Commit | Line | Data |
---|---|---|
86530b38 AT |
1 | |
2 | =head1 NAME | |
3 | ||
4 | Bit::Vector::String - Generic string import/export for Bit::Vector | |
5 | ||
6 | =head1 SYNOPSIS | |
7 | ||
8 | use Bit::Vector::String; | |
9 | ||
10 | to_Oct | |
11 | $string = $vector->to_Oct(); | |
12 | ||
13 | from_Oct | |
14 | $vector->from_Oct($string); | |
15 | ||
16 | new_Oct | |
17 | $vector = Bit::Vector->new_Oct($bits,$string); | |
18 | ||
19 | String_Export | |
20 | $string = $vector->String_Export($type); | |
21 | ||
22 | String_Import | |
23 | $type = $vector->String_Import($string); | |
24 | ||
25 | new_String | |
26 | $vector = Bit::Vector->new_String($bits,$string); | |
27 | ($vector,$type) = Bit::Vector->new_String($bits,$string); | |
28 | ||
29 | =head1 DESCRIPTION | |
30 | ||
31 | =over 2 | |
32 | ||
33 | =item * | |
34 | ||
35 | C<$string = $vector-E<gt>to_Oct();> | |
36 | ||
37 | Returns an octal string representing the given bit vector. | |
38 | ||
39 | Note that this method is not particularly efficient, since it | |
40 | is almost completely realized in Perl, and moreover internally | |
41 | operates on a Perl list of individual octal digits which it | |
42 | concatenates into the final string using "C<join('', ...)>". | |
43 | ||
44 | A benchmark reveals that this method is about 40 times slower | |
45 | than the method "C<to_Bin()>" (which is realized in C): | |
46 | ||
47 | Benchmark: timing 10000 iterations of to_Bin, to_Hex, to_Oct... | |
48 | to_Bin: 1 wallclock secs ( 1.09 usr + 0.00 sys = 1.09 CPU) | |
49 | to_Hex: 1 wallclock secs ( 0.53 usr + 0.00 sys = 0.53 CPU) | |
50 | to_Oct: 40 wallclock secs (40.16 usr + 0.05 sys = 40.21 CPU) | |
51 | ||
52 | Note that since an octal digit is always worth three bits, | |
53 | the length of the resulting string is always a multiple of | |
54 | three bits, regardless of the true length (in bits) of the | |
55 | given bit vector. | |
56 | ||
57 | Also note that the B<LEAST> significant octal digit is | |
58 | located at the B<RIGHT> end of the resulting string, and | |
59 | the B<MOST> significant digit at the B<LEFT> end. | |
60 | ||
61 | Finally, note that this method does B<NOT> prepend any uniquely | |
62 | identifying format prefix (such as "0o") to the resulting string | |
63 | (which means that the result of this method only contains valid | |
64 | octal digits, i.e., [0-7]). | |
65 | ||
66 | However, this can of course most easily be done as needed, | |
67 | as follows: | |
68 | ||
69 | $string = '0o' . $vector->to_Oct(); | |
70 | ||
71 | =item * | |
72 | ||
73 | C<$vector-E<gt>from_Oct($string);> | |
74 | ||
75 | Allows to read in the contents of a bit vector from an octal string, | |
76 | such as returned by the method "C<to_Oct()>" (see above). | |
77 | ||
78 | Note that this method is not particularly efficient, since it is | |
79 | almost completely realized in Perl, and moreover chops the input | |
80 | string into individual characters using "C<split(//, $string)>". | |
81 | ||
82 | Remember also that the least significant bits are always to the | |
83 | right of an octal string, and the most significant bits to the left. | |
84 | Therefore, the string is actually reversed internally before storing | |
85 | it in the given bit vector using the method "C<Chunk_List_Store()>", | |
86 | which expects the least significant chunks of data at the beginning | |
87 | of a list. | |
88 | ||
89 | A benchmark reveals that this method is about 40 times slower than | |
90 | the method "C<from_Bin()>" (which is realized in C): | |
91 | ||
92 | Benchmark: timing 10000 iterations of from_Bin, from_Hex, from_Oct... | |
93 | from_Bin: 1 wallclock secs ( 1.13 usr + 0.00 sys = 1.13 CPU) | |
94 | from_Hex: 1 wallclock secs ( 0.80 usr + 0.00 sys = 0.80 CPU) | |
95 | from_Oct: 46 wallclock secs (44.95 usr + 0.00 sys = 44.95 CPU) | |
96 | ||
97 | If the given string contains any character which is not an octal digit | |
98 | (i.e., [0-7]), a fatal syntax error ensues ("unknown string type"). | |
99 | ||
100 | Note especially that this method does B<NOT> accept any uniquely | |
101 | identifying format prefix (such as "0o") in the given string; the | |
102 | presence of such a prefix will also lead to the fatal "unknown | |
103 | string type" error. | |
104 | ||
105 | If the given string contains less octal digits than are needed to | |
106 | completely fill the given bit vector, the remaining (most significant) | |
107 | bits all remain cleared (i.e., set to zero). | |
108 | ||
109 | This also means that, even if the given string does not contain | |
110 | enough digits to completely fill the given bit vector, the previous | |
111 | contents of the bit vector are erased completely. | |
112 | ||
113 | If the given string is longer than it needs to fill the given bit | |
114 | vector, the superfluous characters are simply ignored. | |
115 | ||
116 | This behaviour is intentional so that you may read in the string | |
117 | representing one bit vector into another bit vector of different | |
118 | size, i.e., as much of it as will fit. | |
119 | ||
120 | =item * | |
121 | ||
122 | C<$vector = Bit::Vector-E<gt>new_Oct($bits,$string);> | |
123 | ||
124 | This method is an alternative constructor which allows you to create | |
125 | a new bit vector object (with "C<$bits>" bits) and to initialize it | |
126 | all in one go. | |
127 | ||
128 | The method internally first calls the bit vector constructor method | |
129 | "C<new()>" and then stores the given string in the newly created | |
130 | bit vector using the same approach as the method "C<from_Oct()>" | |
131 | (described above). | |
132 | ||
133 | Note that this approach is not particularly efficient, since it | |
134 | is almost completely realized in Perl, and moreover chops the input | |
135 | string into individual characters using "C<split(//, $string)>". | |
136 | ||
137 | An exception will be raised if the necessary memory cannot be allocated | |
138 | (see the description of the method "C<new()>" in L<Bit::Vector(3)> for | |
139 | possible causes) or if the given string cannot be converted successfully | |
140 | (see the description of the method "C<from_Oct()>" above for details). | |
141 | ||
142 | Note especially that this method does B<NOT> accept any uniquely | |
143 | identifying format prefix (such as "0o") in the given string and that | |
144 | such a prefix will lead to a fatal "unknown string type" error. | |
145 | ||
146 | In case of an error, the memory occupied by the new bit vector is | |
147 | released again before the exception is actually thrown. | |
148 | ||
149 | If the number of bits "C<$bits>" given has the value "C<undef>", | |
150 | the method will automatically allocate a bit vector with a size | |
151 | (i.e., number of bits) of three times the length of the given string | |
152 | (since every octal digit is worth three bits). | |
153 | ||
154 | Note that this behaviour is different from that of the methods | |
155 | "C<new_Hex()>", "C<new_Bin()>", "C<new_Dec()>" and "C<new_Enum()>" | |
156 | (which are realized in C, internally); these methods will silently | |
157 | assume a value of 0 bits if "C<undef>" is given (and may warn | |
158 | about the "Use of uninitialized value" if warnings are enabled). | |
159 | ||
160 | =item * | |
161 | ||
162 | C<$string = $vector-E<gt>String_Export($type);> | |
163 | ||
164 | Returns a string representing the given bit vector in the | |
165 | format specified by "C<$type>": | |
166 | ||
167 | 1 | b | bin => binary (using "to_Bin()") | |
168 | 2 | o | oct => octal (using "to_Oct()") | |
169 | 3 | d | dec => decimal (using "to_Dec()") | |
170 | 4 | h | hex | x => hexadecimal (using "to_Hex()") | |
171 | 5 | e | enum => enumeration (using "to_Enum()") | |
172 | 6 | p | pack => packed binary (using "Block_Read()") | |
173 | ||
174 | The case (lower/upper/mixed case) of "C<$type>" is ignored. | |
175 | ||
176 | If "C<$type>" is omitted or "C<undef>" or false ("0" | |
177 | or the empty string), a hexadecimal string is returned | |
178 | as the default format. | |
179 | ||
180 | If "C<$type>" does not have any of the values described | |
181 | above, a fatal "unknown string type" will occur. | |
182 | ||
183 | Beware that in order to guarantee that the strings can | |
184 | be correctly parsed and read in by the methods | |
185 | "C<String_Import()>" and "C<new_String()>" (described | |
186 | below), the method "C<String_Export()>" provides | |
187 | uniquely identifying prefixes (and, in one case, | |
188 | a suffix) as follows: | |
189 | ||
190 | 1 | b | bin => '0b' . $vector->to_Bin(); | |
191 | 2 | o | oct => '0o' . $vector->to_Oct(); | |
192 | 3 | d | dec => $vector->to_Dec(); # prefix is [+-] | |
193 | 4 | h | hex | x => '0x' . $vector->to_Hex(); | |
194 | 5 | e | enum => '{' . $vector->to_Enum() . '}'; | |
195 | 6 | p | pack => ':' . $vector->Size() . | |
196 | ':' . $vector->Block_Read(); | |
197 | ||
198 | This is necessary because certain strings can be valid | |
199 | representations in more than one format. | |
200 | ||
201 | All strings in binary format, i.e., which only contain "0" | |
202 | and "1", are also valid number representations (of a different | |
203 | value, of course) in octal, decimal and hexadecimal. | |
204 | ||
205 | Likewise, a string in octal format is also valid in decimal | |
206 | and hexadecimal, and a string in decimal format is also valid | |
207 | in hexadecimal. | |
208 | ||
209 | Moreover, if the enumeration of set bits (as returned by | |
210 | "C<to_Enum()>") only contains one element, this element could | |
211 | be mistaken for a representation of the entire bit vector | |
212 | (instead of just one bit) in decimal. | |
213 | ||
214 | Beware also that the string returned by format "6" ("packed | |
215 | binary") will in general B<NOT BE PRINTABLE>, because it will | |
216 | usually consist of many unprintable characters! | |
217 | ||
218 | =item * | |
219 | ||
220 | C<$type = $vector-E<gt>String_Import($string);> | |
221 | ||
222 | Allows to read in the contents of a bit vector from a string | |
223 | which has previously been produced by "C<String_Export()>", | |
224 | "C<to_Bin()>", "C<to_Oct()>", "C<to_Dec()>", "C<to_Hex()>", | |
225 | "C<to_Enum()>", "C<Block_Read()>" or manually or by another | |
226 | program. | |
227 | ||
228 | Beware however that the string must have the correct format; | |
229 | otherwise a fatal "unknown string type" error will occur. | |
230 | ||
231 | The correct format is the one returned by "C<String_Export()>" | |
232 | (see immediately above). | |
233 | ||
234 | The method will also try to automatically recognize formats | |
235 | without identifying prefix such as returned by the methods | |
236 | "C<to_Bin()>", "C<to_Oct()>", "C<to_Dec()>", "C<to_Hex()>" | |
237 | and "C<to_Enum()>". | |
238 | ||
239 | However, as explained above for the method "C<String_Export()>", | |
240 | due to the fact that a string may be a valid representation in | |
241 | more than one format, this may lead to unwanted results. | |
242 | ||
243 | The method will try to match the format of the given string | |
244 | in the following order: | |
245 | ||
246 | If the string consists only of [01], it will be considered | |
247 | to be in binary format (although it could be in octal, decimal | |
248 | or hexadecimal format or even be an enumeration with only | |
249 | one element as well). | |
250 | ||
251 | If the string consists only of [0-7], it will be considered | |
252 | to be in octal format (although it could be in decimal or | |
253 | hexadecimal format or even be an enumeration with only | |
254 | one element as well). | |
255 | ||
256 | If the string consists only of [0-9], it will be considered | |
257 | to be in decimal format (although it could be in hexadecimal | |
258 | format or even be an enumeration with only one element as well). | |
259 | ||
260 | If the string consists only of [0-9A-Fa-f], it will be considered | |
261 | to be in hexadecimal format. | |
262 | ||
263 | If the string only contains numbers in decimal format, separated | |
264 | by commas (",") or dashes ("-"), it is considered to be an | |
265 | enumeration (a single decimal number also qualifies). | |
266 | ||
267 | And if the string starts with ":[0-9]:", the remainder of the | |
268 | string is read in with "C<Block_Store()>". | |
269 | ||
270 | To avoid misinterpretations, it is therefore recommendable to | |
271 | always either use the method "C<String_Export()>" or to provide | |
272 | some uniquely identifying prefix (and suffix, in one case) | |
273 | yourself: | |
274 | ||
275 | binary => '0b' . $string; | |
276 | octal => '0o' . $string; | |
277 | decimal => '+' . $string; # in case "$string" | |
278 | => '-' . $string; # has no sign yet | |
279 | hexadecimal => '0x' . $string; | |
280 | => '0h' . $string; | |
281 | enumeration => '{' . $string . '}'; | |
282 | => '[' . $string . ']'; | |
283 | => '<' . $string . '>'; | |
284 | => '(' . $string . ')'; | |
285 | packed binary => ':' . $vector->Size() . | |
286 | ':' . $vector->Block_Read(); | |
287 | ||
288 | Note that case (lower/upper/mixed case) is not important | |
289 | and will be ignored by this method. | |
290 | ||
291 | Internally, the method uses the methods "C<from_Bin()>", | |
292 | "C<from_Oct()>", "C<from_Dec()>", "C<from_Hex()>", | |
293 | "C<from_Enum()>" and "C<Block_Store()>" for actually | |
294 | importing the contents of the string into the given | |
295 | bit vector. See their descriptions here in this document | |
296 | and in L<Bit::Vector(3)> for any further conditions that | |
297 | must be met and corresponding possible fatal error messages. | |
298 | ||
299 | The method returns the number of the format that has been | |
300 | recognized: | |
301 | ||
302 | 1 => binary | |
303 | 2 => octal | |
304 | 3 => decimal | |
305 | 4 => hexadecimal | |
306 | 5 => enumeration | |
307 | 6 => packed binary | |
308 | ||
309 | =item * | |
310 | ||
311 | C<$vector = Bit::Vector-E<gt>new_String($bits,$string);> | |
312 | ||
313 | C<($vector,$type) = Bit::Vector-E<gt>new_String($bits,$string);> | |
314 | ||
315 | This method is an alternative constructor which allows you to create | |
316 | a new bit vector object (with "C<$bits>" bits) and to initialize it | |
317 | all in one go. | |
318 | ||
319 | The method internally first calls the bit vector constructor method | |
320 | "C<new()>" and then stores the given string in the newly created | |
321 | bit vector using the same approach as the method "C<String_Import()>" | |
322 | (described immediately above). | |
323 | ||
324 | An exception will be raised if the necessary memory cannot be allocated | |
325 | (see the description of the method "C<new()>" in L<Bit::Vector(3)> for | |
326 | possible causes) or if the given string cannot be converted successfully | |
327 | (see the description of the method "C<String_Import()>" above for details). | |
328 | ||
329 | In case of an error, the memory occupied by the new bit vector is | |
330 | released again before the exception is actually thrown. | |
331 | ||
332 | If the number of bits "C<$bits>" given has the value "C<undef>", the | |
333 | method will automatically determine this value for you and allocate | |
334 | a bit vector of the calculated size. | |
335 | ||
336 | Note that this behaviour is different from that of the methods | |
337 | "C<new_Hex()>", "C<new_Bin()>", "C<new_Dec()>" and "C<new_Enum()>" | |
338 | (which are realized in C, internally); these methods will silently | |
339 | assume a value of 0 bits if "C<undef>" is given (and may warn | |
340 | about the "Use of uninitialized value" if warnings are enabled). | |
341 | ||
342 | The necessary number of bits is calculated as follows: | |
343 | ||
344 | binary => length($string); | |
345 | octal => 3 * length($string); | |
346 | decimal => int( length($string) * log(10) / log(2) + 1 ); | |
347 | hexadecimal => 4 * length($string); | |
348 | enumeration => maximum of values found in $string + 1 | |
349 | packed binary => $string =~ /^:(\d+):/; | |
350 | ||
351 | If called in scalar context, the method returns the newly created | |
352 | bit vector object. | |
353 | ||
354 | If called in list context, the method additionally returns the | |
355 | number of the format which has been recognized, as explained | |
356 | above for the method "C<String_Import()>". | |
357 | ||
358 | =back | |
359 | ||
360 | =head1 SEE ALSO | |
361 | ||
362 | Bit::Vector(3), Bit::Vector::Overload(3). | |
363 | ||
364 | =head1 VERSION | |
365 | ||
366 | This man page documents "Bit::Vector::String" version 6.4. | |
367 | ||
368 | =head1 AUTHOR | |
369 | ||
370 | Steffen Beyer | |
371 | mailto:sb@engelschall.com | |
372 | http://www.engelschall.com/u/sb/download/ | |
373 | ||
374 | =head1 COPYRIGHT | |
375 | ||
376 | Copyright (c) 2004 by Steffen Beyer. All rights reserved. | |
377 | ||
378 | =head1 LICENSE | |
379 | ||
380 | This package is free software; you can redistribute it and/or | |
381 | modify it under the same terms as Perl itself, i.e., under the | |
382 | terms of the "Artistic License" or the "GNU General Public License". | |
383 | ||
384 | The C library at the core of this Perl module can additionally | |
385 | be redistributed and/or modified under the terms of the "GNU | |
386 | Library General Public License". | |
387 | ||
388 | Please refer to the files "Artistic.txt", "GNU_GPL.txt" and | |
389 | "GNU_LGPL.txt" in this distribution for details! | |
390 | ||
391 | =head1 DISCLAIMER | |
392 | ||
393 | This package is distributed in the hope that it will be useful, | |
394 | but WITHOUT ANY WARRANTY; without even the implied warranty of | |
395 | MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. | |
396 | ||
397 | See the "GNU General Public License" for more details. | |
398 |