| 1 | |
| 2 | =head1 NAME |
| 3 | |
| 4 | Bit::Vector::String - Generic string import/export for Bit::Vector |
| 5 | |
| 6 | =head1 SYNOPSIS |
| 7 | |
| 8 | use Bit::Vector::String; |
| 9 | |
| 10 | to_Oct |
| 11 | $string = $vector->to_Oct(); |
| 12 | |
| 13 | from_Oct |
| 14 | $vector->from_Oct($string); |
| 15 | |
| 16 | new_Oct |
| 17 | $vector = Bit::Vector->new_Oct($bits,$string); |
| 18 | |
| 19 | String_Export |
| 20 | $string = $vector->String_Export($type); |
| 21 | |
| 22 | String_Import |
| 23 | $type = $vector->String_Import($string); |
| 24 | |
| 25 | new_String |
| 26 | $vector = Bit::Vector->new_String($bits,$string); |
| 27 | ($vector,$type) = Bit::Vector->new_String($bits,$string); |
| 28 | |
| 29 | =head1 DESCRIPTION |
| 30 | |
| 31 | =over 2 |
| 32 | |
| 33 | =item * |
| 34 | |
| 35 | C<$string = $vector-E<gt>to_Oct();> |
| 36 | |
| 37 | Returns an octal string representing the given bit vector. |
| 38 | |
| 39 | Note that this method is not particularly efficient, since it |
| 40 | is almost completely realized in Perl, and moreover internally |
| 41 | operates on a Perl list of individual octal digits which it |
| 42 | concatenates into the final string using "C<join('', ...)>". |
| 43 | |
| 44 | A benchmark reveals that this method is about 40 times slower |
| 45 | than the method "C<to_Bin()>" (which is realized in C): |
| 46 | |
| 47 | Benchmark: timing 10000 iterations of to_Bin, to_Hex, to_Oct... |
| 48 | to_Bin: 1 wallclock secs ( 1.09 usr + 0.00 sys = 1.09 CPU) |
| 49 | to_Hex: 1 wallclock secs ( 0.53 usr + 0.00 sys = 0.53 CPU) |
| 50 | to_Oct: 40 wallclock secs (40.16 usr + 0.05 sys = 40.21 CPU) |
| 51 | |
| 52 | Note that since an octal digit is always worth three bits, |
| 53 | the length of the resulting string is always a multiple of |
| 54 | three bits, regardless of the true length (in bits) of the |
| 55 | given bit vector. |
| 56 | |
| 57 | Also note that the B<LEAST> significant octal digit is |
| 58 | located at the B<RIGHT> end of the resulting string, and |
| 59 | the B<MOST> significant digit at the B<LEFT> end. |
| 60 | |
| 61 | Finally, note that this method does B<NOT> prepend any uniquely |
| 62 | identifying format prefix (such as "0o") to the resulting string |
| 63 | (which means that the result of this method only contains valid |
| 64 | octal digits, i.e., [0-7]). |
| 65 | |
| 66 | However, this can of course most easily be done as needed, |
| 67 | as follows: |
| 68 | |
| 69 | $string = '0o' . $vector->to_Oct(); |
| 70 | |
| 71 | =item * |
| 72 | |
| 73 | C<$vector-E<gt>from_Oct($string);> |
| 74 | |
| 75 | Allows to read in the contents of a bit vector from an octal string, |
| 76 | such as returned by the method "C<to_Oct()>" (see above). |
| 77 | |
| 78 | Note that this method is not particularly efficient, since it is |
| 79 | almost completely realized in Perl, and moreover chops the input |
| 80 | string into individual characters using "C<split(//, $string)>". |
| 81 | |
| 82 | Remember also that the least significant bits are always to the |
| 83 | right of an octal string, and the most significant bits to the left. |
| 84 | Therefore, the string is actually reversed internally before storing |
| 85 | it in the given bit vector using the method "C<Chunk_List_Store()>", |
| 86 | which expects the least significant chunks of data at the beginning |
| 87 | of a list. |
| 88 | |
| 89 | A benchmark reveals that this method is about 40 times slower than |
| 90 | the method "C<from_Bin()>" (which is realized in C): |
| 91 | |
| 92 | Benchmark: timing 10000 iterations of from_Bin, from_Hex, from_Oct... |
| 93 | from_Bin: 1 wallclock secs ( 1.13 usr + 0.00 sys = 1.13 CPU) |
| 94 | from_Hex: 1 wallclock secs ( 0.80 usr + 0.00 sys = 0.80 CPU) |
| 95 | from_Oct: 46 wallclock secs (44.95 usr + 0.00 sys = 44.95 CPU) |
| 96 | |
| 97 | If the given string contains any character which is not an octal digit |
| 98 | (i.e., [0-7]), a fatal syntax error ensues ("unknown string type"). |
| 99 | |
| 100 | Note especially that this method does B<NOT> accept any uniquely |
| 101 | identifying format prefix (such as "0o") in the given string; the |
| 102 | presence of such a prefix will also lead to the fatal "unknown |
| 103 | string type" error. |
| 104 | |
| 105 | If the given string contains less octal digits than are needed to |
| 106 | completely fill the given bit vector, the remaining (most significant) |
| 107 | bits all remain cleared (i.e., set to zero). |
| 108 | |
| 109 | This also means that, even if the given string does not contain |
| 110 | enough digits to completely fill the given bit vector, the previous |
| 111 | contents of the bit vector are erased completely. |
| 112 | |
| 113 | If the given string is longer than it needs to fill the given bit |
| 114 | vector, the superfluous characters are simply ignored. |
| 115 | |
| 116 | This behaviour is intentional so that you may read in the string |
| 117 | representing one bit vector into another bit vector of different |
| 118 | size, i.e., as much of it as will fit. |
| 119 | |
| 120 | =item * |
| 121 | |
| 122 | C<$vector = Bit::Vector-E<gt>new_Oct($bits,$string);> |
| 123 | |
| 124 | This method is an alternative constructor which allows you to create |
| 125 | a new bit vector object (with "C<$bits>" bits) and to initialize it |
| 126 | all in one go. |
| 127 | |
| 128 | The method internally first calls the bit vector constructor method |
| 129 | "C<new()>" and then stores the given string in the newly created |
| 130 | bit vector using the same approach as the method "C<from_Oct()>" |
| 131 | (described above). |
| 132 | |
| 133 | Note that this approach is not particularly efficient, since it |
| 134 | is almost completely realized in Perl, and moreover chops the input |
| 135 | string into individual characters using "C<split(//, $string)>". |
| 136 | |
| 137 | An exception will be raised if the necessary memory cannot be allocated |
| 138 | (see the description of the method "C<new()>" in L<Bit::Vector(3)> for |
| 139 | possible causes) or if the given string cannot be converted successfully |
| 140 | (see the description of the method "C<from_Oct()>" above for details). |
| 141 | |
| 142 | Note especially that this method does B<NOT> accept any uniquely |
| 143 | identifying format prefix (such as "0o") in the given string and that |
| 144 | such a prefix will lead to a fatal "unknown string type" error. |
| 145 | |
| 146 | In case of an error, the memory occupied by the new bit vector is |
| 147 | released again before the exception is actually thrown. |
| 148 | |
| 149 | If the number of bits "C<$bits>" given has the value "C<undef>", |
| 150 | the method will automatically allocate a bit vector with a size |
| 151 | (i.e., number of bits) of three times the length of the given string |
| 152 | (since every octal digit is worth three bits). |
| 153 | |
| 154 | Note that this behaviour is different from that of the methods |
| 155 | "C<new_Hex()>", "C<new_Bin()>", "C<new_Dec()>" and "C<new_Enum()>" |
| 156 | (which are realized in C, internally); these methods will silently |
| 157 | assume a value of 0 bits if "C<undef>" is given (and may warn |
| 158 | about the "Use of uninitialized value" if warnings are enabled). |
| 159 | |
| 160 | =item * |
| 161 | |
| 162 | C<$string = $vector-E<gt>String_Export($type);> |
| 163 | |
| 164 | Returns a string representing the given bit vector in the |
| 165 | format specified by "C<$type>": |
| 166 | |
| 167 | 1 | b | bin => binary (using "to_Bin()") |
| 168 | 2 | o | oct => octal (using "to_Oct()") |
| 169 | 3 | d | dec => decimal (using "to_Dec()") |
| 170 | 4 | h | hex | x => hexadecimal (using "to_Hex()") |
| 171 | 5 | e | enum => enumeration (using "to_Enum()") |
| 172 | 6 | p | pack => packed binary (using "Block_Read()") |
| 173 | |
| 174 | The case (lower/upper/mixed case) of "C<$type>" is ignored. |
| 175 | |
| 176 | If "C<$type>" is omitted or "C<undef>" or false ("0" |
| 177 | or the empty string), a hexadecimal string is returned |
| 178 | as the default format. |
| 179 | |
| 180 | If "C<$type>" does not have any of the values described |
| 181 | above, a fatal "unknown string type" will occur. |
| 182 | |
| 183 | Beware that in order to guarantee that the strings can |
| 184 | be correctly parsed and read in by the methods |
| 185 | "C<String_Import()>" and "C<new_String()>" (described |
| 186 | below), the method "C<String_Export()>" provides |
| 187 | uniquely identifying prefixes (and, in one case, |
| 188 | a suffix) as follows: |
| 189 | |
| 190 | 1 | b | bin => '0b' . $vector->to_Bin(); |
| 191 | 2 | o | oct => '0o' . $vector->to_Oct(); |
| 192 | 3 | d | dec => $vector->to_Dec(); # prefix is [+-] |
| 193 | 4 | h | hex | x => '0x' . $vector->to_Hex(); |
| 194 | 5 | e | enum => '{' . $vector->to_Enum() . '}'; |
| 195 | 6 | p | pack => ':' . $vector->Size() . |
| 196 | ':' . $vector->Block_Read(); |
| 197 | |
| 198 | This is necessary because certain strings can be valid |
| 199 | representations in more than one format. |
| 200 | |
| 201 | All strings in binary format, i.e., which only contain "0" |
| 202 | and "1", are also valid number representations (of a different |
| 203 | value, of course) in octal, decimal and hexadecimal. |
| 204 | |
| 205 | Likewise, a string in octal format is also valid in decimal |
| 206 | and hexadecimal, and a string in decimal format is also valid |
| 207 | in hexadecimal. |
| 208 | |
| 209 | Moreover, if the enumeration of set bits (as returned by |
| 210 | "C<to_Enum()>") only contains one element, this element could |
| 211 | be mistaken for a representation of the entire bit vector |
| 212 | (instead of just one bit) in decimal. |
| 213 | |
| 214 | Beware also that the string returned by format "6" ("packed |
| 215 | binary") will in general B<NOT BE PRINTABLE>, because it will |
| 216 | usually consist of many unprintable characters! |
| 217 | |
| 218 | =item * |
| 219 | |
| 220 | C<$type = $vector-E<gt>String_Import($string);> |
| 221 | |
| 222 | Allows to read in the contents of a bit vector from a string |
| 223 | which has previously been produced by "C<String_Export()>", |
| 224 | "C<to_Bin()>", "C<to_Oct()>", "C<to_Dec()>", "C<to_Hex()>", |
| 225 | "C<to_Enum()>", "C<Block_Read()>" or manually or by another |
| 226 | program. |
| 227 | |
| 228 | Beware however that the string must have the correct format; |
| 229 | otherwise a fatal "unknown string type" error will occur. |
| 230 | |
| 231 | The correct format is the one returned by "C<String_Export()>" |
| 232 | (see immediately above). |
| 233 | |
| 234 | The method will also try to automatically recognize formats |
| 235 | without identifying prefix such as returned by the methods |
| 236 | "C<to_Bin()>", "C<to_Oct()>", "C<to_Dec()>", "C<to_Hex()>" |
| 237 | and "C<to_Enum()>". |
| 238 | |
| 239 | However, as explained above for the method "C<String_Export()>", |
| 240 | due to the fact that a string may be a valid representation in |
| 241 | more than one format, this may lead to unwanted results. |
| 242 | |
| 243 | The method will try to match the format of the given string |
| 244 | in the following order: |
| 245 | |
| 246 | If the string consists only of [01], it will be considered |
| 247 | to be in binary format (although it could be in octal, decimal |
| 248 | or hexadecimal format or even be an enumeration with only |
| 249 | one element as well). |
| 250 | |
| 251 | If the string consists only of [0-7], it will be considered |
| 252 | to be in octal format (although it could be in decimal or |
| 253 | hexadecimal format or even be an enumeration with only |
| 254 | one element as well). |
| 255 | |
| 256 | If the string consists only of [0-9], it will be considered |
| 257 | to be in decimal format (although it could be in hexadecimal |
| 258 | format or even be an enumeration with only one element as well). |
| 259 | |
| 260 | If the string consists only of [0-9A-Fa-f], it will be considered |
| 261 | to be in hexadecimal format. |
| 262 | |
| 263 | If the string only contains numbers in decimal format, separated |
| 264 | by commas (",") or dashes ("-"), it is considered to be an |
| 265 | enumeration (a single decimal number also qualifies). |
| 266 | |
| 267 | And if the string starts with ":[0-9]:", the remainder of the |
| 268 | string is read in with "C<Block_Store()>". |
| 269 | |
| 270 | To avoid misinterpretations, it is therefore recommendable to |
| 271 | always either use the method "C<String_Export()>" or to provide |
| 272 | some uniquely identifying prefix (and suffix, in one case) |
| 273 | yourself: |
| 274 | |
| 275 | binary => '0b' . $string; |
| 276 | octal => '0o' . $string; |
| 277 | decimal => '+' . $string; # in case "$string" |
| 278 | => '-' . $string; # has no sign yet |
| 279 | hexadecimal => '0x' . $string; |
| 280 | => '0h' . $string; |
| 281 | enumeration => '{' . $string . '}'; |
| 282 | => '[' . $string . ']'; |
| 283 | => '<' . $string . '>'; |
| 284 | => '(' . $string . ')'; |
| 285 | packed binary => ':' . $vector->Size() . |
| 286 | ':' . $vector->Block_Read(); |
| 287 | |
| 288 | Note that case (lower/upper/mixed case) is not important |
| 289 | and will be ignored by this method. |
| 290 | |
| 291 | Internally, the method uses the methods "C<from_Bin()>", |
| 292 | "C<from_Oct()>", "C<from_Dec()>", "C<from_Hex()>", |
| 293 | "C<from_Enum()>" and "C<Block_Store()>" for actually |
| 294 | importing the contents of the string into the given |
| 295 | bit vector. See their descriptions here in this document |
| 296 | and in L<Bit::Vector(3)> for any further conditions that |
| 297 | must be met and corresponding possible fatal error messages. |
| 298 | |
| 299 | The method returns the number of the format that has been |
| 300 | recognized: |
| 301 | |
| 302 | 1 => binary |
| 303 | 2 => octal |
| 304 | 3 => decimal |
| 305 | 4 => hexadecimal |
| 306 | 5 => enumeration |
| 307 | 6 => packed binary |
| 308 | |
| 309 | =item * |
| 310 | |
| 311 | C<$vector = Bit::Vector-E<gt>new_String($bits,$string);> |
| 312 | |
| 313 | C<($vector,$type) = Bit::Vector-E<gt>new_String($bits,$string);> |
| 314 | |
| 315 | This method is an alternative constructor which allows you to create |
| 316 | a new bit vector object (with "C<$bits>" bits) and to initialize it |
| 317 | all in one go. |
| 318 | |
| 319 | The method internally first calls the bit vector constructor method |
| 320 | "C<new()>" and then stores the given string in the newly created |
| 321 | bit vector using the same approach as the method "C<String_Import()>" |
| 322 | (described immediately above). |
| 323 | |
| 324 | An exception will be raised if the necessary memory cannot be allocated |
| 325 | (see the description of the method "C<new()>" in L<Bit::Vector(3)> for |
| 326 | possible causes) or if the given string cannot be converted successfully |
| 327 | (see the description of the method "C<String_Import()>" above for details). |
| 328 | |
| 329 | In case of an error, the memory occupied by the new bit vector is |
| 330 | released again before the exception is actually thrown. |
| 331 | |
| 332 | If the number of bits "C<$bits>" given has the value "C<undef>", the |
| 333 | method will automatically determine this value for you and allocate |
| 334 | a bit vector of the calculated size. |
| 335 | |
| 336 | Note that this behaviour is different from that of the methods |
| 337 | "C<new_Hex()>", "C<new_Bin()>", "C<new_Dec()>" and "C<new_Enum()>" |
| 338 | (which are realized in C, internally); these methods will silently |
| 339 | assume a value of 0 bits if "C<undef>" is given (and may warn |
| 340 | about the "Use of uninitialized value" if warnings are enabled). |
| 341 | |
| 342 | The necessary number of bits is calculated as follows: |
| 343 | |
| 344 | binary => length($string); |
| 345 | octal => 3 * length($string); |
| 346 | decimal => int( length($string) * log(10) / log(2) + 1 ); |
| 347 | hexadecimal => 4 * length($string); |
| 348 | enumeration => maximum of values found in $string + 1 |
| 349 | packed binary => $string =~ /^:(\d+):/; |
| 350 | |
| 351 | If called in scalar context, the method returns the newly created |
| 352 | bit vector object. |
| 353 | |
| 354 | If called in list context, the method additionally returns the |
| 355 | number of the format which has been recognized, as explained |
| 356 | above for the method "C<String_Import()>". |
| 357 | |
| 358 | =back |
| 359 | |
| 360 | =head1 SEE ALSO |
| 361 | |
| 362 | Bit::Vector(3), Bit::Vector::Overload(3). |
| 363 | |
| 364 | =head1 VERSION |
| 365 | |
| 366 | This man page documents "Bit::Vector::String" version 6.4. |
| 367 | |
| 368 | =head1 AUTHOR |
| 369 | |
| 370 | Steffen Beyer |
| 371 | mailto:sb@engelschall.com |
| 372 | http://www.engelschall.com/u/sb/download/ |
| 373 | |
| 374 | =head1 COPYRIGHT |
| 375 | |
| 376 | Copyright (c) 2004 by Steffen Beyer. All rights reserved. |
| 377 | |
| 378 | =head1 LICENSE |
| 379 | |
| 380 | This package is free software; you can redistribute it and/or |
| 381 | modify it under the same terms as Perl itself, i.e., under the |
| 382 | terms of the "Artistic License" or the "GNU General Public License". |
| 383 | |
| 384 | The C library at the core of this Perl module can additionally |
| 385 | be redistributed and/or modified under the terms of the "GNU |
| 386 | Library General Public License". |
| 387 | |
| 388 | Please refer to the files "Artistic.txt", "GNU_GPL.txt" and |
| 389 | "GNU_LGPL.txt" in this distribution for details! |
| 390 | |
| 391 | =head1 DISCLAIMER |
| 392 | |
| 393 | This package is distributed in the hope that it will be useful, |
| 394 | but WITHOUT ANY WARRANTY; without even the implied warranty of |
| 395 | MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. |
| 396 | |
| 397 | See the "GNU General Public License" for more details. |
| 398 | |