| 1 | package PerlIO; |
| 2 | |
| 3 | our $VERSION = '1.04'; |
| 4 | |
| 5 | # Map layer name to package that defines it |
| 6 | our %alias; |
| 7 | |
| 8 | sub import |
| 9 | { |
| 10 | my $class = shift; |
| 11 | while (@_) |
| 12 | { |
| 13 | my $layer = shift; |
| 14 | if (exists $alias{$layer}) |
| 15 | { |
| 16 | $layer = $alias{$layer} |
| 17 | } |
| 18 | else |
| 19 | { |
| 20 | $layer = "${class}::$layer"; |
| 21 | } |
| 22 | eval "require $layer"; |
| 23 | warn $@ if $@; |
| 24 | } |
| 25 | } |
| 26 | |
| 27 | sub F_UTF8 () { 0x8000 } |
| 28 | |
| 29 | 1; |
| 30 | __END__ |
| 31 | |
| 32 | =head1 NAME |
| 33 | |
| 34 | PerlIO - On demand loader for PerlIO layers and root of PerlIO::* name space |
| 35 | |
| 36 | =head1 SYNOPSIS |
| 37 | |
| 38 | open($fh,"<:crlf", "my.txt"); # support platform-native and CRLF text files |
| 39 | |
| 40 | open($fh,"<","his.jpg"); # portably open a binary file for reading |
| 41 | binmode($fh); |
| 42 | |
| 43 | Shell: |
| 44 | PERLIO=perlio perl .... |
| 45 | |
| 46 | =head1 DESCRIPTION |
| 47 | |
| 48 | When an undefined layer 'foo' is encountered in an C<open> or |
| 49 | C<binmode> layer specification then C code performs the equivalent of: |
| 50 | |
| 51 | use PerlIO 'foo'; |
| 52 | |
| 53 | The perl code in PerlIO.pm then attempts to locate a layer by doing |
| 54 | |
| 55 | require PerlIO::foo; |
| 56 | |
| 57 | Otherwise the C<PerlIO> package is a place holder for additional |
| 58 | PerlIO related functions. |
| 59 | |
| 60 | The following layers are currently defined: |
| 61 | |
| 62 | =over 4 |
| 63 | |
| 64 | =item :unix |
| 65 | |
| 66 | Lowest level layer which provides basic PerlIO operations in terms of |
| 67 | UNIX/POSIX numeric file descriptor calls |
| 68 | (open(), read(), write(), lseek(), close()). |
| 69 | |
| 70 | =item :stdio |
| 71 | |
| 72 | Layer which calls C<fread>, C<fwrite> and C<fseek>/C<ftell> etc. Note |
| 73 | that as this is "real" stdio it will ignore any layers beneath it and |
| 74 | got straight to the operating system via the C library as usual. |
| 75 | |
| 76 | =item :perlio |
| 77 | |
| 78 | A from scratch implementation of buffering for PerlIO. Provides fast |
| 79 | access to the buffer for C<sv_gets> which implements perl's readline/E<lt>E<gt> |
| 80 | and in general attempts to minimize data copying. |
| 81 | |
| 82 | C<:perlio> will insert a C<:unix> layer below itself to do low level IO. |
| 83 | |
| 84 | =item :crlf |
| 85 | |
| 86 | A layer that implements DOS/Windows like CRLF line endings. On read |
| 87 | converts pairs of CR,LF to a single "\n" newline character. On write |
| 88 | converts each "\n" to a CR,LF pair. Note that this layer likes to be |
| 89 | one of its kind: it silently ignores attempts to be pushed into the |
| 90 | layer stack more than once. |
| 91 | |
| 92 | It currently does I<not> mimic MS-DOS as far as treating of Control-Z |
| 93 | as being an end-of-file marker. |
| 94 | |
| 95 | (Gory details follow) To be more exact what happens is this: after |
| 96 | pushing itself to the stack, the C<:crlf> layer checks all the layers |
| 97 | below itself to find the first layer that is capable of being a CRLF |
| 98 | layer but is not yet enabled to be a CRLF layer. If it finds such a |
| 99 | layer, it enables the CRLFness of that other deeper layer, and then |
| 100 | pops itself off the stack. If not, fine, use the one we just pushed. |
| 101 | |
| 102 | The end result is that a C<:crlf> means "please enable the first CRLF |
| 103 | layer you can find, and if you can't find one, here would be a good |
| 104 | spot to place a new one." |
| 105 | |
| 106 | Based on the C<:perlio> layer. |
| 107 | |
| 108 | =item :mmap |
| 109 | |
| 110 | A layer which implements "reading" of files by using C<mmap()> to |
| 111 | make (whole) file appear in the process's address space, and then |
| 112 | using that as PerlIO's "buffer". This I<may> be faster in certain |
| 113 | circumstances for large files, and may result in less physical memory |
| 114 | use when multiple processes are reading the same file. |
| 115 | |
| 116 | Files which are not C<mmap()>-able revert to behaving like the C<:perlio> |
| 117 | layer. Writes also behave like C<:perlio> layer as C<mmap()> for write |
| 118 | needs extra house-keeping (to extend the file) which negates any advantage. |
| 119 | |
| 120 | The C<:mmap> layer will not exist if platform does not support C<mmap()>. |
| 121 | |
| 122 | =item :utf8 |
| 123 | |
| 124 | Declares that the stream accepts perl's internal encoding of |
| 125 | characters. (Which really is UTF-8 on ASCII machines, but is |
| 126 | UTF-EBCDIC on EBCDIC machines.) This allows any character perl can |
| 127 | represent to be read from or written to the stream. The UTF-X encoding |
| 128 | is chosen to render simple text parts (i.e. non-accented letters, |
| 129 | digits and common punctuation) human readable in the encoded file. |
| 130 | |
| 131 | Here is how to write your native data out using UTF-8 (or UTF-EBCDIC) |
| 132 | and then read it back in. |
| 133 | |
| 134 | open(F, ">:utf8", "data.utf"); |
| 135 | print F $out; |
| 136 | close(F); |
| 137 | |
| 138 | open(F, "<:utf8", "data.utf"); |
| 139 | $in = <F>; |
| 140 | close(F); |
| 141 | |
| 142 | =item :bytes |
| 143 | |
| 144 | This is the inverse of C<:utf8> layer. It turns off the flag |
| 145 | on the layer below so that data read from it is considered to |
| 146 | be "octets" i.e. characters in range 0..255 only. Likewise |
| 147 | on output perl will warn if a "wide" character is written |
| 148 | to a such a stream. |
| 149 | |
| 150 | =item :raw |
| 151 | |
| 152 | The C<:raw> layer is I<defined> as being identical to calling |
| 153 | C<binmode($fh)> - the stream is made suitable for passing binary data |
| 154 | i.e. each byte is passed as-is. The stream will still be |
| 155 | buffered. |
| 156 | |
| 157 | In Perl 5.6 and some books the C<:raw> layer (previously sometimes also |
| 158 | referred to as a "discipline") is documented as the inverse of the |
| 159 | C<:crlf> layer. That is no longer the case - other layers which would |
| 160 | alter binary nature of the stream are also disabled. If you want UNIX |
| 161 | line endings on a platform that normally does CRLF translation, but still |
| 162 | want UTF-8 or encoding defaults the appropriate thing to do is to add |
| 163 | C<:perlio> to PERLIO environment variable. |
| 164 | |
| 165 | The implementation of C<:raw> is as a pseudo-layer which when "pushed" |
| 166 | pops itself and then any layers which do not declare themselves as suitable |
| 167 | for binary data. (Undoing :utf8 and :crlf are implemented by clearing |
| 168 | flags rather than popping layers but that is an implementation detail.) |
| 169 | |
| 170 | As a consequence of the fact that C<:raw> normally pops layers |
| 171 | it usually only makes sense to have it as the only or first element in |
| 172 | a layer specification. When used as the first element it provides |
| 173 | a known base on which to build e.g. |
| 174 | |
| 175 | open($fh,":raw:utf8",...) |
| 176 | |
| 177 | will construct a "binary" stream, but then enable UTF-8 translation. |
| 178 | |
| 179 | =item :pop |
| 180 | |
| 181 | A pseudo layer that removes the top-most layer. Gives perl code |
| 182 | a way to manipulate the layer stack. Should be considered |
| 183 | as experimental. Note that C<:pop> only works on real layers |
| 184 | and will not undo the effects of pseudo layers like C<:utf8>. |
| 185 | An example of a possible use might be: |
| 186 | |
| 187 | open($fh,...) |
| 188 | ... |
| 189 | binmode($fh,":encoding(...)"); # next chunk is encoded |
| 190 | ... |
| 191 | binmode($fh,":pop"); # back to un-encoded |
| 192 | |
| 193 | A more elegant (and safer) interface is needed. |
| 194 | |
| 195 | =item :win32 |
| 196 | |
| 197 | On Win32 platforms this I<experimental> layer uses native "handle" IO |
| 198 | rather than unix-like numeric file descriptor layer. Known to be |
| 199 | buggy as of perl 5.8.2. |
| 200 | |
| 201 | =back |
| 202 | |
| 203 | =head2 Custom Layers |
| 204 | |
| 205 | It is possible to write custom layers in addition to the above builtin |
| 206 | ones, both in C/XS and Perl. Two such layers (and one example written |
| 207 | in Perl using the latter) come with the Perl distribution. |
| 208 | |
| 209 | =over 4 |
| 210 | |
| 211 | =item :encoding |
| 212 | |
| 213 | Use C<:encoding(ENCODING)> either in open() or binmode() to install |
| 214 | a layer that does transparently character set and encoding transformations, |
| 215 | for example from Shift-JIS to Unicode. Note that under C<stdio> |
| 216 | an C<:encoding> also enables C<:utf8>. See L<PerlIO::encoding> |
| 217 | for more information. |
| 218 | |
| 219 | =item :via |
| 220 | |
| 221 | Use C<:via(MODULE)> either in open() or binmode() to install a layer |
| 222 | that does whatever transformation (for example compression / |
| 223 | decompression, encryption / decryption) to the filehandle. |
| 224 | See L<PerlIO::via> for more information. |
| 225 | |
| 226 | =back |
| 227 | |
| 228 | =head2 Alternatives to raw |
| 229 | |
| 230 | To get a binary stream an alternate method is to use: |
| 231 | |
| 232 | open($fh,"whatever") |
| 233 | binmode($fh); |
| 234 | |
| 235 | this has advantage of being backward compatible with how such things have |
| 236 | had to be coded on some platforms for years. |
| 237 | |
| 238 | To get an un-buffered stream specify an unbuffered layer (e.g. C<:unix>) |
| 239 | in the open call: |
| 240 | |
| 241 | open($fh,"<:unix",$path) |
| 242 | |
| 243 | =head2 Defaults and how to override them |
| 244 | |
| 245 | If the platform is MS-DOS like and normally does CRLF to "\n" |
| 246 | translation for text files then the default layers are : |
| 247 | |
| 248 | unix crlf |
| 249 | |
| 250 | (The low level "unix" layer may be replaced by a platform specific low |
| 251 | level layer.) |
| 252 | |
| 253 | Otherwise if C<Configure> found out how to do "fast" IO using system's |
| 254 | stdio, then the default layers are: |
| 255 | |
| 256 | unix stdio |
| 257 | |
| 258 | Otherwise the default layers are |
| 259 | |
| 260 | unix perlio |
| 261 | |
| 262 | These defaults may change once perlio has been better tested and tuned. |
| 263 | |
| 264 | The default can be overridden by setting the environment variable |
| 265 | PERLIO to a space separated list of layers (C<unix> or platform low |
| 266 | level layer is always pushed first). |
| 267 | |
| 268 | This can be used to see the effect of/bugs in the various layers e.g. |
| 269 | |
| 270 | cd .../perl/t |
| 271 | PERLIO=stdio ./perl harness |
| 272 | PERLIO=perlio ./perl harness |
| 273 | |
| 274 | For the various value of PERLIO see L<perlrun/PERLIO>. |
| 275 | |
| 276 | =head2 Querying the layers of filehandles |
| 277 | |
| 278 | The following returns the B<names> of the PerlIO layers on a filehandle. |
| 279 | |
| 280 | my @layers = PerlIO::get_layers($fh); # Or FH, *FH, "FH". |
| 281 | |
| 282 | The layers are returned in the order an open() or binmode() call would |
| 283 | use them. Note that the "default stack" depends on the operating |
| 284 | system and on the Perl version, and both the compile-time and |
| 285 | runtime configurations of Perl. |
| 286 | |
| 287 | The following table summarizes the default layers on UNIX-like and |
| 288 | DOS-like platforms and depending on the setting of the C<$ENV{PERLIO}>: |
| 289 | |
| 290 | PERLIO UNIX-like DOS-like |
| 291 | ------ --------- -------- |
| 292 | unset / "" unix perlio / stdio [1] unix crlf |
| 293 | stdio unix perlio / stdio [1] stdio |
| 294 | perlio unix perlio unix perlio |
| 295 | mmap unix mmap unix mmap |
| 296 | |
| 297 | # [1] "stdio" if Configure found out how to do "fast stdio" (depends |
| 298 | # on the stdio implementation) and in Perl 5.8, otherwise "unix perlio" |
| 299 | |
| 300 | By default the layers from the input side of the filehandle is |
| 301 | returned, to get the output side use the optional C<output> argument: |
| 302 | |
| 303 | my @layers = PerlIO::get_layers($fh, output => 1); |
| 304 | |
| 305 | (Usually the layers are identical on either side of a filehandle but |
| 306 | for example with sockets there may be differences, or if you have |
| 307 | been using the C<open> pragma.) |
| 308 | |
| 309 | There is no set_layers(), nor does get_layers() return a tied array |
| 310 | mirroring the stack, or anything fancy like that. This is not |
| 311 | accidental or unintentional. The PerlIO layer stack is a bit more |
| 312 | complicated than just a stack (see for example the behaviour of C<:raw>). |
| 313 | You are supposed to use open() and binmode() to manipulate the stack. |
| 314 | |
| 315 | B<Implementation details follow, please close your eyes.> |
| 316 | |
| 317 | The arguments to layers are by default returned in parenthesis after |
| 318 | the name of the layer, and certain layers (like C<utf8>) are not real |
| 319 | layers but instead flags on real layers: to get all of these returned |
| 320 | separately use the optional C<details> argument: |
| 321 | |
| 322 | my @layer_and_args_and_flags = PerlIO::get_layers($fh, details => 1); |
| 323 | |
| 324 | The result will be up to be three times the number of layers: |
| 325 | the first element will be a name, the second element the arguments |
| 326 | (unspecified arguments will be C<undef>), the third element the flags, |
| 327 | the fourth element a name again, and so forth. |
| 328 | |
| 329 | B<You may open your eyes now.> |
| 330 | |
| 331 | =head1 AUTHOR |
| 332 | |
| 333 | Nick Ing-Simmons E<lt>nick@ing-simmons.netE<gt> |
| 334 | |
| 335 | =head1 SEE ALSO |
| 336 | |
| 337 | L<perlfunc/"binmode">, L<perlfunc/"open">, L<perlunicode>, L<perliol>, |
| 338 | L<Encode> |
| 339 | |
| 340 | =cut |