Commit | Line | Data |
---|---|---|
920dae64 AT |
1 | package PerlIO; |
2 | ||
3 | our $VERSION = '1.04'; | |
4 | ||
5 | # Map layer name to package that defines it | |
6 | our %alias; | |
7 | ||
8 | sub import | |
9 | { | |
10 | my $class = shift; | |
11 | while (@_) | |
12 | { | |
13 | my $layer = shift; | |
14 | if (exists $alias{$layer}) | |
15 | { | |
16 | $layer = $alias{$layer} | |
17 | } | |
18 | else | |
19 | { | |
20 | $layer = "${class}::$layer"; | |
21 | } | |
22 | eval "require $layer"; | |
23 | warn $@ if $@; | |
24 | } | |
25 | } | |
26 | ||
27 | sub F_UTF8 () { 0x8000 } | |
28 | ||
29 | 1; | |
30 | __END__ | |
31 | ||
32 | =head1 NAME | |
33 | ||
34 | PerlIO - On demand loader for PerlIO layers and root of PerlIO::* name space | |
35 | ||
36 | =head1 SYNOPSIS | |
37 | ||
38 | open($fh,"<:crlf", "my.txt"); # support platform-native and CRLF text files | |
39 | ||
40 | open($fh,"<","his.jpg"); # portably open a binary file for reading | |
41 | binmode($fh); | |
42 | ||
43 | Shell: | |
44 | PERLIO=perlio perl .... | |
45 | ||
46 | =head1 DESCRIPTION | |
47 | ||
48 | When an undefined layer 'foo' is encountered in an C<open> or | |
49 | C<binmode> layer specification then C code performs the equivalent of: | |
50 | ||
51 | use PerlIO 'foo'; | |
52 | ||
53 | The perl code in PerlIO.pm then attempts to locate a layer by doing | |
54 | ||
55 | require PerlIO::foo; | |
56 | ||
57 | Otherwise the C<PerlIO> package is a place holder for additional | |
58 | PerlIO related functions. | |
59 | ||
60 | The following layers are currently defined: | |
61 | ||
62 | =over 4 | |
63 | ||
64 | =item :unix | |
65 | ||
66 | Lowest level layer which provides basic PerlIO operations in terms of | |
67 | UNIX/POSIX numeric file descriptor calls | |
68 | (open(), read(), write(), lseek(), close()). | |
69 | ||
70 | =item :stdio | |
71 | ||
72 | Layer which calls C<fread>, C<fwrite> and C<fseek>/C<ftell> etc. Note | |
73 | that as this is "real" stdio it will ignore any layers beneath it and | |
74 | got straight to the operating system via the C library as usual. | |
75 | ||
76 | =item :perlio | |
77 | ||
78 | A from scratch implementation of buffering for PerlIO. Provides fast | |
79 | access to the buffer for C<sv_gets> which implements perl's readline/E<lt>E<gt> | |
80 | and in general attempts to minimize data copying. | |
81 | ||
82 | C<:perlio> will insert a C<:unix> layer below itself to do low level IO. | |
83 | ||
84 | =item :crlf | |
85 | ||
86 | A layer that implements DOS/Windows like CRLF line endings. On read | |
87 | converts pairs of CR,LF to a single "\n" newline character. On write | |
88 | converts each "\n" to a CR,LF pair. Note that this layer likes to be | |
89 | one of its kind: it silently ignores attempts to be pushed into the | |
90 | layer stack more than once. | |
91 | ||
92 | It currently does I<not> mimic MS-DOS as far as treating of Control-Z | |
93 | as being an end-of-file marker. | |
94 | ||
95 | (Gory details follow) To be more exact what happens is this: after | |
96 | pushing itself to the stack, the C<:crlf> layer checks all the layers | |
97 | below itself to find the first layer that is capable of being a CRLF | |
98 | layer but is not yet enabled to be a CRLF layer. If it finds such a | |
99 | layer, it enables the CRLFness of that other deeper layer, and then | |
100 | pops itself off the stack. If not, fine, use the one we just pushed. | |
101 | ||
102 | The end result is that a C<:crlf> means "please enable the first CRLF | |
103 | layer you can find, and if you can't find one, here would be a good | |
104 | spot to place a new one." | |
105 | ||
106 | Based on the C<:perlio> layer. | |
107 | ||
108 | =item :mmap | |
109 | ||
110 | A layer which implements "reading" of files by using C<mmap()> to | |
111 | make (whole) file appear in the process's address space, and then | |
112 | using that as PerlIO's "buffer". This I<may> be faster in certain | |
113 | circumstances for large files, and may result in less physical memory | |
114 | use when multiple processes are reading the same file. | |
115 | ||
116 | Files which are not C<mmap()>-able revert to behaving like the C<:perlio> | |
117 | layer. Writes also behave like C<:perlio> layer as C<mmap()> for write | |
118 | needs extra house-keeping (to extend the file) which negates any advantage. | |
119 | ||
120 | The C<:mmap> layer will not exist if platform does not support C<mmap()>. | |
121 | ||
122 | =item :utf8 | |
123 | ||
124 | Declares that the stream accepts perl's internal encoding of | |
125 | characters. (Which really is UTF-8 on ASCII machines, but is | |
126 | UTF-EBCDIC on EBCDIC machines.) This allows any character perl can | |
127 | represent to be read from or written to the stream. The UTF-X encoding | |
128 | is chosen to render simple text parts (i.e. non-accented letters, | |
129 | digits and common punctuation) human readable in the encoded file. | |
130 | ||
131 | Here is how to write your native data out using UTF-8 (or UTF-EBCDIC) | |
132 | and then read it back in. | |
133 | ||
134 | open(F, ">:utf8", "data.utf"); | |
135 | print F $out; | |
136 | close(F); | |
137 | ||
138 | open(F, "<:utf8", "data.utf"); | |
139 | $in = <F>; | |
140 | close(F); | |
141 | ||
142 | =item :bytes | |
143 | ||
144 | This is the inverse of C<:utf8> layer. It turns off the flag | |
145 | on the layer below so that data read from it is considered to | |
146 | be "octets" i.e. characters in range 0..255 only. Likewise | |
147 | on output perl will warn if a "wide" character is written | |
148 | to a such a stream. | |
149 | ||
150 | =item :raw | |
151 | ||
152 | The C<:raw> layer is I<defined> as being identical to calling | |
153 | C<binmode($fh)> - the stream is made suitable for passing binary data | |
154 | i.e. each byte is passed as-is. The stream will still be | |
155 | buffered. | |
156 | ||
157 | In Perl 5.6 and some books the C<:raw> layer (previously sometimes also | |
158 | referred to as a "discipline") is documented as the inverse of the | |
159 | C<:crlf> layer. That is no longer the case - other layers which would | |
160 | alter binary nature of the stream are also disabled. If you want UNIX | |
161 | line endings on a platform that normally does CRLF translation, but still | |
162 | want UTF-8 or encoding defaults the appropriate thing to do is to add | |
163 | C<:perlio> to PERLIO environment variable. | |
164 | ||
165 | The implementation of C<:raw> is as a pseudo-layer which when "pushed" | |
166 | pops itself and then any layers which do not declare themselves as suitable | |
167 | for binary data. (Undoing :utf8 and :crlf are implemented by clearing | |
168 | flags rather than popping layers but that is an implementation detail.) | |
169 | ||
170 | As a consequence of the fact that C<:raw> normally pops layers | |
171 | it usually only makes sense to have it as the only or first element in | |
172 | a layer specification. When used as the first element it provides | |
173 | a known base on which to build e.g. | |
174 | ||
175 | open($fh,":raw:utf8",...) | |
176 | ||
177 | will construct a "binary" stream, but then enable UTF-8 translation. | |
178 | ||
179 | =item :pop | |
180 | ||
181 | A pseudo layer that removes the top-most layer. Gives perl code | |
182 | a way to manipulate the layer stack. Should be considered | |
183 | as experimental. Note that C<:pop> only works on real layers | |
184 | and will not undo the effects of pseudo layers like C<:utf8>. | |
185 | An example of a possible use might be: | |
186 | ||
187 | open($fh,...) | |
188 | ... | |
189 | binmode($fh,":encoding(...)"); # next chunk is encoded | |
190 | ... | |
191 | binmode($fh,":pop"); # back to un-encoded | |
192 | ||
193 | A more elegant (and safer) interface is needed. | |
194 | ||
195 | =item :win32 | |
196 | ||
197 | On Win32 platforms this I<experimental> layer uses native "handle" IO | |
198 | rather than unix-like numeric file descriptor layer. Known to be | |
199 | buggy as of perl 5.8.2. | |
200 | ||
201 | =back | |
202 | ||
203 | =head2 Custom Layers | |
204 | ||
205 | It is possible to write custom layers in addition to the above builtin | |
206 | ones, both in C/XS and Perl. Two such layers (and one example written | |
207 | in Perl using the latter) come with the Perl distribution. | |
208 | ||
209 | =over 4 | |
210 | ||
211 | =item :encoding | |
212 | ||
213 | Use C<:encoding(ENCODING)> either in open() or binmode() to install | |
214 | a layer that does transparently character set and encoding transformations, | |
215 | for example from Shift-JIS to Unicode. Note that under C<stdio> | |
216 | an C<:encoding> also enables C<:utf8>. See L<PerlIO::encoding> | |
217 | for more information. | |
218 | ||
219 | =item :via | |
220 | ||
221 | Use C<:via(MODULE)> either in open() or binmode() to install a layer | |
222 | that does whatever transformation (for example compression / | |
223 | decompression, encryption / decryption) to the filehandle. | |
224 | See L<PerlIO::via> for more information. | |
225 | ||
226 | =back | |
227 | ||
228 | =head2 Alternatives to raw | |
229 | ||
230 | To get a binary stream an alternate method is to use: | |
231 | ||
232 | open($fh,"whatever") | |
233 | binmode($fh); | |
234 | ||
235 | this has advantage of being backward compatible with how such things have | |
236 | had to be coded on some platforms for years. | |
237 | ||
238 | To get an un-buffered stream specify an unbuffered layer (e.g. C<:unix>) | |
239 | in the open call: | |
240 | ||
241 | open($fh,"<:unix",$path) | |
242 | ||
243 | =head2 Defaults and how to override them | |
244 | ||
245 | If the platform is MS-DOS like and normally does CRLF to "\n" | |
246 | translation for text files then the default layers are : | |
247 | ||
248 | unix crlf | |
249 | ||
250 | (The low level "unix" layer may be replaced by a platform specific low | |
251 | level layer.) | |
252 | ||
253 | Otherwise if C<Configure> found out how to do "fast" IO using system's | |
254 | stdio, then the default layers are: | |
255 | ||
256 | unix stdio | |
257 | ||
258 | Otherwise the default layers are | |
259 | ||
260 | unix perlio | |
261 | ||
262 | These defaults may change once perlio has been better tested and tuned. | |
263 | ||
264 | The default can be overridden by setting the environment variable | |
265 | PERLIO to a space separated list of layers (C<unix> or platform low | |
266 | level layer is always pushed first). | |
267 | ||
268 | This can be used to see the effect of/bugs in the various layers e.g. | |
269 | ||
270 | cd .../perl/t | |
271 | PERLIO=stdio ./perl harness | |
272 | PERLIO=perlio ./perl harness | |
273 | ||
274 | For the various value of PERLIO see L<perlrun/PERLIO>. | |
275 | ||
276 | =head2 Querying the layers of filehandles | |
277 | ||
278 | The following returns the B<names> of the PerlIO layers on a filehandle. | |
279 | ||
280 | my @layers = PerlIO::get_layers($fh); # Or FH, *FH, "FH". | |
281 | ||
282 | The layers are returned in the order an open() or binmode() call would | |
283 | use them. Note that the "default stack" depends on the operating | |
284 | system and on the Perl version, and both the compile-time and | |
285 | runtime configurations of Perl. | |
286 | ||
287 | The following table summarizes the default layers on UNIX-like and | |
288 | DOS-like platforms and depending on the setting of the C<$ENV{PERLIO}>: | |
289 | ||
290 | PERLIO UNIX-like DOS-like | |
291 | ------ --------- -------- | |
292 | unset / "" unix perlio / stdio [1] unix crlf | |
293 | stdio unix perlio / stdio [1] stdio | |
294 | perlio unix perlio unix perlio | |
295 | mmap unix mmap unix mmap | |
296 | ||
297 | # [1] "stdio" if Configure found out how to do "fast stdio" (depends | |
298 | # on the stdio implementation) and in Perl 5.8, otherwise "unix perlio" | |
299 | ||
300 | By default the layers from the input side of the filehandle is | |
301 | returned, to get the output side use the optional C<output> argument: | |
302 | ||
303 | my @layers = PerlIO::get_layers($fh, output => 1); | |
304 | ||
305 | (Usually the layers are identical on either side of a filehandle but | |
306 | for example with sockets there may be differences, or if you have | |
307 | been using the C<open> pragma.) | |
308 | ||
309 | There is no set_layers(), nor does get_layers() return a tied array | |
310 | mirroring the stack, or anything fancy like that. This is not | |
311 | accidental or unintentional. The PerlIO layer stack is a bit more | |
312 | complicated than just a stack (see for example the behaviour of C<:raw>). | |
313 | You are supposed to use open() and binmode() to manipulate the stack. | |
314 | ||
315 | B<Implementation details follow, please close your eyes.> | |
316 | ||
317 | The arguments to layers are by default returned in parenthesis after | |
318 | the name of the layer, and certain layers (like C<utf8>) are not real | |
319 | layers but instead flags on real layers: to get all of these returned | |
320 | separately use the optional C<details> argument: | |
321 | ||
322 | my @layer_and_args_and_flags = PerlIO::get_layers($fh, details => 1); | |
323 | ||
324 | The result will be up to be three times the number of layers: | |
325 | the first element will be a name, the second element the arguments | |
326 | (unspecified arguments will be C<undef>), the third element the flags, | |
327 | the fourth element a name again, and so forth. | |
328 | ||
329 | B<You may open your eyes now.> | |
330 | ||
331 | =head1 AUTHOR | |
332 | ||
333 | Nick Ing-Simmons E<lt>nick@ing-simmons.netE<gt> | |
334 | ||
335 | =head1 SEE ALSO | |
336 | ||
337 | L<perlfunc/"binmode">, L<perlfunc/"open">, L<perlunicode>, L<perliol>, | |
338 | L<Encode> | |
339 | ||
340 | =cut |