Commit | Line | Data |
---|---|---|
86530b38 AT |
1 | =head1 NAME |
2 | ||
3 | perliol - C API for Perl's implementation of IO in Layers. | |
4 | ||
5 | =head1 SYNOPSIS | |
6 | ||
7 | /* Defining a layer ... */ | |
8 | #include <perliol.h> | |
9 | ||
10 | =head1 DESCRIPTION | |
11 | ||
12 | This document describes the behavior and implementation of the PerlIO | |
13 | abstraction described in L<perlapio> when C<USE_PERLIO> is defined (and | |
14 | C<USE_SFIO> is not). | |
15 | ||
16 | =head2 History and Background | |
17 | ||
18 | The PerlIO abstraction was introduced in perl5.003_02 but languished as | |
19 | just an abstraction until perl5.7.0. However during that time a number | |
20 | of perl extensions switched to using it, so the API is mostly fixed to | |
21 | maintain (source) compatibility. | |
22 | ||
23 | The aim of the implementation is to provide the PerlIO API in a flexible | |
24 | and platform neutral manner. It is also a trial of an "Object Oriented | |
25 | C, with vtables" approach which may be applied to perl6. | |
26 | ||
27 | =head2 Layers vs Disciplines | |
28 | ||
29 | Initial discussion of the ability to modify IO streams behaviour used | |
30 | the term "discipline" for the entities which were added. This came (I | |
31 | believe) from the use of the term in "sfio", which in turn borrowed it | |
32 | from "line disciplines" on Unix terminals. However, this document (and | |
33 | the C code) uses the term "layer". | |
34 | ||
35 | This is, I hope, a natural term given the implementation, and should | |
36 | avoid connotations that are inherent in earlier uses of "discipline" | |
37 | for things which are rather different. | |
38 | ||
39 | =head2 Data Structures | |
40 | ||
41 | The basic data structure is a PerlIOl: | |
42 | ||
43 | typedef struct _PerlIO PerlIOl; | |
44 | typedef struct _PerlIO_funcs PerlIO_funcs; | |
45 | typedef PerlIOl *PerlIO; | |
46 | ||
47 | struct _PerlIO | |
48 | { | |
49 | PerlIOl * next; /* Lower layer */ | |
50 | PerlIO_funcs * tab; /* Functions for this layer */ | |
51 | IV flags; /* Various flags for state */ | |
52 | }; | |
53 | ||
54 | A C<PerlIOl *> is a pointer to the struct, and the I<application> | |
55 | level C<PerlIO *> is a pointer to a C<PerlIOl *> - i.e. a pointer | |
56 | to a pointer to the struct. This allows the application level C<PerlIO *> | |
57 | to remain constant while the actual C<PerlIOl *> underneath | |
58 | changes. (Compare perl's C<SV *> which remains constant while its | |
59 | C<sv_any> field changes as the scalar's type changes.) An IO stream is | |
60 | then in general represented as a pointer to this linked-list of | |
61 | "layers". | |
62 | ||
63 | It should be noted that because of the double indirection in a C<PerlIO *>, | |
64 | a C<< &(perlio->next) >> "is" a C<PerlIO *>, and so to some degree | |
65 | at least one layer can use the "standard" API on the next layer down. | |
66 | ||
67 | A "layer" is composed of two parts: | |
68 | ||
69 | =over 4 | |
70 | ||
71 | =item 1. | |
72 | ||
73 | The functions and attributes of the "layer class". | |
74 | ||
75 | =item 2. | |
76 | ||
77 | The per-instance data for a particular handle. | |
78 | ||
79 | =back | |
80 | ||
81 | =head2 Functions and Attributes | |
82 | ||
83 | The functions and attributes are accessed via the "tab" (for table) | |
84 | member of C<PerlIOl>. The functions (methods of the layer "class") are | |
85 | fixed, and are defined by the C<PerlIO_funcs> type. They are broadly the | |
86 | same as the public C<PerlIO_xxxxx> functions: | |
87 | ||
88 | struct _PerlIO_funcs | |
89 | { | |
90 | Size_t fsize; | |
91 | char * name; | |
92 | Size_t size; | |
93 | IV kind; | |
94 | IV (*Pushed)(pTHX_ PerlIO *f,const char *mode,SV *arg, PerlIO_funcs *tab); | |
95 | IV (*Popped)(pTHX_ PerlIO *f); | |
96 | PerlIO * (*Open)(pTHX_ PerlIO_funcs *tab, | |
97 | AV *layers, IV n, | |
98 | const char *mode, | |
99 | int fd, int imode, int perm, | |
100 | PerlIO *old, | |
101 | int narg, SV **args); | |
102 | IV (*Binmode)(pTHX_ PerlIO *f); | |
103 | SV * (*Getarg)(pTHX_ PerlIO *f, CLONE_PARAMS *param, int flags) | |
104 | IV (*Fileno)(pTHX_ PerlIO *f); | |
105 | PerlIO * (*Dup)(pTHX_ PerlIO *f, PerlIO *o, CLONE_PARAMS *param, int flags) | |
106 | /* Unix-like functions - cf sfio line disciplines */ | |
107 | SSize_t (*Read)(pTHX_ PerlIO *f, void *vbuf, Size_t count); | |
108 | SSize_t (*Unread)(pTHX_ PerlIO *f, const void *vbuf, Size_t count); | |
109 | SSize_t (*Write)(pTHX_ PerlIO *f, const void *vbuf, Size_t count); | |
110 | IV (*Seek)(pTHX_ PerlIO *f, Off_t offset, int whence); | |
111 | Off_t (*Tell)(pTHX_ PerlIO *f); | |
112 | IV (*Close)(pTHX_ PerlIO *f); | |
113 | /* Stdio-like buffered IO functions */ | |
114 | IV (*Flush)(pTHX_ PerlIO *f); | |
115 | IV (*Fill)(pTHX_ PerlIO *f); | |
116 | IV (*Eof)(pTHX_ PerlIO *f); | |
117 | IV (*Error)(pTHX_ PerlIO *f); | |
118 | void (*Clearerr)(pTHX_ PerlIO *f); | |
119 | void (*Setlinebuf)(pTHX_ PerlIO *f); | |
120 | /* Perl's snooping functions */ | |
121 | STDCHAR * (*Get_base)(pTHX_ PerlIO *f); | |
122 | Size_t (*Get_bufsiz)(pTHX_ PerlIO *f); | |
123 | STDCHAR * (*Get_ptr)(pTHX_ PerlIO *f); | |
124 | SSize_t (*Get_cnt)(pTHX_ PerlIO *f); | |
125 | void (*Set_ptrcnt)(pTHX_ PerlIO *f,STDCHAR *ptr,SSize_t cnt); | |
126 | }; | |
127 | ||
128 | The first few members of the struct give a function table size for | |
129 | compatibility check "name" for the layer, the size to C<malloc> for the per-instance data, | |
130 | and some flags which are attributes of the class as whole (such as whether it is a buffering | |
131 | layer), then follow the functions which fall into four basic groups: | |
132 | ||
133 | =over 4 | |
134 | ||
135 | =item 1. | |
136 | ||
137 | Opening and setup functions | |
138 | ||
139 | =item 2. | |
140 | ||
141 | Basic IO operations | |
142 | ||
143 | =item 3. | |
144 | ||
145 | Stdio class buffering options. | |
146 | ||
147 | =item 4. | |
148 | ||
149 | Functions to support Perl's traditional "fast" access to the buffer. | |
150 | ||
151 | =back | |
152 | ||
153 | A layer does not have to implement all the functions, but the whole | |
154 | table has to be present. Unimplemented slots can be NULL (which will | |
155 | result in an error when called) or can be filled in with stubs to | |
156 | "inherit" behaviour from a "base class". This "inheritance" is fixed | |
157 | for all instances of the layer, but as the layer chooses which stubs | |
158 | to populate the table, limited "multiple inheritance" is possible. | |
159 | ||
160 | =head2 Per-instance Data | |
161 | ||
162 | The per-instance data are held in memory beyond the basic PerlIOl | |
163 | struct, by making a PerlIOl the first member of the layer's struct | |
164 | thus: | |
165 | ||
166 | typedef struct | |
167 | { | |
168 | struct _PerlIO base; /* Base "class" info */ | |
169 | STDCHAR * buf; /* Start of buffer */ | |
170 | STDCHAR * end; /* End of valid part of buffer */ | |
171 | STDCHAR * ptr; /* Current position in buffer */ | |
172 | Off_t posn; /* Offset of buf into the file */ | |
173 | Size_t bufsiz; /* Real size of buffer */ | |
174 | IV oneword; /* Emergency buffer */ | |
175 | } PerlIOBuf; | |
176 | ||
177 | In this way (as for perl's scalars) a pointer to a PerlIOBuf can be | |
178 | treated as a pointer to a PerlIOl. | |
179 | ||
180 | =head2 Layers in action. | |
181 | ||
182 | table perlio unix | |
183 | | | | |
184 | +-----------+ +----------+ +--------+ | |
185 | PerlIO ->| |--->| next |--->| NULL | | |
186 | +-----------+ +----------+ +--------+ | |
187 | | | | buffer | | fd | | |
188 | +-----------+ | | +--------+ | |
189 | | | +----------+ | |
190 | ||
191 | ||
192 | The above attempts to show how the layer scheme works in a simple case. | |
193 | The application's C<PerlIO *> points to an entry in the table(s) | |
194 | representing open (allocated) handles. For example the first three slots | |
195 | in the table correspond to C<stdin>,C<stdout> and C<stderr>. The table | |
196 | in turn points to the current "top" layer for the handle - in this case | |
197 | an instance of the generic buffering layer "perlio". That layer in turn | |
198 | points to the next layer down - in this case the lowlevel "unix" layer. | |
199 | ||
200 | The above is roughly equivalent to a "stdio" buffered stream, but with | |
201 | much more flexibility: | |
202 | ||
203 | =over 4 | |
204 | ||
205 | =item * | |
206 | ||
207 | If Unix level C<read>/C<write>/C<lseek> is not appropriate for (say) | |
208 | sockets then the "unix" layer can be replaced (at open time or even | |
209 | dynamically) with a "socket" layer. | |
210 | ||
211 | =item * | |
212 | ||
213 | Different handles can have different buffering schemes. The "top" | |
214 | layer could be the "mmap" layer if reading disk files was quicker | |
215 | using C<mmap> than C<read>. An "unbuffered" stream can be implemented | |
216 | simply by not having a buffer layer. | |
217 | ||
218 | =item * | |
219 | ||
220 | Extra layers can be inserted to process the data as it flows through. | |
221 | This was the driving need for including the scheme in perl 5.7.0+ - we | |
222 | needed a mechanism to allow data to be translated between perl's | |
223 | internal encoding (conceptually at least Unicode as UTF-8), and the | |
224 | "native" format used by the system. This is provided by the | |
225 | ":encoding(xxxx)" layer which typically sits above the buffering layer. | |
226 | ||
227 | =item * | |
228 | ||
229 | A layer can be added that does "\n" to CRLF translation. This layer | |
230 | can be used on any platform, not just those that normally do such | |
231 | things. | |
232 | ||
233 | =back | |
234 | ||
235 | =head2 Per-instance flag bits | |
236 | ||
237 | The generic flag bits are a hybrid of C<O_XXXXX> style flags deduced | |
238 | from the mode string passed to C<PerlIO_open()>, and state bits for | |
239 | typical buffer layers. | |
240 | ||
241 | =over 4 | |
242 | ||
243 | =item PERLIO_F_EOF | |
244 | ||
245 | End of file. | |
246 | ||
247 | =item PERLIO_F_CANWRITE | |
248 | ||
249 | Writes are permitted, i.e. opened as "w" or "r+" or "a", etc. | |
250 | ||
251 | =item PERLIO_F_CANREAD | |
252 | ||
253 | Reads are permitted i.e. opened "r" or "w+" (or even "a+" - ick). | |
254 | ||
255 | =item PERLIO_F_ERROR | |
256 | ||
257 | An error has occurred (for C<PerlIO_error()>). | |
258 | ||
259 | =item PERLIO_F_TRUNCATE | |
260 | ||
261 | Truncate file suggested by open mode. | |
262 | ||
263 | =item PERLIO_F_APPEND | |
264 | ||
265 | All writes should be appends. | |
266 | ||
267 | =item PERLIO_F_CRLF | |
268 | ||
269 | Layer is performing Win32-like "\n" mapped to CR,LF for output and CR,LF | |
270 | mapped to "\n" for input. Normally the provided "crlf" layer is the only | |
271 | layer that need bother about this. C<PerlIO_binmode()> will mess with this | |
272 | flag rather than add/remove layers if the C<PERLIO_K_CANCRLF> bit is set | |
273 | for the layers class. | |
274 | ||
275 | =item PERLIO_F_UTF8 | |
276 | ||
277 | Data written to this layer should be UTF-8 encoded; data provided | |
278 | by this layer should be considered UTF-8 encoded. Can be set on any layer | |
279 | by ":utf8" dummy layer. Also set on ":encoding" layer. | |
280 | ||
281 | =item PERLIO_F_UNBUF | |
282 | ||
283 | Layer is unbuffered - i.e. write to next layer down should occur for | |
284 | each write to this layer. | |
285 | ||
286 | =item PERLIO_F_WRBUF | |
287 | ||
288 | The buffer for this layer currently holds data written to it but not sent | |
289 | to next layer. | |
290 | ||
291 | =item PERLIO_F_RDBUF | |
292 | ||
293 | The buffer for this layer currently holds unconsumed data read from | |
294 | layer below. | |
295 | ||
296 | =item PERLIO_F_LINEBUF | |
297 | ||
298 | Layer is line buffered. Write data should be passed to next layer down | |
299 | whenever a "\n" is seen. Any data beyond the "\n" should then be | |
300 | processed. | |
301 | ||
302 | =item PERLIO_F_TEMP | |
303 | ||
304 | File has been C<unlink()>ed, or should be deleted on C<close()>. | |
305 | ||
306 | =item PERLIO_F_OPEN | |
307 | ||
308 | Handle is open. | |
309 | ||
310 | =item PERLIO_F_FASTGETS | |
311 | ||
312 | This instance of this layer supports the "fast C<gets>" interface. | |
313 | Normally set based on C<PERLIO_K_FASTGETS> for the class and by the | |
314 | existence of the function(s) in the table. However a class that | |
315 | normally provides that interface may need to avoid it on a | |
316 | particular instance. The "pending" layer needs to do this when | |
317 | it is pushed above a layer which does not support the interface. | |
318 | (Perl's C<sv_gets()> does not expect the streams fast C<gets> behaviour | |
319 | to change during one "get".) | |
320 | ||
321 | =back | |
322 | ||
323 | =head2 Methods in Detail | |
324 | ||
325 | =over 4 | |
326 | ||
327 | =item fsize | |
328 | ||
329 | Size_t fsize; | |
330 | ||
331 | Size of the function table. This is compared against the value PerlIO | |
332 | code "knows" as a compatibility check. Future versions I<may> be able | |
333 | to tolerate layers compiled against an old version of the headers. | |
334 | ||
335 | =item name | |
336 | ||
337 | char * name; | |
338 | ||
339 | The name of the layer whose open() method Perl should invoke on | |
340 | open(). For example if the layer is called APR, you will call: | |
341 | ||
342 | open $fh, ">:APR", ... | |
343 | ||
344 | and Perl knows that it has to invoke the PerlIOAPR_open() method | |
345 | implemented by the APR layer. | |
346 | ||
347 | =item size | |
348 | ||
349 | Size_t size; | |
350 | ||
351 | The size of the per-instance data structure, e.g.: | |
352 | ||
353 | sizeof(PerlIOAPR) | |
354 | ||
355 | If this field is zero then C<PerlIO_pushed> does not malloc anything | |
356 | and assumes layer's Pushed function will do any required layer stack | |
357 | manipulation - used to avoid malloc/free overhead for dummy layers. | |
358 | If the field is non-zero it must be at least the size of C<PerlIOl>, | |
359 | C<PerlIO_pushed> will allocate memory for the layer's data structures | |
360 | and link new layer onto the stream's stack. (If the layer's Pushed | |
361 | method returns an error indication the layer is popped again.) | |
362 | ||
363 | =item kind | |
364 | ||
365 | IV kind; | |
366 | ||
367 | =over 4 | |
368 | ||
369 | =item * PERLIO_K_BUFFERED | |
370 | ||
371 | The layer is buffered. | |
372 | ||
373 | =item * PERLIO_K_RAW | |
374 | ||
375 | The layer is acceptable to have in a binmode(FH) stack - i.e. it does not | |
376 | (or will configure itself not to) transform bytes passing through it. | |
377 | ||
378 | =item * PERLIO_K_CANCRLF | |
379 | ||
380 | Layer can translate between "\n" and CRLF line ends. | |
381 | ||
382 | =item * PERLIO_K_FASTGETS | |
383 | ||
384 | Layer allows buffer snooping. | |
385 | ||
386 | =item * PERLIO_K_MULTIARG | |
387 | ||
388 | Used when the layer's open() accepts more arguments than usual. The | |
389 | extra arguments should come not before the C<MODE> argument. When this | |
390 | flag is used it's up to the layer to validate the args. | |
391 | ||
392 | =back | |
393 | ||
394 | =item Pushed | |
395 | ||
396 | IV (*Pushed)(pTHX_ PerlIO *f,const char *mode, SV *arg); | |
397 | ||
398 | The only absolutely mandatory method. Called when the layer is pushed | |
399 | onto the stack. The C<mode> argument may be NULL if this occurs | |
400 | post-open. The C<arg> will be non-C<NULL> if an argument string was | |
401 | passed. In most cases this should call C<PerlIOBase_pushed()> to | |
402 | convert C<mode> into the appropriate C<PERLIO_F_XXXXX> flags in | |
403 | addition to any actions the layer itself takes. If a layer is not | |
404 | expecting an argument it need neither save the one passed to it, nor | |
405 | provide C<Getarg()> (it could perhaps C<Perl_warn> that the argument | |
406 | was un-expected). | |
407 | ||
408 | Returns 0 on success. On failure returns -1 and should set errno. | |
409 | ||
410 | =item Popped | |
411 | ||
412 | IV (*Popped)(pTHX_ PerlIO *f); | |
413 | ||
414 | Called when the layer is popped from the stack. A layer will normally | |
415 | be popped after C<Close()> is called. But a layer can be popped | |
416 | without being closed if the program is dynamically managing layers on | |
417 | the stream. In such cases C<Popped()> should free any resources | |
418 | (buffers, translation tables, ...) not held directly in the layer's | |
419 | struct. It should also C<Unread()> any unconsumed data that has been | |
420 | read and buffered from the layer below back to that layer, so that it | |
421 | can be re-provided to what ever is now above. | |
422 | ||
423 | Returns 0 on success and failure. | |
424 | ||
425 | =item Open | |
426 | ||
427 | PerlIO * (*Open)(...); | |
428 | ||
429 | The C<Open()> method has lots of arguments because it combines the | |
430 | functions of perl's C<open>, C<PerlIO_open>, perl's C<sysopen>, | |
431 | C<PerlIO_fdopen> and C<PerlIO_reopen>. The full prototype is as | |
432 | follows: | |
433 | ||
434 | PerlIO * (*Open)(pTHX_ PerlIO_funcs *tab, | |
435 | AV *layers, IV n, | |
436 | const char *mode, | |
437 | int fd, int imode, int perm, | |
438 | PerlIO *old, | |
439 | int narg, SV **args); | |
440 | ||
441 | Open should (perhaps indirectly) call C<PerlIO_allocate()> to allocate | |
442 | a slot in the table and associate it with the layers information for | |
443 | the opened file, by calling C<PerlIO_push>. The I<layers> AV is an | |
444 | array of all the layers destined for the C<PerlIO *>, and any | |
445 | arguments passed to them, I<n> is the index into that array of the | |
446 | layer being called. The macro C<PerlIOArg> will return a (possibly | |
447 | C<NULL>) SV * for the argument passed to the layer. | |
448 | ||
449 | The I<mode> string is an "C<fopen()>-like" string which would match | |
450 | the regular expression C</^[I#]?[rwa]\+?[bt]?$/>. | |
451 | ||
452 | The C<'I'> prefix is used during creation of C<stdin>..C<stderr> via | |
453 | special C<PerlIO_fdopen> calls; the C<'#'> prefix means that this is | |
454 | C<sysopen> and that I<imode> and I<perm> should be passed to | |
455 | C<PerlLIO_open3>; C<'r'> means B<r>ead, C<'w'> means B<w>rite and | |
456 | C<'a'> means B<a>ppend. The C<'+'> suffix means that both reading and | |
457 | writing/appending are permitted. The C<'b'> suffix means file should | |
458 | be binary, and C<'t'> means it is text. (Almost all layers should do | |
459 | the IO in binary mode, and ignore the b/t bits. The C<:crlf> layer | |
460 | should be pushed to handle the distinction.) | |
461 | ||
462 | If I<old> is not C<NULL> then this is a C<PerlIO_reopen>. Perl itself | |
463 | does not use this (yet?) and semantics are a little vague. | |
464 | ||
465 | If I<fd> not negative then it is the numeric file descriptor I<fd>, | |
466 | which will be open in a manner compatible with the supplied mode | |
467 | string, the call is thus equivalent to C<PerlIO_fdopen>. In this case | |
468 | I<nargs> will be zero. | |
469 | ||
470 | If I<nargs> is greater than zero then it gives the number of arguments | |
471 | passed to C<open>, otherwise it will be 1 if for example | |
472 | C<PerlIO_open> was called. In simple cases SvPV_nolen(*args) is the | |
473 | pathname to open. | |
474 | ||
475 | Having said all that translation-only layers do not need to provide | |
476 | C<Open()> at all, but rather leave the opening to a lower level layer | |
477 | and wait to be "pushed". If a layer does provide C<Open()> it should | |
478 | normally call the C<Open()> method of next layer down (if any) and | |
479 | then push itself on top if that succeeds. | |
480 | ||
481 | Returns C<NULL> on failure. | |
482 | ||
483 | =item Binmode | |
484 | ||
485 | IV (*Binmode)(pTHX_ PerlIO *f); | |
486 | ||
487 | Optional. Used when C<:raw> layer is pushed (explicitly or as a result | |
488 | of binmode(FH)). If not present layer will be popped. If present | |
489 | should configure layer as binary (or pop itself) and return 0. | |
490 | If it returns -1 for error C<binmode> will fail with layer | |
491 | still on the stack. | |
492 | ||
493 | =item Getarg | |
494 | ||
495 | SV * (*Getarg)(pTHX_ PerlIO *f, | |
496 | CLONE_PARAMS *param, int flags); | |
497 | ||
498 | Optional. If present should return an SV * representing the string | |
499 | argument passed to the layer when it was | |
500 | pushed. e.g. ":encoding(ascii)" would return an SvPV with value | |
501 | "ascii". (I<param> and I<flags> arguments can be ignored in most | |
502 | cases) | |
503 | ||
504 | =item Fileno | |
505 | ||
506 | IV (*Fileno)(pTHX_ PerlIO *f); | |
507 | ||
508 | Returns the Unix/Posix numeric file descriptor for the handle. Normally | |
509 | C<PerlIOBase_fileno()> (which just asks next layer down) will suffice | |
510 | for this. | |
511 | ||
512 | Returns -1 on error, which is considered to include the case where the | |
513 | layer cannot provide such a file descriptor. | |
514 | ||
515 | =item Dup | |
516 | ||
517 | PerlIO * (*Dup)(pTHX_ PerlIO *f, PerlIO *o, | |
518 | CLONE_PARAMS *param, int flags); | |
519 | ||
520 | XXX: Needs more docs. | |
521 | ||
522 | Used as part of the "clone" process when a thread is spawned (in which | |
523 | case param will be non-NULL) and when a stream is being duplicated via | |
524 | '&' in the C<open>. | |
525 | ||
526 | Similar to C<Open>, returns PerlIO* on success, C<NULL> on failure. | |
527 | ||
528 | =item Read | |
529 | ||
530 | SSize_t (*Read)(pTHX_ PerlIO *f, void *vbuf, Size_t count); | |
531 | ||
532 | Basic read operation. | |
533 | ||
534 | Typically will call C<Fill> and manipulate pointers (possibly via the | |
535 | API). C<PerlIOBuf_read()> may be suitable for derived classes which | |
536 | provide "fast gets" methods. | |
537 | ||
538 | Returns actual bytes read, or -1 on an error. | |
539 | ||
540 | =item Unread | |
541 | ||
542 | SSize_t (*Unread)(pTHX_ PerlIO *f, | |
543 | const void *vbuf, Size_t count); | |
544 | ||
545 | A superset of stdio's C<ungetc()>. Should arrange for future reads to | |
546 | see the bytes in C<vbuf>. If there is no obviously better implementation | |
547 | then C<PerlIOBase_unread()> provides the function by pushing a "fake" | |
548 | "pending" layer above the calling layer. | |
549 | ||
550 | Returns the number of unread chars. | |
551 | ||
552 | =item Write | |
553 | ||
554 | SSize_t (*Write)(PerlIO *f, const void *vbuf, Size_t count); | |
555 | ||
556 | Basic write operation. | |
557 | ||
558 | Returns bytes written or -1 on an error. | |
559 | ||
560 | =item Seek | |
561 | ||
562 | IV (*Seek)(pTHX_ PerlIO *f, Off_t offset, int whence); | |
563 | ||
564 | Position the file pointer. Should normally call its own C<Flush> | |
565 | method and then the C<Seek> method of next layer down. | |
566 | ||
567 | Returns 0 on success, -1 on failure. | |
568 | ||
569 | =item Tell | |
570 | ||
571 | Off_t (*Tell)(pTHX_ PerlIO *f); | |
572 | ||
573 | Return the file pointer. May be based on layers cached concept of | |
574 | position to avoid overhead. | |
575 | ||
576 | Returns -1 on failure to get the file pointer. | |
577 | ||
578 | =item Close | |
579 | ||
580 | IV (*Close)(pTHX_ PerlIO *f); | |
581 | ||
582 | Close the stream. Should normally call C<PerlIOBase_close()> to flush | |
583 | itself and close layers below, and then deallocate any data structures | |
584 | (buffers, translation tables, ...) not held directly in the data | |
585 | structure. | |
586 | ||
587 | Returns 0 on success, -1 on failure. | |
588 | ||
589 | =item Flush | |
590 | ||
591 | IV (*Flush)(pTHX_ PerlIO *f); | |
592 | ||
593 | Should make stream's state consistent with layers below. That is, any | |
594 | buffered write data should be written, and file position of lower layers | |
595 | adjusted for data read from below but not actually consumed. | |
596 | (Should perhaps C<Unread()> such data to the lower layer.) | |
597 | ||
598 | Returns 0 on success, -1 on failure. | |
599 | ||
600 | =item Fill | |
601 | ||
602 | IV (*Fill)(pTHX_ PerlIO *f); | |
603 | ||
604 | The buffer for this layer should be filled (for read) from layer | |
605 | below. When you "subclass" PerlIOBuf layer, you want to use its | |
606 | I<_read> method and to supply your own fill method, which fills the | |
607 | PerlIOBuf's buffer. | |
608 | ||
609 | Returns 0 on success, -1 on failure. | |
610 | ||
611 | =item Eof | |
612 | ||
613 | IV (*Eof)(pTHX_ PerlIO *f); | |
614 | ||
615 | Return end-of-file indicator. C<PerlIOBase_eof()> is normally sufficient. | |
616 | ||
617 | Returns 0 on end-of-file, 1 if not end-of-file, -1 on error. | |
618 | ||
619 | =item Error | |
620 | ||
621 | IV (*Error)(pTHX_ PerlIO *f); | |
622 | ||
623 | Return error indicator. C<PerlIOBase_error()> is normally sufficient. | |
624 | ||
625 | Returns 1 if there is an error (usually when C<PERLIO_F_ERROR> is set, | |
626 | 0 otherwise. | |
627 | ||
628 | =item Clearerr | |
629 | ||
630 | void (*Clearerr)(pTHX_ PerlIO *f); | |
631 | ||
632 | Clear end-of-file and error indicators. Should call C<PerlIOBase_clearerr()> | |
633 | to set the C<PERLIO_F_XXXXX> flags, which may suffice. | |
634 | ||
635 | =item Setlinebuf | |
636 | ||
637 | void (*Setlinebuf)(pTHX_ PerlIO *f); | |
638 | ||
639 | Mark the stream as line buffered. C<PerlIOBase_setlinebuf()> sets the | |
640 | PERLIO_F_LINEBUF flag and is normally sufficient. | |
641 | ||
642 | =item Get_base | |
643 | ||
644 | STDCHAR * (*Get_base)(pTHX_ PerlIO *f); | |
645 | ||
646 | Allocate (if not already done so) the read buffer for this layer and | |
647 | return pointer to it. Return NULL on failure. | |
648 | ||
649 | =item Get_bufsiz | |
650 | ||
651 | Size_t (*Get_bufsiz)(pTHX_ PerlIO *f); | |
652 | ||
653 | Return the number of bytes that last C<Fill()> put in the buffer. | |
654 | ||
655 | =item Get_ptr | |
656 | ||
657 | STDCHAR * (*Get_ptr)(pTHX_ PerlIO *f); | |
658 | ||
659 | Return the current read pointer relative to this layer's buffer. | |
660 | ||
661 | =item Get_cnt | |
662 | ||
663 | SSize_t (*Get_cnt)(pTHX_ PerlIO *f); | |
664 | ||
665 | Return the number of bytes left to be read in the current buffer. | |
666 | ||
667 | =item Set_ptrcnt | |
668 | ||
669 | void (*Set_ptrcnt)(pTHX_ PerlIO *f, | |
670 | STDCHAR *ptr, SSize_t cnt); | |
671 | ||
672 | Adjust the read pointer and count of bytes to match C<ptr> and/or C<cnt>. | |
673 | The application (or layer above) must ensure they are consistent. | |
674 | (Checking is allowed by the paranoid.) | |
675 | ||
676 | =back | |
677 | ||
678 | ||
679 | =head2 Core Layers | |
680 | ||
681 | The file C<perlio.c> provides the following layers: | |
682 | ||
683 | =over 4 | |
684 | ||
685 | =item "unix" | |
686 | ||
687 | A basic non-buffered layer which calls Unix/POSIX C<read()>, C<write()>, | |
688 | C<lseek()>, C<close()>. No buffering. Even on platforms that distinguish | |
689 | between O_TEXT and O_BINARY this layer is always O_BINARY. | |
690 | ||
691 | =item "perlio" | |
692 | ||
693 | A very complete generic buffering layer which provides the whole of | |
694 | PerlIO API. It is also intended to be used as a "base class" for other | |
695 | layers. (For example its C<Read()> method is implemented in terms of | |
696 | the C<Get_cnt()>/C<Get_ptr()>/C<Set_ptrcnt()> methods). | |
697 | ||
698 | "perlio" over "unix" provides a complete replacement for stdio as seen | |
699 | via PerlIO API. This is the default for USE_PERLIO when system's stdio | |
700 | does not permit perl's "fast gets" access, and which do not | |
701 | distinguish between C<O_TEXT> and C<O_BINARY>. | |
702 | ||
703 | =item "stdio" | |
704 | ||
705 | A layer which provides the PerlIO API via the layer scheme, but | |
706 | implements it by calling system's stdio. This is (currently) the default | |
707 | if system's stdio provides sufficient access to allow perl's "fast gets" | |
708 | access and which do not distinguish between C<O_TEXT> and C<O_BINARY>. | |
709 | ||
710 | =item "crlf" | |
711 | ||
712 | A layer derived using "perlio" as a base class. It provides Win32-like | |
713 | "\n" to CR,LF translation. Can either be applied above "perlio" or serve | |
714 | as the buffer layer itself. "crlf" over "unix" is the default if system | |
715 | distinguishes between C<O_TEXT> and C<O_BINARY> opens. (At some point | |
716 | "unix" will be replaced by a "native" Win32 IO layer on that platform, | |
717 | as Win32's read/write layer has various drawbacks.) The "crlf" layer is | |
718 | a reasonable model for a layer which transforms data in some way. | |
719 | ||
720 | =item "mmap" | |
721 | ||
722 | If Configure detects C<mmap()> functions this layer is provided (with | |
723 | "perlio" as a "base") which does "read" operations by mmap()ing the | |
724 | file. Performance improvement is marginal on modern systems, so it is | |
725 | mainly there as a proof of concept. It is likely to be unbundled from | |
726 | the core at some point. The "mmap" layer is a reasonable model for a | |
727 | minimalist "derived" layer. | |
728 | ||
729 | =item "pending" | |
730 | ||
731 | An "internal" derivative of "perlio" which can be used to provide | |
732 | Unread() function for layers which have no buffer or cannot be | |
733 | bothered. (Basically this layer's C<Fill()> pops itself off the stack | |
734 | and so resumes reading from layer below.) | |
735 | ||
736 | =item "raw" | |
737 | ||
738 | A dummy layer which never exists on the layer stack. Instead when | |
739 | "pushed" it actually pops the stack removing itself, it then calls | |
740 | Binmode function table entry on all the layers in the stack - normally | |
741 | this (via PerlIOBase_binmode) removes any layers which do not have | |
742 | C<PERLIO_K_RAW> bit set. Layers can modify that behaviour by defining | |
743 | their own Binmode entry. | |
744 | ||
745 | =item "utf8" | |
746 | ||
747 | Another dummy layer. When pushed it pops itself and sets the | |
748 | C<PERLIO_F_UTF8> flag on the layer which was (and now is once more) | |
749 | the top of the stack. | |
750 | ||
751 | =back | |
752 | ||
753 | In addition F<perlio.c> also provides a number of C<PerlIOBase_xxxx()> | |
754 | functions which are intended to be used in the table slots of classes | |
755 | which do not need to do anything special for a particular method. | |
756 | ||
757 | =head2 Extension Layers | |
758 | ||
759 | Layers can made available by extension modules. When an unknown layer | |
760 | is encountered the PerlIO code will perform the equivalent of : | |
761 | ||
762 | use PerlIO 'layer'; | |
763 | ||
764 | Where I<layer> is the unknown layer. F<PerlIO.pm> will then attempt to: | |
765 | ||
766 | require PerlIO::layer; | |
767 | ||
768 | If after that process the layer is still not defined then the C<open> | |
769 | will fail. | |
770 | ||
771 | The following extension layers are bundled with perl: | |
772 | ||
773 | =over 4 | |
774 | ||
775 | =item ":encoding" | |
776 | ||
777 | use Encoding; | |
778 | ||
779 | makes this layer available, although F<PerlIO.pm> "knows" where to | |
780 | find it. It is an example of a layer which takes an argument as it is | |
781 | called thus: | |
782 | ||
783 | open( $fh, "<:encoding(iso-8859-7)", $pathname ); | |
784 | ||
785 | =item ":scalar" | |
786 | ||
787 | Provides support for reading data from and writing data to a scalar. | |
788 | ||
789 | open( $fh, "+<:scalar", \$scalar ); | |
790 | ||
791 | When a handle is so opened, then reads get bytes from the string value | |
792 | of I<$scalar>, and writes change the value. In both cases the position | |
793 | in I<$scalar> starts as zero but can be altered via C<seek>, and | |
794 | determined via C<tell>. | |
795 | ||
796 | Please note that this layer is implied when calling open() thus: | |
797 | ||
798 | open( $fh, "+<", \$scalar ); | |
799 | ||
800 | =item ":via" | |
801 | ||
802 | Provided to allow layers to be implemented as Perl code. For instance: | |
803 | ||
804 | use PerlIO::via::StripHTML; | |
805 | open( my $fh, "<:via(StripHTML)", "index.html" ); | |
806 | ||
807 | See L<PerlIO::via> for details. | |
808 | ||
809 | =back | |
810 | ||
811 | =head1 TODO | |
812 | ||
813 | Things that need to be done to improve this document. | |
814 | ||
815 | =over | |
816 | ||
817 | =item * | |
818 | ||
819 | Explain how to make a valid fh without going through open()(i.e. apply | |
820 | a layer). For example if the file is not opened through perl, but we | |
821 | want to get back a fh, like it was opened by Perl. | |
822 | ||
823 | How PerlIO_apply_layera fits in, where its docs, was it made public? | |
824 | ||
825 | Currently the example could be something like this: | |
826 | ||
827 | PerlIO *foo_to_PerlIO(pTHX_ char *mode, ...) | |
828 | { | |
829 | char *mode; /* "w", "r", etc */ | |
830 | const char *layers = ":APR"; /* the layer name */ | |
831 | PerlIO *f = PerlIO_allocate(aTHX); | |
832 | if (!f) { | |
833 | return NULL; | |
834 | } | |
835 | ||
836 | PerlIO_apply_layers(aTHX_ f, mode, layers); | |
837 | ||
838 | if (f) { | |
839 | PerlIOAPR *st = PerlIOSelf(f, PerlIOAPR); | |
840 | /* fill in the st struct, as in _open() */ | |
841 | st->file = file; | |
842 | PerlIOBase(f)->flags |= PERLIO_F_OPEN; | |
843 | ||
844 | return f; | |
845 | } | |
846 | return NULL; | |
847 | } | |
848 | ||
849 | =item * | |
850 | ||
851 | fix/add the documentation in places marked as XXX. | |
852 | ||
853 | =item * | |
854 | ||
855 | The handling of errors by the layer is not specified. e.g. when $! | |
856 | should be set explicitly, when the error handling should be just | |
857 | delegated to the top layer. | |
858 | ||
859 | Probably give some hints on using SETERRNO() or pointers to where they | |
860 | can be found. | |
861 | ||
862 | =item * | |
863 | ||
864 | I think it would help to give some concrete examples to make it easier | |
865 | to understand the API. Of course I agree that the API has to be | |
866 | concise, but since there is no second document that is more of a | |
867 | guide, I think that it'd make it easier to start with the doc which is | |
868 | an API, but has examples in it in places where things are unclear, to | |
869 | a person who is not a PerlIO guru (yet). | |
870 | ||
871 | =back | |
872 | ||
873 | =cut |