Initial commit of OpenSPARC T2 architecture model.
[OpenSPARC-T2-SAM] / sam-t2 / devtools / v8plus / man / man3 / Encode::PerlIO.3
CommitLineData
920dae64
AT
1.\" Automatically generated by Pod::Man v1.37, Pod::Parser v1.32
2.\"
3.\" Standard preamble:
4.\" ========================================================================
5.de Sh \" Subsection heading
6.br
7.if t .Sp
8.ne 5
9.PP
10\fB\\$1\fR
11.PP
12..
13.de Sp \" Vertical space (when we can't use .PP)
14.if t .sp .5v
15.if n .sp
16..
17.de Vb \" Begin verbatim text
18.ft CW
19.nf
20.ne \\$1
21..
22.de Ve \" End verbatim text
23.ft R
24.fi
25..
26.\" Set up some character translations and predefined strings. \*(-- will
27.\" give an unbreakable dash, \*(PI will give pi, \*(L" will give a left
28.\" double quote, and \*(R" will give a right double quote. | will give a
29.\" real vertical bar. \*(C+ will give a nicer C++. Capital omega is used to
30.\" do unbreakable dashes and therefore won't be available. \*(C` and \*(C'
31.\" expand to `' in nroff, nothing in troff, for use with C<>.
32.tr \(*W-|\(bv\*(Tr
33.ds C+ C\v'-.1v'\h'-1p'\s-2+\h'-1p'+\s0\v'.1v'\h'-1p'
34.ie n \{\
35. ds -- \(*W-
36. ds PI pi
37. if (\n(.H=4u)&(1m=24u) .ds -- \(*W\h'-12u'\(*W\h'-12u'-\" diablo 10 pitch
38. if (\n(.H=4u)&(1m=20u) .ds -- \(*W\h'-12u'\(*W\h'-8u'-\" diablo 12 pitch
39. ds L" ""
40. ds R" ""
41. ds C` ""
42. ds C' ""
43'br\}
44.el\{\
45. ds -- \|\(em\|
46. ds PI \(*p
47. ds L" ``
48. ds R" ''
49'br\}
50.\"
51.\" If the F register is turned on, we'll generate index entries on stderr for
52.\" titles (.TH), headers (.SH), subsections (.Sh), items (.Ip), and index
53.\" entries marked with X<> in POD. Of course, you'll have to process the
54.\" output yourself in some meaningful fashion.
55.if \nF \{\
56. de IX
57. tm Index:\\$1\t\\n%\t"\\$2"
58..
59. nr % 0
60. rr F
61.\}
62.\"
63.\" For nroff, turn off justification. Always turn off hyphenation; it makes
64.\" way too many mistakes in technical documents.
65.hy 0
66.if n .na
67.\"
68.\" Accent mark definitions (@(#)ms.acc 1.5 88/02/08 SMI; from UCB 4.2).
69.\" Fear. Run. Save yourself. No user-serviceable parts.
70. \" fudge factors for nroff and troff
71.if n \{\
72. ds #H 0
73. ds #V .8m
74. ds #F .3m
75. ds #[ \f1
76. ds #] \fP
77.\}
78.if t \{\
79. ds #H ((1u-(\\\\n(.fu%2u))*.13m)
80. ds #V .6m
81. ds #F 0
82. ds #[ \&
83. ds #] \&
84.\}
85. \" simple accents for nroff and troff
86.if n \{\
87. ds ' \&
88. ds ` \&
89. ds ^ \&
90. ds , \&
91. ds ~ ~
92. ds /
93.\}
94.if t \{\
95. ds ' \\k:\h'-(\\n(.wu*8/10-\*(#H)'\'\h"|\\n:u"
96. ds ` \\k:\h'-(\\n(.wu*8/10-\*(#H)'\`\h'|\\n:u'
97. ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'^\h'|\\n:u'
98. ds , \\k:\h'-(\\n(.wu*8/10)',\h'|\\n:u'
99. ds ~ \\k:\h'-(\\n(.wu-\*(#H-.1m)'~\h'|\\n:u'
100. ds / \\k:\h'-(\\n(.wu*8/10-\*(#H)'\z\(sl\h'|\\n:u'
101.\}
102. \" troff and (daisy-wheel) nroff accents
103.ds : \\k:\h'-(\\n(.wu*8/10-\*(#H+.1m+\*(#F)'\v'-\*(#V'\z.\h'.2m+\*(#F'.\h'|\\n:u'\v'\*(#V'
104.ds 8 \h'\*(#H'\(*b\h'-\*(#H'
105.ds o \\k:\h'-(\\n(.wu+\w'\(de'u-\*(#H)/2u'\v'-.3n'\*(#[\z\(de\v'.3n'\h'|\\n:u'\*(#]
106.ds d- \h'\*(#H'\(pd\h'-\w'~'u'\v'-.25m'\f2\(hy\fP\v'.25m'\h'-\*(#H'
107.ds D- D\\k:\h'-\w'D'u'\v'-.11m'\z\(hy\v'.11m'\h'|\\n:u'
108.ds th \*(#[\v'.3m'\s+1I\s-1\v'-.3m'\h'-(\w'I'u*2/3)'\s-1o\s+1\*(#]
109.ds Th \*(#[\s+2I\s-2\h'-\w'I'u*3/5'\v'-.3m'o\v'.3m'\*(#]
110.ds ae a\h'-(\w'a'u*4/10)'e
111.ds Ae A\h'-(\w'A'u*4/10)'E
112. \" corrections for vroff
113.if v .ds ~ \\k:\h'-(\\n(.wu*9/10-\*(#H)'\s-2\u~\d\s+2\h'|\\n:u'
114.if v .ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'\v'-.4m'^\v'.4m'\h'|\\n:u'
115. \" for low resolution devices (crt and lpr)
116.if \n(.H>23 .if \n(.V>19 \
117\{\
118. ds : e
119. ds 8 ss
120. ds o a
121. ds d- d\h'-1'\(ga
122. ds D- D\h'-1'\(hy
123. ds th \o'bp'
124. ds Th \o'LP'
125. ds ae ae
126. ds Ae AE
127.\}
128.rm #[ #] #H #V #F C
129.\" ========================================================================
130.\"
131.IX Title "Encode::PerlIO 3"
132.TH Encode::PerlIO 3 "2001-09-21" "perl v5.8.8" "Perl Programmers Reference Guide"
133.SH "NAME"
134Encode::PerlIO \-\- a detailed document on Encode and PerlIO
135.SH "Overview"
136.IX Header "Overview"
137It is very common to want to do encoding transformations when
138reading or writing files, network connections, pipes etc.
139If Perl is configured to use the new 'perlio' \s-1IO\s0 system then
140\&\f(CW\*(C`Encode\*(C'\fR provides a \*(L"layer\*(R" (see PerlIO) which can transform
141data as it is read or written.
142.PP
143Here is how the blind poet would modernise the encoding:
144.PP
145.Vb 7
146\& use Encode;
147\& open(my $iliad,'<:encoding(iso-8859-7)','iliad.greek');
148\& open(my $utf8,'>:utf8','iliad.utf8');
149\& my @epic = <$iliad>;
150\& print $utf8 @epic;
151\& close($utf8);
152\& close($illiad);
153.Ve
154.PP
155In addition, the new \s-1IO\s0 system can also be configured to read/write
156\&\s-1UTF\-8\s0 encoded characters (as noted above, this is efficient):
157.PP
158.Vb 2
159\& open(my $fh,'>:utf8','anything');
160\& print $fh "Any \ex{0021} string \eN{SMILEY FACE}\en";
161.Ve
162.PP
163Either of the above forms of \*(L"layer\*(R" specifications can be made the default
164for a lexical scope with the \f(CW\*(C`use open ...\*(C'\fR pragma. See open.
165.PP
166Once a handle is open, its layers can be altered using \f(CW\*(C`binmode\*(C'\fR.
167.PP
168Without any such configuration, or if Perl itself is built using the
169system's own \s-1IO\s0, then write operations assume that the file handle
170accepts only \fIbytes\fR and will \f(CW\*(C`die\*(C'\fR if a character larger than 255 is
171written to the handle. When reading, each octet from the handle becomes
172a byte\-in\-a\-character. Note that this default is the same behaviour
173as bytes-only languages (including Perl before v5.6) would have,
174and is sufficient to handle native 8\-bit encodings e.g. iso\-8859\-1,
175\&\s-1EBCDIC\s0 etc. and any legacy mechanisms for handling other encodings
176and binary data.
177.PP
178In other cases, it is the program's responsibility to transform
179characters into bytes using the \s-1API\s0 above before doing writes, and to
180transform the bytes read from a handle into characters before doing
181\&\*(L"character operations\*(R" (e.g. \f(CW\*(C`lc\*(C'\fR, \f(CW\*(C`/\eW+/\*(C'\fR, ...).
182.PP
183You can also use PerlIO to convert larger amounts of data you don't
184want to bring into memory. For example, to convert between \s-1ISO\-8859\-1\s0
185(Latin 1) and \s-1UTF\-8\s0 (or UTF-EBCDIC in \s-1EBCDIC\s0 machines):
186.PP
187.Vb 3
188\& open(F, "<:encoding(iso-8859-1)", "data.txt") or die $!;
189\& open(G, ">:utf8", "data.utf") or die $!;
190\& while (<F>) { print G }
191.Ve
192.PP
193.Vb 2
194\& # Could also do "print G <F>" but that would pull
195\& # the whole file into memory just to write it out again.
196.Ve
197.PP
198More examples:
199.PP
200.Vb 3
201\& open(my $f, "<:encoding(cp1252)")
202\& open(my $g, ">:encoding(iso-8859-2)")
203\& open(my $h, ">:encoding(latin9)") # iso-8859-15
204.Ve
205.PP
206See also encoding for how to change the default encoding of the
207data in your script.
208.SH "How does it work?"
209.IX Header "How does it work?"
210Here is a crude diagram of how filehandle, PerlIO, and Encode
211interact.
212.PP
213.Vb 3
214\& filehandle <-> PerlIO PerlIO <-> scalar (read/printed)
215\& \e /
216\& Encode
217.Ve
218.PP
219When PerlIO receives data from either direction, it fills a buffer
220(currently with 1024 bytes) and passes the buffer to Encode.
221Encode tries to convert the valid part and passes it back to PerlIO,
222leaving invalid parts (usually a partial character) in the buffer.
223PerlIO then appends more data to the buffer, calls Encode again,
224and so on until the data stream ends.
225.PP
226To do so, PerlIO always calls (de|en)code methods with \s-1CHECK\s0 set to 1.
227This ensures that the method stops at the right place when it
228encounters partial character. The following is what happens when
229PerlIO and Encode tries to encode (from utf8) more than 1024 bytes
230and the buffer boundary happens to be in the middle of a character.
231.PP
232.Vb 5
233\& A B C .... ~ \ex{3000} ....
234\& 41 42 43 .... 7E e3 80 80 ....
235\& <- buffer --------------->
236\& << encoded >>>>>>>>>>
237\& <- next buffer ------
238.Ve
239.PP
240Encode converts from the beginning to \ex7E, leaving \exe3 in the buffer
241because it is invalid (partial character).
242.PP
243Unfortunately, this scheme does not work well with escape-based
244encodings such as \s-1ISO\-2022\-JP\s0.
245.SH "Line Buffering"
246.IX Header "Line Buffering"
247Now let's see what happens when you try to decode from \s-1ISO\-2022\-JP\s0 and
248the buffer ends in the middle of a character.
249.PP
250.Vb 5
251\& JIS208-ESC \ex{5f3e}
252\& A B C .... ~ \ee $ B |DAN | ....
253\& 41 42 43 .... 7E 1b 24 41 43 46 ....
254\& <- buffer --------------------------->
255\& << encoded >>>>>>>>>>>>>>>>>>>>>>>
256.Ve
257.PP
258As you see, the next buffer begins with \ex43. But \ex43 is 'C' in
259\&\s-1ASCII\s0, which is wrong in this case because we are now in \s-1JISX\s0 0208
260area so it has to convert \ex43\ex46, not \ex43. Unlike utf8 and \s-1EUC\s0,
261in escape-based encodings you can't tell if a given octet is a whole
262character or just part of it.
263.PP
264Fortunately PerlIO also supports line buffer if you tell PerlIO to use
265one instead of fixed buffer. Since \s-1ISO\-2022\-JP\s0 is guaranteed to revert to \s-1ASCII\s0 at the end of the line, partial
266character will never happen when line buffer is used.
267.PP
268To tell PerlIO to use line buffer, implement \->needs_lines method
269for your encoding object. See Encode::Encoding for details.
270.PP
271Thanks to these efforts most encodings that come with Encode support
272PerlIO but that still leaves following encodings.
273.PP
274.Vb 4
275\& iso-2022-kr
276\& MIME-B
277\& MIME-Header
278\& MIME-Q
279.Ve
280.PP
281Fortunately iso\-2022\-kr is hardly used (according to Jungshik) and
282MIME\-* are very unlikely to be fed to PerlIO because they are for mail
283headers. See Encode::MIME::Header for details.
284.Sh "How can I tell whether my encoding fully supports PerlIO ?"
285.IX Subsection "How can I tell whether my encoding fully supports PerlIO ?"
286As of this writing, any encoding whose class belongs to Encode::XS and
287Encode::Unicode works. The Encode module has a \f(CW\*(C`perlio_ok\*(C'\fR method
288which you can use before applying PerlIO encoding to the filehandle.
289Here is an example:
290.PP
291.Vb 7
292\& my $use_perlio = perlio_ok($enc);
293\& my $layer = $use_perlio ? "<:raw" : "<:encoding($enc)";
294\& open my $fh, $layer, $file or die "$file : $!";
295\& while(<$fh>){
296\& $_ = decode($enc, $_) unless $use_perlio;
297\& # ....
298\& }
299.Ve
300.SH "SEE ALSO"
301.IX Header "SEE ALSO"
302Encode::Encoding,
303Encode::Supported,
304Encode::PerlIO,
305encoding,
306perlebcdic,
307\&\*(L"open\*(R" in perlfunc,
308perlunicode,
309utf8,
310the Perl Unicode Mailing List <perl\-unicode@perl.org>