Initial commit of OpenSPARC T2 architecture model.
[OpenSPARC-T2-SAM] / sam-t2 / devtools / v9 / html / python / lib / module-codecs.html
CommitLineData
920dae64
AT
1<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
2<html>
3<head>
4<link rel="STYLESHEET" href="lib.css" type='text/css' />
5<link rel="SHORTCUT ICON" href="../icons/pyfav.png" type="image/png" />
6<link rel='start' href='../index.html' title='Python Documentation Index' />
7<link rel="first" href="lib.html" title='Python Library Reference' />
8<link rel='contents' href='contents.html' title="Contents" />
9<link rel='index' href='genindex.html' title='Index' />
10<link rel='last' href='about.html' title='About this document...' />
11<link rel='help' href='about.html' title='About this document...' />
12<link rel="next" href="module-unicodedata.html" />
13<link rel="prev" href="module-textwrap.html" />
14<link rel="parent" href="strings.html" />
15<link rel="next" href="node130.html" />
16<meta name='aesop' content='information' />
17<title>4.9 codecs -- Codec registry and base classes</title>
18</head>
19<body>
20<DIV CLASS="navigation">
21<div id='top-navigation-panel' xml:id='top-navigation-panel'>
22<table align="center" width="100%" cellpadding="0" cellspacing="2">
23<tr>
24<td class='online-navigation'><a rel="prev" title="4.8 textwrap "
25 href="module-textwrap.html"><img src='../icons/previous.png'
26 border='0' height='32' alt='Previous Page' width='32' /></A></td>
27<td class='online-navigation'><a rel="parent" title="4. String Services"
28 href="strings.html"><img src='../icons/up.png'
29 border='0' height='32' alt='Up One Level' width='32' /></A></td>
30<td class='online-navigation'><a rel="next" title="4.9.1 Codec Base Classes"
31 href="node130.html"><img src='../icons/next.png'
32 border='0' height='32' alt='Next Page' width='32' /></A></td>
33<td align="center" width="100%">Python Library Reference</td>
34<td class='online-navigation'><a rel="contents" title="Table of Contents"
35 href="contents.html"><img src='../icons/contents.png'
36 border='0' height='32' alt='Contents' width='32' /></A></td>
37<td class='online-navigation'><a href="modindex.html" title="Module Index"><img src='../icons/modules.png'
38 border='0' height='32' alt='Module Index' width='32' /></a></td>
39<td class='online-navigation'><a rel="index" title="Index"
40 href="genindex.html"><img src='../icons/index.png'
41 border='0' height='32' alt='Index' width='32' /></A></td>
42</tr></table>
43<div class='online-navigation'>
44<b class="navlabel">Previous:</b>
45<a class="sectref" rel="prev" href="module-textwrap.html">4.8 textwrap </A>
46<b class="navlabel">Up:</b>
47<a class="sectref" rel="parent" href="strings.html">4. String Services</A>
48<b class="navlabel">Next:</b>
49<a class="sectref" rel="next" href="node130.html">4.9.1 Codec Base Classes</A>
50</div>
51<hr /></div>
52</DIV>
53<!--End of Navigation Panel-->
54
55<H1><A NAME="SECTION006900000000000000000">
564.9 <tt class="module">codecs</tt> --
57 Codec registry and base classes</A>
58</H1>
59
60<P>
61<A NAME="module-codecs"></A>
62
63<P>
64<a id='l2h-994' xml:id='l2h-994'></a>
65<a id='l2h-975' xml:id='l2h-975'></a><a id='l2h-976' xml:id='l2h-976'></a><a id='l2h-995' xml:id='l2h-995'></a>
66<a id='l2h-977' xml:id='l2h-977'></a>
67<P>
68This module defines base classes for standard Python codecs (encoders
69and decoders) and provides access to the internal Python codec
70registry which manages the codec and error handling lookup process.
71
72<P>
73It defines the following functions:
74
75<P>
76<dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline">
77 <td><nobr><b><tt id='l2h-978' xml:id='l2h-978' class="function">register</tt></b>(</nobr></td>
78 <td><var>search_function</var>)</td></tr></table></dt>
79<dd>
80Register a codec search function. Search functions are expected to
81take one argument, the encoding name in all lower case letters, and
82return a tuple of functions <code>(<var>encoder</var>, <var>decoder</var>, <var>stream_reader</var>,
83<var>stream_writer</var>)</code> taking the following arguments:
84
85<P>
86<var>encoder</var> and <var>decoder</var>: These must be functions or methods
87 which have the same interface as the
88 <tt class="method">encode()</tt>/<tt class="method">decode()</tt> methods of Codec instances (see
89 Codec Interface). The functions/methods are expected to work in a
90 stateless mode.
91
92<P>
93<var>stream_reader</var> and <var>stream_writer</var>: These have to be
94 factory functions providing the following interface:
95
96<P>
97<code>factory(<var>stream</var>, <var>errors</var>='strict')</code>
98
99<P>
100The factory functions must return objects providing the interfaces
101 defined by the base classes <tt class="class">StreamWriter</tt> and
102 <tt class="class">StreamReader</tt>, respectively. Stream codecs can maintain
103 state.
104
105<P>
106Possible values for errors are <code>'strict'</code> (raise an exception
107 in case of an encoding error), <code>'replace'</code> (replace malformed
108 data with a suitable replacement marker, such as "<tt class="character">?</tt>"),
109 <code>'ignore'</code> (ignore malformed data and continue without further
110 notice), <code>'xmlcharrefreplace'</code> (replace with the appropriate XML
111 character reference (for encoding only)) and <code>'backslashreplace'</code>
112 (replace with backslashed escape sequences (for encoding only)) as
113 well as any other error handling name defined via
114 <tt class="function">register_error()</tt>.
115
116<P>
117In case a search function cannot find a given encoding, it should
118return <code>None</code>.
119</dl>
120
121<P>
122<dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline">
123 <td><nobr><b><tt id='l2h-979' xml:id='l2h-979' class="function">lookup</tt></b>(</nobr></td>
124 <td><var>encoding</var>)</td></tr></table></dt>
125<dd>
126Looks up a codec tuple in the Python codec registry and returns the
127function tuple as defined above.
128
129<P>
130Encodings are first looked up in the registry's cache. If not found,
131the list of registered search functions is scanned. If no codecs tuple
132is found, a <tt class="exception">LookupError</tt> is raised. Otherwise, the codecs
133tuple is stored in the cache and returned to the caller.
134</dl>
135
136<P>
137To simplify access to the various codecs, the module provides these
138additional functions which use <tt class="function">lookup()</tt> for the codec
139lookup:
140
141<P>
142<dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline">
143 <td><nobr><b><tt id='l2h-980' xml:id='l2h-980' class="function">getencoder</tt></b>(</nobr></td>
144 <td><var>encoding</var>)</td></tr></table></dt>
145<dd>
146Lookup up the codec for the given encoding and return its encoder
147function.
148
149<P>
150Raises a <tt class="exception">LookupError</tt> in case the encoding cannot be found.
151</dl>
152
153<P>
154<dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline">
155 <td><nobr><b><tt id='l2h-981' xml:id='l2h-981' class="function">getdecoder</tt></b>(</nobr></td>
156 <td><var>encoding</var>)</td></tr></table></dt>
157<dd>
158Lookup up the codec for the given encoding and return its decoder
159function.
160
161<P>
162Raises a <tt class="exception">LookupError</tt> in case the encoding cannot be found.
163</dl>
164
165<P>
166<dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline">
167 <td><nobr><b><tt id='l2h-982' xml:id='l2h-982' class="function">getreader</tt></b>(</nobr></td>
168 <td><var>encoding</var>)</td></tr></table></dt>
169<dd>
170Lookup up the codec for the given encoding and return its StreamReader
171class or factory function.
172
173<P>
174Raises a <tt class="exception">LookupError</tt> in case the encoding cannot be found.
175</dl>
176
177<P>
178<dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline">
179 <td><nobr><b><tt id='l2h-983' xml:id='l2h-983' class="function">getwriter</tt></b>(</nobr></td>
180 <td><var>encoding</var>)</td></tr></table></dt>
181<dd>
182Lookup up the codec for the given encoding and return its StreamWriter
183class or factory function.
184
185<P>
186Raises a <tt class="exception">LookupError</tt> in case the encoding cannot be found.
187</dl>
188
189<P>
190<dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline">
191 <td><nobr><b><tt id='l2h-984' xml:id='l2h-984' class="function">register_error</tt></b>(</nobr></td>
192 <td><var>name, error_handler</var>)</td></tr></table></dt>
193<dd>
194Register the error handling function <var>error_handler</var> under the
195name <var>name</var>. <var>error_handler</var> will be called during encoding
196and decoding in case of an error, when <var>name</var> is specified as the
197errors parameter.
198
199<P>
200For encoding <var>error_handler</var> will be called with a
201<tt class="exception">UnicodeEncodeError</tt> instance, which contains information about
202the location of the error. The error handler must either raise this or
203a different exception or return a tuple with a replacement for the
204unencodable part of the input and a position where encoding should
205continue. The encoder will encode the replacement and continue encoding
206the original input at the specified position. Negative position values
207will be treated as being relative to the end of the input string. If the
208resulting position is out of bound an IndexError will be raised.
209
210<P>
211Decoding and translating works similar, except <tt class="exception">UnicodeDecodeError</tt>
212or <tt class="exception">UnicodeTranslateError</tt> will be passed to the handler and
213that the replacement from the error handler will be put into the output
214directly.
215</dl>
216
217<P>
218<dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline">
219 <td><nobr><b><tt id='l2h-985' xml:id='l2h-985' class="function">lookup_error</tt></b>(</nobr></td>
220 <td><var>name</var>)</td></tr></table></dt>
221<dd>
222Return the error handler previously register under the name <var>name</var>.
223
224<P>
225Raises a <tt class="exception">LookupError</tt> in case the handler cannot be found.
226</dl>
227
228<P>
229<dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline">
230 <td><nobr><b><tt id='l2h-986' xml:id='l2h-986' class="function">strict_errors</tt></b>(</nobr></td>
231 <td><var>exception</var>)</td></tr></table></dt>
232<dd>
233Implements the <code>strict</code> error handling.
234</dl>
235
236<P>
237<dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline">
238 <td><nobr><b><tt id='l2h-987' xml:id='l2h-987' class="function">replace_errors</tt></b>(</nobr></td>
239 <td><var>exception</var>)</td></tr></table></dt>
240<dd>
241Implements the <code>replace</code> error handling.
242</dl>
243
244<P>
245<dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline">
246 <td><nobr><b><tt id='l2h-988' xml:id='l2h-988' class="function">ignore_errors</tt></b>(</nobr></td>
247 <td><var>exception</var>)</td></tr></table></dt>
248<dd>
249Implements the <code>ignore</code> error handling.
250</dl>
251
252<P>
253<dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline">
254 <td><nobr><b><tt id='l2h-989' xml:id='l2h-989' class="function">xmlcharrefreplace_errors_errors</tt></b>(</nobr></td>
255 <td><var>exception</var>)</td></tr></table></dt>
256<dd>
257Implements the <code>xmlcharrefreplace</code> error handling.
258</dl>
259
260<P>
261<dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline">
262 <td><nobr><b><tt id='l2h-990' xml:id='l2h-990' class="function">backslashreplace_errors_errors</tt></b>(</nobr></td>
263 <td><var>exception</var>)</td></tr></table></dt>
264<dd>
265Implements the <code>backslashreplace</code> error handling.
266</dl>
267
268<P>
269To simplify working with encoded files or stream, the module
270also defines these utility functions:
271
272<P>
273<dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline">
274 <td><nobr><b><tt id='l2h-991' xml:id='l2h-991' class="function">open</tt></b>(</nobr></td>
275 <td><var>filename, mode</var><big>[</big><var>, encoding</var><big>[</big><var>,
276 errors</var><big>[</big><var>, buffering</var><big>]</big><var></var><big>]</big><var></var><big>]</big><var></var>)</td></tr></table></dt>
277<dd>
278Open an encoded file using the given <var>mode</var> and return
279a wrapped version providing transparent encoding/decoding.
280
281<P>
282<span class="note"><b class="label">Note:</b>
283The wrapped version will only accept the object format
284defined by the codecs, i.e. Unicode objects for most built-in
285codecs. Output is also codec-dependent and will usually be Unicode as
286well.</span>
287
288<P>
289<var>encoding</var> specifies the encoding which is to be used for the
290file.
291
292<P>
293<var>errors</var> may be given to define the error handling. It defaults
294to <code>'strict'</code> which causes a <tt class="exception">ValueError</tt> to be raised
295in case an encoding error occurs.
296
297<P>
298<var>buffering</var> has the same meaning as for the built-in
299<tt class="function">open()</tt> function. It defaults to line buffered.
300</dl>
301
302<P>
303<dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline">
304 <td><nobr><b><tt id='l2h-992' xml:id='l2h-992' class="function">EncodedFile</tt></b>(</nobr></td>
305 <td><var>file, input</var><big>[</big><var>,
306 output</var><big>[</big><var>, errors</var><big>]</big><var></var><big>]</big><var></var>)</td></tr></table></dt>
307<dd>
308Return a wrapped version of file which provides transparent
309encoding translation.
310
311<P>
312Strings written to the wrapped file are interpreted according to the
313given <var>input</var> encoding and then written to the original file as
314strings using the <var>output</var> encoding. The intermediate encoding will
315usually be Unicode but depends on the specified codecs.
316
317<P>
318If <var>output</var> is not given, it defaults to <var>input</var>.
319
320<P>
321<var>errors</var> may be given to define the error handling. It defaults to
322<code>'strict'</code>, which causes <tt class="exception">ValueError</tt> to be raised in case
323an encoding error occurs.
324</dl>
325
326<P>
327The module also provides the following constants which are useful
328for reading and writing to platform dependent files:
329
330<P>
331<dl><dt><b><tt id='l2h-993' xml:id='l2h-993'>BOM</tt></b></dt>
332<dd>
333<dt><b><tt id='l2h-996' xml:id='l2h-996'>BOM_BE</tt></b></dt><dd>
334<dt><b><tt id='l2h-997' xml:id='l2h-997'>BOM_LE</tt></b></dt><dd>
335<dt><b><tt id='l2h-998' xml:id='l2h-998'>BOM_UTF8</tt></b></dt><dd>
336<dt><b><tt id='l2h-999' xml:id='l2h-999'>BOM_UTF16</tt></b></dt><dd>
337<dt><b><tt id='l2h-1000' xml:id='l2h-1000'>BOM_UTF16_BE</tt></b></dt><dd>
338<dt><b><tt id='l2h-1001' xml:id='l2h-1001'>BOM_UTF16_LE</tt></b></dt><dd>
339<dt><b><tt id='l2h-1002' xml:id='l2h-1002'>BOM_UTF32</tt></b></dt><dd>
340<dt><b><tt id='l2h-1003' xml:id='l2h-1003'>BOM_UTF32_BE</tt></b></dt><dd>
341<dt><b><tt id='l2h-1004' xml:id='l2h-1004'>BOM_UTF32_LE</tt></b></dt><dd>
342These constants define various encodings of the Unicode byte order mark
343(BOM) used in UTF-16 and UTF-32 data streams to indicate the byte order
344used in the stream or file and in UTF-8 as a Unicode signature.
345<tt class="constant">BOM_UTF16</tt> is either <tt class="constant">BOM_UTF16_BE</tt> or
346<tt class="constant">BOM_UTF16_LE</tt> depending on the platform's native byte order,
347<tt class="constant">BOM</tt> is an alias for <tt class="constant">BOM_UTF16</tt>, <tt class="constant">BOM_LE</tt>
348for <tt class="constant">BOM_UTF16_LE</tt> and <tt class="constant">BOM_BE</tt> for <tt class="constant">BOM_UTF16_BE</tt>.
349The others represent the BOM in UTF-8 and UTF-32 encodings.
350</dd></dl>
351
352<P>
353
354<p><br /></p><hr class='online-navigation' />
355<div class='online-navigation'>
356<!--Table of Child-Links-->
357<A NAME="CHILD_LINKS"><STRONG>Subsections</STRONG></a>
358
359<UL CLASS="ChildLinks">
360<LI><A href="node130.html">4.9.1 Codec Base Classes</a>
361<UL>
362<LI><A href="codec-objects.html">4.9.1.1 Codec Objects</a>
363<LI><A href="stream-writer-objects.html">4.9.1.2 StreamWriter Objects</a>
364<LI><A href="stream-reader-objects.html">4.9.1.3 StreamReader Objects</a>
365<LI><A href="stream-reader-writer.html">4.9.1.4 StreamReaderWriter Objects</a>
366<LI><A href="stream-recoder-objects.html">4.9.1.5 StreamRecoder Objects</a>
367</ul>
368<LI><A href="standard-encodings.html">4.9.2 Standard Encodings</a>
369<LI><A href="module-encodings.idna.html">4.9.3 <tt class="module">encodings.idna</tt> --
370 Internationalized Domain Names in Applications</a>
371</ul>
372<!--End of Table of Child-Links-->
373</div>
374
375<DIV CLASS="navigation">
376<div class='online-navigation'>
377<p></p><hr />
378<table align="center" width="100%" cellpadding="0" cellspacing="2">
379<tr>
380<td class='online-navigation'><a rel="prev" title="4.8 textwrap "
381 href="module-textwrap.html"><img src='../icons/previous.png'
382 border='0' height='32' alt='Previous Page' width='32' /></A></td>
383<td class='online-navigation'><a rel="parent" title="4. String Services"
384 href="strings.html"><img src='../icons/up.png'
385 border='0' height='32' alt='Up One Level' width='32' /></A></td>
386<td class='online-navigation'><a rel="next" title="4.9.1 Codec Base Classes"
387 href="node130.html"><img src='../icons/next.png'
388 border='0' height='32' alt='Next Page' width='32' /></A></td>
389<td align="center" width="100%">Python Library Reference</td>
390<td class='online-navigation'><a rel="contents" title="Table of Contents"
391 href="contents.html"><img src='../icons/contents.png'
392 border='0' height='32' alt='Contents' width='32' /></A></td>
393<td class='online-navigation'><a href="modindex.html" title="Module Index"><img src='../icons/modules.png'
394 border='0' height='32' alt='Module Index' width='32' /></a></td>
395<td class='online-navigation'><a rel="index" title="Index"
396 href="genindex.html"><img src='../icons/index.png'
397 border='0' height='32' alt='Index' width='32' /></A></td>
398</tr></table>
399<div class='online-navigation'>
400<b class="navlabel">Previous:</b>
401<a class="sectref" rel="prev" href="module-textwrap.html">4.8 textwrap </A>
402<b class="navlabel">Up:</b>
403<a class="sectref" rel="parent" href="strings.html">4. String Services</A>
404<b class="navlabel">Next:</b>
405<a class="sectref" rel="next" href="node130.html">4.9.1 Codec Base Classes</A>
406</div>
407</div>
408<hr />
409<span class="release-info">Release 2.4.2, documentation updated on 28 September 2005.</span>
410</DIV>
411<!--End of Navigation Panel-->
412<ADDRESS>
413See <i><a href="about.html">About this document...</a></i> for information on suggesting changes.
414</ADDRESS>
415</BODY>
416</HTML>