Commit | Line | Data |
---|---|---|
920dae64 AT |
1 | <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"> |
2 | <html> | |
3 | <head> | |
4 | <link rel="STYLESHEET" href="lib.css" type='text/css' /> | |
5 | <link rel="SHORTCUT ICON" href="../icons/pyfav.png" type="image/png" /> | |
6 | <link rel='start' href='../index.html' title='Python Documentation Index' /> | |
7 | <link rel="first" href="lib.html" title='Python Library Reference' /> | |
8 | <link rel='contents' href='contents.html' title="Contents" /> | |
9 | <link rel='index' href='genindex.html' title='Index' /> | |
10 | <link rel='last' href='about.html' title='About this document...' /> | |
11 | <link rel='help' href='about.html' title='About this document...' /> | |
12 | <link rel="next" href="module-unicodedata.html" /> | |
13 | <link rel="prev" href="module-textwrap.html" /> | |
14 | <link rel="parent" href="strings.html" /> | |
15 | <link rel="next" href="node130.html" /> | |
16 | <meta name='aesop' content='information' /> | |
17 | <title>4.9 codecs -- Codec registry and base classes</title> | |
18 | </head> | |
19 | <body> | |
20 | <DIV CLASS="navigation"> | |
21 | <div id='top-navigation-panel' xml:id='top-navigation-panel'> | |
22 | <table align="center" width="100%" cellpadding="0" cellspacing="2"> | |
23 | <tr> | |
24 | <td class='online-navigation'><a rel="prev" title="4.8 textwrap " | |
25 | href="module-textwrap.html"><img src='../icons/previous.png' | |
26 | border='0' height='32' alt='Previous Page' width='32' /></A></td> | |
27 | <td class='online-navigation'><a rel="parent" title="4. String Services" | |
28 | href="strings.html"><img src='../icons/up.png' | |
29 | border='0' height='32' alt='Up One Level' width='32' /></A></td> | |
30 | <td class='online-navigation'><a rel="next" title="4.9.1 Codec Base Classes" | |
31 | href="node130.html"><img src='../icons/next.png' | |
32 | border='0' height='32' alt='Next Page' width='32' /></A></td> | |
33 | <td align="center" width="100%">Python Library Reference</td> | |
34 | <td class='online-navigation'><a rel="contents" title="Table of Contents" | |
35 | href="contents.html"><img src='../icons/contents.png' | |
36 | border='0' height='32' alt='Contents' width='32' /></A></td> | |
37 | <td class='online-navigation'><a href="modindex.html" title="Module Index"><img src='../icons/modules.png' | |
38 | border='0' height='32' alt='Module Index' width='32' /></a></td> | |
39 | <td class='online-navigation'><a rel="index" title="Index" | |
40 | href="genindex.html"><img src='../icons/index.png' | |
41 | border='0' height='32' alt='Index' width='32' /></A></td> | |
42 | </tr></table> | |
43 | <div class='online-navigation'> | |
44 | <b class="navlabel">Previous:</b> | |
45 | <a class="sectref" rel="prev" href="module-textwrap.html">4.8 textwrap </A> | |
46 | <b class="navlabel">Up:</b> | |
47 | <a class="sectref" rel="parent" href="strings.html">4. String Services</A> | |
48 | <b class="navlabel">Next:</b> | |
49 | <a class="sectref" rel="next" href="node130.html">4.9.1 Codec Base Classes</A> | |
50 | </div> | |
51 | <hr /></div> | |
52 | </DIV> | |
53 | <!--End of Navigation Panel--> | |
54 | ||
55 | <H1><A NAME="SECTION006900000000000000000"> | |
56 | 4.9 <tt class="module">codecs</tt> -- | |
57 | Codec registry and base classes</A> | |
58 | </H1> | |
59 | ||
60 | <P> | |
61 | <A NAME="module-codecs"></A> | |
62 | ||
63 | <P> | |
64 | <a id='l2h-994' xml:id='l2h-994'></a> | |
65 | <a id='l2h-975' xml:id='l2h-975'></a><a id='l2h-976' xml:id='l2h-976'></a><a id='l2h-995' xml:id='l2h-995'></a> | |
66 | <a id='l2h-977' xml:id='l2h-977'></a> | |
67 | <P> | |
68 | This module defines base classes for standard Python codecs (encoders | |
69 | and decoders) and provides access to the internal Python codec | |
70 | registry which manages the codec and error handling lookup process. | |
71 | ||
72 | <P> | |
73 | It defines the following functions: | |
74 | ||
75 | <P> | |
76 | <dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"> | |
77 | <td><nobr><b><tt id='l2h-978' xml:id='l2h-978' class="function">register</tt></b>(</nobr></td> | |
78 | <td><var>search_function</var>)</td></tr></table></dt> | |
79 | <dd> | |
80 | Register a codec search function. Search functions are expected to | |
81 | take one argument, the encoding name in all lower case letters, and | |
82 | return a tuple of functions <code>(<var>encoder</var>, <var>decoder</var>, <var>stream_reader</var>, | |
83 | <var>stream_writer</var>)</code> taking the following arguments: | |
84 | ||
85 | <P> | |
86 | <var>encoder</var> and <var>decoder</var>: These must be functions or methods | |
87 | which have the same interface as the | |
88 | <tt class="method">encode()</tt>/<tt class="method">decode()</tt> methods of Codec instances (see | |
89 | Codec Interface). The functions/methods are expected to work in a | |
90 | stateless mode. | |
91 | ||
92 | <P> | |
93 | <var>stream_reader</var> and <var>stream_writer</var>: These have to be | |
94 | factory functions providing the following interface: | |
95 | ||
96 | <P> | |
97 | <code>factory(<var>stream</var>, <var>errors</var>='strict')</code> | |
98 | ||
99 | <P> | |
100 | The factory functions must return objects providing the interfaces | |
101 | defined by the base classes <tt class="class">StreamWriter</tt> and | |
102 | <tt class="class">StreamReader</tt>, respectively. Stream codecs can maintain | |
103 | state. | |
104 | ||
105 | <P> | |
106 | Possible values for errors are <code>'strict'</code> (raise an exception | |
107 | in case of an encoding error), <code>'replace'</code> (replace malformed | |
108 | data with a suitable replacement marker, such as "<tt class="character">?</tt>"), | |
109 | <code>'ignore'</code> (ignore malformed data and continue without further | |
110 | notice), <code>'xmlcharrefreplace'</code> (replace with the appropriate XML | |
111 | character reference (for encoding only)) and <code>'backslashreplace'</code> | |
112 | (replace with backslashed escape sequences (for encoding only)) as | |
113 | well as any other error handling name defined via | |
114 | <tt class="function">register_error()</tt>. | |
115 | ||
116 | <P> | |
117 | In case a search function cannot find a given encoding, it should | |
118 | return <code>None</code>. | |
119 | </dl> | |
120 | ||
121 | <P> | |
122 | <dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"> | |
123 | <td><nobr><b><tt id='l2h-979' xml:id='l2h-979' class="function">lookup</tt></b>(</nobr></td> | |
124 | <td><var>encoding</var>)</td></tr></table></dt> | |
125 | <dd> | |
126 | Looks up a codec tuple in the Python codec registry and returns the | |
127 | function tuple as defined above. | |
128 | ||
129 | <P> | |
130 | Encodings are first looked up in the registry's cache. If not found, | |
131 | the list of registered search functions is scanned. If no codecs tuple | |
132 | is found, a <tt class="exception">LookupError</tt> is raised. Otherwise, the codecs | |
133 | tuple is stored in the cache and returned to the caller. | |
134 | </dl> | |
135 | ||
136 | <P> | |
137 | To simplify access to the various codecs, the module provides these | |
138 | additional functions which use <tt class="function">lookup()</tt> for the codec | |
139 | lookup: | |
140 | ||
141 | <P> | |
142 | <dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"> | |
143 | <td><nobr><b><tt id='l2h-980' xml:id='l2h-980' class="function">getencoder</tt></b>(</nobr></td> | |
144 | <td><var>encoding</var>)</td></tr></table></dt> | |
145 | <dd> | |
146 | Lookup up the codec for the given encoding and return its encoder | |
147 | function. | |
148 | ||
149 | <P> | |
150 | Raises a <tt class="exception">LookupError</tt> in case the encoding cannot be found. | |
151 | </dl> | |
152 | ||
153 | <P> | |
154 | <dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"> | |
155 | <td><nobr><b><tt id='l2h-981' xml:id='l2h-981' class="function">getdecoder</tt></b>(</nobr></td> | |
156 | <td><var>encoding</var>)</td></tr></table></dt> | |
157 | <dd> | |
158 | Lookup up the codec for the given encoding and return its decoder | |
159 | function. | |
160 | ||
161 | <P> | |
162 | Raises a <tt class="exception">LookupError</tt> in case the encoding cannot be found. | |
163 | </dl> | |
164 | ||
165 | <P> | |
166 | <dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"> | |
167 | <td><nobr><b><tt id='l2h-982' xml:id='l2h-982' class="function">getreader</tt></b>(</nobr></td> | |
168 | <td><var>encoding</var>)</td></tr></table></dt> | |
169 | <dd> | |
170 | Lookup up the codec for the given encoding and return its StreamReader | |
171 | class or factory function. | |
172 | ||
173 | <P> | |
174 | Raises a <tt class="exception">LookupError</tt> in case the encoding cannot be found. | |
175 | </dl> | |
176 | ||
177 | <P> | |
178 | <dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"> | |
179 | <td><nobr><b><tt id='l2h-983' xml:id='l2h-983' class="function">getwriter</tt></b>(</nobr></td> | |
180 | <td><var>encoding</var>)</td></tr></table></dt> | |
181 | <dd> | |
182 | Lookup up the codec for the given encoding and return its StreamWriter | |
183 | class or factory function. | |
184 | ||
185 | <P> | |
186 | Raises a <tt class="exception">LookupError</tt> in case the encoding cannot be found. | |
187 | </dl> | |
188 | ||
189 | <P> | |
190 | <dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"> | |
191 | <td><nobr><b><tt id='l2h-984' xml:id='l2h-984' class="function">register_error</tt></b>(</nobr></td> | |
192 | <td><var>name, error_handler</var>)</td></tr></table></dt> | |
193 | <dd> | |
194 | Register the error handling function <var>error_handler</var> under the | |
195 | name <var>name</var>. <var>error_handler</var> will be called during encoding | |
196 | and decoding in case of an error, when <var>name</var> is specified as the | |
197 | errors parameter. | |
198 | ||
199 | <P> | |
200 | For encoding <var>error_handler</var> will be called with a | |
201 | <tt class="exception">UnicodeEncodeError</tt> instance, which contains information about | |
202 | the location of the error. The error handler must either raise this or | |
203 | a different exception or return a tuple with a replacement for the | |
204 | unencodable part of the input and a position where encoding should | |
205 | continue. The encoder will encode the replacement and continue encoding | |
206 | the original input at the specified position. Negative position values | |
207 | will be treated as being relative to the end of the input string. If the | |
208 | resulting position is out of bound an IndexError will be raised. | |
209 | ||
210 | <P> | |
211 | Decoding and translating works similar, except <tt class="exception">UnicodeDecodeError</tt> | |
212 | or <tt class="exception">UnicodeTranslateError</tt> will be passed to the handler and | |
213 | that the replacement from the error handler will be put into the output | |
214 | directly. | |
215 | </dl> | |
216 | ||
217 | <P> | |
218 | <dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"> | |
219 | <td><nobr><b><tt id='l2h-985' xml:id='l2h-985' class="function">lookup_error</tt></b>(</nobr></td> | |
220 | <td><var>name</var>)</td></tr></table></dt> | |
221 | <dd> | |
222 | Return the error handler previously register under the name <var>name</var>. | |
223 | ||
224 | <P> | |
225 | Raises a <tt class="exception">LookupError</tt> in case the handler cannot be found. | |
226 | </dl> | |
227 | ||
228 | <P> | |
229 | <dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"> | |
230 | <td><nobr><b><tt id='l2h-986' xml:id='l2h-986' class="function">strict_errors</tt></b>(</nobr></td> | |
231 | <td><var>exception</var>)</td></tr></table></dt> | |
232 | <dd> | |
233 | Implements the <code>strict</code> error handling. | |
234 | </dl> | |
235 | ||
236 | <P> | |
237 | <dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"> | |
238 | <td><nobr><b><tt id='l2h-987' xml:id='l2h-987' class="function">replace_errors</tt></b>(</nobr></td> | |
239 | <td><var>exception</var>)</td></tr></table></dt> | |
240 | <dd> | |
241 | Implements the <code>replace</code> error handling. | |
242 | </dl> | |
243 | ||
244 | <P> | |
245 | <dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"> | |
246 | <td><nobr><b><tt id='l2h-988' xml:id='l2h-988' class="function">ignore_errors</tt></b>(</nobr></td> | |
247 | <td><var>exception</var>)</td></tr></table></dt> | |
248 | <dd> | |
249 | Implements the <code>ignore</code> error handling. | |
250 | </dl> | |
251 | ||
252 | <P> | |
253 | <dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"> | |
254 | <td><nobr><b><tt id='l2h-989' xml:id='l2h-989' class="function">xmlcharrefreplace_errors_errors</tt></b>(</nobr></td> | |
255 | <td><var>exception</var>)</td></tr></table></dt> | |
256 | <dd> | |
257 | Implements the <code>xmlcharrefreplace</code> error handling. | |
258 | </dl> | |
259 | ||
260 | <P> | |
261 | <dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"> | |
262 | <td><nobr><b><tt id='l2h-990' xml:id='l2h-990' class="function">backslashreplace_errors_errors</tt></b>(</nobr></td> | |
263 | <td><var>exception</var>)</td></tr></table></dt> | |
264 | <dd> | |
265 | Implements the <code>backslashreplace</code> error handling. | |
266 | </dl> | |
267 | ||
268 | <P> | |
269 | To simplify working with encoded files or stream, the module | |
270 | also defines these utility functions: | |
271 | ||
272 | <P> | |
273 | <dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"> | |
274 | <td><nobr><b><tt id='l2h-991' xml:id='l2h-991' class="function">open</tt></b>(</nobr></td> | |
275 | <td><var>filename, mode</var><big>[</big><var>, encoding</var><big>[</big><var>, | |
276 | errors</var><big>[</big><var>, buffering</var><big>]</big><var></var><big>]</big><var></var><big>]</big><var></var>)</td></tr></table></dt> | |
277 | <dd> | |
278 | Open an encoded file using the given <var>mode</var> and return | |
279 | a wrapped version providing transparent encoding/decoding. | |
280 | ||
281 | <P> | |
282 | <span class="note"><b class="label">Note:</b> | |
283 | The wrapped version will only accept the object format | |
284 | defined by the codecs, i.e. Unicode objects for most built-in | |
285 | codecs. Output is also codec-dependent and will usually be Unicode as | |
286 | well.</span> | |
287 | ||
288 | <P> | |
289 | <var>encoding</var> specifies the encoding which is to be used for the | |
290 | file. | |
291 | ||
292 | <P> | |
293 | <var>errors</var> may be given to define the error handling. It defaults | |
294 | to <code>'strict'</code> which causes a <tt class="exception">ValueError</tt> to be raised | |
295 | in case an encoding error occurs. | |
296 | ||
297 | <P> | |
298 | <var>buffering</var> has the same meaning as for the built-in | |
299 | <tt class="function">open()</tt> function. It defaults to line buffered. | |
300 | </dl> | |
301 | ||
302 | <P> | |
303 | <dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"> | |
304 | <td><nobr><b><tt id='l2h-992' xml:id='l2h-992' class="function">EncodedFile</tt></b>(</nobr></td> | |
305 | <td><var>file, input</var><big>[</big><var>, | |
306 | output</var><big>[</big><var>, errors</var><big>]</big><var></var><big>]</big><var></var>)</td></tr></table></dt> | |
307 | <dd> | |
308 | Return a wrapped version of file which provides transparent | |
309 | encoding translation. | |
310 | ||
311 | <P> | |
312 | Strings written to the wrapped file are interpreted according to the | |
313 | given <var>input</var> encoding and then written to the original file as | |
314 | strings using the <var>output</var> encoding. The intermediate encoding will | |
315 | usually be Unicode but depends on the specified codecs. | |
316 | ||
317 | <P> | |
318 | If <var>output</var> is not given, it defaults to <var>input</var>. | |
319 | ||
320 | <P> | |
321 | <var>errors</var> may be given to define the error handling. It defaults to | |
322 | <code>'strict'</code>, which causes <tt class="exception">ValueError</tt> to be raised in case | |
323 | an encoding error occurs. | |
324 | </dl> | |
325 | ||
326 | <P> | |
327 | The module also provides the following constants which are useful | |
328 | for reading and writing to platform dependent files: | |
329 | ||
330 | <P> | |
331 | <dl><dt><b><tt id='l2h-993' xml:id='l2h-993'>BOM</tt></b></dt> | |
332 | <dd> | |
333 | <dt><b><tt id='l2h-996' xml:id='l2h-996'>BOM_BE</tt></b></dt><dd> | |
334 | <dt><b><tt id='l2h-997' xml:id='l2h-997'>BOM_LE</tt></b></dt><dd> | |
335 | <dt><b><tt id='l2h-998' xml:id='l2h-998'>BOM_UTF8</tt></b></dt><dd> | |
336 | <dt><b><tt id='l2h-999' xml:id='l2h-999'>BOM_UTF16</tt></b></dt><dd> | |
337 | <dt><b><tt id='l2h-1000' xml:id='l2h-1000'>BOM_UTF16_BE</tt></b></dt><dd> | |
338 | <dt><b><tt id='l2h-1001' xml:id='l2h-1001'>BOM_UTF16_LE</tt></b></dt><dd> | |
339 | <dt><b><tt id='l2h-1002' xml:id='l2h-1002'>BOM_UTF32</tt></b></dt><dd> | |
340 | <dt><b><tt id='l2h-1003' xml:id='l2h-1003'>BOM_UTF32_BE</tt></b></dt><dd> | |
341 | <dt><b><tt id='l2h-1004' xml:id='l2h-1004'>BOM_UTF32_LE</tt></b></dt><dd> | |
342 | These constants define various encodings of the Unicode byte order mark | |
343 | (BOM) used in UTF-16 and UTF-32 data streams to indicate the byte order | |
344 | used in the stream or file and in UTF-8 as a Unicode signature. | |
345 | <tt class="constant">BOM_UTF16</tt> is either <tt class="constant">BOM_UTF16_BE</tt> or | |
346 | <tt class="constant">BOM_UTF16_LE</tt> depending on the platform's native byte order, | |
347 | <tt class="constant">BOM</tt> is an alias for <tt class="constant">BOM_UTF16</tt>, <tt class="constant">BOM_LE</tt> | |
348 | for <tt class="constant">BOM_UTF16_LE</tt> and <tt class="constant">BOM_BE</tt> for <tt class="constant">BOM_UTF16_BE</tt>. | |
349 | The others represent the BOM in UTF-8 and UTF-32 encodings. | |
350 | </dd></dl> | |
351 | ||
352 | <P> | |
353 | ||
354 | <p><br /></p><hr class='online-navigation' /> | |
355 | <div class='online-navigation'> | |
356 | <!--Table of Child-Links--> | |
357 | <A NAME="CHILD_LINKS"><STRONG>Subsections</STRONG></a> | |
358 | ||
359 | <UL CLASS="ChildLinks"> | |
360 | <LI><A href="node130.html">4.9.1 Codec Base Classes</a> | |
361 | <UL> | |
362 | <LI><A href="codec-objects.html">4.9.1.1 Codec Objects</a> | |
363 | <LI><A href="stream-writer-objects.html">4.9.1.2 StreamWriter Objects</a> | |
364 | <LI><A href="stream-reader-objects.html">4.9.1.3 StreamReader Objects</a> | |
365 | <LI><A href="stream-reader-writer.html">4.9.1.4 StreamReaderWriter Objects</a> | |
366 | <LI><A href="stream-recoder-objects.html">4.9.1.5 StreamRecoder Objects</a> | |
367 | </ul> | |
368 | <LI><A href="standard-encodings.html">4.9.2 Standard Encodings</a> | |
369 | <LI><A href="module-encodings.idna.html">4.9.3 <tt class="module">encodings.idna</tt> -- | |
370 | Internationalized Domain Names in Applications</a> | |
371 | </ul> | |
372 | <!--End of Table of Child-Links--> | |
373 | </div> | |
374 | ||
375 | <DIV CLASS="navigation"> | |
376 | <div class='online-navigation'> | |
377 | <p></p><hr /> | |
378 | <table align="center" width="100%" cellpadding="0" cellspacing="2"> | |
379 | <tr> | |
380 | <td class='online-navigation'><a rel="prev" title="4.8 textwrap " | |
381 | href="module-textwrap.html"><img src='../icons/previous.png' | |
382 | border='0' height='32' alt='Previous Page' width='32' /></A></td> | |
383 | <td class='online-navigation'><a rel="parent" title="4. String Services" | |
384 | href="strings.html"><img src='../icons/up.png' | |
385 | border='0' height='32' alt='Up One Level' width='32' /></A></td> | |
386 | <td class='online-navigation'><a rel="next" title="4.9.1 Codec Base Classes" | |
387 | href="node130.html"><img src='../icons/next.png' | |
388 | border='0' height='32' alt='Next Page' width='32' /></A></td> | |
389 | <td align="center" width="100%">Python Library Reference</td> | |
390 | <td class='online-navigation'><a rel="contents" title="Table of Contents" | |
391 | href="contents.html"><img src='../icons/contents.png' | |
392 | border='0' height='32' alt='Contents' width='32' /></A></td> | |
393 | <td class='online-navigation'><a href="modindex.html" title="Module Index"><img src='../icons/modules.png' | |
394 | border='0' height='32' alt='Module Index' width='32' /></a></td> | |
395 | <td class='online-navigation'><a rel="index" title="Index" | |
396 | href="genindex.html"><img src='../icons/index.png' | |
397 | border='0' height='32' alt='Index' width='32' /></A></td> | |
398 | </tr></table> | |
399 | <div class='online-navigation'> | |
400 | <b class="navlabel">Previous:</b> | |
401 | <a class="sectref" rel="prev" href="module-textwrap.html">4.8 textwrap </A> | |
402 | <b class="navlabel">Up:</b> | |
403 | <a class="sectref" rel="parent" href="strings.html">4. String Services</A> | |
404 | <b class="navlabel">Next:</b> | |
405 | <a class="sectref" rel="next" href="node130.html">4.9.1 Codec Base Classes</A> | |
406 | </div> | |
407 | </div> | |
408 | <hr /> | |
409 | <span class="release-info">Release 2.4.2, documentation updated on 28 September 2005.</span> | |
410 | </DIV> | |
411 | <!--End of Navigation Panel--> | |
412 | <ADDRESS> | |
413 | See <i><a href="about.html">About this document...</a></i> for information on suggesting changes. | |
414 | </ADDRESS> | |
415 | </BODY> | |
416 | </HTML> |