Commit | Line | Data |
---|---|---|
920dae64 AT |
1 | <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"> |
2 | <html> | |
3 | <head> | |
4 | <link rel="STYLESHEET" href="api.css" type='text/css' /> | |
5 | <link rel="SHORTCUT ICON" href="../icons/pyfav.png" type="image/png" /> | |
6 | <link rel='start' href='../index.html' title='Python Documentation Index' /> | |
7 | <link rel="first" href="api.html" title='Python/C API Reference Manual' /> | |
8 | <link rel='contents' href='contents.html' title="Contents" /> | |
9 | <link rel='index' href='genindex.html' title='Index' /> | |
10 | <link rel='last' href='about.html' title='About this document...' /> | |
11 | <link rel='help' href='about.html' title='About this document...' /> | |
12 | <link rel="next" href="unicodeMethodsAndSlots.html" /> | |
13 | <link rel="prev" href="unicodeObjects.html" /> | |
14 | <link rel="parent" href="unicodeObjects.html" /> | |
15 | <link rel="next" href="unicodeMethodsAndSlots.html" /> | |
16 | <meta name='aesop' content='information' /> | |
17 | <title>7.3.2.1 Built-in Codecs </title> | |
18 | </head> | |
19 | <body> | |
20 | <DIV CLASS="navigation"> | |
21 | <div id='top-navigation-panel' xml:id='top-navigation-panel'> | |
22 | <table align="center" width="100%" cellpadding="0" cellspacing="2"> | |
23 | <tr> | |
24 | <td class='online-navigation'><a rel="prev" title="7.3.2 Unicode Objects" | |
25 | href="unicodeObjects.html"><img src='../icons/previous.png' | |
26 | border='0' height='32' alt='Previous Page' width='32' /></A></td> | |
27 | <td class='online-navigation'><a rel="parent" title="7.3.2 Unicode Objects" | |
28 | href="unicodeObjects.html"><img src='../icons/up.png' | |
29 | border='0' height='32' alt='Up One Level' width='32' /></A></td> | |
30 | <td class='online-navigation'><a rel="next" title="7.3.2.2 Methods and Slot" | |
31 | href="unicodeMethodsAndSlots.html"><img src='../icons/next.png' | |
32 | border='0' height='32' alt='Next Page' width='32' /></A></td> | |
33 | <td align="center" width="100%">Python/C API Reference Manual</td> | |
34 | <td class='online-navigation'><a rel="contents" title="Table of Contents" | |
35 | href="contents.html"><img src='../icons/contents.png' | |
36 | border='0' height='32' alt='Contents' width='32' /></A></td> | |
37 | <td class='online-navigation'><img src='../icons/blank.png' | |
38 | border='0' height='32' alt='' width='32' /></td> | |
39 | <td class='online-navigation'><a rel="index" title="Index" | |
40 | href="genindex.html"><img src='../icons/index.png' | |
41 | border='0' height='32' alt='Index' width='32' /></A></td> | |
42 | </tr></table> | |
43 | <div class='online-navigation'> | |
44 | <b class="navlabel">Previous:</b> | |
45 | <a class="sectref" rel="prev" href="unicodeObjects.html">7.3.2 Unicode Objects</A> | |
46 | <b class="navlabel">Up:</b> | |
47 | <a class="sectref" rel="parent" href="unicodeObjects.html">7.3.2 Unicode Objects</A> | |
48 | <b class="navlabel">Next:</b> | |
49 | <a class="sectref" rel="next" href="unicodeMethodsAndSlots.html">7.3.2.2 Methods and Slot</A> | |
50 | </div> | |
51 | <hr /></div> | |
52 | </DIV> | |
53 | <!--End of Navigation Panel--> | |
54 | ||
55 | <H3><A NAME="SECTION009321000000000000000"></A><A NAME="builtinCodecs"></A> | |
56 | <BR> | |
57 | 7.3.2.1 Built-in Codecs | |
58 | </H3> | |
59 | ||
60 | <P> | |
61 | Python provides a set of builtin codecs which are written in C | |
62 | for speed. All of these codecs are directly usable via the | |
63 | following functions. | |
64 | ||
65 | <P> | |
66 | Many of the following APIs take two arguments encoding and | |
67 | errors. These parameters encoding and errors have the same semantics | |
68 | as the ones of the builtin unicode() Unicode object constructor. | |
69 | ||
70 | <P> | |
71 | Setting encoding to <tt class="constant">NULL</tt> causes the default encoding to be used | |
72 | which is ASCII. The file system calls should use | |
73 | <tt class="cdata">Py_FileSystemDefaultEncoding</tt> as the encoding for file | |
74 | names. This variable should be treated as read-only: On some systems, | |
75 | it will be a pointer to a static string, on others, it will change at | |
76 | run-time (such as when the application invokes setlocale). | |
77 | ||
78 | <P> | |
79 | Error handling is set by errors which may also be set to <tt class="constant">NULL</tt> | |
80 | meaning to use the default handling defined for the codec. Default | |
81 | error handling for all builtin codecs is ``strict'' | |
82 | (<tt class="exception">ValueError</tt> is raised). | |
83 | ||
84 | <P> | |
85 | The codecs all use a similar interface. Only deviation from the | |
86 | following generic ones are documented for simplicity. | |
87 | ||
88 | <P> | |
89 | These are the generic codec APIs: | |
90 | ||
91 | <P> | |
92 | <dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"><td><nobr>PyObject* <b><tt id='l2h-506' xml:id='l2h-506' class="cfunction">PyUnicode_Decode</tt></b>(</nobr></td><td>const char *<var>s</var>, | |
93 | int <var>size</var>, | |
94 | const char *<var>encoding</var>, | |
95 | const char *<var>errors</var>)</td></tr></table></dt> | |
96 | <dd> | |
97 | <div class="refcount-info"> | |
98 | <span class="label">Return value:</span> | |
99 | <span class="value">New reference.</span> | |
100 | </div> | |
101 | Create a Unicode object by decoding <var>size</var> bytes of the encoded | |
102 | string <var>s</var>. <var>encoding</var> and <var>errors</var> have the same | |
103 | meaning as the parameters of the same name in the | |
104 | <tt class="function">unicode()</tt> builtin function. The codec to be used is | |
105 | looked up using the Python codec registry. Return <tt class="constant">NULL</tt> if an | |
106 | exception was raised by the codec. | |
107 | </dd></dl> | |
108 | ||
109 | <P> | |
110 | <dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"><td><nobr>PyObject* <b><tt id='l2h-507' xml:id='l2h-507' class="cfunction">PyUnicode_Encode</tt></b>(</nobr></td><td>const Py_UNICODE *<var>s</var>, | |
111 | int <var>size</var>, | |
112 | const char *<var>encoding</var>, | |
113 | const char *<var>errors</var>)</td></tr></table></dt> | |
114 | <dd> | |
115 | <div class="refcount-info"> | |
116 | <span class="label">Return value:</span> | |
117 | <span class="value">New reference.</span> | |
118 | </div> | |
119 | Encode the <tt class="ctype">Py_UNICODE</tt> buffer of the given size and return | |
120 | a Python string object. <var>encoding</var> and <var>errors</var> have the | |
121 | same meaning as the parameters of the same name in the Unicode | |
122 | <tt class="method">encode()</tt> method. The codec to be used is looked up using | |
123 | the Python codec registry. Return <tt class="constant">NULL</tt> if an exception was | |
124 | raised by the codec. | |
125 | </dd></dl> | |
126 | ||
127 | <P> | |
128 | <dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"><td><nobr>PyObject* <b><tt id='l2h-508' xml:id='l2h-508' class="cfunction">PyUnicode_AsEncodedString</tt></b>(</nobr></td><td>PyObject *<var>unicode</var>, | |
129 | const char *<var>encoding</var>, | |
130 | const char *<var>errors</var>)</td></tr></table></dt> | |
131 | <dd> | |
132 | <div class="refcount-info"> | |
133 | <span class="label">Return value:</span> | |
134 | <span class="value">New reference.</span> | |
135 | </div> | |
136 | Encode a Unicode object and return the result as Python string | |
137 | object. <var>encoding</var> and <var>errors</var> have the same meaning as the | |
138 | parameters of the same name in the Unicode <tt class="method">encode()</tt> method. | |
139 | The codec to be used is looked up using the Python codec registry. | |
140 | Return <tt class="constant">NULL</tt> if an exception was raised by the codec. | |
141 | </dd></dl> | |
142 | ||
143 | <P> | |
144 | These are the UTF-8 codec APIs: | |
145 | ||
146 | <P> | |
147 | <dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"><td><nobr>PyObject* <b><tt id='l2h-509' xml:id='l2h-509' class="cfunction">PyUnicode_DecodeUTF8</tt></b>(</nobr></td><td>const char *<var>s</var>, | |
148 | int <var>size</var>, | |
149 | const char *<var>errors</var>)</td></tr></table></dt> | |
150 | <dd> | |
151 | <div class="refcount-info"> | |
152 | <span class="label">Return value:</span> | |
153 | <span class="value">New reference.</span> | |
154 | </div> | |
155 | Create a Unicode object by decoding <var>size</var> bytes of the UTF-8 | |
156 | encoded string <var>s</var>. Return <tt class="constant">NULL</tt> if an exception was raised | |
157 | by the codec. | |
158 | </dd></dl> | |
159 | ||
160 | <P> | |
161 | <dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"><td><nobr>PyObject* <b><tt id='l2h-510' xml:id='l2h-510' class="cfunction">PyUnicode_DecodeUTF8Stateful</tt></b>(</nobr></td><td>const char *<var>s</var>, | |
162 | int <var>size</var>, | |
163 | const char *<var>errors</var>, | |
164 | int *<var>consumed</var>)</td></tr></table></dt> | |
165 | <dd> | |
166 | If <var>consumed</var> is <tt class="constant">NULL</tt>, behave like <tt class="cfunction">PyUnicode_DecodeUTF8()</tt>. | |
167 | If <var>consumed</var> is not <tt class="constant">NULL</tt>, trailing incomplete UTF-8 byte sequences | |
168 | will not be treated as an error. Those bytes will not be decoded and the | |
169 | number of bytes that have been decoded will be stored in <var>consumed</var>. | |
170 | ||
171 | <span class="versionnote">New in version 2.4.</span> | |
172 | ||
173 | </dd></dl> | |
174 | ||
175 | <P> | |
176 | <dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"><td><nobr>PyObject* <b><tt id='l2h-511' xml:id='l2h-511' class="cfunction">PyUnicode_EncodeUTF8</tt></b>(</nobr></td><td>const Py_UNICODE *<var>s</var>, | |
177 | int <var>size</var>, | |
178 | const char *<var>errors</var>)</td></tr></table></dt> | |
179 | <dd> | |
180 | <div class="refcount-info"> | |
181 | <span class="label">Return value:</span> | |
182 | <span class="value">New reference.</span> | |
183 | </div> | |
184 | Encode the <tt class="ctype">Py_UNICODE</tt> buffer of the given size using UTF-8 | |
185 | and return a Python string object. Return <tt class="constant">NULL</tt> if an exception | |
186 | was raised by the codec. | |
187 | </dd></dl> | |
188 | ||
189 | <P> | |
190 | <dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"><td><nobr>PyObject* <b><tt id='l2h-512' xml:id='l2h-512' class="cfunction">PyUnicode_AsUTF8String</tt></b>(</nobr></td><td>PyObject *<var>unicode</var>)</td></tr></table></dt> | |
191 | <dd> | |
192 | <div class="refcount-info"> | |
193 | <span class="label">Return value:</span> | |
194 | <span class="value">New reference.</span> | |
195 | </div> | |
196 | Encode a Unicode objects using UTF-8 and return the result as | |
197 | Python string object. Error handling is ``strict''. Return | |
198 | <tt class="constant">NULL</tt> if an exception was raised by the codec. | |
199 | </dd></dl> | |
200 | ||
201 | <P> | |
202 | These are the UTF-16 codec APIs: | |
203 | ||
204 | <P> | |
205 | <dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"><td><nobr>PyObject* <b><tt id='l2h-513' xml:id='l2h-513' class="cfunction">PyUnicode_DecodeUTF16</tt></b>(</nobr></td><td>const char *<var>s</var>, | |
206 | int <var>size</var>, | |
207 | const char *<var>errors</var>, | |
208 | int *<var>byteorder</var>)</td></tr></table></dt> | |
209 | <dd> | |
210 | <div class="refcount-info"> | |
211 | <span class="label">Return value:</span> | |
212 | <span class="value">New reference.</span> | |
213 | </div> | |
214 | Decode <var>length</var> bytes from a UTF-16 encoded buffer string and | |
215 | return the corresponding Unicode object. <var>errors</var> (if | |
216 | non-<tt class="constant">NULL</tt>) defines the error handling. It defaults to ``strict''. | |
217 | ||
218 | <P> | |
219 | If <var>byteorder</var> is non-<tt class="constant">NULL</tt>, the decoder starts decoding using | |
220 | the given byte order: | |
221 | ||
222 | <P> | |
223 | <div class="verbatim"><pre> | |
224 | *byteorder == -1: little endian | |
225 | *byteorder == 0: native order | |
226 | *byteorder == 1: big endian | |
227 | </pre></div> | |
228 | ||
229 | <P> | |
230 | and then switches according to all byte order marks (BOM) it finds | |
231 | in the input data. BOMs are not copied into the resulting Unicode | |
232 | string. After completion, <var>*byteorder</var> is set to the current | |
233 | byte order at the end of input data. | |
234 | ||
235 | <P> | |
236 | If <var>byteorder</var> is <tt class="constant">NULL</tt>, the codec starts in native order mode. | |
237 | ||
238 | <P> | |
239 | Return <tt class="constant">NULL</tt> if an exception was raised by the codec. | |
240 | </dd></dl> | |
241 | ||
242 | <P> | |
243 | <dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"><td><nobr>PyObject* <b><tt id='l2h-514' xml:id='l2h-514' class="cfunction">PyUnicode_DecodeUTF16Stateful</tt></b>(</nobr></td><td>const char *<var>s</var>, | |
244 | int <var>size</var>, | |
245 | const char *<var>errors</var>, | |
246 | int *<var>byteorder</var>, | |
247 | int *<var>consumed</var>)</td></tr></table></dt> | |
248 | <dd> | |
249 | If <var>consumed</var> is <tt class="constant">NULL</tt>, behave like | |
250 | <tt class="cfunction">PyUnicode_DecodeUTF16()</tt>. If <var>consumed</var> is not <tt class="constant">NULL</tt>, | |
251 | <tt class="cfunction">PyUnicode_DecodeUTF16Stateful()</tt> will not treat trailing incomplete | |
252 | UTF-16 byte sequences (i.e. an odd number of bytes or a split surrogate pair) | |
253 | as an error. Those bytes will not be decoded and the number of bytes that | |
254 | have been decoded will be stored in <var>consumed</var>. | |
255 | ||
256 | <span class="versionnote">New in version 2.4.</span> | |
257 | ||
258 | </dd></dl> | |
259 | ||
260 | <P> | |
261 | <dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"><td><nobr>PyObject* <b><tt id='l2h-515' xml:id='l2h-515' class="cfunction">PyUnicode_EncodeUTF16</tt></b>(</nobr></td><td>const Py_UNICODE *<var>s</var>, | |
262 | int <var>size</var>, | |
263 | const char *<var>errors</var>, | |
264 | int <var>byteorder</var>)</td></tr></table></dt> | |
265 | <dd> | |
266 | <div class="refcount-info"> | |
267 | <span class="label">Return value:</span> | |
268 | <span class="value">New reference.</span> | |
269 | </div> | |
270 | Return a Python string object holding the UTF-16 encoded value of | |
271 | the Unicode data in <var>s</var>. If <var>byteorder</var> is not <code>0</code>, | |
272 | output is written according to the following byte order: | |
273 | ||
274 | <P> | |
275 | <div class="verbatim"><pre> | |
276 | byteorder == -1: little endian | |
277 | byteorder == 0: native byte order (writes a BOM mark) | |
278 | byteorder == 1: big endian | |
279 | </pre></div> | |
280 | ||
281 | <P> | |
282 | If byteorder is <code>0</code>, the output string will always start with | |
283 | the Unicode BOM mark (U+FEFF). In the other two modes, no BOM mark | |
284 | is prepended. | |
285 | ||
286 | <P> | |
287 | If <var>Py_UNICODE_WIDE</var> is defined, a single <tt class="ctype">Py_UNICODE</tt> | |
288 | value may get represented as a surrogate pair. If it is not | |
289 | defined, each <tt class="ctype">Py_UNICODE</tt> values is interpreted as an | |
290 | UCS-2 character. | |
291 | ||
292 | <P> | |
293 | Return <tt class="constant">NULL</tt> if an exception was raised by the codec. | |
294 | </dd></dl> | |
295 | ||
296 | <P> | |
297 | <dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"><td><nobr>PyObject* <b><tt id='l2h-516' xml:id='l2h-516' class="cfunction">PyUnicode_AsUTF16String</tt></b>(</nobr></td><td>PyObject *<var>unicode</var>)</td></tr></table></dt> | |
298 | <dd> | |
299 | <div class="refcount-info"> | |
300 | <span class="label">Return value:</span> | |
301 | <span class="value">New reference.</span> | |
302 | </div> | |
303 | Return a Python string using the UTF-16 encoding in native byte | |
304 | order. The string always starts with a BOM mark. Error handling is | |
305 | ``strict''. Return <tt class="constant">NULL</tt> if an exception was raised by the | |
306 | codec. | |
307 | </dd></dl> | |
308 | ||
309 | <P> | |
310 | These are the ``Unicode Escape'' codec APIs: | |
311 | ||
312 | <P> | |
313 | <dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"><td><nobr>PyObject* <b><tt id='l2h-517' xml:id='l2h-517' class="cfunction">PyUnicode_DecodeUnicodeEscape</tt></b>(</nobr></td><td>const char *<var>s</var>, | |
314 | int <var>size</var>, | |
315 | const char *<var>errors</var>)</td></tr></table></dt> | |
316 | <dd> | |
317 | <div class="refcount-info"> | |
318 | <span class="label">Return value:</span> | |
319 | <span class="value">New reference.</span> | |
320 | </div> | |
321 | Create a Unicode object by decoding <var>size</var> bytes of the | |
322 | Unicode-Escape encoded string <var>s</var>. Return <tt class="constant">NULL</tt> if an | |
323 | exception was raised by the codec. | |
324 | </dd></dl> | |
325 | ||
326 | <P> | |
327 | <dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"><td><nobr>PyObject* <b><tt id='l2h-518' xml:id='l2h-518' class="cfunction">PyUnicode_EncodeUnicodeEscape</tt></b>(</nobr></td><td>const Py_UNICODE *<var>s</var>, | |
328 | int <var>size</var>, | |
329 | const char *<var>errors</var>)</td></tr></table></dt> | |
330 | <dd> | |
331 | <div class="refcount-info"> | |
332 | <span class="label">Return value:</span> | |
333 | <span class="value">New reference.</span> | |
334 | </div> | |
335 | Encode the <tt class="ctype">Py_UNICODE</tt> buffer of the given size using | |
336 | Unicode-Escape and return a Python string object. Return <tt class="constant">NULL</tt> | |
337 | if an exception was raised by the codec. | |
338 | </dd></dl> | |
339 | ||
340 | <P> | |
341 | <dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"><td><nobr>PyObject* <b><tt id='l2h-519' xml:id='l2h-519' class="cfunction">PyUnicode_AsUnicodeEscapeString</tt></b>(</nobr></td><td>PyObject *<var>unicode</var>)</td></tr></table></dt> | |
342 | <dd> | |
343 | <div class="refcount-info"> | |
344 | <span class="label">Return value:</span> | |
345 | <span class="value">New reference.</span> | |
346 | </div> | |
347 | Encode a Unicode objects using Unicode-Escape and return the | |
348 | result as Python string object. Error handling is ``strict''. | |
349 | Return <tt class="constant">NULL</tt> if an exception was raised by the codec. | |
350 | </dd></dl> | |
351 | ||
352 | <P> | |
353 | These are the ``Raw Unicode Escape'' codec APIs: | |
354 | ||
355 | <P> | |
356 | <dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"><td><nobr>PyObject* <b><tt id='l2h-520' xml:id='l2h-520' class="cfunction">PyUnicode_DecodeRawUnicodeEscape</tt></b>(</nobr></td><td>const char *<var>s</var>, | |
357 | int <var>size</var>, | |
358 | const char *<var>errors</var>)</td></tr></table></dt> | |
359 | <dd> | |
360 | <div class="refcount-info"> | |
361 | <span class="label">Return value:</span> | |
362 | <span class="value">New reference.</span> | |
363 | </div> | |
364 | Create a Unicode object by decoding <var>size</var> bytes of the | |
365 | Raw-Unicode-Escape encoded string <var>s</var>. Return <tt class="constant">NULL</tt> if an | |
366 | exception was raised by the codec. | |
367 | </dd></dl> | |
368 | ||
369 | <P> | |
370 | <dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"><td><nobr>PyObject* <b><tt id='l2h-521' xml:id='l2h-521' class="cfunction">PyUnicode_EncodeRawUnicodeEscape</tt></b>(</nobr></td><td>const Py_UNICODE *<var>s</var>, | |
371 | int <var>size</var>, | |
372 | const char *<var>errors</var>)</td></tr></table></dt> | |
373 | <dd> | |
374 | <div class="refcount-info"> | |
375 | <span class="label">Return value:</span> | |
376 | <span class="value">New reference.</span> | |
377 | </div> | |
378 | Encode the <tt class="ctype">Py_UNICODE</tt> buffer of the given size using | |
379 | Raw-Unicode-Escape and return a Python string object. Return | |
380 | <tt class="constant">NULL</tt> if an exception was raised by the codec. | |
381 | </dd></dl> | |
382 | ||
383 | <P> | |
384 | <dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"><td><nobr>PyObject* <b><tt id='l2h-522' xml:id='l2h-522' class="cfunction">PyUnicode_AsRawUnicodeEscapeString</tt></b>(</nobr></td><td>PyObject *<var>unicode</var>)</td></tr></table></dt> | |
385 | <dd> | |
386 | <div class="refcount-info"> | |
387 | <span class="label">Return value:</span> | |
388 | <span class="value">New reference.</span> | |
389 | </div> | |
390 | Encode a Unicode objects using Raw-Unicode-Escape and return the | |
391 | result as Python string object. Error handling is ``strict''. | |
392 | Return <tt class="constant">NULL</tt> if an exception was raised by the codec. | |
393 | </dd></dl> | |
394 | ||
395 | <P> | |
396 | These are the Latin-1 codec APIs: | |
397 | Latin-1 corresponds to the first 256 Unicode ordinals and only these | |
398 | are accepted by the codecs during encoding. | |
399 | ||
400 | <P> | |
401 | <dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"><td><nobr>PyObject* <b><tt id='l2h-523' xml:id='l2h-523' class="cfunction">PyUnicode_DecodeLatin1</tt></b>(</nobr></td><td>const char *<var>s</var>, | |
402 | int <var>size</var>, | |
403 | const char *<var>errors</var>)</td></tr></table></dt> | |
404 | <dd> | |
405 | <div class="refcount-info"> | |
406 | <span class="label">Return value:</span> | |
407 | <span class="value">New reference.</span> | |
408 | </div> | |
409 | Create a Unicode object by decoding <var>size</var> bytes of the Latin-1 | |
410 | encoded string <var>s</var>. Return <tt class="constant">NULL</tt> if an exception was raised | |
411 | by the codec. | |
412 | </dd></dl> | |
413 | ||
414 | <P> | |
415 | <dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"><td><nobr>PyObject* <b><tt id='l2h-524' xml:id='l2h-524' class="cfunction">PyUnicode_EncodeLatin1</tt></b>(</nobr></td><td>const Py_UNICODE *<var>s</var>, | |
416 | int <var>size</var>, | |
417 | const char *<var>errors</var>)</td></tr></table></dt> | |
418 | <dd> | |
419 | <div class="refcount-info"> | |
420 | <span class="label">Return value:</span> | |
421 | <span class="value">New reference.</span> | |
422 | </div> | |
423 | Encode the <tt class="ctype">Py_UNICODE</tt> buffer of the given size using | |
424 | Latin-1 and return a Python string object. Return <tt class="constant">NULL</tt> if an | |
425 | exception was raised by the codec. | |
426 | </dd></dl> | |
427 | ||
428 | <P> | |
429 | <dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"><td><nobr>PyObject* <b><tt id='l2h-525' xml:id='l2h-525' class="cfunction">PyUnicode_AsLatin1String</tt></b>(</nobr></td><td>PyObject *<var>unicode</var>)</td></tr></table></dt> | |
430 | <dd> | |
431 | <div class="refcount-info"> | |
432 | <span class="label">Return value:</span> | |
433 | <span class="value">New reference.</span> | |
434 | </div> | |
435 | Encode a Unicode objects using Latin-1 and return the result as | |
436 | Python string object. Error handling is ``strict''. Return | |
437 | <tt class="constant">NULL</tt> if an exception was raised by the codec. | |
438 | </dd></dl> | |
439 | ||
440 | <P> | |
441 | These are the ASCII codec APIs. Only 7-bit ASCII data is | |
442 | accepted. All other codes generate errors. | |
443 | ||
444 | <P> | |
445 | <dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"><td><nobr>PyObject* <b><tt id='l2h-526' xml:id='l2h-526' class="cfunction">PyUnicode_DecodeASCII</tt></b>(</nobr></td><td>const char *<var>s</var>, | |
446 | int <var>size</var>, | |
447 | const char *<var>errors</var>)</td></tr></table></dt> | |
448 | <dd> | |
449 | <div class="refcount-info"> | |
450 | <span class="label">Return value:</span> | |
451 | <span class="value">New reference.</span> | |
452 | </div> | |
453 | Create a Unicode object by decoding <var>size</var> bytes of the | |
454 | ASCII encoded string <var>s</var>. Return <tt class="constant">NULL</tt> if an exception | |
455 | was raised by the codec. | |
456 | </dd></dl> | |
457 | ||
458 | <P> | |
459 | <dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"><td><nobr>PyObject* <b><tt id='l2h-527' xml:id='l2h-527' class="cfunction">PyUnicode_EncodeASCII</tt></b>(</nobr></td><td>const Py_UNICODE *<var>s</var>, | |
460 | int <var>size</var>, | |
461 | const char *<var>errors</var>)</td></tr></table></dt> | |
462 | <dd> | |
463 | <div class="refcount-info"> | |
464 | <span class="label">Return value:</span> | |
465 | <span class="value">New reference.</span> | |
466 | </div> | |
467 | Encode the <tt class="ctype">Py_UNICODE</tt> buffer of the given size using | |
468 | ASCII and return a Python string object. Return <tt class="constant">NULL</tt> if an | |
469 | exception was raised by the codec. | |
470 | </dd></dl> | |
471 | ||
472 | <P> | |
473 | <dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"><td><nobr>PyObject* <b><tt id='l2h-528' xml:id='l2h-528' class="cfunction">PyUnicode_AsASCIIString</tt></b>(</nobr></td><td>PyObject *<var>unicode</var>)</td></tr></table></dt> | |
474 | <dd> | |
475 | <div class="refcount-info"> | |
476 | <span class="label">Return value:</span> | |
477 | <span class="value">New reference.</span> | |
478 | </div> | |
479 | Encode a Unicode objects using ASCII and return the result as | |
480 | Python string object. Error handling is ``strict''. Return | |
481 | <tt class="constant">NULL</tt> if an exception was raised by the codec. | |
482 | </dd></dl> | |
483 | ||
484 | <P> | |
485 | These are the mapping codec APIs: | |
486 | ||
487 | <P> | |
488 | This codec is special in that it can be used to implement many | |
489 | different codecs (and this is in fact what was done to obtain most of | |
490 | the standard codecs included in the <tt class="module">encodings</tt> package). The | |
491 | codec uses mapping to encode and decode characters. | |
492 | ||
493 | <P> | |
494 | Decoding mappings must map single string characters to single Unicode | |
495 | characters, integers (which are then interpreted as Unicode ordinals) | |
496 | or None (meaning "undefined mapping" and causing an error). | |
497 | ||
498 | <P> | |
499 | Encoding mappings must map single Unicode characters to single string | |
500 | characters, integers (which are then interpreted as Latin-1 ordinals) | |
501 | or None (meaning "undefined mapping" and causing an error). | |
502 | ||
503 | <P> | |
504 | The mapping objects provided must only support the __getitem__ mapping | |
505 | interface. | |
506 | ||
507 | <P> | |
508 | If a character lookup fails with a LookupError, the character is | |
509 | copied as-is meaning that its ordinal value will be interpreted as | |
510 | Unicode or Latin-1 ordinal resp. Because of this, mappings only need | |
511 | to contain those mappings which map characters to different code | |
512 | points. | |
513 | ||
514 | <P> | |
515 | <dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"><td><nobr>PyObject* <b><tt id='l2h-529' xml:id='l2h-529' class="cfunction">PyUnicode_DecodeCharmap</tt></b>(</nobr></td><td>const char *<var>s</var>, | |
516 | int <var>size</var>, | |
517 | PyObject *<var>mapping</var>, | |
518 | const char *<var>errors</var>)</td></tr></table></dt> | |
519 | <dd> | |
520 | <div class="refcount-info"> | |
521 | <span class="label">Return value:</span> | |
522 | <span class="value">New reference.</span> | |
523 | </div> | |
524 | Create a Unicode object by decoding <var>size</var> bytes of the encoded | |
525 | string <var>s</var> using the given <var>mapping</var> object. Return | |
526 | <tt class="constant">NULL</tt> if an exception was raised by the codec. | |
527 | </dd></dl> | |
528 | ||
529 | <P> | |
530 | <dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"><td><nobr>PyObject* <b><tt id='l2h-530' xml:id='l2h-530' class="cfunction">PyUnicode_EncodeCharmap</tt></b>(</nobr></td><td>const Py_UNICODE *<var>s</var>, | |
531 | int <var>size</var>, | |
532 | PyObject *<var>mapping</var>, | |
533 | const char *<var>errors</var>)</td></tr></table></dt> | |
534 | <dd> | |
535 | <div class="refcount-info"> | |
536 | <span class="label">Return value:</span> | |
537 | <span class="value">New reference.</span> | |
538 | </div> | |
539 | Encode the <tt class="ctype">Py_UNICODE</tt> buffer of the given size using the | |
540 | given <var>mapping</var> object and return a Python string object. | |
541 | Return <tt class="constant">NULL</tt> if an exception was raised by the codec. | |
542 | </dd></dl> | |
543 | ||
544 | <P> | |
545 | <dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"><td><nobr>PyObject* <b><tt id='l2h-531' xml:id='l2h-531' class="cfunction">PyUnicode_AsCharmapString</tt></b>(</nobr></td><td>PyObject *<var>unicode</var>, | |
546 | PyObject *<var>mapping</var>)</td></tr></table></dt> | |
547 | <dd> | |
548 | <div class="refcount-info"> | |
549 | <span class="label">Return value:</span> | |
550 | <span class="value">New reference.</span> | |
551 | </div> | |
552 | Encode a Unicode objects using the given <var>mapping</var> object and | |
553 | return the result as Python string object. Error handling is | |
554 | ``strict''. Return <tt class="constant">NULL</tt> if an exception was raised by the | |
555 | codec. | |
556 | </dd></dl> | |
557 | ||
558 | <P> | |
559 | The following codec API is special in that maps Unicode to Unicode. | |
560 | ||
561 | <P> | |
562 | <dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"><td><nobr>PyObject* <b><tt id='l2h-532' xml:id='l2h-532' class="cfunction">PyUnicode_TranslateCharmap</tt></b>(</nobr></td><td>const Py_UNICODE *<var>s</var>, | |
563 | int <var>size</var>, | |
564 | PyObject *<var>table</var>, | |
565 | const char *<var>errors</var>)</td></tr></table></dt> | |
566 | <dd> | |
567 | <div class="refcount-info"> | |
568 | <span class="label">Return value:</span> | |
569 | <span class="value">New reference.</span> | |
570 | </div> | |
571 | Translate a <tt class="ctype">Py_UNICODE</tt> buffer of the given length by | |
572 | applying a character mapping <var>table</var> to it and return the | |
573 | resulting Unicode object. Return <tt class="constant">NULL</tt> when an exception was | |
574 | raised by the codec. | |
575 | ||
576 | <P> | |
577 | The <var>mapping</var> table must map Unicode ordinal integers to Unicode | |
578 | ordinal integers or None (causing deletion of the character). | |
579 | ||
580 | <P> | |
581 | Mapping tables need only provide the method__getitem__() | |
582 | interface; dictionaries and sequences work well. Unmapped character | |
583 | ordinals (ones which cause a <tt class="exception">LookupError</tt>) are left | |
584 | untouched and are copied as-is. | |
585 | </dd></dl> | |
586 | ||
587 | <P> | |
588 | These are the MBCS codec APIs. They are currently only available on | |
589 | Windows and use the Win32 MBCS converters to implement the | |
590 | conversions. Note that MBCS (or DBCS) is a class of encodings, not | |
591 | just one. The target encoding is defined by the user settings on the | |
592 | machine running the codec. | |
593 | ||
594 | <P> | |
595 | <dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"><td><nobr>PyObject* <b><tt id='l2h-533' xml:id='l2h-533' class="cfunction">PyUnicode_DecodeMBCS</tt></b>(</nobr></td><td>const char *<var>s</var>, | |
596 | int <var>size</var>, | |
597 | const char *<var>errors</var>)</td></tr></table></dt> | |
598 | <dd> | |
599 | <div class="refcount-info"> | |
600 | <span class="label">Return value:</span> | |
601 | <span class="value">New reference.</span> | |
602 | </div> | |
603 | Create a Unicode object by decoding <var>size</var> bytes of the MBCS | |
604 | encoded string <var>s</var>. Return <tt class="constant">NULL</tt> if an exception was | |
605 | raised by the codec. | |
606 | </dd></dl> | |
607 | ||
608 | <P> | |
609 | <dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"><td><nobr>PyObject* <b><tt id='l2h-534' xml:id='l2h-534' class="cfunction">PyUnicode_EncodeMBCS</tt></b>(</nobr></td><td>const Py_UNICODE *<var>s</var>, | |
610 | int <var>size</var>, | |
611 | const char *<var>errors</var>)</td></tr></table></dt> | |
612 | <dd> | |
613 | <div class="refcount-info"> | |
614 | <span class="label">Return value:</span> | |
615 | <span class="value">New reference.</span> | |
616 | </div> | |
617 | Encode the <tt class="ctype">Py_UNICODE</tt> buffer of the given size using MBCS | |
618 | and return a Python string object. Return <tt class="constant">NULL</tt> if an exception | |
619 | was raised by the codec. | |
620 | </dd></dl> | |
621 | ||
622 | <P> | |
623 | <dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"><td><nobr>PyObject* <b><tt id='l2h-535' xml:id='l2h-535' class="cfunction">PyUnicode_AsMBCSString</tt></b>(</nobr></td><td>PyObject *<var>unicode</var>)</td></tr></table></dt> | |
624 | <dd> | |
625 | <div class="refcount-info"> | |
626 | <span class="label">Return value:</span> | |
627 | <span class="value">New reference.</span> | |
628 | </div> | |
629 | Encode a Unicode objects using MBCS and return the result as | |
630 | Python string object. Error handling is ``strict''. Return | |
631 | <tt class="constant">NULL</tt> if an exception was raised by the codec. | |
632 | </dd></dl> | |
633 | ||
634 | <P> | |
635 | ||
636 | <DIV CLASS="navigation"> | |
637 | <div class='online-navigation'> | |
638 | <p></p><hr /> | |
639 | <table align="center" width="100%" cellpadding="0" cellspacing="2"> | |
640 | <tr> | |
641 | <td class='online-navigation'><a rel="prev" title="7.3.2 Unicode Objects" | |
642 | href="unicodeObjects.html"><img src='../icons/previous.png' | |
643 | border='0' height='32' alt='Previous Page' width='32' /></A></td> | |
644 | <td class='online-navigation'><a rel="parent" title="7.3.2 Unicode Objects" | |
645 | href="unicodeObjects.html"><img src='../icons/up.png' | |
646 | border='0' height='32' alt='Up One Level' width='32' /></A></td> | |
647 | <td class='online-navigation'><a rel="next" title="7.3.2.2 Methods and Slot" | |
648 | href="unicodeMethodsAndSlots.html"><img src='../icons/next.png' | |
649 | border='0' height='32' alt='Next Page' width='32' /></A></td> | |
650 | <td align="center" width="100%">Python/C API Reference Manual</td> | |
651 | <td class='online-navigation'><a rel="contents" title="Table of Contents" | |
652 | href="contents.html"><img src='../icons/contents.png' | |
653 | border='0' height='32' alt='Contents' width='32' /></A></td> | |
654 | <td class='online-navigation'><img src='../icons/blank.png' | |
655 | border='0' height='32' alt='' width='32' /></td> | |
656 | <td class='online-navigation'><a rel="index" title="Index" | |
657 | href="genindex.html"><img src='../icons/index.png' | |
658 | border='0' height='32' alt='Index' width='32' /></A></td> | |
659 | </tr></table> | |
660 | <div class='online-navigation'> | |
661 | <b class="navlabel">Previous:</b> | |
662 | <a class="sectref" rel="prev" href="unicodeObjects.html">7.3.2 Unicode Objects</A> | |
663 | <b class="navlabel">Up:</b> | |
664 | <a class="sectref" rel="parent" href="unicodeObjects.html">7.3.2 Unicode Objects</A> | |
665 | <b class="navlabel">Next:</b> | |
666 | <a class="sectref" rel="next" href="unicodeMethodsAndSlots.html">7.3.2.2 Methods and Slot</A> | |
667 | </div> | |
668 | </div> | |
669 | <hr /> | |
670 | <span class="release-info">Release 2.4.2, documentation updated on 28 September 2005.</span> | |
671 | </DIV> | |
672 | <!--End of Navigation Panel--> | |
673 | <ADDRESS> | |
674 | See <i><a href="about.html">About this document...</a></i> for information on suggesting changes. | |
675 | </ADDRESS> | |
676 | </BODY> | |
677 | </HTML> |