| 1 | <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"> |
| 2 | <html> |
| 3 | <head> |
| 4 | <link rel="STYLESHEET" href="api.css" type='text/css' /> |
| 5 | <link rel="SHORTCUT ICON" href="../icons/pyfav.png" type="image/png" /> |
| 6 | <link rel='start' href='../index.html' title='Python Documentation Index' /> |
| 7 | <link rel="first" href="api.html" title='Python/C API Reference Manual' /> |
| 8 | <link rel='contents' href='contents.html' title="Contents" /> |
| 9 | <link rel='index' href='genindex.html' title='Index' /> |
| 10 | <link rel='last' href='about.html' title='About this document...' /> |
| 11 | <link rel='help' href='about.html' title='About this document...' /> |
| 12 | <link rel="next" href="unicodeMethodsAndSlots.html" /> |
| 13 | <link rel="prev" href="unicodeObjects.html" /> |
| 14 | <link rel="parent" href="unicodeObjects.html" /> |
| 15 | <link rel="next" href="unicodeMethodsAndSlots.html" /> |
| 16 | <meta name='aesop' content='information' /> |
| 17 | <title>7.3.2.1 Built-in Codecs </title> |
| 18 | </head> |
| 19 | <body> |
| 20 | <DIV CLASS="navigation"> |
| 21 | <div id='top-navigation-panel' xml:id='top-navigation-panel'> |
| 22 | <table align="center" width="100%" cellpadding="0" cellspacing="2"> |
| 23 | <tr> |
| 24 | <td class='online-navigation'><a rel="prev" title="7.3.2 Unicode Objects" |
| 25 | href="unicodeObjects.html"><img src='../icons/previous.png' |
| 26 | border='0' height='32' alt='Previous Page' width='32' /></A></td> |
| 27 | <td class='online-navigation'><a rel="parent" title="7.3.2 Unicode Objects" |
| 28 | href="unicodeObjects.html"><img src='../icons/up.png' |
| 29 | border='0' height='32' alt='Up One Level' width='32' /></A></td> |
| 30 | <td class='online-navigation'><a rel="next" title="7.3.2.2 Methods and Slot" |
| 31 | href="unicodeMethodsAndSlots.html"><img src='../icons/next.png' |
| 32 | border='0' height='32' alt='Next Page' width='32' /></A></td> |
| 33 | <td align="center" width="100%">Python/C API Reference Manual</td> |
| 34 | <td class='online-navigation'><a rel="contents" title="Table of Contents" |
| 35 | href="contents.html"><img src='../icons/contents.png' |
| 36 | border='0' height='32' alt='Contents' width='32' /></A></td> |
| 37 | <td class='online-navigation'><img src='../icons/blank.png' |
| 38 | border='0' height='32' alt='' width='32' /></td> |
| 39 | <td class='online-navigation'><a rel="index" title="Index" |
| 40 | href="genindex.html"><img src='../icons/index.png' |
| 41 | border='0' height='32' alt='Index' width='32' /></A></td> |
| 42 | </tr></table> |
| 43 | <div class='online-navigation'> |
| 44 | <b class="navlabel">Previous:</b> |
| 45 | <a class="sectref" rel="prev" href="unicodeObjects.html">7.3.2 Unicode Objects</A> |
| 46 | <b class="navlabel">Up:</b> |
| 47 | <a class="sectref" rel="parent" href="unicodeObjects.html">7.3.2 Unicode Objects</A> |
| 48 | <b class="navlabel">Next:</b> |
| 49 | <a class="sectref" rel="next" href="unicodeMethodsAndSlots.html">7.3.2.2 Methods and Slot</A> |
| 50 | </div> |
| 51 | <hr /></div> |
| 52 | </DIV> |
| 53 | <!--End of Navigation Panel--> |
| 54 | |
| 55 | <H3><A NAME="SECTION009321000000000000000"></A><A NAME="builtinCodecs"></A> |
| 56 | <BR> |
| 57 | 7.3.2.1 Built-in Codecs |
| 58 | </H3> |
| 59 | |
| 60 | <P> |
| 61 | Python provides a set of builtin codecs which are written in C |
| 62 | for speed. All of these codecs are directly usable via the |
| 63 | following functions. |
| 64 | |
| 65 | <P> |
| 66 | Many of the following APIs take two arguments encoding and |
| 67 | errors. These parameters encoding and errors have the same semantics |
| 68 | as the ones of the builtin unicode() Unicode object constructor. |
| 69 | |
| 70 | <P> |
| 71 | Setting encoding to <tt class="constant">NULL</tt> causes the default encoding to be used |
| 72 | which is ASCII. The file system calls should use |
| 73 | <tt class="cdata">Py_FileSystemDefaultEncoding</tt> as the encoding for file |
| 74 | names. This variable should be treated as read-only: On some systems, |
| 75 | it will be a pointer to a static string, on others, it will change at |
| 76 | run-time (such as when the application invokes setlocale). |
| 77 | |
| 78 | <P> |
| 79 | Error handling is set by errors which may also be set to <tt class="constant">NULL</tt> |
| 80 | meaning to use the default handling defined for the codec. Default |
| 81 | error handling for all builtin codecs is ``strict'' |
| 82 | (<tt class="exception">ValueError</tt> is raised). |
| 83 | |
| 84 | <P> |
| 85 | The codecs all use a similar interface. Only deviation from the |
| 86 | following generic ones are documented for simplicity. |
| 87 | |
| 88 | <P> |
| 89 | These are the generic codec APIs: |
| 90 | |
| 91 | <P> |
| 92 | <dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"><td><nobr>PyObject* <b><tt id='l2h-506' xml:id='l2h-506' class="cfunction">PyUnicode_Decode</tt></b>(</nobr></td><td>const char *<var>s</var>, |
| 93 | int <var>size</var>, |
| 94 | const char *<var>encoding</var>, |
| 95 | const char *<var>errors</var>)</td></tr></table></dt> |
| 96 | <dd> |
| 97 | <div class="refcount-info"> |
| 98 | <span class="label">Return value:</span> |
| 99 | <span class="value">New reference.</span> |
| 100 | </div> |
| 101 | Create a Unicode object by decoding <var>size</var> bytes of the encoded |
| 102 | string <var>s</var>. <var>encoding</var> and <var>errors</var> have the same |
| 103 | meaning as the parameters of the same name in the |
| 104 | <tt class="function">unicode()</tt> builtin function. The codec to be used is |
| 105 | looked up using the Python codec registry. Return <tt class="constant">NULL</tt> if an |
| 106 | exception was raised by the codec. |
| 107 | </dd></dl> |
| 108 | |
| 109 | <P> |
| 110 | <dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"><td><nobr>PyObject* <b><tt id='l2h-507' xml:id='l2h-507' class="cfunction">PyUnicode_Encode</tt></b>(</nobr></td><td>const Py_UNICODE *<var>s</var>, |
| 111 | int <var>size</var>, |
| 112 | const char *<var>encoding</var>, |
| 113 | const char *<var>errors</var>)</td></tr></table></dt> |
| 114 | <dd> |
| 115 | <div class="refcount-info"> |
| 116 | <span class="label">Return value:</span> |
| 117 | <span class="value">New reference.</span> |
| 118 | </div> |
| 119 | Encode the <tt class="ctype">Py_UNICODE</tt> buffer of the given size and return |
| 120 | a Python string object. <var>encoding</var> and <var>errors</var> have the |
| 121 | same meaning as the parameters of the same name in the Unicode |
| 122 | <tt class="method">encode()</tt> method. The codec to be used is looked up using |
| 123 | the Python codec registry. Return <tt class="constant">NULL</tt> if an exception was |
| 124 | raised by the codec. |
| 125 | </dd></dl> |
| 126 | |
| 127 | <P> |
| 128 | <dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"><td><nobr>PyObject* <b><tt id='l2h-508' xml:id='l2h-508' class="cfunction">PyUnicode_AsEncodedString</tt></b>(</nobr></td><td>PyObject *<var>unicode</var>, |
| 129 | const char *<var>encoding</var>, |
| 130 | const char *<var>errors</var>)</td></tr></table></dt> |
| 131 | <dd> |
| 132 | <div class="refcount-info"> |
| 133 | <span class="label">Return value:</span> |
| 134 | <span class="value">New reference.</span> |
| 135 | </div> |
| 136 | Encode a Unicode object and return the result as Python string |
| 137 | object. <var>encoding</var> and <var>errors</var> have the same meaning as the |
| 138 | parameters of the same name in the Unicode <tt class="method">encode()</tt> method. |
| 139 | The codec to be used is looked up using the Python codec registry. |
| 140 | Return <tt class="constant">NULL</tt> if an exception was raised by the codec. |
| 141 | </dd></dl> |
| 142 | |
| 143 | <P> |
| 144 | These are the UTF-8 codec APIs: |
| 145 | |
| 146 | <P> |
| 147 | <dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"><td><nobr>PyObject* <b><tt id='l2h-509' xml:id='l2h-509' class="cfunction">PyUnicode_DecodeUTF8</tt></b>(</nobr></td><td>const char *<var>s</var>, |
| 148 | int <var>size</var>, |
| 149 | const char *<var>errors</var>)</td></tr></table></dt> |
| 150 | <dd> |
| 151 | <div class="refcount-info"> |
| 152 | <span class="label">Return value:</span> |
| 153 | <span class="value">New reference.</span> |
| 154 | </div> |
| 155 | Create a Unicode object by decoding <var>size</var> bytes of the UTF-8 |
| 156 | encoded string <var>s</var>. Return <tt class="constant">NULL</tt> if an exception was raised |
| 157 | by the codec. |
| 158 | </dd></dl> |
| 159 | |
| 160 | <P> |
| 161 | <dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"><td><nobr>PyObject* <b><tt id='l2h-510' xml:id='l2h-510' class="cfunction">PyUnicode_DecodeUTF8Stateful</tt></b>(</nobr></td><td>const char *<var>s</var>, |
| 162 | int <var>size</var>, |
| 163 | const char *<var>errors</var>, |
| 164 | int *<var>consumed</var>)</td></tr></table></dt> |
| 165 | <dd> |
| 166 | If <var>consumed</var> is <tt class="constant">NULL</tt>, behave like <tt class="cfunction">PyUnicode_DecodeUTF8()</tt>. |
| 167 | If <var>consumed</var> is not <tt class="constant">NULL</tt>, trailing incomplete UTF-8 byte sequences |
| 168 | will not be treated as an error. Those bytes will not be decoded and the |
| 169 | number of bytes that have been decoded will be stored in <var>consumed</var>. |
| 170 | |
| 171 | <span class="versionnote">New in version 2.4.</span> |
| 172 | |
| 173 | </dd></dl> |
| 174 | |
| 175 | <P> |
| 176 | <dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"><td><nobr>PyObject* <b><tt id='l2h-511' xml:id='l2h-511' class="cfunction">PyUnicode_EncodeUTF8</tt></b>(</nobr></td><td>const Py_UNICODE *<var>s</var>, |
| 177 | int <var>size</var>, |
| 178 | const char *<var>errors</var>)</td></tr></table></dt> |
| 179 | <dd> |
| 180 | <div class="refcount-info"> |
| 181 | <span class="label">Return value:</span> |
| 182 | <span class="value">New reference.</span> |
| 183 | </div> |
| 184 | Encode the <tt class="ctype">Py_UNICODE</tt> buffer of the given size using UTF-8 |
| 185 | and return a Python string object. Return <tt class="constant">NULL</tt> if an exception |
| 186 | was raised by the codec. |
| 187 | </dd></dl> |
| 188 | |
| 189 | <P> |
| 190 | <dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"><td><nobr>PyObject* <b><tt id='l2h-512' xml:id='l2h-512' class="cfunction">PyUnicode_AsUTF8String</tt></b>(</nobr></td><td>PyObject *<var>unicode</var>)</td></tr></table></dt> |
| 191 | <dd> |
| 192 | <div class="refcount-info"> |
| 193 | <span class="label">Return value:</span> |
| 194 | <span class="value">New reference.</span> |
| 195 | </div> |
| 196 | Encode a Unicode objects using UTF-8 and return the result as |
| 197 | Python string object. Error handling is ``strict''. Return |
| 198 | <tt class="constant">NULL</tt> if an exception was raised by the codec. |
| 199 | </dd></dl> |
| 200 | |
| 201 | <P> |
| 202 | These are the UTF-16 codec APIs: |
| 203 | |
| 204 | <P> |
| 205 | <dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"><td><nobr>PyObject* <b><tt id='l2h-513' xml:id='l2h-513' class="cfunction">PyUnicode_DecodeUTF16</tt></b>(</nobr></td><td>const char *<var>s</var>, |
| 206 | int <var>size</var>, |
| 207 | const char *<var>errors</var>, |
| 208 | int *<var>byteorder</var>)</td></tr></table></dt> |
| 209 | <dd> |
| 210 | <div class="refcount-info"> |
| 211 | <span class="label">Return value:</span> |
| 212 | <span class="value">New reference.</span> |
| 213 | </div> |
| 214 | Decode <var>length</var> bytes from a UTF-16 encoded buffer string and |
| 215 | return the corresponding Unicode object. <var>errors</var> (if |
| 216 | non-<tt class="constant">NULL</tt>) defines the error handling. It defaults to ``strict''. |
| 217 | |
| 218 | <P> |
| 219 | If <var>byteorder</var> is non-<tt class="constant">NULL</tt>, the decoder starts decoding using |
| 220 | the given byte order: |
| 221 | |
| 222 | <P> |
| 223 | <div class="verbatim"><pre> |
| 224 | *byteorder == -1: little endian |
| 225 | *byteorder == 0: native order |
| 226 | *byteorder == 1: big endian |
| 227 | </pre></div> |
| 228 | |
| 229 | <P> |
| 230 | and then switches according to all byte order marks (BOM) it finds |
| 231 | in the input data. BOMs are not copied into the resulting Unicode |
| 232 | string. After completion, <var>*byteorder</var> is set to the current |
| 233 | byte order at the end of input data. |
| 234 | |
| 235 | <P> |
| 236 | If <var>byteorder</var> is <tt class="constant">NULL</tt>, the codec starts in native order mode. |
| 237 | |
| 238 | <P> |
| 239 | Return <tt class="constant">NULL</tt> if an exception was raised by the codec. |
| 240 | </dd></dl> |
| 241 | |
| 242 | <P> |
| 243 | <dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"><td><nobr>PyObject* <b><tt id='l2h-514' xml:id='l2h-514' class="cfunction">PyUnicode_DecodeUTF16Stateful</tt></b>(</nobr></td><td>const char *<var>s</var>, |
| 244 | int <var>size</var>, |
| 245 | const char *<var>errors</var>, |
| 246 | int *<var>byteorder</var>, |
| 247 | int *<var>consumed</var>)</td></tr></table></dt> |
| 248 | <dd> |
| 249 | If <var>consumed</var> is <tt class="constant">NULL</tt>, behave like |
| 250 | <tt class="cfunction">PyUnicode_DecodeUTF16()</tt>. If <var>consumed</var> is not <tt class="constant">NULL</tt>, |
| 251 | <tt class="cfunction">PyUnicode_DecodeUTF16Stateful()</tt> will not treat trailing incomplete |
| 252 | UTF-16 byte sequences (i.e. an odd number of bytes or a split surrogate pair) |
| 253 | as an error. Those bytes will not be decoded and the number of bytes that |
| 254 | have been decoded will be stored in <var>consumed</var>. |
| 255 | |
| 256 | <span class="versionnote">New in version 2.4.</span> |
| 257 | |
| 258 | </dd></dl> |
| 259 | |
| 260 | <P> |
| 261 | <dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"><td><nobr>PyObject* <b><tt id='l2h-515' xml:id='l2h-515' class="cfunction">PyUnicode_EncodeUTF16</tt></b>(</nobr></td><td>const Py_UNICODE *<var>s</var>, |
| 262 | int <var>size</var>, |
| 263 | const char *<var>errors</var>, |
| 264 | int <var>byteorder</var>)</td></tr></table></dt> |
| 265 | <dd> |
| 266 | <div class="refcount-info"> |
| 267 | <span class="label">Return value:</span> |
| 268 | <span class="value">New reference.</span> |
| 269 | </div> |
| 270 | Return a Python string object holding the UTF-16 encoded value of |
| 271 | the Unicode data in <var>s</var>. If <var>byteorder</var> is not <code>0</code>, |
| 272 | output is written according to the following byte order: |
| 273 | |
| 274 | <P> |
| 275 | <div class="verbatim"><pre> |
| 276 | byteorder == -1: little endian |
| 277 | byteorder == 0: native byte order (writes a BOM mark) |
| 278 | byteorder == 1: big endian |
| 279 | </pre></div> |
| 280 | |
| 281 | <P> |
| 282 | If byteorder is <code>0</code>, the output string will always start with |
| 283 | the Unicode BOM mark (U+FEFF). In the other two modes, no BOM mark |
| 284 | is prepended. |
| 285 | |
| 286 | <P> |
| 287 | If <var>Py_UNICODE_WIDE</var> is defined, a single <tt class="ctype">Py_UNICODE</tt> |
| 288 | value may get represented as a surrogate pair. If it is not |
| 289 | defined, each <tt class="ctype">Py_UNICODE</tt> values is interpreted as an |
| 290 | UCS-2 character. |
| 291 | |
| 292 | <P> |
| 293 | Return <tt class="constant">NULL</tt> if an exception was raised by the codec. |
| 294 | </dd></dl> |
| 295 | |
| 296 | <P> |
| 297 | <dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"><td><nobr>PyObject* <b><tt id='l2h-516' xml:id='l2h-516' class="cfunction">PyUnicode_AsUTF16String</tt></b>(</nobr></td><td>PyObject *<var>unicode</var>)</td></tr></table></dt> |
| 298 | <dd> |
| 299 | <div class="refcount-info"> |
| 300 | <span class="label">Return value:</span> |
| 301 | <span class="value">New reference.</span> |
| 302 | </div> |
| 303 | Return a Python string using the UTF-16 encoding in native byte |
| 304 | order. The string always starts with a BOM mark. Error handling is |
| 305 | ``strict''. Return <tt class="constant">NULL</tt> if an exception was raised by the |
| 306 | codec. |
| 307 | </dd></dl> |
| 308 | |
| 309 | <P> |
| 310 | These are the ``Unicode Escape'' codec APIs: |
| 311 | |
| 312 | <P> |
| 313 | <dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"><td><nobr>PyObject* <b><tt id='l2h-517' xml:id='l2h-517' class="cfunction">PyUnicode_DecodeUnicodeEscape</tt></b>(</nobr></td><td>const char *<var>s</var>, |
| 314 | int <var>size</var>, |
| 315 | const char *<var>errors</var>)</td></tr></table></dt> |
| 316 | <dd> |
| 317 | <div class="refcount-info"> |
| 318 | <span class="label">Return value:</span> |
| 319 | <span class="value">New reference.</span> |
| 320 | </div> |
| 321 | Create a Unicode object by decoding <var>size</var> bytes of the |
| 322 | Unicode-Escape encoded string <var>s</var>. Return <tt class="constant">NULL</tt> if an |
| 323 | exception was raised by the codec. |
| 324 | </dd></dl> |
| 325 | |
| 326 | <P> |
| 327 | <dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"><td><nobr>PyObject* <b><tt id='l2h-518' xml:id='l2h-518' class="cfunction">PyUnicode_EncodeUnicodeEscape</tt></b>(</nobr></td><td>const Py_UNICODE *<var>s</var>, |
| 328 | int <var>size</var>, |
| 329 | const char *<var>errors</var>)</td></tr></table></dt> |
| 330 | <dd> |
| 331 | <div class="refcount-info"> |
| 332 | <span class="label">Return value:</span> |
| 333 | <span class="value">New reference.</span> |
| 334 | </div> |
| 335 | Encode the <tt class="ctype">Py_UNICODE</tt> buffer of the given size using |
| 336 | Unicode-Escape and return a Python string object. Return <tt class="constant">NULL</tt> |
| 337 | if an exception was raised by the codec. |
| 338 | </dd></dl> |
| 339 | |
| 340 | <P> |
| 341 | <dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"><td><nobr>PyObject* <b><tt id='l2h-519' xml:id='l2h-519' class="cfunction">PyUnicode_AsUnicodeEscapeString</tt></b>(</nobr></td><td>PyObject *<var>unicode</var>)</td></tr></table></dt> |
| 342 | <dd> |
| 343 | <div class="refcount-info"> |
| 344 | <span class="label">Return value:</span> |
| 345 | <span class="value">New reference.</span> |
| 346 | </div> |
| 347 | Encode a Unicode objects using Unicode-Escape and return the |
| 348 | result as Python string object. Error handling is ``strict''. |
| 349 | Return <tt class="constant">NULL</tt> if an exception was raised by the codec. |
| 350 | </dd></dl> |
| 351 | |
| 352 | <P> |
| 353 | These are the ``Raw Unicode Escape'' codec APIs: |
| 354 | |
| 355 | <P> |
| 356 | <dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"><td><nobr>PyObject* <b><tt id='l2h-520' xml:id='l2h-520' class="cfunction">PyUnicode_DecodeRawUnicodeEscape</tt></b>(</nobr></td><td>const char *<var>s</var>, |
| 357 | int <var>size</var>, |
| 358 | const char *<var>errors</var>)</td></tr></table></dt> |
| 359 | <dd> |
| 360 | <div class="refcount-info"> |
| 361 | <span class="label">Return value:</span> |
| 362 | <span class="value">New reference.</span> |
| 363 | </div> |
| 364 | Create a Unicode object by decoding <var>size</var> bytes of the |
| 365 | Raw-Unicode-Escape encoded string <var>s</var>. Return <tt class="constant">NULL</tt> if an |
| 366 | exception was raised by the codec. |
| 367 | </dd></dl> |
| 368 | |
| 369 | <P> |
| 370 | <dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"><td><nobr>PyObject* <b><tt id='l2h-521' xml:id='l2h-521' class="cfunction">PyUnicode_EncodeRawUnicodeEscape</tt></b>(</nobr></td><td>const Py_UNICODE *<var>s</var>, |
| 371 | int <var>size</var>, |
| 372 | const char *<var>errors</var>)</td></tr></table></dt> |
| 373 | <dd> |
| 374 | <div class="refcount-info"> |
| 375 | <span class="label">Return value:</span> |
| 376 | <span class="value">New reference.</span> |
| 377 | </div> |
| 378 | Encode the <tt class="ctype">Py_UNICODE</tt> buffer of the given size using |
| 379 | Raw-Unicode-Escape and return a Python string object. Return |
| 380 | <tt class="constant">NULL</tt> if an exception was raised by the codec. |
| 381 | </dd></dl> |
| 382 | |
| 383 | <P> |
| 384 | <dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"><td><nobr>PyObject* <b><tt id='l2h-522' xml:id='l2h-522' class="cfunction">PyUnicode_AsRawUnicodeEscapeString</tt></b>(</nobr></td><td>PyObject *<var>unicode</var>)</td></tr></table></dt> |
| 385 | <dd> |
| 386 | <div class="refcount-info"> |
| 387 | <span class="label">Return value:</span> |
| 388 | <span class="value">New reference.</span> |
| 389 | </div> |
| 390 | Encode a Unicode objects using Raw-Unicode-Escape and return the |
| 391 | result as Python string object. Error handling is ``strict''. |
| 392 | Return <tt class="constant">NULL</tt> if an exception was raised by the codec. |
| 393 | </dd></dl> |
| 394 | |
| 395 | <P> |
| 396 | These are the Latin-1 codec APIs: |
| 397 | Latin-1 corresponds to the first 256 Unicode ordinals and only these |
| 398 | are accepted by the codecs during encoding. |
| 399 | |
| 400 | <P> |
| 401 | <dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"><td><nobr>PyObject* <b><tt id='l2h-523' xml:id='l2h-523' class="cfunction">PyUnicode_DecodeLatin1</tt></b>(</nobr></td><td>const char *<var>s</var>, |
| 402 | int <var>size</var>, |
| 403 | const char *<var>errors</var>)</td></tr></table></dt> |
| 404 | <dd> |
| 405 | <div class="refcount-info"> |
| 406 | <span class="label">Return value:</span> |
| 407 | <span class="value">New reference.</span> |
| 408 | </div> |
| 409 | Create a Unicode object by decoding <var>size</var> bytes of the Latin-1 |
| 410 | encoded string <var>s</var>. Return <tt class="constant">NULL</tt> if an exception was raised |
| 411 | by the codec. |
| 412 | </dd></dl> |
| 413 | |
| 414 | <P> |
| 415 | <dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"><td><nobr>PyObject* <b><tt id='l2h-524' xml:id='l2h-524' class="cfunction">PyUnicode_EncodeLatin1</tt></b>(</nobr></td><td>const Py_UNICODE *<var>s</var>, |
| 416 | int <var>size</var>, |
| 417 | const char *<var>errors</var>)</td></tr></table></dt> |
| 418 | <dd> |
| 419 | <div class="refcount-info"> |
| 420 | <span class="label">Return value:</span> |
| 421 | <span class="value">New reference.</span> |
| 422 | </div> |
| 423 | Encode the <tt class="ctype">Py_UNICODE</tt> buffer of the given size using |
| 424 | Latin-1 and return a Python string object. Return <tt class="constant">NULL</tt> if an |
| 425 | exception was raised by the codec. |
| 426 | </dd></dl> |
| 427 | |
| 428 | <P> |
| 429 | <dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"><td><nobr>PyObject* <b><tt id='l2h-525' xml:id='l2h-525' class="cfunction">PyUnicode_AsLatin1String</tt></b>(</nobr></td><td>PyObject *<var>unicode</var>)</td></tr></table></dt> |
| 430 | <dd> |
| 431 | <div class="refcount-info"> |
| 432 | <span class="label">Return value:</span> |
| 433 | <span class="value">New reference.</span> |
| 434 | </div> |
| 435 | Encode a Unicode objects using Latin-1 and return the result as |
| 436 | Python string object. Error handling is ``strict''. Return |
| 437 | <tt class="constant">NULL</tt> if an exception was raised by the codec. |
| 438 | </dd></dl> |
| 439 | |
| 440 | <P> |
| 441 | These are the ASCII codec APIs. Only 7-bit ASCII data is |
| 442 | accepted. All other codes generate errors. |
| 443 | |
| 444 | <P> |
| 445 | <dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"><td><nobr>PyObject* <b><tt id='l2h-526' xml:id='l2h-526' class="cfunction">PyUnicode_DecodeASCII</tt></b>(</nobr></td><td>const char *<var>s</var>, |
| 446 | int <var>size</var>, |
| 447 | const char *<var>errors</var>)</td></tr></table></dt> |
| 448 | <dd> |
| 449 | <div class="refcount-info"> |
| 450 | <span class="label">Return value:</span> |
| 451 | <span class="value">New reference.</span> |
| 452 | </div> |
| 453 | Create a Unicode object by decoding <var>size</var> bytes of the |
| 454 | ASCII encoded string <var>s</var>. Return <tt class="constant">NULL</tt> if an exception |
| 455 | was raised by the codec. |
| 456 | </dd></dl> |
| 457 | |
| 458 | <P> |
| 459 | <dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"><td><nobr>PyObject* <b><tt id='l2h-527' xml:id='l2h-527' class="cfunction">PyUnicode_EncodeASCII</tt></b>(</nobr></td><td>const Py_UNICODE *<var>s</var>, |
| 460 | int <var>size</var>, |
| 461 | const char *<var>errors</var>)</td></tr></table></dt> |
| 462 | <dd> |
| 463 | <div class="refcount-info"> |
| 464 | <span class="label">Return value:</span> |
| 465 | <span class="value">New reference.</span> |
| 466 | </div> |
| 467 | Encode the <tt class="ctype">Py_UNICODE</tt> buffer of the given size using |
| 468 | ASCII and return a Python string object. Return <tt class="constant">NULL</tt> if an |
| 469 | exception was raised by the codec. |
| 470 | </dd></dl> |
| 471 | |
| 472 | <P> |
| 473 | <dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"><td><nobr>PyObject* <b><tt id='l2h-528' xml:id='l2h-528' class="cfunction">PyUnicode_AsASCIIString</tt></b>(</nobr></td><td>PyObject *<var>unicode</var>)</td></tr></table></dt> |
| 474 | <dd> |
| 475 | <div class="refcount-info"> |
| 476 | <span class="label">Return value:</span> |
| 477 | <span class="value">New reference.</span> |
| 478 | </div> |
| 479 | Encode a Unicode objects using ASCII and return the result as |
| 480 | Python string object. Error handling is ``strict''. Return |
| 481 | <tt class="constant">NULL</tt> if an exception was raised by the codec. |
| 482 | </dd></dl> |
| 483 | |
| 484 | <P> |
| 485 | These are the mapping codec APIs: |
| 486 | |
| 487 | <P> |
| 488 | This codec is special in that it can be used to implement many |
| 489 | different codecs (and this is in fact what was done to obtain most of |
| 490 | the standard codecs included in the <tt class="module">encodings</tt> package). The |
| 491 | codec uses mapping to encode and decode characters. |
| 492 | |
| 493 | <P> |
| 494 | Decoding mappings must map single string characters to single Unicode |
| 495 | characters, integers (which are then interpreted as Unicode ordinals) |
| 496 | or None (meaning "undefined mapping" and causing an error). |
| 497 | |
| 498 | <P> |
| 499 | Encoding mappings must map single Unicode characters to single string |
| 500 | characters, integers (which are then interpreted as Latin-1 ordinals) |
| 501 | or None (meaning "undefined mapping" and causing an error). |
| 502 | |
| 503 | <P> |
| 504 | The mapping objects provided must only support the __getitem__ mapping |
| 505 | interface. |
| 506 | |
| 507 | <P> |
| 508 | If a character lookup fails with a LookupError, the character is |
| 509 | copied as-is meaning that its ordinal value will be interpreted as |
| 510 | Unicode or Latin-1 ordinal resp. Because of this, mappings only need |
| 511 | to contain those mappings which map characters to different code |
| 512 | points. |
| 513 | |
| 514 | <P> |
| 515 | <dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"><td><nobr>PyObject* <b><tt id='l2h-529' xml:id='l2h-529' class="cfunction">PyUnicode_DecodeCharmap</tt></b>(</nobr></td><td>const char *<var>s</var>, |
| 516 | int <var>size</var>, |
| 517 | PyObject *<var>mapping</var>, |
| 518 | const char *<var>errors</var>)</td></tr></table></dt> |
| 519 | <dd> |
| 520 | <div class="refcount-info"> |
| 521 | <span class="label">Return value:</span> |
| 522 | <span class="value">New reference.</span> |
| 523 | </div> |
| 524 | Create a Unicode object by decoding <var>size</var> bytes of the encoded |
| 525 | string <var>s</var> using the given <var>mapping</var> object. Return |
| 526 | <tt class="constant">NULL</tt> if an exception was raised by the codec. |
| 527 | </dd></dl> |
| 528 | |
| 529 | <P> |
| 530 | <dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"><td><nobr>PyObject* <b><tt id='l2h-530' xml:id='l2h-530' class="cfunction">PyUnicode_EncodeCharmap</tt></b>(</nobr></td><td>const Py_UNICODE *<var>s</var>, |
| 531 | int <var>size</var>, |
| 532 | PyObject *<var>mapping</var>, |
| 533 | const char *<var>errors</var>)</td></tr></table></dt> |
| 534 | <dd> |
| 535 | <div class="refcount-info"> |
| 536 | <span class="label">Return value:</span> |
| 537 | <span class="value">New reference.</span> |
| 538 | </div> |
| 539 | Encode the <tt class="ctype">Py_UNICODE</tt> buffer of the given size using the |
| 540 | given <var>mapping</var> object and return a Python string object. |
| 541 | Return <tt class="constant">NULL</tt> if an exception was raised by the codec. |
| 542 | </dd></dl> |
| 543 | |
| 544 | <P> |
| 545 | <dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"><td><nobr>PyObject* <b><tt id='l2h-531' xml:id='l2h-531' class="cfunction">PyUnicode_AsCharmapString</tt></b>(</nobr></td><td>PyObject *<var>unicode</var>, |
| 546 | PyObject *<var>mapping</var>)</td></tr></table></dt> |
| 547 | <dd> |
| 548 | <div class="refcount-info"> |
| 549 | <span class="label">Return value:</span> |
| 550 | <span class="value">New reference.</span> |
| 551 | </div> |
| 552 | Encode a Unicode objects using the given <var>mapping</var> object and |
| 553 | return the result as Python string object. Error handling is |
| 554 | ``strict''. Return <tt class="constant">NULL</tt> if an exception was raised by the |
| 555 | codec. |
| 556 | </dd></dl> |
| 557 | |
| 558 | <P> |
| 559 | The following codec API is special in that maps Unicode to Unicode. |
| 560 | |
| 561 | <P> |
| 562 | <dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"><td><nobr>PyObject* <b><tt id='l2h-532' xml:id='l2h-532' class="cfunction">PyUnicode_TranslateCharmap</tt></b>(</nobr></td><td>const Py_UNICODE *<var>s</var>, |
| 563 | int <var>size</var>, |
| 564 | PyObject *<var>table</var>, |
| 565 | const char *<var>errors</var>)</td></tr></table></dt> |
| 566 | <dd> |
| 567 | <div class="refcount-info"> |
| 568 | <span class="label">Return value:</span> |
| 569 | <span class="value">New reference.</span> |
| 570 | </div> |
| 571 | Translate a <tt class="ctype">Py_UNICODE</tt> buffer of the given length by |
| 572 | applying a character mapping <var>table</var> to it and return the |
| 573 | resulting Unicode object. Return <tt class="constant">NULL</tt> when an exception was |
| 574 | raised by the codec. |
| 575 | |
| 576 | <P> |
| 577 | The <var>mapping</var> table must map Unicode ordinal integers to Unicode |
| 578 | ordinal integers or None (causing deletion of the character). |
| 579 | |
| 580 | <P> |
| 581 | Mapping tables need only provide the method__getitem__() |
| 582 | interface; dictionaries and sequences work well. Unmapped character |
| 583 | ordinals (ones which cause a <tt class="exception">LookupError</tt>) are left |
| 584 | untouched and are copied as-is. |
| 585 | </dd></dl> |
| 586 | |
| 587 | <P> |
| 588 | These are the MBCS codec APIs. They are currently only available on |
| 589 | Windows and use the Win32 MBCS converters to implement the |
| 590 | conversions. Note that MBCS (or DBCS) is a class of encodings, not |
| 591 | just one. The target encoding is defined by the user settings on the |
| 592 | machine running the codec. |
| 593 | |
| 594 | <P> |
| 595 | <dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"><td><nobr>PyObject* <b><tt id='l2h-533' xml:id='l2h-533' class="cfunction">PyUnicode_DecodeMBCS</tt></b>(</nobr></td><td>const char *<var>s</var>, |
| 596 | int <var>size</var>, |
| 597 | const char *<var>errors</var>)</td></tr></table></dt> |
| 598 | <dd> |
| 599 | <div class="refcount-info"> |
| 600 | <span class="label">Return value:</span> |
| 601 | <span class="value">New reference.</span> |
| 602 | </div> |
| 603 | Create a Unicode object by decoding <var>size</var> bytes of the MBCS |
| 604 | encoded string <var>s</var>. Return <tt class="constant">NULL</tt> if an exception was |
| 605 | raised by the codec. |
| 606 | </dd></dl> |
| 607 | |
| 608 | <P> |
| 609 | <dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"><td><nobr>PyObject* <b><tt id='l2h-534' xml:id='l2h-534' class="cfunction">PyUnicode_EncodeMBCS</tt></b>(</nobr></td><td>const Py_UNICODE *<var>s</var>, |
| 610 | int <var>size</var>, |
| 611 | const char *<var>errors</var>)</td></tr></table></dt> |
| 612 | <dd> |
| 613 | <div class="refcount-info"> |
| 614 | <span class="label">Return value:</span> |
| 615 | <span class="value">New reference.</span> |
| 616 | </div> |
| 617 | Encode the <tt class="ctype">Py_UNICODE</tt> buffer of the given size using MBCS |
| 618 | and return a Python string object. Return <tt class="constant">NULL</tt> if an exception |
| 619 | was raised by the codec. |
| 620 | </dd></dl> |
| 621 | |
| 622 | <P> |
| 623 | <dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"><td><nobr>PyObject* <b><tt id='l2h-535' xml:id='l2h-535' class="cfunction">PyUnicode_AsMBCSString</tt></b>(</nobr></td><td>PyObject *<var>unicode</var>)</td></tr></table></dt> |
| 624 | <dd> |
| 625 | <div class="refcount-info"> |
| 626 | <span class="label">Return value:</span> |
| 627 | <span class="value">New reference.</span> |
| 628 | </div> |
| 629 | Encode a Unicode objects using MBCS and return the result as |
| 630 | Python string object. Error handling is ``strict''. Return |
| 631 | <tt class="constant">NULL</tt> if an exception was raised by the codec. |
| 632 | </dd></dl> |
| 633 | |
| 634 | <P> |
| 635 | |
| 636 | <DIV CLASS="navigation"> |
| 637 | <div class='online-navigation'> |
| 638 | <p></p><hr /> |
| 639 | <table align="center" width="100%" cellpadding="0" cellspacing="2"> |
| 640 | <tr> |
| 641 | <td class='online-navigation'><a rel="prev" title="7.3.2 Unicode Objects" |
| 642 | href="unicodeObjects.html"><img src='../icons/previous.png' |
| 643 | border='0' height='32' alt='Previous Page' width='32' /></A></td> |
| 644 | <td class='online-navigation'><a rel="parent" title="7.3.2 Unicode Objects" |
| 645 | href="unicodeObjects.html"><img src='../icons/up.png' |
| 646 | border='0' height='32' alt='Up One Level' width='32' /></A></td> |
| 647 | <td class='online-navigation'><a rel="next" title="7.3.2.2 Methods and Slot" |
| 648 | href="unicodeMethodsAndSlots.html"><img src='../icons/next.png' |
| 649 | border='0' height='32' alt='Next Page' width='32' /></A></td> |
| 650 | <td align="center" width="100%">Python/C API Reference Manual</td> |
| 651 | <td class='online-navigation'><a rel="contents" title="Table of Contents" |
| 652 | href="contents.html"><img src='../icons/contents.png' |
| 653 | border='0' height='32' alt='Contents' width='32' /></A></td> |
| 654 | <td class='online-navigation'><img src='../icons/blank.png' |
| 655 | border='0' height='32' alt='' width='32' /></td> |
| 656 | <td class='online-navigation'><a rel="index" title="Index" |
| 657 | href="genindex.html"><img src='../icons/index.png' |
| 658 | border='0' height='32' alt='Index' width='32' /></A></td> |
| 659 | </tr></table> |
| 660 | <div class='online-navigation'> |
| 661 | <b class="navlabel">Previous:</b> |
| 662 | <a class="sectref" rel="prev" href="unicodeObjects.html">7.3.2 Unicode Objects</A> |
| 663 | <b class="navlabel">Up:</b> |
| 664 | <a class="sectref" rel="parent" href="unicodeObjects.html">7.3.2 Unicode Objects</A> |
| 665 | <b class="navlabel">Next:</b> |
| 666 | <a class="sectref" rel="next" href="unicodeMethodsAndSlots.html">7.3.2.2 Methods and Slot</A> |
| 667 | </div> |
| 668 | </div> |
| 669 | <hr /> |
| 670 | <span class="release-info">Release 2.4.2, documentation updated on 28 September 2005.</span> |
| 671 | </DIV> |
| 672 | <!--End of Navigation Panel--> |
| 673 | <ADDRESS> |
| 674 | See <i><a href="about.html">About this document...</a></i> for information on suggesting changes. |
| 675 | </ADDRESS> |
| 676 | </BODY> |
| 677 | </HTML> |