Initial commit of OpenSPARC T2 architecture model.
[OpenSPARC-T2-SAM] / sam-t2 / devtools / v8plus / html / python / api / unicodeObjects.html
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<html>
<head>
<link rel="STYLESHEET" href="api.css" type='text/css' />
<link rel="SHORTCUT ICON" href="../icons/pyfav.png" type="image/png" />
<link rel='start' href='../index.html' title='Python Documentation Index' />
<link rel="first" href="api.html" title='Python/C API Reference Manual' />
<link rel='contents' href='contents.html' title="Contents" />
<link rel='index' href='genindex.html' title='Index' />
<link rel='last' href='about.html' title='About this document...' />
<link rel='help' href='about.html' title='About this document...' />
<link rel="next" href="bufferObjects.html" />
<link rel="prev" href="stringObjects.html" />
<link rel="parent" href="sequenceObjects.html" />
<link rel="next" href="builtinCodecs.html" />
<meta name='aesop' content='information' />
<title>7.3.2 Unicode Objects </title>
</head>
<body>
<DIV CLASS="navigation">
<div id='top-navigation-panel' xml:id='top-navigation-panel'>
<table align="center" width="100%" cellpadding="0" cellspacing="2">
<tr>
<td class='online-navigation'><a rel="prev" title="7.3.1 String Objects"
href="stringObjects.html"><img src='../icons/previous.png'
border='0' height='32' alt='Previous Page' width='32' /></A></td>
<td class='online-navigation'><a rel="parent" title="7.3 Sequence Objects"
href="sequenceObjects.html"><img src='../icons/up.png'
border='0' height='32' alt='Up One Level' width='32' /></A></td>
<td class='online-navigation'><a rel="next" title="7.3.2.1 Built-in Codecs"
href="builtinCodecs.html"><img src='../icons/next.png'
border='0' height='32' alt='Next Page' width='32' /></A></td>
<td align="center" width="100%">Python/C API Reference Manual</td>
<td class='online-navigation'><a rel="contents" title="Table of Contents"
href="contents.html"><img src='../icons/contents.png'
border='0' height='32' alt='Contents' width='32' /></A></td>
<td class='online-navigation'><img src='../icons/blank.png'
border='0' height='32' alt='' width='32' /></td>
<td class='online-navigation'><a rel="index" title="Index"
href="genindex.html"><img src='../icons/index.png'
border='0' height='32' alt='Index' width='32' /></A></td>
</tr></table>
<div class='online-navigation'>
<b class="navlabel">Previous:</b>
<a class="sectref" rel="prev" href="stringObjects.html">7.3.1 String Objects</A>
<b class="navlabel">Up:</b>
<a class="sectref" rel="parent" href="sequenceObjects.html">7.3 Sequence Objects</A>
<b class="navlabel">Next:</b>
<a class="sectref" rel="next" href="builtinCodecs.html">7.3.2.1 Built-in Codecs</A>
</div>
<hr /></div>
</DIV>
<!--End of Navigation Panel-->
<H2><A NAME="SECTION009320000000000000000"></A><A NAME="unicodeObjects"></A>
<BR>
7.3.2 Unicode Objects
</H2>
<P>
These are the basic Unicode object types used for the Unicode
implementation in Python:
<P>
<dl><dt><b><tt class="ctype"><a id='l2h-474' xml:id='l2h-474'>Py_UNICODE</a></tt></b></dt>
<dd>
This type represents a 16-bit unsigned storage type which is used by
Python internally as basis for holding Unicode ordinals. On
platforms where <tt class="ctype">wchar_t</tt> is available and also has 16-bits,
<tt class="ctype">Py_UNICODE</tt> is a typedef alias for <tt class="ctype">wchar_t</tt> to enhance
native platform compatibility. On all other platforms,
<tt class="ctype">Py_UNICODE</tt> is a typedef alias for <tt class="ctype">unsigned short</tt>.
</dl>
<P>
<dl><dt><b><tt class="ctype"><a id='l2h-475' xml:id='l2h-475'>PyUnicodeObject</a></tt></b></dt>
<dd>
This subtype of <tt class="ctype">PyObject</tt> represents a Python Unicode object.
</dl>
<P>
<dl><dt>PyTypeObject <b><tt id='l2h-476' xml:id='l2h-476' class="cdata">PyUnicode_Type</tt></b></dt>
<dd>
This instance of <tt class="ctype">PyTypeObject</tt> represents the Python Unicode
type.
</dd></dl>
<P>
The following APIs are really C macros and can be used to do fast
checks and to access internal read-only data of Unicode objects:
<P>
<dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"><td><nobr>int&nbsp;<b><tt id='l2h-477' xml:id='l2h-477' class="cfunction">PyUnicode_Check</tt></b>(</nobr></td><td>PyObject *<var>o</var>)</td></tr></table></dt>
<dd>
Return true if the object <var>o</var> is a Unicode object or an
instance of a Unicode subtype.
<span class="versionnote">Changed in version 2.2:
Allowed subtypes to be accepted.</span>
</dd></dl>
<P>
<dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"><td><nobr>int&nbsp;<b><tt id='l2h-478' xml:id='l2h-478' class="cfunction">PyUnicode_CheckExact</tt></b>(</nobr></td><td>PyObject *<var>o</var>)</td></tr></table></dt>
<dd>
Return true if the object <var>o</var> is a Unicode object, but not an
instance of a subtype.
<span class="versionnote">New in version 2.2.</span>
</dd></dl>
<P>
<dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"><td><nobr>int&nbsp;<b><tt id='l2h-479' xml:id='l2h-479' class="cfunction">PyUnicode_GET_SIZE</tt></b>(</nobr></td><td>PyObject *<var>o</var>)</td></tr></table></dt>
<dd>
Return the size of the object. <var>o</var> has to be a
<tt class="ctype">PyUnicodeObject</tt> (not checked).
</dd></dl>
<P>
<dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"><td><nobr>int&nbsp;<b><tt id='l2h-480' xml:id='l2h-480' class="cfunction">PyUnicode_GET_DATA_SIZE</tt></b>(</nobr></td><td>PyObject *<var>o</var>)</td></tr></table></dt>
<dd>
Return the size of the object's internal buffer in bytes. <var>o</var>
has to be a <tt class="ctype">PyUnicodeObject</tt> (not checked).
</dd></dl>
<P>
<dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"><td><nobr>Py_UNICODE*&nbsp;<b><tt id='l2h-481' xml:id='l2h-481' class="cfunction">PyUnicode_AS_UNICODE</tt></b>(</nobr></td><td>PyObject *<var>o</var>)</td></tr></table></dt>
<dd>
Return a pointer to the internal <tt class="ctype">Py_UNICODE</tt> buffer of the
object. <var>o</var> has to be a <tt class="ctype">PyUnicodeObject</tt> (not checked).
</dd></dl>
<P>
<dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"><td><nobr>const char*&nbsp;<b><tt id='l2h-482' xml:id='l2h-482' class="cfunction">PyUnicode_AS_DATA</tt></b>(</nobr></td><td>PyObject *<var>o</var>)</td></tr></table></dt>
<dd>
Return a pointer to the internal buffer of the object.
<var>o</var> has to be a <tt class="ctype">PyUnicodeObject</tt> (not checked).
</dd></dl>
<P>
Unicode provides many different character properties. The most often
needed ones are available through these macros which are mapped to C
functions depending on the Python configuration.
<P>
<dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"><td><nobr>int&nbsp;<b><tt id='l2h-483' xml:id='l2h-483' class="cfunction">Py_UNICODE_ISSPACE</tt></b>(</nobr></td><td>Py_UNICODE <var>ch</var>)</td></tr></table></dt>
<dd>
Return 1 or 0 depending on whether <var>ch</var> is a whitespace
character.
</dd></dl>
<P>
<dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"><td><nobr>int&nbsp;<b><tt id='l2h-484' xml:id='l2h-484' class="cfunction">Py_UNICODE_ISLOWER</tt></b>(</nobr></td><td>Py_UNICODE <var>ch</var>)</td></tr></table></dt>
<dd>
Return 1 or 0 depending on whether <var>ch</var> is a lowercase character.
</dd></dl>
<P>
<dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"><td><nobr>int&nbsp;<b><tt id='l2h-485' xml:id='l2h-485' class="cfunction">Py_UNICODE_ISUPPER</tt></b>(</nobr></td><td>Py_UNICODE <var>ch</var>)</td></tr></table></dt>
<dd>
Return 1 or 0 depending on whether <var>ch</var> is an uppercase
character.
</dd></dl>
<P>
<dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"><td><nobr>int&nbsp;<b><tt id='l2h-486' xml:id='l2h-486' class="cfunction">Py_UNICODE_ISTITLE</tt></b>(</nobr></td><td>Py_UNICODE <var>ch</var>)</td></tr></table></dt>
<dd>
Return 1 or 0 depending on whether <var>ch</var> is a titlecase character.
</dd></dl>
<P>
<dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"><td><nobr>int&nbsp;<b><tt id='l2h-487' xml:id='l2h-487' class="cfunction">Py_UNICODE_ISLINEBREAK</tt></b>(</nobr></td><td>Py_UNICODE <var>ch</var>)</td></tr></table></dt>
<dd>
Return 1 or 0 depending on whether <var>ch</var> is a linebreak character.
</dd></dl>
<P>
<dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"><td><nobr>int&nbsp;<b><tt id='l2h-488' xml:id='l2h-488' class="cfunction">Py_UNICODE_ISDECIMAL</tt></b>(</nobr></td><td>Py_UNICODE <var>ch</var>)</td></tr></table></dt>
<dd>
Return 1 or 0 depending on whether <var>ch</var> is a decimal character.
</dd></dl>
<P>
<dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"><td><nobr>int&nbsp;<b><tt id='l2h-489' xml:id='l2h-489' class="cfunction">Py_UNICODE_ISDIGIT</tt></b>(</nobr></td><td>Py_UNICODE <var>ch</var>)</td></tr></table></dt>
<dd>
Return 1 or 0 depending on whether <var>ch</var> is a digit character.
</dd></dl>
<P>
<dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"><td><nobr>int&nbsp;<b><tt id='l2h-490' xml:id='l2h-490' class="cfunction">Py_UNICODE_ISNUMERIC</tt></b>(</nobr></td><td>Py_UNICODE <var>ch</var>)</td></tr></table></dt>
<dd>
Return 1 or 0 depending on whether <var>ch</var> is a numeric character.
</dd></dl>
<P>
<dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"><td><nobr>int&nbsp;<b><tt id='l2h-491' xml:id='l2h-491' class="cfunction">Py_UNICODE_ISALPHA</tt></b>(</nobr></td><td>Py_UNICODE <var>ch</var>)</td></tr></table></dt>
<dd>
Return 1 or 0 depending on whether <var>ch</var> is an alphabetic
character.
</dd></dl>
<P>
<dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"><td><nobr>int&nbsp;<b><tt id='l2h-492' xml:id='l2h-492' class="cfunction">Py_UNICODE_ISALNUM</tt></b>(</nobr></td><td>Py_UNICODE <var>ch</var>)</td></tr></table></dt>
<dd>
Return 1 or 0 depending on whether <var>ch</var> is an alphanumeric
character.
</dd></dl>
<P>
These APIs can be used for fast direct character conversions:
<P>
<dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"><td><nobr>Py_UNICODE&nbsp;<b><tt id='l2h-493' xml:id='l2h-493' class="cfunction">Py_UNICODE_TOLOWER</tt></b>(</nobr></td><td>Py_UNICODE <var>ch</var>)</td></tr></table></dt>
<dd>
Return the character <var>ch</var> converted to lower case.
</dd></dl>
<P>
<dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"><td><nobr>Py_UNICODE&nbsp;<b><tt id='l2h-494' xml:id='l2h-494' class="cfunction">Py_UNICODE_TOUPPER</tt></b>(</nobr></td><td>Py_UNICODE <var>ch</var>)</td></tr></table></dt>
<dd>
Return the character <var>ch</var> converted to upper case.
</dd></dl>
<P>
<dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"><td><nobr>Py_UNICODE&nbsp;<b><tt id='l2h-495' xml:id='l2h-495' class="cfunction">Py_UNICODE_TOTITLE</tt></b>(</nobr></td><td>Py_UNICODE <var>ch</var>)</td></tr></table></dt>
<dd>
Return the character <var>ch</var> converted to title case.
</dd></dl>
<P>
<dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"><td><nobr>int&nbsp;<b><tt id='l2h-496' xml:id='l2h-496' class="cfunction">Py_UNICODE_TODECIMAL</tt></b>(</nobr></td><td>Py_UNICODE <var>ch</var>)</td></tr></table></dt>
<dd>
Return the character <var>ch</var> converted to a decimal positive
integer. Return <code>-1</code> if this is not possible. This macro
does not raise exceptions.
</dd></dl>
<P>
<dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"><td><nobr>int&nbsp;<b><tt id='l2h-497' xml:id='l2h-497' class="cfunction">Py_UNICODE_TODIGIT</tt></b>(</nobr></td><td>Py_UNICODE <var>ch</var>)</td></tr></table></dt>
<dd>
Return the character <var>ch</var> converted to a single digit integer.
Return <code>-1</code> if this is not possible. This macro does not raise
exceptions.
</dd></dl>
<P>
<dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"><td><nobr>double&nbsp;<b><tt id='l2h-498' xml:id='l2h-498' class="cfunction">Py_UNICODE_TONUMERIC</tt></b>(</nobr></td><td>Py_UNICODE <var>ch</var>)</td></tr></table></dt>
<dd>
Return the character <var>ch</var> converted to a (positive) double.
Return <code>-1.0</code> if this is not possible. This macro does not raise
exceptions.
</dd></dl>
<P>
To create Unicode objects and access their basic sequence properties,
use these APIs:
<P>
<dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"><td><nobr>PyObject*&nbsp;<b><tt id='l2h-499' xml:id='l2h-499' class="cfunction">PyUnicode_FromUnicode</tt></b>(</nobr></td><td>const Py_UNICODE *<var>u</var>,
int <var>size</var>)</td></tr></table></dt>
<dd>
<div class="refcount-info">
<span class="label">Return value:</span>
<span class="value">New reference.</span>
</div>
Create a Unicode Object from the Py_UNICODE buffer <var>u</var> of the
given size. <var>u</var> may be <tt class="constant">NULL</tt> which causes the contents to be
undefined. It is the user's responsibility to fill in the needed
data. The buffer is copied into the new object. If the buffer is
not <tt class="constant">NULL</tt>, the return value might be a shared object. Therefore,
modification of the resulting Unicode object is only allowed when
<var>u</var> is <tt class="constant">NULL</tt>.
</dd></dl>
<P>
<dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"><td><nobr>Py_UNICODE*&nbsp;<b><tt id='l2h-500' xml:id='l2h-500' class="cfunction">PyUnicode_AsUnicode</tt></b>(</nobr></td><td>PyObject *<var>unicode</var>)</td></tr></table></dt>
<dd>
Return a read-only pointer to the Unicode object's internal
<tt class="ctype">Py_UNICODE</tt> buffer, <tt class="constant">NULL</tt> if <var>unicode</var> is not a Unicode
object.
</dd></dl>
<P>
<dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"><td><nobr>int&nbsp;<b><tt id='l2h-501' xml:id='l2h-501' class="cfunction">PyUnicode_GetSize</tt></b>(</nobr></td><td>PyObject *<var>unicode</var>)</td></tr></table></dt>
<dd>
Return the length of the Unicode object.
</dd></dl>
<P>
<dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"><td><nobr>PyObject*&nbsp;<b><tt id='l2h-502' xml:id='l2h-502' class="cfunction">PyUnicode_FromEncodedObject</tt></b>(</nobr></td><td>PyObject *<var>obj</var>,
const char *<var>encoding</var>,
const char *<var>errors</var>)</td></tr></table></dt>
<dd>
<div class="refcount-info">
<span class="label">Return value:</span>
<span class="value">New reference.</span>
</div>
Coerce an encoded object <var>obj</var> to an Unicode object and return a
reference with incremented refcount.
<P>
Coercion is done in the following way:
<P>
<OL>
<LI>Unicode objects are passed back as-is with incremented
refcount. <span class="note"><b class="label">Note:</b>
These cannot be decoded; passing a non-<tt class="constant">NULL</tt>
value for encoding will result in a <tt class="exception">TypeError</tt>.</span>
<P>
</LI>
<LI>String and other char buffer compatible objects are decoded
according to the given encoding and using the error handling
defined by errors. Both can be <tt class="constant">NULL</tt> to have the interface
use the default values (see the next section for details).
<P>
</LI>
<LI>All other objects cause an exception.
</LI>
</OL>
<P>
The API returns <tt class="constant">NULL</tt> if there was an error. The caller is
responsible for decref'ing the returned objects.
</dd></dl>
<P>
<dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"><td><nobr>PyObject*&nbsp;<b><tt id='l2h-503' xml:id='l2h-503' class="cfunction">PyUnicode_FromObject</tt></b>(</nobr></td><td>PyObject *<var>obj</var>)</td></tr></table></dt>
<dd>
<div class="refcount-info">
<span class="label">Return value:</span>
<span class="value">New reference.</span>
</div>
Shortcut for <code>PyUnicode_FromEncodedObject(obj, NULL, "strict")</code>
which is used throughout the interpreter whenever coercion to
Unicode is needed.
</dd></dl>
<P>
If the platform supports <tt class="ctype">wchar_t</tt> and provides a header file
wchar.h, Python can interface directly to this type using the
following functions. Support is optimized if Python's own
<tt class="ctype">Py_UNICODE</tt> type is identical to the system's <tt class="ctype">wchar_t</tt>.
<P>
<dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"><td><nobr>PyObject*&nbsp;<b><tt id='l2h-504' xml:id='l2h-504' class="cfunction">PyUnicode_FromWideChar</tt></b>(</nobr></td><td>const wchar_t *<var>w</var>,
int <var>size</var>)</td></tr></table></dt>
<dd>
<div class="refcount-info">
<span class="label">Return value:</span>
<span class="value">New reference.</span>
</div>
Create a Unicode object from the <tt class="ctype">wchar_t</tt> buffer <var>w</var> of
the given size. Return <tt class="constant">NULL</tt> on failure.
</dd></dl>
<P>
<dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"><td><nobr>int&nbsp;<b><tt id='l2h-505' xml:id='l2h-505' class="cfunction">PyUnicode_AsWideChar</tt></b>(</nobr></td><td>PyUnicodeObject *<var>unicode</var>,
wchar_t *<var>w</var>,
int <var>size</var>)</td></tr></table></dt>
<dd>
Copy the Unicode object contents into the <tt class="ctype">wchar_t</tt> buffer
<var>w</var>. At most <var>size</var> <tt class="ctype">wchar_t</tt> characters are copied
(excluding a possibly trailing 0-termination character). Return
the number of <tt class="ctype">wchar_t</tt> characters copied or -1 in case of an
error. Note that the resulting <tt class="ctype">wchar_t</tt> string may or may
not be 0-terminated. It is the responsibility of the caller to make
sure that the <tt class="ctype">wchar_t</tt> string is 0-terminated in case this is
required by the application.
</dd></dl>
<P>
<p><br /></p><hr class='online-navigation' />
<div class='online-navigation'>
<!--Table of Child-Links-->
<A NAME="CHILD_LINKS"><STRONG>Subsections</STRONG></a>
<UL CLASS="ChildLinks">
<LI><A href="builtinCodecs.html">7.3.2.1 Built-in Codecs</a>
<LI><A href="unicodeMethodsAndSlots.html">7.3.2.2 Methods and Slot Functions</a>
</ul>
<!--End of Table of Child-Links-->
</div>
<DIV CLASS="navigation">
<div class='online-navigation'>
<p></p><hr />
<table align="center" width="100%" cellpadding="0" cellspacing="2">
<tr>
<td class='online-navigation'><a rel="prev" title="7.3.1 String Objects"
href="stringObjects.html"><img src='../icons/previous.png'
border='0' height='32' alt='Previous Page' width='32' /></A></td>
<td class='online-navigation'><a rel="parent" title="7.3 Sequence Objects"
href="sequenceObjects.html"><img src='../icons/up.png'
border='0' height='32' alt='Up One Level' width='32' /></A></td>
<td class='online-navigation'><a rel="next" title="7.3.2.1 Built-in Codecs"
href="builtinCodecs.html"><img src='../icons/next.png'
border='0' height='32' alt='Next Page' width='32' /></A></td>
<td align="center" width="100%">Python/C API Reference Manual</td>
<td class='online-navigation'><a rel="contents" title="Table of Contents"
href="contents.html"><img src='../icons/contents.png'
border='0' height='32' alt='Contents' width='32' /></A></td>
<td class='online-navigation'><img src='../icons/blank.png'
border='0' height='32' alt='' width='32' /></td>
<td class='online-navigation'><a rel="index" title="Index"
href="genindex.html"><img src='../icons/index.png'
border='0' height='32' alt='Index' width='32' /></A></td>
</tr></table>
<div class='online-navigation'>
<b class="navlabel">Previous:</b>
<a class="sectref" rel="prev" href="stringObjects.html">7.3.1 String Objects</A>
<b class="navlabel">Up:</b>
<a class="sectref" rel="parent" href="sequenceObjects.html">7.3 Sequence Objects</A>
<b class="navlabel">Next:</b>
<a class="sectref" rel="next" href="builtinCodecs.html">7.3.2.1 Built-in Codecs</A>
</div>
</div>
<hr />
<span class="release-info">Release 2.4.2, documentation updated on 28 September 2005.</span>
</DIV>
<!--End of Navigation Panel-->
<ADDRESS>
See <i><a href="about.html">About this document...</a></i> for information on suggesting changes.
</ADDRESS>
</BODY>
</HTML>