Initial commit of OpenSPARC T2 architecture model.
[OpenSPARC-T2-SAM] / sam-t2 / devtools / v9 / html / python / lib / module-unicodedata.html
CommitLineData
920dae64
AT
1<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
2<html>
3<head>
4<link rel="STYLESHEET" href="lib.css" type='text/css' />
5<link rel="SHORTCUT ICON" href="../icons/pyfav.png" type="image/png" />
6<link rel='start' href='../index.html' title='Python Documentation Index' />
7<link rel="first" href="lib.html" title='Python Library Reference' />
8<link rel='contents' href='contents.html' title="Contents" />
9<link rel='index' href='genindex.html' title='Index' />
10<link rel='last' href='about.html' title='About this document...' />
11<link rel='help' href='about.html' title='About this document...' />
12<link rel="next" href="module-stringprep.html" />
13<link rel="prev" href="module-codecs.html" />
14<link rel="parent" href="strings.html" />
15<link rel="next" href="module-stringprep.html" />
16<meta name='aesop' content='information' />
17<title>4.10 unicodedata -- Unicode Database</title>
18</head>
19<body>
20<DIV CLASS="navigation">
21<div id='top-navigation-panel' xml:id='top-navigation-panel'>
22<table align="center" width="100%" cellpadding="0" cellspacing="2">
23<tr>
24<td class='online-navigation'><a rel="prev" title="4.9.3 encodings.idna "
25 href="module-encodings.idna.html"><img src='../icons/previous.png'
26 border='0' height='32' alt='Previous Page' width='32' /></A></td>
27<td class='online-navigation'><a rel="parent" title="4. String Services"
28 href="strings.html"><img src='../icons/up.png'
29 border='0' height='32' alt='Up One Level' width='32' /></A></td>
30<td class='online-navigation'><a rel="next" title="4.11 stringprep "
31 href="module-stringprep.html"><img src='../icons/next.png'
32 border='0' height='32' alt='Next Page' width='32' /></A></td>
33<td align="center" width="100%">Python Library Reference</td>
34<td class='online-navigation'><a rel="contents" title="Table of Contents"
35 href="contents.html"><img src='../icons/contents.png'
36 border='0' height='32' alt='Contents' width='32' /></A></td>
37<td class='online-navigation'><a href="modindex.html" title="Module Index"><img src='../icons/modules.png'
38 border='0' height='32' alt='Module Index' width='32' /></a></td>
39<td class='online-navigation'><a rel="index" title="Index"
40 href="genindex.html"><img src='../icons/index.png'
41 border='0' height='32' alt='Index' width='32' /></A></td>
42</tr></table>
43<div class='online-navigation'>
44<b class="navlabel">Previous:</b>
45<a class="sectref" rel="prev" href="module-encodings.idna.html">4.9.3 encodings.idna </A>
46<b class="navlabel">Up:</b>
47<a class="sectref" rel="parent" href="strings.html">4. String Services</A>
48<b class="navlabel">Next:</b>
49<a class="sectref" rel="next" href="module-stringprep.html">4.11 stringprep </A>
50</div>
51<hr /></div>
52</DIV>
53<!--End of Navigation Panel-->
54
55<H1><A NAME="SECTION0061000000000000000000">
564.10 <tt class="module">unicodedata</tt> --
57 Unicode Database</A>
58</H1>
59
60<P>
61<A NAME="module-unicodedata"></A>
62
63<P>
64<a id='l2h-1037' xml:id='l2h-1037'></a>
65<a id='l2h-1023' xml:id='l2h-1023'></a>
66<P>
67This module provides access to the Unicode Character Database which
68defines character properties for all Unicode characters. The data in
69this database is based on the <span class="file">UnicodeData.txt</span> file version
703.2.0 which is publically available from <a class="url" href="ftp://ftp.unicode.org/">ftp://ftp.unicode.org/</a>.
71
72<P>
73The module uses the same names and symbols as defined by the
74UnicodeData File Format 3.2.0 (see
75<a class="url" href="http://www.unicode.org/Public/UNIDATA/UnicodeData.html">http://www.unicode.org/Public/UNIDATA/UnicodeData.html</a>). It
76defines the following functions:
77
78<P>
79<dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline">
80 <td><nobr><b><tt id='l2h-1024' xml:id='l2h-1024' class="function">lookup</tt></b>(</nobr></td>
81 <td><var>name</var>)</td></tr></table></dt>
82<dd>
83 Look up character by name. If a character with the
84 given name is found, return the corresponding Unicode
85 character. If not found, <tt class="exception">KeyError</tt> is raised.
86</dl>
87
88<P>
89<dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline">
90 <td><nobr><b><tt id='l2h-1025' xml:id='l2h-1025' class="function">name</tt></b>(</nobr></td>
91 <td><var>unichr</var><big>[</big><var>, default</var><big>]</big><var></var>)</td></tr></table></dt>
92<dd>
93 Returns the name assigned to the Unicode character
94 <var>unichr</var> as a string. If no name is defined,
95 <var>default</var> is returned, or, if not given,
96 <tt class="exception">ValueError</tt> is raised.
97</dl>
98
99<P>
100<dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline">
101 <td><nobr><b><tt id='l2h-1026' xml:id='l2h-1026' class="function">decimal</tt></b>(</nobr></td>
102 <td><var>unichr</var><big>[</big><var>, default</var><big>]</big><var></var>)</td></tr></table></dt>
103<dd>
104 Returns the decimal value assigned to the Unicode character
105 <var>unichr</var> as integer. If no such value is defined,
106 <var>default</var> is returned, or, if not given,
107 <tt class="exception">ValueError</tt> is raised.
108</dl>
109
110<P>
111<dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline">
112 <td><nobr><b><tt id='l2h-1027' xml:id='l2h-1027' class="function">digit</tt></b>(</nobr></td>
113 <td><var>unichr</var><big>[</big><var>, default</var><big>]</big><var></var>)</td></tr></table></dt>
114<dd>
115 Returns the digit value assigned to the Unicode character
116 <var>unichr</var> as integer. If no such value is defined,
117 <var>default</var> is returned, or, if not given,
118 <tt class="exception">ValueError</tt> is raised.
119</dl>
120
121<P>
122<dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline">
123 <td><nobr><b><tt id='l2h-1028' xml:id='l2h-1028' class="function">numeric</tt></b>(</nobr></td>
124 <td><var>unichr</var><big>[</big><var>, default</var><big>]</big><var></var>)</td></tr></table></dt>
125<dd>
126 Returns the numeric value assigned to the Unicode character
127 <var>unichr</var> as float. If no such value is defined, <var>default</var> is
128 returned, or, if not given, <tt class="exception">ValueError</tt> is raised.
129</dl>
130
131<P>
132<dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline">
133 <td><nobr><b><tt id='l2h-1029' xml:id='l2h-1029' class="function">category</tt></b>(</nobr></td>
134 <td><var>unichr</var>)</td></tr></table></dt>
135<dd>
136 Returns the general category assigned to the Unicode character
137 <var>unichr</var> as string.
138</dl>
139
140<P>
141<dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline">
142 <td><nobr><b><tt id='l2h-1030' xml:id='l2h-1030' class="function">bidirectional</tt></b>(</nobr></td>
143 <td><var>unichr</var>)</td></tr></table></dt>
144<dd>
145 Returns the bidirectional category assigned to the Unicode character
146 <var>unichr</var> as string. If no such value is defined, an empty string
147 is returned.
148</dl>
149
150<P>
151<dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline">
152 <td><nobr><b><tt id='l2h-1031' xml:id='l2h-1031' class="function">combining</tt></b>(</nobr></td>
153 <td><var>unichr</var>)</td></tr></table></dt>
154<dd>
155 Returns the canonical combining class assigned to the Unicode
156 character <var>unichr</var> as integer. Returns <code>0</code> if no combining
157 class is defined.
158</dl>
159
160<P>
161<dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline">
162 <td><nobr><b><tt id='l2h-1032' xml:id='l2h-1032' class="function">east_asian_width</tt></b>(</nobr></td>
163 <td><var>unichr</var>)</td></tr></table></dt>
164<dd>
165 Returns the east asian width assigned to the Unicode character
166 <var>unichr</var> as string.
167
168<span class="versionnote">New in version 2.4.</span>
169
170</dl>
171
172<P>
173<dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline">
174 <td><nobr><b><tt id='l2h-1033' xml:id='l2h-1033' class="function">mirrored</tt></b>(</nobr></td>
175 <td><var>unichr</var>)</td></tr></table></dt>
176<dd>
177 Returns the mirrored property assigned to the Unicode character
178 <var>unichr</var> as integer. Returns <code>1</code> if the character has been
179 identified as a ``mirrored'' character in bidirectional text,
180 <code>0</code> otherwise.
181</dl>
182
183<P>
184<dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline">
185 <td><nobr><b><tt id='l2h-1034' xml:id='l2h-1034' class="function">decomposition</tt></b>(</nobr></td>
186 <td><var>unichr</var>)</td></tr></table></dt>
187<dd>
188 Returns the character decomposition mapping assigned to the Unicode
189 character <var>unichr</var> as string. An empty string is returned in case
190 no such mapping is defined.
191</dl>
192
193<P>
194<dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline">
195 <td><nobr><b><tt id='l2h-1035' xml:id='l2h-1035' class="function">normalize</tt></b>(</nobr></td>
196 <td><var>form, unistr</var>)</td></tr></table></dt>
197<dd>
198
199<P>
200Return the normal form <var>form</var> for the Unicode string <var>unistr</var>.
201Valid values for <var>form</var> are 'NFC', 'NFKC', 'NFD', and 'NFKD'.
202
203<P>
204The Unicode standard defines various normalization forms of a Unicode
205string, based on the definition of canonical equivalence and
206compatibility equivalence. In Unicode, several characters can be
207expressed in various way. For example, the character U+00C7 (LATIN
208CAPITAL LETTER C WITH CEDILLA) can also be expressed as the sequence
209U+0043 (LATIN CAPITAL LETTER C) U+0327 (COMBINING CEDILLA).
210
211<P>
212For each character, there are two normal forms: normal form C and
213normal form D. Normal form D (NFD) is also known as canonical
214decomposition, and translates each character into its decomposed form.
215Normal form C (NFC) first applies a canonical decomposition, then
216composes pre-combined characters again.
217
218<P>
219In addition to these two forms, there two additional normal forms
220based on compatibility equivalence. In Unicode, certain characters are
221supported which normally would be unified with other characters. For
222example, U+2160 (ROMAN NUMERAL ONE) is really the same thing as U+0049
223(LATIN CAPITAL LETTER I). However, it is supported in Unicode for
224compatibility with existing character sets (e.g. gb2312).
225
226<P>
227The normal form KD (NFKD) will apply the compatibility decomposition,
228i.e. replace all compatibility characters with their equivalents. The
229normal form KC (NFKC) first applies the compatibility decomposition,
230followed by the canonical composition.
231
232<P>
233
234<span class="versionnote">New in version 2.3.</span>
235
236</dl>
237
238<P>
239In addition, the module exposes the following constant:
240
241<P>
242<dl><dt><b><tt id='l2h-1036' xml:id='l2h-1036'>unidata_version</tt></b></dt>
243<dd>
244The version of the Unicode database used in this module.
245
246<P>
247
248<span class="versionnote">New in version 2.3.</span>
249
250</dd></dl>
251
252<DIV CLASS="navigation">
253<div class='online-navigation'>
254<p></p><hr />
255<table align="center" width="100%" cellpadding="0" cellspacing="2">
256<tr>
257<td class='online-navigation'><a rel="prev" title="4.9.3 encodings.idna "
258 href="module-encodings.idna.html"><img src='../icons/previous.png'
259 border='0' height='32' alt='Previous Page' width='32' /></A></td>
260<td class='online-navigation'><a rel="parent" title="4. String Services"
261 href="strings.html"><img src='../icons/up.png'
262 border='0' height='32' alt='Up One Level' width='32' /></A></td>
263<td class='online-navigation'><a rel="next" title="4.11 stringprep "
264 href="module-stringprep.html"><img src='../icons/next.png'
265 border='0' height='32' alt='Next Page' width='32' /></A></td>
266<td align="center" width="100%">Python Library Reference</td>
267<td class='online-navigation'><a rel="contents" title="Table of Contents"
268 href="contents.html"><img src='../icons/contents.png'
269 border='0' height='32' alt='Contents' width='32' /></A></td>
270<td class='online-navigation'><a href="modindex.html" title="Module Index"><img src='../icons/modules.png'
271 border='0' height='32' alt='Module Index' width='32' /></a></td>
272<td class='online-navigation'><a rel="index" title="Index"
273 href="genindex.html"><img src='../icons/index.png'
274 border='0' height='32' alt='Index' width='32' /></A></td>
275</tr></table>
276<div class='online-navigation'>
277<b class="navlabel">Previous:</b>
278<a class="sectref" rel="prev" href="module-encodings.idna.html">4.9.3 encodings.idna </A>
279<b class="navlabel">Up:</b>
280<a class="sectref" rel="parent" href="strings.html">4. String Services</A>
281<b class="navlabel">Next:</b>
282<a class="sectref" rel="next" href="module-stringprep.html">4.11 stringprep </A>
283</div>
284</div>
285<hr />
286<span class="release-info">Release 2.4.2, documentation updated on 28 September 2005.</span>
287</DIV>
288<!--End of Navigation Panel-->
289<ADDRESS>
290See <i><a href="about.html">About this document...</a></i> for information on suggesting changes.
291</ADDRESS>
292</BODY>
293</HTML>