Commit | Line | Data |
---|---|---|
920dae64 AT |
1 | <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"> |
2 | <html> | |
3 | <head> | |
4 | <link rel="STYLESHEET" href="lib.css" type='text/css' /> | |
5 | <link rel="SHORTCUT ICON" href="../icons/pyfav.png" type="image/png" /> | |
6 | <link rel='start' href='../index.html' title='Python Documentation Index' /> | |
7 | <link rel="first" href="lib.html" title='Python Library Reference' /> | |
8 | <link rel='contents' href='contents.html' title="Contents" /> | |
9 | <link rel='index' href='genindex.html' title='Index' /> | |
10 | <link rel='last' href='about.html' title='About this document...' /> | |
11 | <link rel='help' href='about.html' title='About this document...' /> | |
12 | <link rel="next" href="module-stringprep.html" /> | |
13 | <link rel="prev" href="module-codecs.html" /> | |
14 | <link rel="parent" href="strings.html" /> | |
15 | <link rel="next" href="module-stringprep.html" /> | |
16 | <meta name='aesop' content='information' /> | |
17 | <title>4.10 unicodedata -- Unicode Database</title> | |
18 | </head> | |
19 | <body> | |
20 | <DIV CLASS="navigation"> | |
21 | <div id='top-navigation-panel' xml:id='top-navigation-panel'> | |
22 | <table align="center" width="100%" cellpadding="0" cellspacing="2"> | |
23 | <tr> | |
24 | <td class='online-navigation'><a rel="prev" title="4.9.3 encodings.idna " | |
25 | href="module-encodings.idna.html"><img src='../icons/previous.png' | |
26 | border='0' height='32' alt='Previous Page' width='32' /></A></td> | |
27 | <td class='online-navigation'><a rel="parent" title="4. String Services" | |
28 | href="strings.html"><img src='../icons/up.png' | |
29 | border='0' height='32' alt='Up One Level' width='32' /></A></td> | |
30 | <td class='online-navigation'><a rel="next" title="4.11 stringprep " | |
31 | href="module-stringprep.html"><img src='../icons/next.png' | |
32 | border='0' height='32' alt='Next Page' width='32' /></A></td> | |
33 | <td align="center" width="100%">Python Library Reference</td> | |
34 | <td class='online-navigation'><a rel="contents" title="Table of Contents" | |
35 | href="contents.html"><img src='../icons/contents.png' | |
36 | border='0' height='32' alt='Contents' width='32' /></A></td> | |
37 | <td class='online-navigation'><a href="modindex.html" title="Module Index"><img src='../icons/modules.png' | |
38 | border='0' height='32' alt='Module Index' width='32' /></a></td> | |
39 | <td class='online-navigation'><a rel="index" title="Index" | |
40 | href="genindex.html"><img src='../icons/index.png' | |
41 | border='0' height='32' alt='Index' width='32' /></A></td> | |
42 | </tr></table> | |
43 | <div class='online-navigation'> | |
44 | <b class="navlabel">Previous:</b> | |
45 | <a class="sectref" rel="prev" href="module-encodings.idna.html">4.9.3 encodings.idna </A> | |
46 | <b class="navlabel">Up:</b> | |
47 | <a class="sectref" rel="parent" href="strings.html">4. String Services</A> | |
48 | <b class="navlabel">Next:</b> | |
49 | <a class="sectref" rel="next" href="module-stringprep.html">4.11 stringprep </A> | |
50 | </div> | |
51 | <hr /></div> | |
52 | </DIV> | |
53 | <!--End of Navigation Panel--> | |
54 | ||
55 | <H1><A NAME="SECTION0061000000000000000000"> | |
56 | 4.10 <tt class="module">unicodedata</tt> -- | |
57 | Unicode Database</A> | |
58 | </H1> | |
59 | ||
60 | <P> | |
61 | <A NAME="module-unicodedata"></A> | |
62 | ||
63 | <P> | |
64 | <a id='l2h-1037' xml:id='l2h-1037'></a> | |
65 | <a id='l2h-1023' xml:id='l2h-1023'></a> | |
66 | <P> | |
67 | This module provides access to the Unicode Character Database which | |
68 | defines character properties for all Unicode characters. The data in | |
69 | this database is based on the <span class="file">UnicodeData.txt</span> file version | |
70 | 3.2.0 which is publically available from <a class="url" href="ftp://ftp.unicode.org/">ftp://ftp.unicode.org/</a>. | |
71 | ||
72 | <P> | |
73 | The module uses the same names and symbols as defined by the | |
74 | UnicodeData File Format 3.2.0 (see | |
75 | <a class="url" href="http://www.unicode.org/Public/UNIDATA/UnicodeData.html">http://www.unicode.org/Public/UNIDATA/UnicodeData.html</a>). It | |
76 | defines the following functions: | |
77 | ||
78 | <P> | |
79 | <dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"> | |
80 | <td><nobr><b><tt id='l2h-1024' xml:id='l2h-1024' class="function">lookup</tt></b>(</nobr></td> | |
81 | <td><var>name</var>)</td></tr></table></dt> | |
82 | <dd> | |
83 | Look up character by name. If a character with the | |
84 | given name is found, return the corresponding Unicode | |
85 | character. If not found, <tt class="exception">KeyError</tt> is raised. | |
86 | </dl> | |
87 | ||
88 | <P> | |
89 | <dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"> | |
90 | <td><nobr><b><tt id='l2h-1025' xml:id='l2h-1025' class="function">name</tt></b>(</nobr></td> | |
91 | <td><var>unichr</var><big>[</big><var>, default</var><big>]</big><var></var>)</td></tr></table></dt> | |
92 | <dd> | |
93 | Returns the name assigned to the Unicode character | |
94 | <var>unichr</var> as a string. If no name is defined, | |
95 | <var>default</var> is returned, or, if not given, | |
96 | <tt class="exception">ValueError</tt> is raised. | |
97 | </dl> | |
98 | ||
99 | <P> | |
100 | <dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"> | |
101 | <td><nobr><b><tt id='l2h-1026' xml:id='l2h-1026' class="function">decimal</tt></b>(</nobr></td> | |
102 | <td><var>unichr</var><big>[</big><var>, default</var><big>]</big><var></var>)</td></tr></table></dt> | |
103 | <dd> | |
104 | Returns the decimal value assigned to the Unicode character | |
105 | <var>unichr</var> as integer. If no such value is defined, | |
106 | <var>default</var> is returned, or, if not given, | |
107 | <tt class="exception">ValueError</tt> is raised. | |
108 | </dl> | |
109 | ||
110 | <P> | |
111 | <dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"> | |
112 | <td><nobr><b><tt id='l2h-1027' xml:id='l2h-1027' class="function">digit</tt></b>(</nobr></td> | |
113 | <td><var>unichr</var><big>[</big><var>, default</var><big>]</big><var></var>)</td></tr></table></dt> | |
114 | <dd> | |
115 | Returns the digit value assigned to the Unicode character | |
116 | <var>unichr</var> as integer. If no such value is defined, | |
117 | <var>default</var> is returned, or, if not given, | |
118 | <tt class="exception">ValueError</tt> is raised. | |
119 | </dl> | |
120 | ||
121 | <P> | |
122 | <dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"> | |
123 | <td><nobr><b><tt id='l2h-1028' xml:id='l2h-1028' class="function">numeric</tt></b>(</nobr></td> | |
124 | <td><var>unichr</var><big>[</big><var>, default</var><big>]</big><var></var>)</td></tr></table></dt> | |
125 | <dd> | |
126 | Returns the numeric value assigned to the Unicode character | |
127 | <var>unichr</var> as float. If no such value is defined, <var>default</var> is | |
128 | returned, or, if not given, <tt class="exception">ValueError</tt> is raised. | |
129 | </dl> | |
130 | ||
131 | <P> | |
132 | <dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"> | |
133 | <td><nobr><b><tt id='l2h-1029' xml:id='l2h-1029' class="function">category</tt></b>(</nobr></td> | |
134 | <td><var>unichr</var>)</td></tr></table></dt> | |
135 | <dd> | |
136 | Returns the general category assigned to the Unicode character | |
137 | <var>unichr</var> as string. | |
138 | </dl> | |
139 | ||
140 | <P> | |
141 | <dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"> | |
142 | <td><nobr><b><tt id='l2h-1030' xml:id='l2h-1030' class="function">bidirectional</tt></b>(</nobr></td> | |
143 | <td><var>unichr</var>)</td></tr></table></dt> | |
144 | <dd> | |
145 | Returns the bidirectional category assigned to the Unicode character | |
146 | <var>unichr</var> as string. If no such value is defined, an empty string | |
147 | is returned. | |
148 | </dl> | |
149 | ||
150 | <P> | |
151 | <dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"> | |
152 | <td><nobr><b><tt id='l2h-1031' xml:id='l2h-1031' class="function">combining</tt></b>(</nobr></td> | |
153 | <td><var>unichr</var>)</td></tr></table></dt> | |
154 | <dd> | |
155 | Returns the canonical combining class assigned to the Unicode | |
156 | character <var>unichr</var> as integer. Returns <code>0</code> if no combining | |
157 | class is defined. | |
158 | </dl> | |
159 | ||
160 | <P> | |
161 | <dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"> | |
162 | <td><nobr><b><tt id='l2h-1032' xml:id='l2h-1032' class="function">east_asian_width</tt></b>(</nobr></td> | |
163 | <td><var>unichr</var>)</td></tr></table></dt> | |
164 | <dd> | |
165 | Returns the east asian width assigned to the Unicode character | |
166 | <var>unichr</var> as string. | |
167 | ||
168 | <span class="versionnote">New in version 2.4.</span> | |
169 | ||
170 | </dl> | |
171 | ||
172 | <P> | |
173 | <dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"> | |
174 | <td><nobr><b><tt id='l2h-1033' xml:id='l2h-1033' class="function">mirrored</tt></b>(</nobr></td> | |
175 | <td><var>unichr</var>)</td></tr></table></dt> | |
176 | <dd> | |
177 | Returns the mirrored property assigned to the Unicode character | |
178 | <var>unichr</var> as integer. Returns <code>1</code> if the character has been | |
179 | identified as a ``mirrored'' character in bidirectional text, | |
180 | <code>0</code> otherwise. | |
181 | </dl> | |
182 | ||
183 | <P> | |
184 | <dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"> | |
185 | <td><nobr><b><tt id='l2h-1034' xml:id='l2h-1034' class="function">decomposition</tt></b>(</nobr></td> | |
186 | <td><var>unichr</var>)</td></tr></table></dt> | |
187 | <dd> | |
188 | Returns the character decomposition mapping assigned to the Unicode | |
189 | character <var>unichr</var> as string. An empty string is returned in case | |
190 | no such mapping is defined. | |
191 | </dl> | |
192 | ||
193 | <P> | |
194 | <dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"> | |
195 | <td><nobr><b><tt id='l2h-1035' xml:id='l2h-1035' class="function">normalize</tt></b>(</nobr></td> | |
196 | <td><var>form, unistr</var>)</td></tr></table></dt> | |
197 | <dd> | |
198 | ||
199 | <P> | |
200 | Return the normal form <var>form</var> for the Unicode string <var>unistr</var>. | |
201 | Valid values for <var>form</var> are 'NFC', 'NFKC', 'NFD', and 'NFKD'. | |
202 | ||
203 | <P> | |
204 | The Unicode standard defines various normalization forms of a Unicode | |
205 | string, based on the definition of canonical equivalence and | |
206 | compatibility equivalence. In Unicode, several characters can be | |
207 | expressed in various way. For example, the character U+00C7 (LATIN | |
208 | CAPITAL LETTER C WITH CEDILLA) can also be expressed as the sequence | |
209 | U+0043 (LATIN CAPITAL LETTER C) U+0327 (COMBINING CEDILLA). | |
210 | ||
211 | <P> | |
212 | For each character, there are two normal forms: normal form C and | |
213 | normal form D. Normal form D (NFD) is also known as canonical | |
214 | decomposition, and translates each character into its decomposed form. | |
215 | Normal form C (NFC) first applies a canonical decomposition, then | |
216 | composes pre-combined characters again. | |
217 | ||
218 | <P> | |
219 | In addition to these two forms, there two additional normal forms | |
220 | based on compatibility equivalence. In Unicode, certain characters are | |
221 | supported which normally would be unified with other characters. For | |
222 | example, U+2160 (ROMAN NUMERAL ONE) is really the same thing as U+0049 | |
223 | (LATIN CAPITAL LETTER I). However, it is supported in Unicode for | |
224 | compatibility with existing character sets (e.g. gb2312). | |
225 | ||
226 | <P> | |
227 | The normal form KD (NFKD) will apply the compatibility decomposition, | |
228 | i.e. replace all compatibility characters with their equivalents. The | |
229 | normal form KC (NFKC) first applies the compatibility decomposition, | |
230 | followed by the canonical composition. | |
231 | ||
232 | <P> | |
233 | ||
234 | <span class="versionnote">New in version 2.3.</span> | |
235 | ||
236 | </dl> | |
237 | ||
238 | <P> | |
239 | In addition, the module exposes the following constant: | |
240 | ||
241 | <P> | |
242 | <dl><dt><b><tt id='l2h-1036' xml:id='l2h-1036'>unidata_version</tt></b></dt> | |
243 | <dd> | |
244 | The version of the Unicode database used in this module. | |
245 | ||
246 | <P> | |
247 | ||
248 | <span class="versionnote">New in version 2.3.</span> | |
249 | ||
250 | </dd></dl> | |
251 | ||
252 | <DIV CLASS="navigation"> | |
253 | <div class='online-navigation'> | |
254 | <p></p><hr /> | |
255 | <table align="center" width="100%" cellpadding="0" cellspacing="2"> | |
256 | <tr> | |
257 | <td class='online-navigation'><a rel="prev" title="4.9.3 encodings.idna " | |
258 | href="module-encodings.idna.html"><img src='../icons/previous.png' | |
259 | border='0' height='32' alt='Previous Page' width='32' /></A></td> | |
260 | <td class='online-navigation'><a rel="parent" title="4. String Services" | |
261 | href="strings.html"><img src='../icons/up.png' | |
262 | border='0' height='32' alt='Up One Level' width='32' /></A></td> | |
263 | <td class='online-navigation'><a rel="next" title="4.11 stringprep " | |
264 | href="module-stringprep.html"><img src='../icons/next.png' | |
265 | border='0' height='32' alt='Next Page' width='32' /></A></td> | |
266 | <td align="center" width="100%">Python Library Reference</td> | |
267 | <td class='online-navigation'><a rel="contents" title="Table of Contents" | |
268 | href="contents.html"><img src='../icons/contents.png' | |
269 | border='0' height='32' alt='Contents' width='32' /></A></td> | |
270 | <td class='online-navigation'><a href="modindex.html" title="Module Index"><img src='../icons/modules.png' | |
271 | border='0' height='32' alt='Module Index' width='32' /></a></td> | |
272 | <td class='online-navigation'><a rel="index" title="Index" | |
273 | href="genindex.html"><img src='../icons/index.png' | |
274 | border='0' height='32' alt='Index' width='32' /></A></td> | |
275 | </tr></table> | |
276 | <div class='online-navigation'> | |
277 | <b class="navlabel">Previous:</b> | |
278 | <a class="sectref" rel="prev" href="module-encodings.idna.html">4.9.3 encodings.idna </A> | |
279 | <b class="navlabel">Up:</b> | |
280 | <a class="sectref" rel="parent" href="strings.html">4. String Services</A> | |
281 | <b class="navlabel">Next:</b> | |
282 | <a class="sectref" rel="next" href="module-stringprep.html">4.11 stringprep </A> | |
283 | </div> | |
284 | </div> | |
285 | <hr /> | |
286 | <span class="release-info">Release 2.4.2, documentation updated on 28 September 2005.</span> | |
287 | </DIV> | |
288 | <!--End of Navigation Panel--> | |
289 | <ADDRESS> | |
290 | See <i><a href="about.html">About this document...</a></i> for information on suggesting changes. | |
291 | </ADDRESS> | |
292 | </BODY> | |
293 | </HTML> |