Commit | Line | Data |
---|---|---|
920dae64 AT |
1 | <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"> |
2 | <html> | |
3 | <head> | |
4 | <link rel="STYLESHEET" href="tut.css" type='text/css' /> | |
5 | <link rel="SHORTCUT ICON" href="../icons/pyfav.png" type="image/png" /> | |
6 | <link rel='start' href='../index.html' title='Python Documentation Index' /> | |
7 | <link rel="first" href="tut.html" title='Python Tutorial' /> | |
8 | <link rel='contents' href='node2.html' title="Contents" /> | |
9 | <link rel='index' href='node19.html' title='Index' /> | |
10 | <link rel='last' href='about.html' title='About this document...' /> | |
11 | <link rel='help' href='about.html' title='About this document...' /> | |
12 | <link rel="next" href="node6.html" /> | |
13 | <link rel="prev" href="node4.html" /> | |
14 | <link rel="parent" href="tut.html" /> | |
15 | <link rel="next" href="node6.html" /> | |
16 | <meta name='aesop' content='information' /> | |
17 | <title>3. An Informal Introduction to Python </title> | |
18 | </head> | |
19 | <body> | |
20 | <DIV CLASS="navigation"> | |
21 | <div id='top-navigation-panel' xml:id='top-navigation-panel'> | |
22 | <table align="center" width="100%" cellpadding="0" cellspacing="2"> | |
23 | <tr> | |
24 | <td class='online-navigation'><a rel="prev" title="2. Using the Python" | |
25 | href="node4.html"><img src='../icons/previous.png' | |
26 | border='0' height='32' alt='Previous Page' width='32' /></A></td> | |
27 | <td class='online-navigation'><a rel="parent" title="Python Tutorial" | |
28 | href="tut.html"><img src='../icons/up.png' | |
29 | border='0' height='32' alt='Up One Level' width='32' /></A></td> | |
30 | <td class='online-navigation'><a rel="next" title="4. More Control Flow" | |
31 | href="node6.html"><img src='../icons/next.png' | |
32 | border='0' height='32' alt='Next Page' width='32' /></A></td> | |
33 | <td align="center" width="100%">Python Tutorial</td> | |
34 | <td class='online-navigation'><a rel="contents" title="Table of Contents" | |
35 | href="node2.html"><img src='../icons/contents.png' | |
36 | border='0' height='32' alt='Contents' width='32' /></A></td> | |
37 | <td class='online-navigation'><img src='../icons/blank.png' | |
38 | border='0' height='32' alt='' width='32' /></td> | |
39 | <td class='online-navigation'><a rel="index" title="Index" | |
40 | href="node19.html"><img src='../icons/index.png' | |
41 | border='0' height='32' alt='Index' width='32' /></A></td> | |
42 | </tr></table> | |
43 | <div class='online-navigation'> | |
44 | <b class="navlabel">Previous:</b> | |
45 | <a class="sectref" rel="prev" href="node4.html">2. Using the Python</A> | |
46 | <b class="navlabel">Up:</b> | |
47 | <a class="sectref" rel="parent" href="tut.html">Python Tutorial</A> | |
48 | <b class="navlabel">Next:</b> | |
49 | <a class="sectref" rel="next" href="node6.html">4. More Control Flow</A> | |
50 | </div> | |
51 | <hr /></div> | |
52 | </DIV> | |
53 | <!--End of Navigation Panel--> | |
54 | <div class='online-navigation'> | |
55 | <!--Table of Child-Links--> | |
56 | <A NAME="CHILD_LINKS"><STRONG>Subsections</STRONG></a> | |
57 | ||
58 | <UL CLASS="ChildLinks"> | |
59 | <LI><A href="node5.html#SECTION005100000000000000000">3.1 Using Python as a Calculator</a> | |
60 | <UL> | |
61 | <LI><A href="node5.html#SECTION005110000000000000000">3.1.1 Numbers</a> | |
62 | <LI><A href="node5.html#SECTION005120000000000000000">3.1.2 Strings</a> | |
63 | <LI><A href="node5.html#SECTION005130000000000000000">3.1.3 Unicode Strings</a> | |
64 | <LI><A href="node5.html#SECTION005140000000000000000">3.1.4 Lists</a> | |
65 | </ul> | |
66 | <LI><A href="node5.html#SECTION005200000000000000000">3.2 First Steps Towards Programming</a> | |
67 | </ul> | |
68 | <!--End of Table of Child-Links--> | |
69 | </div> | |
70 | <HR> | |
71 | ||
72 | <H1><A NAME="SECTION005000000000000000000"></A><A NAME="informal"></A> | |
73 | <BR> | |
74 | 3. An Informal Introduction to Python | |
75 | </H1> | |
76 | ||
77 | <P> | |
78 | In the following examples, input and output are distinguished by the | |
79 | presence or absence of prompts ("<tt class="samp">><code>></code>> </tt>" and "<tt class="samp">... </tt>"): to repeat | |
80 | the example, you must type everything after the prompt, when the | |
81 | prompt appears; lines that do not begin with a prompt are output from | |
82 | the interpreter. Note that a secondary prompt on a line by itself in an example means | |
83 | you must type a blank line; this is used to end a multi-line command. | |
84 | ||
85 | <P> | |
86 | Many of the examples in this manual, even those entered at the | |
87 | interactive prompt, include comments. Comments in Python start with | |
88 | the hash character, "<tt class="character">#</tt>", and extend to the end of the | |
89 | physical line. A comment may appear at the start of a line or | |
90 | following whitespace or code, but not within a string literal. A hash | |
91 | character within a string literal is just a hash character. | |
92 | ||
93 | <P> | |
94 | Some examples: | |
95 | ||
96 | <P> | |
97 | <div class="verbatim"><pre> | |
98 | # this is the first comment | |
99 | SPAM = 1 # and this is the second comment | |
100 | # ... and now a third! | |
101 | STRING = "# This is not a comment." | |
102 | </pre></div> | |
103 | ||
104 | <P> | |
105 | ||
106 | <H1><A NAME="SECTION005100000000000000000"></A><A NAME="calculator"></A> | |
107 | <BR> | |
108 | 3.1 Using Python as a Calculator | |
109 | </H1> | |
110 | ||
111 | <P> | |
112 | Let's try some simple Python commands. Start the interpreter and wait | |
113 | for the primary prompt, "<tt class="samp">><code>></code>> </tt>". (It shouldn't take long.) | |
114 | ||
115 | <P> | |
116 | ||
117 | <H2><A NAME="SECTION005110000000000000000"></A><A NAME="numbers"></A> | |
118 | <BR> | |
119 | 3.1.1 Numbers | |
120 | </H2> | |
121 | ||
122 | <P> | |
123 | The interpreter acts as a simple calculator: you can type an | |
124 | expression at it and it will write the value. Expression syntax is | |
125 | straightforward: the operators <code>+</code>, <code>-</code>, <code>*</code> and | |
126 | <code>/</code> work just like in most other languages (for example, Pascal | |
127 | or C); parentheses can be used for grouping. For example: | |
128 | ||
129 | <P> | |
130 | <div class="verbatim"><pre> | |
131 | >>> 2+2 | |
132 | 4 | |
133 | >>> # This is a comment | |
134 | ... 2+2 | |
135 | 4 | |
136 | >>> 2+2 # and a comment on the same line as code | |
137 | 4 | |
138 | >>> (50-5*6)/4 | |
139 | 5 | |
140 | >>> # Integer division returns the floor: | |
141 | ... 7/3 | |
142 | 2 | |
143 | >>> 7/-3 | |
144 | -3 | |
145 | </pre></div> | |
146 | ||
147 | <P> | |
148 | The equal sign ("<tt class="character">=</tt>") is used to assign a value to a variable. | |
149 | Afterwards, no result is displayed before the next interactive prompt: | |
150 | ||
151 | <P> | |
152 | <div class="verbatim"><pre> | |
153 | >>> width = 20 | |
154 | >>> height = 5*9 | |
155 | >>> width * height | |
156 | 900 | |
157 | </pre></div> | |
158 | ||
159 | <P> | |
160 | A value can be assigned to several variables simultaneously: | |
161 | ||
162 | <P> | |
163 | <div class="verbatim"><pre> | |
164 | >>> x = y = z = 0 # Zero x, y and z | |
165 | >>> x | |
166 | 0 | |
167 | >>> y | |
168 | 0 | |
169 | >>> z | |
170 | 0 | |
171 | </pre></div> | |
172 | ||
173 | <P> | |
174 | There is full support for floating point; operators with mixed type | |
175 | operands convert the integer operand to floating point: | |
176 | ||
177 | <P> | |
178 | <div class="verbatim"><pre> | |
179 | >>> 3 * 3.75 / 1.5 | |
180 | 7.5 | |
181 | >>> 7.0 / 2 | |
182 | 3.5 | |
183 | </pre></div> | |
184 | ||
185 | <P> | |
186 | Complex numbers are also supported; imaginary numbers are written with | |
187 | a suffix of "<tt class="samp">j</tt>" or "<tt class="samp">J</tt>". Complex numbers with a nonzero | |
188 | real component are written as "<tt class="samp">(<var>real</var>+<var>imag</var>j)</tt>", or can | |
189 | be created with the "<tt class="samp">complex(<var>real</var>, <var>imag</var>)</tt>" function. | |
190 | ||
191 | <P> | |
192 | <div class="verbatim"><pre> | |
193 | >>> 1j * 1J | |
194 | (-1+0j) | |
195 | >>> 1j * complex(0,1) | |
196 | (-1+0j) | |
197 | >>> 3+1j*3 | |
198 | (3+3j) | |
199 | >>> (3+1j)*3 | |
200 | (9+3j) | |
201 | >>> (1+2j)/(1+1j) | |
202 | (1.5+0.5j) | |
203 | </pre></div> | |
204 | ||
205 | <P> | |
206 | Complex numbers are always represented as two floating point numbers, | |
207 | the real and imaginary part. To extract these parts from a complex | |
208 | number <var>z</var>, use <code><var>z</var>.real</code> and <code><var>z</var>.imag</code>. | |
209 | ||
210 | <P> | |
211 | <div class="verbatim"><pre> | |
212 | >>> a=1.5+0.5j | |
213 | >>> a.real | |
214 | 1.5 | |
215 | >>> a.imag | |
216 | 0.5 | |
217 | </pre></div> | |
218 | ||
219 | <P> | |
220 | The conversion functions to floating point and integer | |
221 | (<tt class="function">float()</tt>, <tt class="function">int()</tt> and <tt class="function">long()</tt>) don't | |
222 | work for complex numbers -- there is no one correct way to convert a | |
223 | complex number to a real number. Use <code>abs(<var>z</var>)</code> to get its | |
224 | magnitude (as a float) or <code>z.real</code> to get its real part. | |
225 | ||
226 | <P> | |
227 | <div class="verbatim"><pre> | |
228 | >>> a=3.0+4.0j | |
229 | >>> float(a) | |
230 | Traceback (most recent call last): | |
231 | File "<stdin>", line 1, in ? | |
232 | TypeError: can't convert complex to float; use abs(z) | |
233 | >>> a.real | |
234 | 3.0 | |
235 | >>> a.imag | |
236 | 4.0 | |
237 | >>> abs(a) # sqrt(a.real**2 + a.imag**2) | |
238 | 5.0 | |
239 | >>> | |
240 | </pre></div> | |
241 | ||
242 | <P> | |
243 | In interactive mode, the last printed expression is assigned to the | |
244 | variable <code>_</code>. This means that when you are using Python as a | |
245 | desk calculator, it is somewhat easier to continue calculations, for | |
246 | example: | |
247 | ||
248 | <P> | |
249 | <div class="verbatim"><pre> | |
250 | >>> tax = 12.5 / 100 | |
251 | >>> price = 100.50 | |
252 | >>> price * tax | |
253 | 12.5625 | |
254 | >>> price + _ | |
255 | 113.0625 | |
256 | >>> round(_, 2) | |
257 | 113.06 | |
258 | >>> | |
259 | </pre></div> | |
260 | ||
261 | <P> | |
262 | This variable should be treated as read-only by the user. Don't | |
263 | explicitly assign a value to it -- you would create an independent | |
264 | local variable with the same name masking the built-in variable with | |
265 | its magic behavior. | |
266 | ||
267 | <P> | |
268 | ||
269 | <H2><A NAME="SECTION005120000000000000000"></A><A NAME="strings"></A> | |
270 | <BR> | |
271 | 3.1.2 Strings | |
272 | </H2> | |
273 | ||
274 | <P> | |
275 | Besides numbers, Python can also manipulate strings, which can be | |
276 | expressed in several ways. They can be enclosed in single quotes or | |
277 | double quotes: | |
278 | ||
279 | <P> | |
280 | <div class="verbatim"><pre> | |
281 | >>> 'spam eggs' | |
282 | 'spam eggs' | |
283 | >>> 'doesn\'t' | |
284 | "doesn't" | |
285 | >>> "doesn't" | |
286 | "doesn't" | |
287 | >>> '"Yes," he said.' | |
288 | '"Yes," he said.' | |
289 | >>> "\"Yes,\" he said." | |
290 | '"Yes," he said.' | |
291 | >>> '"Isn\'t," she said.' | |
292 | '"Isn\'t," she said.' | |
293 | </pre></div> | |
294 | ||
295 | <P> | |
296 | String literals can span multiple lines in several ways. Continuation | |
297 | lines can be used, with a backslash as the last character on the line | |
298 | indicating that the next line is a logical continuation of the line: | |
299 | ||
300 | <P> | |
301 | <div class="verbatim"><pre> | |
302 | hello = "This is a rather long string containing\n\ | |
303 | several lines of text just as you would do in C.\n\ | |
304 | Note that whitespace at the beginning of the line is\ | |
305 | significant." | |
306 | ||
307 | print hello | |
308 | </pre></div> | |
309 | ||
310 | <P> | |
311 | Note that newlines still need to be embedded in the string using | |
312 | <code>\n</code>; the newline following the trailing backslash is | |
313 | discarded. This example would print the following: | |
314 | ||
315 | <P> | |
316 | <div class="verbatim"><pre> | |
317 | This is a rather long string containing | |
318 | several lines of text just as you would do in C. | |
319 | Note that whitespace at the beginning of the line is significant. | |
320 | </pre></div> | |
321 | ||
322 | <P> | |
323 | If we make the string literal a ``raw'' string, however, the | |
324 | <code>\n</code> sequences are not converted to newlines, but the backslash | |
325 | at the end of the line, and the newline character in the source, are | |
326 | both included in the string as data. Thus, the example: | |
327 | ||
328 | <P> | |
329 | <div class="verbatim"><pre> | |
330 | hello = r"This is a rather long string containing\n\ | |
331 | several lines of text much as you would do in C." | |
332 | ||
333 | print hello | |
334 | </pre></div> | |
335 | ||
336 | <P> | |
337 | would print: | |
338 | ||
339 | <P> | |
340 | <div class="verbatim"><pre> | |
341 | This is a rather long string containing\n\ | |
342 | several lines of text much as you would do in C. | |
343 | </pre></div> | |
344 | ||
345 | <P> | |
346 | Or, strings can be surrounded in a pair of matching triple-quotes: | |
347 | <code>"""</code> or <code>'<code>'</code>'</code>. End of lines do not need to be escaped | |
348 | when using triple-quotes, but they will be included in the string. | |
349 | ||
350 | <P> | |
351 | <div class="verbatim"><pre> | |
352 | print """ | |
353 | Usage: thingy [OPTIONS] | |
354 | -h Display this usage message | |
355 | -H hostname Hostname to connect to | |
356 | """ | |
357 | </pre></div> | |
358 | ||
359 | <P> | |
360 | produces the following output: | |
361 | ||
362 | <P> | |
363 | <div class="verbatim"><pre> | |
364 | Usage: thingy [OPTIONS] | |
365 | -h Display this usage message | |
366 | -H hostname Hostname to connect to | |
367 | </pre></div> | |
368 | ||
369 | <P> | |
370 | The interpreter prints the result of string operations in the same way | |
371 | as they are typed for input: inside quotes, and with quotes and other | |
372 | funny characters escaped by backslashes, to show the precise | |
373 | value. The string is enclosed in double quotes if the string contains | |
374 | a single quote and no double quotes, else it's enclosed in single | |
375 | quotes. (The <tt class="keyword">print</tt> statement, described later, can be used | |
376 | to write strings without quotes or escapes.) | |
377 | ||
378 | <P> | |
379 | Strings can be concatenated (glued together) with the | |
380 | <code>+</code> operator, and repeated with <code>*</code>: | |
381 | ||
382 | <P> | |
383 | <div class="verbatim"><pre> | |
384 | >>> word = 'Help' + 'A' | |
385 | >>> word | |
386 | 'HelpA' | |
387 | >>> '<' + word*5 + '>' | |
388 | '<HelpAHelpAHelpAHelpAHelpA>' | |
389 | </pre></div> | |
390 | ||
391 | <P> | |
392 | Two string literals next to each other are automatically concatenated; | |
393 | the first line above could also have been written "<tt class="samp">word = 'Help' | |
394 | 'A'</tt>"; this only works with two literals, not with arbitrary string | |
395 | expressions: | |
396 | ||
397 | <P> | |
398 | <div class="verbatim"><pre> | |
399 | >>> 'str' 'ing' # <- This is ok | |
400 | 'string' | |
401 | >>> 'str'.strip() + 'ing' # <- This is ok | |
402 | 'string' | |
403 | >>> 'str'.strip() 'ing' # <- This is invalid | |
404 | File "<stdin>", line 1, in ? | |
405 | 'str'.strip() 'ing' | |
406 | ^ | |
407 | SyntaxError: invalid syntax | |
408 | </pre></div> | |
409 | ||
410 | <P> | |
411 | Strings can be subscripted (indexed); like in C, the first character | |
412 | of a string has subscript (index) 0. There is no separate character | |
413 | type; a character is simply a string of size one. Like in Icon, | |
414 | substrings can be specified with the <em>slice notation</em>: two indices | |
415 | separated by a colon. | |
416 | ||
417 | <P> | |
418 | <div class="verbatim"><pre> | |
419 | >>> word[4] | |
420 | 'A' | |
421 | >>> word[0:2] | |
422 | 'He' | |
423 | >>> word[2:4] | |
424 | 'lp' | |
425 | </pre></div> | |
426 | ||
427 | <P> | |
428 | Slice indices have useful defaults; an omitted first index defaults to | |
429 | zero, an omitted second index defaults to the size of the string being | |
430 | sliced. | |
431 | ||
432 | <P> | |
433 | <div class="verbatim"><pre> | |
434 | >>> word[:2] # The first two characters | |
435 | 'He' | |
436 | >>> word[2:] # Everything except the first two characters | |
437 | 'lpA' | |
438 | </pre></div> | |
439 | ||
440 | <P> | |
441 | Unlike a C string, Python strings cannot be changed. Assigning to an | |
442 | indexed position in the string results in an error: | |
443 | ||
444 | <P> | |
445 | <div class="verbatim"><pre> | |
446 | >>> word[0] = 'x' | |
447 | Traceback (most recent call last): | |
448 | File "<stdin>", line 1, in ? | |
449 | TypeError: object doesn't support item assignment | |
450 | >>> word[:1] = 'Splat' | |
451 | Traceback (most recent call last): | |
452 | File "<stdin>", line 1, in ? | |
453 | TypeError: object doesn't support slice assignment | |
454 | </pre></div> | |
455 | ||
456 | <P> | |
457 | However, creating a new string with the combined content is easy and | |
458 | efficient: | |
459 | ||
460 | <P> | |
461 | <div class="verbatim"><pre> | |
462 | >>> 'x' + word[1:] | |
463 | 'xelpA' | |
464 | >>> 'Splat' + word[4] | |
465 | 'SplatA' | |
466 | </pre></div> | |
467 | ||
468 | <P> | |
469 | Here's a useful invariant of slice operations: | |
470 | <code>s[:i] + s[i:]</code> equals <code>s</code>. | |
471 | ||
472 | <P> | |
473 | <div class="verbatim"><pre> | |
474 | >>> word[:2] + word[2:] | |
475 | 'HelpA' | |
476 | >>> word[:3] + word[3:] | |
477 | 'HelpA' | |
478 | </pre></div> | |
479 | ||
480 | <P> | |
481 | Degenerate slice indices are handled gracefully: an index that is too | |
482 | large is replaced by the string size, an upper bound smaller than the | |
483 | lower bound returns an empty string. | |
484 | ||
485 | <P> | |
486 | <div class="verbatim"><pre> | |
487 | >>> word[1:100] | |
488 | 'elpA' | |
489 | >>> word[10:] | |
490 | '' | |
491 | >>> word[2:1] | |
492 | '' | |
493 | </pre></div> | |
494 | ||
495 | <P> | |
496 | Indices may be negative numbers, to start counting from the right. | |
497 | For example: | |
498 | ||
499 | <P> | |
500 | <div class="verbatim"><pre> | |
501 | >>> word[-1] # The last character | |
502 | 'A' | |
503 | >>> word[-2] # The last-but-one character | |
504 | 'p' | |
505 | >>> word[-2:] # The last two characters | |
506 | 'pA' | |
507 | >>> word[:-2] # Everything except the last two characters | |
508 | 'Hel' | |
509 | </pre></div> | |
510 | ||
511 | <P> | |
512 | But note that -0 is really the same as 0, so it does not count from | |
513 | the right! | |
514 | ||
515 | <P> | |
516 | <div class="verbatim"><pre> | |
517 | >>> word[-0] # (since -0 equals 0) | |
518 | 'H' | |
519 | </pre></div> | |
520 | ||
521 | <P> | |
522 | Out-of-range negative slice indices are truncated, but don't try this | |
523 | for single-element (non-slice) indices: | |
524 | ||
525 | <P> | |
526 | <div class="verbatim"><pre> | |
527 | >>> word[-100:] | |
528 | 'HelpA' | |
529 | >>> word[-10] # error | |
530 | Traceback (most recent call last): | |
531 | File "<stdin>", line 1, in ? | |
532 | IndexError: string index out of range | |
533 | </pre></div> | |
534 | ||
535 | <P> | |
536 | The best way to remember how slices work is to think of the indices as | |
537 | pointing <em>between</em> characters, with the left edge of the first | |
538 | character numbered 0. Then the right edge of the last character of a | |
539 | string of <var>n</var> characters has index <var>n</var>, for example: | |
540 | ||
541 | <P> | |
542 | <div class="verbatim"><pre> | |
543 | +---+---+---+---+---+ | |
544 | | H | e | l | p | A | | |
545 | +---+---+---+---+---+ | |
546 | 0 1 2 3 4 5 | |
547 | -5 -4 -3 -2 -1 | |
548 | </pre></div> | |
549 | ||
550 | <P> | |
551 | The first row of numbers gives the position of the indices 0...5 in | |
552 | the string; the second row gives the corresponding negative indices. | |
553 | The slice from <var>i</var> to <var>j</var> consists of all characters between | |
554 | the edges labeled <var>i</var> and <var>j</var>, respectively. | |
555 | ||
556 | <P> | |
557 | For non-negative indices, the length of a slice is the difference of | |
558 | the indices, if both are within bounds. For example, the length of | |
559 | <code>word[1:3]</code> is 2. | |
560 | ||
561 | <P> | |
562 | The built-in function <tt class="function">len()</tt> returns the length of a string: | |
563 | ||
564 | <P> | |
565 | <div class="verbatim"><pre> | |
566 | >>> s = 'supercalifragilisticexpialidocious' | |
567 | >>> len(s) | |
568 | 34 | |
569 | </pre></div> | |
570 | ||
571 | <P> | |
572 | <div class="seealso"> | |
573 | <p class="heading">See Also:</p> | |
574 | ||
575 | <dl compact="compact" class="seetitle"> | |
576 | <dt><em class="citetitle"><a href="../lib/typesseq.html" | |
577 | >Sequence Types</a></em></dt> | |
578 | <dd>Strings, and the Unicode strings described in the next | |
579 | section, are examples of <em>sequence types</em>, and | |
580 | support the common operations supported by such types.</dd> | |
581 | </dl> | |
582 | <dl compact="compact" class="seetitle"> | |
583 | <dt><em class="citetitle"><a href="../lib/string-methods.html" | |
584 | >String Methods</a></em></dt> | |
585 | <dd>Both strings and Unicode strings support a large number of | |
586 | methods for basic transformations and searching.</dd> | |
587 | </dl> | |
588 | <dl compact="compact" class="seetitle"> | |
589 | <dt><em class="citetitle"><a href="../lib/typesseq-strings.html" | |
590 | >String Formatting Operations</a></em></dt> | |
591 | <dd>The formatting operations invoked when strings and Unicode | |
592 | strings are the left operand of the <code>%</code> operator are | |
593 | described in more detail here.</dd> | |
594 | </dl> | |
595 | </div> | |
596 | ||
597 | <P> | |
598 | ||
599 | <H2><A NAME="SECTION005130000000000000000"></A><A NAME="unicodeStrings"></A> | |
600 | <BR> | |
601 | 3.1.3 Unicode Strings | |
602 | </H2> | |
603 | ||
604 | <P> | |
605 | Starting with Python 2.0 a new data type for storing text data is | |
606 | available to the programmer: the Unicode object. It can be used to | |
607 | store and manipulate Unicode data (see <a class="url" href="http://www.unicode.org/">http://www.unicode.org/</a>) | |
608 | and integrates well with the existing string objects, providing | |
609 | auto-conversions where necessary. | |
610 | ||
611 | <P> | |
612 | Unicode has the advantage of providing one ordinal for every character | |
613 | in every script used in modern and ancient texts. Previously, there | |
614 | were only 256 possible ordinals for script characters and texts were | |
615 | typically bound to a code page which mapped the ordinals to script | |
616 | characters. This lead to very much confusion especially with respect | |
617 | to internationalization (usually written as "<tt class="samp">i18n</tt>" -- | |
618 | "<tt class="character">i</tt>" + 18 characters + "<tt class="character">n</tt>") of software. Unicode | |
619 | solves these problems by defining one code page for all scripts. | |
620 | ||
621 | <P> | |
622 | Creating Unicode strings in Python is just as simple as creating | |
623 | normal strings: | |
624 | ||
625 | <P> | |
626 | <div class="verbatim"><pre> | |
627 | >>> u'Hello World !' | |
628 | u'Hello World !' | |
629 | </pre></div> | |
630 | ||
631 | <P> | |
632 | The small "<tt class="character">u</tt>" in front of the quote indicates that an | |
633 | Unicode string is supposed to be created. If you want to include | |
634 | special characters in the string, you can do so by using the Python | |
635 | <em>Unicode-Escape</em> encoding. The following example shows how: | |
636 | ||
637 | <P> | |
638 | <div class="verbatim"><pre> | |
639 | >>> u'Hello\u0020World !' | |
640 | u'Hello World !' | |
641 | </pre></div> | |
642 | ||
643 | <P> | |
644 | The escape sequence <code>\u0020</code> indicates to insert the Unicode | |
645 | character with the ordinal value 0x0020 (the space character) at the | |
646 | given position. | |
647 | ||
648 | <P> | |
649 | Other characters are interpreted by using their respective ordinal | |
650 | values directly as Unicode ordinals. If you have literal strings | |
651 | in the standard Latin-1 encoding that is used in many Western countries, | |
652 | you will find it convenient that the lower 256 characters | |
653 | of Unicode are the same as the 256 characters of Latin-1. | |
654 | ||
655 | <P> | |
656 | For experts, there is also a raw mode just like the one for normal | |
657 | strings. You have to prefix the opening quote with 'ur' to have | |
658 | Python use the <em>Raw-Unicode-Escape</em> encoding. It will only apply | |
659 | the above <code>\uXXXX</code> conversion if there is an uneven number of | |
660 | backslashes in front of the small 'u'. | |
661 | ||
662 | <P> | |
663 | <div class="verbatim"><pre> | |
664 | >>> ur'Hello\u0020World !' | |
665 | u'Hello World !' | |
666 | >>> ur'Hello\\u0020World !' | |
667 | u'Hello\\\\u0020World !' | |
668 | </pre></div> | |
669 | ||
670 | <P> | |
671 | The raw mode is most useful when you have to enter lots of | |
672 | backslashes, as can be necessary in regular expressions. | |
673 | ||
674 | <P> | |
675 | Apart from these standard encodings, Python provides a whole set of | |
676 | other ways of creating Unicode strings on the basis of a known | |
677 | encoding. | |
678 | ||
679 | <P> | |
680 | The built-in function <tt class="function">unicode()</tt><a id='l2h-3' xml:id='l2h-3'></a> provides | |
681 | access to all registered Unicode codecs (COders and DECoders). Some of | |
682 | the more well known encodings which these codecs can convert are | |
683 | <em>Latin-1</em>, <em>ASCII</em>, <em>UTF-8</em>, and <em>UTF-16</em>. | |
684 | The latter two are variable-length encodings that store each Unicode | |
685 | character in one or more bytes. The default encoding is | |
686 | normally set to ASCII, which passes through characters in the range | |
687 | 0 to 127 and rejects any other characters with an error. | |
688 | When a Unicode string is printed, written to a file, or converted | |
689 | with <tt class="function">str()</tt>, conversion takes place using this default encoding. | |
690 | ||
691 | <P> | |
692 | <div class="verbatim"><pre> | |
693 | >>> u"abc" | |
694 | u'abc' | |
695 | >>> str(u"abc") | |
696 | 'abc' | |
697 |