Initial commit of OpenSPARC T2 architecture model.
[OpenSPARC-T2-SAM] / sam-t2 / devtools / amd64 / html / python / lib / re-syntax.html
CommitLineData
920dae64
AT
1<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
2<html>
3<head>
4<link rel="STYLESHEET" href="lib.css" type='text/css' />
5<link rel="SHORTCUT ICON" href="../icons/pyfav.png" type="image/png" />
6<link rel='start' href='../index.html' title='Python Documentation Index' />
7<link rel="first" href="lib.html" title='Python Library Reference' />
8<link rel='contents' href='contents.html' title="Contents" />
9<link rel='index' href='genindex.html' title='Index' />
10<link rel='last' href='about.html' title='About this document...' />
11<link rel='help' href='about.html' title='About this document...' />
12<link rel="next" href="matching-searching.html" />
13<link rel="prev" href="module-re.html" />
14<link rel="parent" href="module-re.html" />
15<link rel="next" href="matching-searching.html" />
16<meta name='aesop' content='information' />
17<title>4.2.1 Regular Expression Syntax </title>
18</head>
19<body>
20<DIV CLASS="navigation">
21<div id='top-navigation-panel' xml:id='top-navigation-panel'>
22<table align="center" width="100%" cellpadding="0" cellspacing="2">
23<tr>
24<td class='online-navigation'><a rel="prev" title="4.2 re "
25 href="module-re.html"><img src='../icons/previous.png'
26 border='0' height='32' alt='Previous Page' width='32' /></A></td>
27<td class='online-navigation'><a rel="parent" title="4.2 re "
28 href="module-re.html"><img src='../icons/up.png'
29 border='0' height='32' alt='Up One Level' width='32' /></A></td>
30<td class='online-navigation'><a rel="next" title="4.2.2 Matching vs Searching"
31 href="matching-searching.html"><img src='../icons/next.png'
32 border='0' height='32' alt='Next Page' width='32' /></A></td>
33<td align="center" width="100%">Python Library Reference</td>
34<td class='online-navigation'><a rel="contents" title="Table of Contents"
35 href="contents.html"><img src='../icons/contents.png'
36 border='0' height='32' alt='Contents' width='32' /></A></td>
37<td class='online-navigation'><a href="modindex.html" title="Module Index"><img src='../icons/modules.png'
38 border='0' height='32' alt='Module Index' width='32' /></a></td>
39<td class='online-navigation'><a rel="index" title="Index"
40 href="genindex.html"><img src='../icons/index.png'
41 border='0' height='32' alt='Index' width='32' /></A></td>
42</tr></table>
43<div class='online-navigation'>
44<b class="navlabel">Previous:</b>
45<a class="sectref" rel="prev" href="module-re.html">4.2 re </A>
46<b class="navlabel">Up:</b>
47<a class="sectref" rel="parent" href="module-re.html">4.2 re </A>
48<b class="navlabel">Next:</b>
49<a class="sectref" rel="next" href="matching-searching.html">4.2.2 Matching vs Searching</A>
50</div>
51<hr /></div>
52</DIV>
53<!--End of Navigation Panel-->
54
55<H2><A NAME="SECTION006210000000000000000"></A><A NAME="re-syntax"></A>
56<BR>
574.2.1 Regular Expression Syntax
58</H2>
59
60<P>
61A regular expression (or RE) specifies a set of strings that matches
62it; the functions in this module let you check if a particular string
63matches a given regular expression (or if a given regular expression
64matches a particular string, which comes down to the same thing).
65
66<P>
67Regular expressions can be concatenated to form new regular
68expressions; if <em>A</em> and <em>B</em> are both regular expressions,
69then <em>AB</em> is also a regular expression. In general, if a string
70<em>p</em> matches <em>A</em> and another string <em>q</em> matches <em>B</em>,
71the string <em>pq</em> will match AB. This holds unless <em>A</em> or
72<em>B</em> contain low precedence operations; boundary conditions between
73<em>A</em> and <em>B</em>; or have numbered group references. Thus, complex
74expressions can easily be constructed from simpler primitive
75expressions like the ones described here. For details of the theory
76and implementation of regular expressions, consult the Friedl book
77referenced above, or almost any textbook about compiler construction.
78
79<P>
80A brief explanation of the format of regular expressions follows. For
81further information and a gentler presentation, consult the Regular
82Expression HOWTO, accessible from <a class="url" href="http://www.python.org/doc/howto/">http://www.python.org/doc/howto/</a>.
83
84<P>
85Regular expressions can contain both special and ordinary characters.
86Most ordinary characters, like "<tt class="character">A</tt>", "<tt class="character">a</tt>", or
87"<tt class="character">0</tt>", are the simplest regular expressions; they simply match
88themselves. You can concatenate ordinary characters, so <tt class="regexp">last</tt>
89matches the string <code>'last'</code>. (In the rest of this section, we'll
90write RE's in <tt class="regexp">this special style</tt>, usually without quotes, and
91strings to be matched <code>'in single quotes'</code>.)
92
93<P>
94Some characters, like "<tt class="character">|</tt>" or "<tt class="character">(</tt>", are special.
95Special characters either stand for classes of ordinary characters, or
96affect how the regular expressions around them are interpreted.
97
98<P>
99The special characters are:
100<DL>
101<DT><STRONG>"<tt class="character">.</tt>"</STRONG></DT>
102<DD>(Dot.) In the default mode, this matches any
103character except a newline. If the <tt class="constant">DOTALL</tt> flag has been
104specified, this matches any character including a newline.
105
106<P>
107</DD>
108<DT><STRONG>"<tt class="character">^</tt>"</STRONG></DT>
109<DD>(Caret.) Matches the start of the
110string, and in <tt class="constant">MULTILINE</tt> mode also matches immediately
111after each newline.
112
113<P>
114</DD>
115<DT><STRONG>"<tt class="character">$</tt>"</STRONG></DT>
116<DD>Matches the end of the string or just before the
117newline at the end of the string, and in <tt class="constant">MULTILINE</tt> mode
118also matches before a newline. <tt class="regexp">foo</tt> matches both 'foo' and
119'foobar', while the regular expression <tt class="regexp">foo$</tt> matches only
120'foo'. More interestingly, searching for <tt class="regexp">foo.$</tt> in
121'foo1&#92;nfoo2&#92;n' matches 'foo2' normally,
122but 'foo1' in <tt class="constant">MULTILINE</tt> mode.
123
124<P>
125</DD>
126<DT><STRONG>"<tt class="character">*</tt>"</STRONG></DT>
127<DD>Causes the resulting RE to
128match 0 or more repetitions of the preceding RE, as many repetitions
129as are possible. <tt class="regexp">ab*</tt> will
130match 'a', 'ab', or 'a' followed by any number of 'b's.
131
132<P>
133</DD>
134<DT><STRONG>"<tt class="character">+</tt>"</STRONG></DT>
135<DD>Causes the
136resulting RE to match 1 or more repetitions of the preceding RE.
137<tt class="regexp">ab+</tt> will match 'a' followed by any non-zero number of 'b's; it
138will not match just 'a'.
139
140<P>
141</DD>
142<DT><STRONG>"<tt class="character">?</tt>"</STRONG></DT>
143<DD>Causes the resulting RE to
144match 0 or 1 repetitions of the preceding RE. <tt class="regexp">ab?</tt> will
145match either 'a' or 'ab'.
146
147<P>
148</DD>
149<DT><STRONG><code>*?</code>, <code>+?</code>, <code>??</code></STRONG></DT>
150<DD>The "<tt class="character">*</tt>",
151"<tt class="character">+</tt>", and "<tt class="character">?</tt>" qualifiers are all <i class="dfn">greedy</i>; they
152match as much text as possible. Sometimes this behaviour isn't
153desired; if the RE <tt class="regexp">&lt;.*&gt;</tt> is matched against
154<code>'&lt;H1&gt;title&lt;/H1&gt;'</code>, it will match the entire string, and not just
155<code>'&lt;H1&gt;'</code>. Adding "<tt class="character">?</tt>" after the qualifier makes it
156perform the match in <i class="dfn">non-greedy</i> or <i class="dfn">minimal</i> fashion; as
157<em>few</em> characters as possible will be matched. Using <tt class="regexp">.*?</tt>
158in the previous expression will match only <code>'&lt;H1&gt;'</code>.
159
160<P>
161</DD>
162<DT><STRONG><code>{<var>m</var>}</code></STRONG></DT>
163<DD>Specifies that exactly <var>m</var> copies of the previous RE should be
164matched; fewer matches cause the entire RE not to match. For example,
165<tt class="regexp">a{6}</tt> will match exactly six "<tt class="character">a</tt>" characters, but
166not five.
167
168<P>
169</DD>
170<DT><STRONG><code>{<var>m</var>,<var>n</var>}</code></STRONG></DT>
171<DD>Causes the resulting RE to match from
172<var>m</var> to <var>n</var> repetitions of the preceding RE, attempting to
173match as many repetitions as possible. For example, <tt class="regexp">a{3,5}</tt>
174will match from 3 to 5 "<tt class="character">a</tt>" characters. Omitting <var>m</var>
175specifies a lower bound of zero,
176and omitting <var>n</var> specifies an infinite upper bound. As an
177example, <tt class="regexp">a{4,}b</tt> will match <code>aaaab</code> or a thousand
178"<tt class="character">a</tt>" characters followed by a <code>b</code>, but not <code>aaab</code>.
179The comma may not be omitted or the modifier would be confused with
180the previously described form.
181
182<P>
183</DD>
184<DT><STRONG><code>{<var>m</var>,<var>n</var>}?</code></STRONG></DT>
185<DD>Causes the resulting RE to
186match from <var>m</var> to <var>n</var> repetitions of the preceding RE,
187attempting to match as <em>few</em> repetitions as possible. This is
188the non-greedy version of the previous qualifier. For example, on the
1896-character string <code>'aaaaaa'</code>, <tt class="regexp">a{3,5}</tt> will match 5
190"<tt class="character">a</tt>" characters, while <tt class="regexp">a{3,5}?</tt> will only match 3
191characters.
192
193<P>
194</DD>
195<DT><STRONG>"<tt class="character">&#92;</tt>"</STRONG></DT>
196<DD>Either escapes special characters (permitting
197you to match characters like "<tt class="character">*</tt>", "<tt class="character">?</tt>", and so
198forth), or signals a special sequence; special sequences are discussed
199below.
200
201<P>
202If you're not using a raw string to
203express the pattern, remember that Python also uses the
204backslash as an escape sequence in string literals; if the escape
205sequence isn't recognized by Python's parser, the backslash and
206subsequent character are included in the resulting string. However,
207if Python would recognize the resulting sequence, the backslash should
208be repeated twice. This is complicated and hard to understand, so
209it's highly recommended that you use raw strings for all but the
210simplest expressions.
211
212<P>
213</DD>
214<DT><STRONG><code>[]</code></STRONG></DT>
215<DD>Used to indicate a set of characters. Characters can
216be listed individually, or a range of characters can be indicated by
217giving two characters and separating them by a "<tt class="character">-</tt>". Special
218characters are not active inside sets. For example, <tt class="regexp">[akm$]</tt>
219will match any of the characters "<tt class="character">a</tt>", "<tt class="character">k</tt>",
220"<tt class="character">m</tt>", or "<tt class="character">$</tt>"; <tt class="regexp">[a-z]</tt>
221will match any lowercase letter, and <code>[a-zA-Z0-9]</code> matches any
222letter or digit. Character classes such as <code>&#92;w</code> or <code>&#92;S</code>
223(defined below) are also acceptable inside a range. If you want to
224include a "<tt class="character">]</tt>" or a "<tt class="character">-</tt>" inside a set, precede it with a
225backslash, or place it as the first character. The
226pattern <tt class="regexp">[]]</tt> will match <code>']'</code>, for example.
227
228<P>
229You can match the characters not within a range by <i class="dfn">complementing</i>
230the set. This is indicated by including a
231"<tt class="character">^</tt>" as the first character of the set;
232"<tt class="character">^</tt>" elsewhere will simply match the
233"<tt class="character">^</tt>" character. For example,
234<tt class="regexp">[^5]</tt> will match
235any character except "<tt class="character">5</tt>", and
236<tt class="regexp">[^<code>^</code>]</tt> will match any character
237except "<tt class="character">^</tt>".
238
239<P>
240</DD>
241<DT><STRONG>"<tt class="character">|</tt>"</STRONG></DT>
242<DD><code>A|B</code>, where A and B can be arbitrary REs,
243creates a regular expression that will match either A or B. An
244arbitrary number of REs can be separated by the "<tt class="character">|</tt>" in this
245way. This can be used inside groups (see below) as well. As the target
246string is scanned, REs separated by "<tt class="character">|</tt>" are tried from left to
247right. When one pattern completely matches, that branch is accepted.
248This means that once <code>A</code> matches, <code>B</code> will not be tested further,
249even if it would produce a longer overall match. In other words, the
250"<tt class="character">|</tt>" operator is never greedy. To match a literal "<tt class="character">|</tt>",
251use <tt class="regexp">&#92;|</tt>, or enclose it inside a character class, as in <tt class="regexp">[|]</tt>.
252
253<P>
254</DD>
255<DT><STRONG><code>(...)</code></STRONG></DT>
256<DD>Matches whatever regular expression is inside the
257parentheses, and indicates the start and end of a group; the contents
258of a group can be retrieved after a match has been performed, and can
259be matched later in the string with the <tt class="regexp">&#92;<var>number</var></tt> special
260sequence, described below. To match the literals "<tt class="character">(</tt>" or
261"<tt class="character">)</tt>", use <tt class="regexp">&#92;(</tt> or <tt class="regexp">&#92;)</tt>, or enclose them
262inside a character class: <tt class="regexp">[(] [)]</tt>.
263
264<P>
265</DD>
266<DT><STRONG><code>(?...)</code></STRONG></DT>
267<DD>This is an extension notation (a "<tt class="character">?</tt>"
268following a "<tt class="character">(</tt>" is not meaningful otherwise). The first
269character after the "<tt class="character">?</tt>"
270determines what the meaning and further syntax of the construct is.
271Extensions usually do not create a new group;
272<tt class="regexp">(?P&lt;<var>name</var>&gt;...)</tt> is the only exception to this rule.
273Following are the currently supported extensions.
274
275<P>
276</DD>
277<DT><STRONG><code>(?iLmsux)</code></STRONG></DT>
278<DD>(One or more letters from the set "<tt class="character">i</tt>",
279"<tt class="character">L</tt>", "<tt class="character">m</tt>", "<tt class="character">s</tt>", "<tt class="character">u</tt>",
280"<tt class="character">x</tt>".) The group matches the empty string; the letters set
281the corresponding flags (<tt class="constant">re.I</tt>, <tt class="constant">re.L</tt>,
282<tt class="constant">re.M</tt>, <tt class="constant">re.S</tt>, <tt class="constant">re.U</tt>, <tt class="constant">re.X</tt>)
283for the entire regular expression. This is useful if you wish to
284include the flags as part of the regular expression, instead of
285passing a <var>flag</var> argument to the <tt class="function">compile()</tt> function.
286
287<P>
288Note that the <tt class="regexp">(?x)</tt> flag changes how the expression is parsed.
289It should be used first in the expression string, or after one or more
290whitespace characters. If there are non-whitespace characters before
291the flag, the results are undefined.
292
293<P>
294</DD>
295<DT><STRONG><code>(?:...)</code></STRONG></DT>
296<DD>A non-grouping version of regular parentheses.
297Matches whatever regular expression is inside the parentheses, but the
298substring matched by the
299group <em>cannot</em> be retrieved after performing a match or
300referenced later in the pattern.
301
302<P>
303</DD>
304<DT><STRONG><code>(?P&lt;<var>name</var>&gt;...)</code></STRONG></DT>
305<DD>Similar to regular parentheses, but
306the substring matched by the group is accessible via the symbolic group
307name <var>name</var>. Group names must be valid Python identifiers, and
308each group name must be defined only once within a regular expression. A
309symbolic group is also a numbered group, just as if the group were not
310named. So the group named 'id' in the example above can also be
311referenced as the numbered group 1.
312
313<P>
314For example, if the pattern is
315<tt class="regexp">(?P&lt;id&gt;[a-zA-Z_]&#92;w*)</tt>, the group can be referenced by its
316name in arguments to methods of match objects, such as
317<code>m.group('id')</code> or <code>m.end('id')</code>, and also by name in
318pattern text (for example, <tt class="regexp">(?P=id)</tt>) and replacement text
319(such as <code>&#92;g&lt;id&gt;</code>).
320
321<P>
322</DD>
323<DT><STRONG><code>(?P=<var>name</var>)</code></STRONG></DT>
324<DD>Matches whatever text was matched by the
325earlier group named <var>name</var>.
326
327<P>
328</DD>
329<DT><STRONG><code>(?#...)</code></STRONG></DT>
330<DD>A comment; the contents of the parentheses are
331simply ignored.
332
333<P>
334</DD>
335<DT><STRONG><code>(?=...)</code></STRONG></DT>
336<DD>Matches if <tt class="regexp">...</tt> matches next, but doesn't
337consume any of the string. This is called a lookahead assertion. For
338example, <tt class="regexp">Isaac (?=Asimov)</tt> will match <code>'Isaac&nbsp;'</code> only if it's
339followed by <code>'Asimov'</code>.
340
341<P>
342</DD>
343<DT><STRONG><code>(?!...)</code></STRONG></DT>
344<DD>Matches if <tt class="regexp">...</tt> doesn't match next. This
345is a negative lookahead assertion. For example,
346<tt class="regexp">Isaac (?!Asimov)</tt> will match <code>'Isaac&nbsp;'</code> only if it's <em>not</em>
347followed by <code>'Asimov'</code>.
348
349<P>
350</DD>
351<DT><STRONG><code>(?&lt;=...)</code></STRONG></DT>
352<DD>Matches if the current position in the string
353is preceded by a match for <tt class="regexp">...</tt> that ends at the current
354position. This is called a <i class="dfn">positive lookbehind assertion</i>.
355<tt class="regexp">(?&lt;=abc)def</tt> will find a match in "<tt class="samp">abcdef</tt>", since the
356lookbehind will back up 3 characters and check if the contained
357pattern matches. The contained pattern must only match strings of
358some fixed length, meaning that <tt class="regexp">abc</tt> or <tt class="regexp">a|b</tt> are
359allowed, but <tt class="regexp">a*</tt> and <tt class="regexp">a{3,4}</tt> are not. Note that
360patterns which start with positive lookbehind assertions will never
361match at the beginning of the string being searched; you will most
362likely want to use the <tt class="function">search()</tt> function rather than the
363<tt class="function">match()</tt> function:
364
365<P>
366<div class="verbatim"><pre>
367&gt;&gt;&gt; import re
368&gt;&gt;&gt; m = re.search('(?&lt;=abc)def', 'abcdef')
369&gt;&gt;&gt; m.group(0)
370'def'
371</pre></div>
372
373<P>
374This example looks for a word following a hyphen:
375
376<P>
377<div class="verbatim"><pre>
378&gt;&gt;&gt; m = re.search('(?&lt;=-)\w+', 'spam-egg')
379&gt;&gt;&gt; m.group(0)
380'egg'
381</pre></div>
382
383<P>
384</DD>
385<DT><STRONG><code>(?&lt;!...)</code></STRONG></DT>
386<DD>Matches if the current position in the string
387is not preceded by a match for <tt class="regexp">...</tt>. This is called a
388<i class="dfn">negative lookbehind assertion</i>. Similar to positive lookbehind
389assertions, the contained pattern must only match strings of some
390fixed length. Patterns which start with negative lookbehind
391assertions may match at the beginning of the string being searched.
392
393<P>
394</DD>
395<DT><STRONG><code>(?(<var>id/name</var>)yes-pattern|no-pattern)</code></STRONG></DT>
396<DD>Will try to match
397with <tt class="regexp">yes-pattern</tt> if the group with given <var>id</var> or <var>name</var>
398exists, and with <tt class="regexp">no-pattern</tt> if it doesn't. <tt class="regexp">|no-pattern</tt>
399is optional and can be omitted. For example,
400<tt class="regexp">(&lt;)?(&#92;w+@&#92;w+(?:&#92;.&#92;w+)+)(?(1)&gt;)</tt> is a poor email matching
401pattern, which will match with <code>'&lt;user@host.com&gt;'</code> as well as
402<code>'user@host.com'</code>, but not with <code>'&lt;user@host.com'</code>.
403
404<span class="versionnote">New in version 2.4.</span>
405
406<P>
407</DD>
408</DL>
409
410<P>
411The special sequences consist of "<tt class="character">&#92;</tt>" and a character from the
412list below. If the ordinary character is not on the list, then the
413resulting RE will match the second character. For example,
414<tt class="regexp">&#92;$</tt> matches the character "<tt class="character">$</tt>".
415<DL>
416<DT><STRONG><code>&#92;<var>number</var></code></STRONG></DT>
417<DD>Matches the contents of the group of the
418same number. Groups are numbered starting from 1. For example,
419<tt class="regexp">(.+) &#92;1</tt> matches <code>'the the'</code> or <code>'55 55'</code>, but not
420<code>'the end'</code> (note
421the space after the group). This special sequence can only be used to
422match one of the first 99 groups. If the first digit of <var>number</var>
423is 0, or <var>number</var> is 3 octal digits long, it will not be interpreted
424as a group match, but as the character with octal value <var>number</var>.
425Inside the "<tt class="character">[</tt>" and "<tt class="character">]</tt>" of a character class, all numeric
426escapes are treated as characters.
427
428<P>
429</DD>
430<DT><STRONG><code>&#92;A</code></STRONG></DT>
431<DD>Matches only at the start of the string.
432
433<P>
434</DD>
435<DT><STRONG><code>&#92;b</code></STRONG></DT>
436<DD>Matches the empty string, but only at the
437beginning or end of a word. A word is defined as a sequence of
438alphanumeric or underscore characters, so the end of a word is indicated by
439whitespace or a non-alphanumeric, non-underscore character. Note that
440<code>&#92;b</code> is defined as the boundary between <code>&#92;w</code> and <code>&#92;
441W</code>, so the precise set of characters deemed to be alphanumeric depends on the
442values of the <code>UNICODE</code> and <code>LOCALE</code> flags. Inside a character
443range, <tt class="regexp">&#92;b</tt> represents the backspace character, for compatibility
444with Python's string literals.
445
446<P>
447</DD>
448<DT><STRONG><code>&#92;B</code></STRONG></DT>
449<DD>Matches the empty string, but only when it is <em>not</em>
450at the beginning or end of a word. This is just the opposite of <code>&#92;
451b</code>, so is also subject to the settings of <code>LOCALE</code> and <code>UNICODE</code>.
452
453<P>
454</DD>
455<DT><STRONG><code>&#92;d</code></STRONG></DT>
456<DD>When the <tt class="constant">UNICODE</tt> flag is not specified, matches
457any decimal digit; this is equivalent to the set <tt class="regexp">[0-9]</tt>.
458With <tt class="constant">UNICODE</tt>, it will match whatever is classified as a digit
459in the Unicode character properties database.
460
461<P>
462</DD>
463<DT><STRONG><code>&#92;D</code></STRONG></DT>
464<DD>When the <tt class="constant">UNICODE</tt> flag is not specified, matches
465any non-digit character; this is equivalent to the set
466<tt class="regexp">[^0-9]</tt>. With <tt class="constant">UNICODE</tt>, it will match
467anything other than character marked as digits in the Unicode character
468properties database.
469
470<P>
471</DD>
472<DT><STRONG><code>&#92;s</code></STRONG></DT>
473<DD>When the <tt class="constant">LOCALE</tt> and <tt class="constant">UNICODE</tt>
474flags are not specified, matches any whitespace character; this is
475equivalent to the set <tt class="regexp">[ &#92;t&#92;n&#92;r&#92;f&#92;v]</tt>.
476With <tt class="constant">LOCALE</tt>, it will match this set plus whatever characters
477are defined as space for the current locale. If <tt class="constant">UNICODE</tt> is set,
478this will match the characters <tt class="regexp">[ &#92;t&#92;n&#92;r&#92;f&#92;v]</tt> plus
479whatever is classified as space in the Unicode character properties
480database.
481
482<P>
483</DD>
484<DT><STRONG><code>&#92;S</code></STRONG></DT>
485<DD>When the <tt class="constant">LOCALE</tt> and <tt class="constant">UNICODE</tt>
486flags are not specified, matches any non-whitespace character; this is
487equivalent to the set <tt class="regexp">[^ &#92;t&#92;n&#92;r&#92;f&#92;v]</tt>
488With <tt class="constant">LOCALE</tt>, it will match any character not in this set,
489and not defined as space in the current locale. If <tt class="constant">UNICODE</tt>
490is set, this will match anything other than <tt class="regexp">[ &#92;t&#92;n&#92;r&#92;f&#92;v]</tt>
491and characters marked as space in the Unicode character properties database.
492
493<P>
494</DD>
495<DT><STRONG><code>&#92;w</code></STRONG></DT>
496<DD>When the <tt class="constant">LOCALE</tt> and <tt class="constant">UNICODE</tt>
497flags are not specified, matches any alphanumeric character and the
498underscore; this is equivalent to the set
499<tt class="regexp">[a-zA-Z0-9_]</tt>. With <tt class="constant">LOCALE</tt>, it will match the set
500<tt class="regexp">[0-9_]</tt> plus whatever characters are defined as alphanumeric for
501the current locale. If <tt class="constant">UNICODE</tt> is set, this will match the
502characters <tt class="regexp">[0-9_]</tt> plus whatever is classified as alphanumeric
503in the Unicode character properties database.
504
505<P>
506</DD>
507<DT><STRONG><code>&#92;W</code></STRONG></DT>
508<DD>When the <tt class="constant">LOCALE</tt> and <tt class="constant">UNICODE</tt>
509flags are not specified, matches any non-alphanumeric character; this
510is equivalent to the set <tt class="regexp">[^a-zA-Z0-9_]</tt>. With
511<tt class="constant">LOCALE</tt>, it will match any character not in the set
512<tt class="regexp">[0-9_]</tt>, and not defined as alphanumeric for the current locale.
513If <tt class="constant">UNICODE</tt> is set, this will match anything other than
514<tt class="regexp">[0-9_]</tt> and characters marked as alphanumeric in the Unicode
515character properties database.
516
517<P>
518</DD>
519<DT><STRONG><code>&#92;Z</code></STRONG></DT>
520<DD>Matches only at the end of the string.
521
522<P>
523</DD>
524</DL>
525
526<P>
527Most of the standard escapes supported by Python string literals are
528also accepted by the regular expression parser:
529
530<P>
531<div class="verbatim"><pre>
532\a \b \f \n
533\r \t \v \x
534\\
535</pre></div>
536
537<P>
538Octal escapes are included in a limited form: If the first digit is a
5390, or if there are three octal digits, it is considered an octal
540escape. Otherwise, it is a group reference. As for string literals,
541octal escapes are always at most three digits in length.
542
543<P>
544
545<DIV CLASS="navigation">
546<div class='online-navigation'>
547<p></p><hr />
548<table align="center" width="100%" cellpadding="0" cellspacing="2">
549<tr>
550<td class='online-navigation'><a rel="prev" title="4.2 re "
551 href="module-re.html"><img src='../icons/previous.png'
552 border='0' height='32' alt='Previous Page' width='32' /></A></td>
553<td class='online-navigation'><a rel="parent" title="4.2 re "
554 href="module-re.html"><img src='../icons/up.png'
555 border='0' height='32' alt='Up One Level' width='32' /></A></td>
556<td class='online-navigation'><a rel="next" title="4.2.2 Matching vs Searching"
557 href="matching-searching.html"><img src='../icons/next.png'
558 border='0' height='32' alt='Next Page' width='32' /></A></td>
559<td align="center" width="100%">Python Library Reference</td>
560<td class='online-navigation'><a rel="contents" title="Table of Contents"
561 href="contents.html"><img src='../icons/contents.png'
562 border='0' height='32' alt='Contents' width='32' /></A></td>
563<td class='online-navigation'><a href="modindex.html" title="Module Index"><img src='../icons/modules.png'
564 border='0' height='32' alt='Module Index' width='32' /></a></td>
565<td class='online-navigation'><a rel="index" title="Index"
566 href="genindex.html"><img src='../icons/index.png'
567 border='0' height='32' alt='Index' width='32' /></A></td>
568</tr></table>
569<div class='online-navigation'>
570<b class="navlabel">Previous:</b>
571<a class="sectref" rel="prev" href="module-re.html">4.2 re </A>
572<b class="navlabel">Up:</b>
573<a class="sectref" rel="parent" href="module-re.html">4.2 re </A>
574<b class="navlabel">Next:</b>
575<a class="sectref" rel="next" href="matching-searching.html">4.2.2 Matching vs Searching</A>
576</div>
577</div>
578<hr />
579<span class="release-info">Release 2.4.2, documentation updated on 28 September 2005.</span>
580</DIV>
581<!--End of Navigation Panel-->
582<ADDRESS>
583See <i><a href="about.html">About this document...</a></i> for information on suggesting changes.
584</ADDRESS>
585</BODY>
586</HTML>