git.subgeniuskitty.com - OpenSPARC-T2-SAM/.git/blame_incremental - sam-t2/devtools/v9/html/python/lib/re-syntax.html

... / ...

Commit	Line	Data
	1	<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
	2	<html>
	3	<head>
	4	<link rel="STYLESHEET" href="lib.css" type='text/css' />
	5	<link rel="SHORTCUT ICON" href="../icons/pyfav.png" type="image/png" />
	6	<link rel='start' href='../index.html' title='Python Documentation Index' />
	7	<link rel="first" href="lib.html" title='Python Library Reference' />
	8	<link rel='contents' href='contents.html' title="Contents" />
	9	<link rel='index' href='genindex.html' title='Index' />
	10	<link rel='last' href='about.html' title='About this document...' />
	11	<link rel='help' href='about.html' title='About this document...' />
	12	<link rel="next" href="matching-searching.html" />
	13	<link rel="prev" href="module-re.html" />
	14	<link rel="parent" href="module-re.html" />
	15	<link rel="next" href="matching-searching.html" />
	16	<meta name='aesop' content='information' />
	17	<title>4.2.1 Regular Expression Syntax </title>
	18	</head>
	19	<body>
	20	<DIV CLASS="navigation">
	21	<div id='top-navigation-panel' xml:id='top-navigation-panel'>
	22	<table align="center" width="100%" cellpadding="0" cellspacing="2">
	23	<tr>
	24	<td class='online-navigation'><a rel="prev" title="4.2 re "
	25	href="module-re.html"><img src='../icons/previous.png'
	26	border='0' height='32' alt='Previous Page' width='32' /></A></td>
	27	<td class='online-navigation'><a rel="parent" title="4.2 re "
	28	href="module-re.html"><img src='../icons/up.png'
	29	border='0' height='32' alt='Up One Level' width='32' /></A></td>
	30	<td class='online-navigation'><a rel="next" title="4.2.2 Matching vs Searching"
	31	href="matching-searching.html"><img src='../icons/next.png'
	32	border='0' height='32' alt='Next Page' width='32' /></A></td>
	33	<td align="center" width="100%">Python Library Reference</td>
	34	<td class='online-navigation'><a rel="contents" title="Table of Contents"
	35	href="contents.html"><img src='../icons/contents.png'
	36	border='0' height='32' alt='Contents' width='32' /></A></td>
	37	<td class='online-navigation'><a href="modindex.html" title="Module Index"><img src='../icons/modules.png'
	38	border='0' height='32' alt='Module Index' width='32' /></a></td>
	39	<td class='online-navigation'><a rel="index" title="Index"
	40	href="genindex.html"><img src='../icons/index.png'
	41	border='0' height='32' alt='Index' width='32' /></A></td>
	42	</tr></table>
	43	<div class='online-navigation'>
	44	<b class="navlabel">Previous:</b>
	45	<a class="sectref" rel="prev" href="module-re.html">4.2 re </A>
	46	<b class="navlabel">Up:</b>
	47	<a class="sectref" rel="parent" href="module-re.html">4.2 re </A>
	48	<b class="navlabel">Next:</b>
	49	<a class="sectref" rel="next" href="matching-searching.html">4.2.2 Matching vs Searching</A>
	50	</div>
	51	<hr /></div>
	52	</DIV>
	53	<!--End of Navigation Panel-->
	54
	55	<H2><A NAME="SECTION006210000000000000000"></A><A NAME="re-syntax"></A>
	56	<BR>
	57	4.2.1 Regular Expression Syntax
	58	</H2>
	59
	60	<P>
	61	A regular expression (or RE) specifies a set of strings that matches
	62	it; the functions in this module let you check if a particular string
	63	matches a given regular expression (or if a given regular expression
	64	matches a particular string, which comes down to the same thing).
	65
	66	<P>
	67	Regular expressions can be concatenated to form new regular
	68	expressions; if <em>A</em> and <em>B</em> are both regular expressions,
	69	then <em>AB</em> is also a regular expression. In general, if a string
	70	<em>p</em> matches <em>A</em> and another string <em>q</em> matches <em>B</em>,
	71	the string <em>pq</em> will match AB. This holds unless <em>A</em> or
	72	<em>B</em> contain low precedence operations; boundary conditions between
	73	<em>A</em> and <em>B</em>; or have numbered group references. Thus, complex
	74	expressions can easily be constructed from simpler primitive
	75	expressions like the ones described here. For details of the theory
	76	and implementation of regular expressions, consult the Friedl book
	77	referenced above, or almost any textbook about compiler construction.
	78
	79	<P>
	80	A brief explanation of the format of regular expressions follows. For
	81	further information and a gentler presentation, consult the Regular
	82	Expression HOWTO, accessible from <a class="url" href="http://www.python.org/doc/howto/">http://www.python.org/doc/howto/</a>.
	83
	84	<P>
	85	Regular expressions can contain both special and ordinary characters.
	86	Most ordinary characters, like "<tt class="character">A</tt>", "<tt class="character">a</tt>", or
	87	"<tt class="character">0</tt>", are the simplest regular expressions; they simply match
	88	themselves. You can concatenate ordinary characters, so <tt class="regexp">last</tt>
	89	matches the string <code>'last'</code>. (In the rest of this section, we'll
	90	write RE's in <tt class="regexp">this special style</tt>, usually without quotes, and
	91	strings to be matched <code>'in single quotes'</code>.)
	92
	93	<P>
	94	Some characters, like "<tt class="character">\|</tt>" or "<tt class="character">(</tt>", are special.
	95	Special characters either stand for classes of ordinary characters, or
	96	affect how the regular expressions around them are interpreted.
	97
	98	<P>
	99	The special characters are:
	100	<DL>
	101	<DT><STRONG>"<tt class="character">.</tt>"</STRONG></DT>
	102	<DD>(Dot.) In the default mode, this matches any
	103	character except a newline. If the <tt class="constant">DOTALL</tt> flag has been
	104	specified, this matches any character including a newline.
	105
	106	<P>
	107	</DD>
	108	<DT><STRONG>"<tt class="character">^</tt>"</STRONG></DT>
	109	<DD>(Caret.) Matches the start of the
	110	string, and in <tt class="constant">MULTILINE</tt> mode also matches immediately
	111	after each newline.
	112
	113	<P>
	114	</DD>
	115	<DT><STRONG>"<tt class="character">$</tt>"</STRONG></DT>
	116	<DD>Matches the end of the string or just before the
	117	newline at the end of the string, and in <tt class="constant">MULTILINE</tt> mode
	118	also matches before a newline. <tt class="regexp">foo</tt> matches both 'foo' and
	119	'foobar', while the regular expression <tt class="regexp">foo$</tt> matches only
	120	'foo'. More interestingly, searching for <tt class="regexp">foo.$</tt> in
	121	'foo1\nfoo2\n' matches 'foo2' normally,
	122	but 'foo1' in <tt class="constant">MULTILINE</tt> mode.
	123
	124	<P>
	125	</DD>
	126	<DT><STRONG>"<tt class="character">*</tt>"</STRONG></DT>
	127	<DD>Causes the resulting RE to
	128	match 0 or more repetitions of the preceding RE, as many repetitions
	129	as are possible. <tt class="regexp">ab*</tt> will
	130	match 'a', 'ab', or 'a' followed by any number of 'b's.
	131
	132	<P>
	133	</DD>
	134	<DT><STRONG>"<tt class="character">+</tt>"</STRONG></DT>
	135	<DD>Causes the
	136	resulting RE to match 1 or more repetitions of the preceding RE.
	137	<tt class="regexp">ab+</tt> will match 'a' followed by any non-zero number of 'b's; it
	138	will not match just 'a'.
	139
	140	<P>
	141	</DD>
	142	<DT><STRONG>"<tt class="character">?</tt>"</STRONG></DT>
	143	<DD>Causes the resulting RE to
	144	match 0 or 1 repetitions of the preceding RE. <tt class="regexp">ab?</tt> will
	145	match either 'a' or 'ab'.
	146
	147	<P>
	148	</DD>
	149	<DT><STRONG><code>*?</code>, <code>+?</code>, <code>??</code></STRONG></DT>
	150	<DD>The "<tt class="character">*</tt>",
	151	"<tt class="character">+</tt>", and "<tt class="character">?</tt>" qualifiers are all <i class="dfn">greedy</i>; they
	152	match as much text as possible. Sometimes this behaviour isn't
	153	desired; if the RE <tt class="regexp"><.*></tt> is matched against
	154	<code>'<H1>title</H1>'</code>, it will match the entire string, and not just
	155	<code>'<H1>'</code>. Adding "<tt class="character">?</tt>" after the qualifier makes it
	156	perform the match in <i class="dfn">non-greedy</i> or <i class="dfn">minimal</i> fashion; as
	157	<em>few</em> characters as possible will be matched. Using <tt class="regexp">.*?</tt>
	158	in the previous expression will match only <code>'<H1>'</code>.
	159
	160	<P>
	161	</DD>
	162	<DT><STRONG><code>{<var>m</var>}</code></STRONG></DT>
	163	<DD>Specifies that exactly <var>m</var> copies of the previous RE should be
	164	matched; fewer matches cause the entire RE not to match. For example,
	165	<tt class="regexp">a{6}</tt> will match exactly six "<tt class="character">a</tt>" characters, but
	166	not five.
	167
	168	<P>
	169	</DD>
	170	<DT><STRONG><code>{<var>m</var>,<var>n</var>}</code></STRONG></DT>
	171	<DD>Causes the resulting RE to match from
	172	<var>m</var> to <var>n</var> repetitions of the preceding RE, attempting to
	173	match as many repetitions as possible. For example, <tt class="regexp">a{3,5}</tt>
	174	will match from 3 to 5 "<tt class="character">a</tt>" characters. Omitting <var>m</var>
	175	specifies a lower bound of zero,
	176	and omitting <var>n</var> specifies an infinite upper bound. As an
	177	example, <tt class="regexp">a{4,}b</tt> will match <code>aaaab</code> or a thousand
	178	"<tt class="character">a</tt>" characters followed by a <code>b</code>, but not <code>aaab</code>.
	179	The comma may not be omitted or the modifier would be confused with
	180	the previously described form.
	181
	182	<P>
	183	</DD>
	184	<DT><STRONG><code>{<var>m</var>,<var>n</var>}?</code></STRONG></DT>
	185	<DD>Causes the resulting RE to
	186	match from <var>m</var> to <var>n</var> repetitions of the preceding RE,
	187	attempting to match as <em>few</em> repetitions as possible. This is
	188	the non-greedy version of the previous qualifier. For example, on the
	189	6-character string <code>'aaaaaa'</code>, <tt class="regexp">a{3,5}</tt> will match 5
	190	"<tt class="character">a</tt>" characters, while <tt class="regexp">a{3,5}?</tt> will only match 3
	191	characters.
	192
	193	<P>
	194	</DD>
	195	<DT><STRONG>"<tt class="character">\</tt>"</STRONG></DT>
	196	<DD>Either escapes special characters (permitting
	197	you to match characters like "<tt class="character">*</tt>", "<tt class="character">?</tt>", and so
	198	forth), or signals a special sequence; special sequences are discussed
	199	below.
	200
	201	<P>
	202	If you're not using a raw string to
	203	express the pattern, remember that Python also uses the
	204	backslash as an escape sequence in string literals; if the escape
	205	sequence isn't recognized by Python's parser, the backslash and
	206	subsequent character are included in the resulting string. However,
	207	if Python would recognize the resulting sequence, the backslash should
	208	be repeated twice. This is complicated and hard to understand, so
	209	it's highly recommended that you use raw strings for all but the
	210	simplest expressions.
	211
	212	<P>
	213	</DD>
	214	<DT><STRONG><code>[]</code></STRONG></DT>
	215	<DD>Used to indicate a set of characters. Characters can
	216	be listed individually, or a range of characters can be indicated by
	217	giving two characters and separating them by a "<tt class="character">-</tt>". Special
	218	characters are not active inside sets. For example, <tt class="regexp">[akm$]</tt>
	219	will match any of the characters "<tt class="character">a</tt>", "<tt class="character">k</tt>",
	220	"<tt class="character">m</tt>", or "<tt class="character">$</tt>"; <tt class="regexp">[a-z]</tt>
	221	will match any lowercase letter, and <code>[a-zA-Z0-9]</code> matches any
	222	letter or digit. Character classes such as <code>\w</code> or <code>\S</code>
	223	(defined below) are also acceptable inside a range. If you want to
	224	include a "<tt class="character">]</tt>" or a "<tt class="character">-</tt>" inside a set, precede it with a
	225	backslash, or place it as the first character. The
	226	pattern <tt class="regexp">[]]</tt> will match <code>']'</code>, for example.
	227
	228	<P>
	229	You can match the characters not within a range by <i class="dfn">complementing</i>
	230	the set. This is indicated by including a
	231	"<tt class="character">^</tt>" as the first character of the set;
	232	"<tt class="character">^</tt>" elsewhere will simply match the
	233	"<tt class="character">^</tt>" character. For example,
	234	<tt class="regexp">[^5]</tt> will match
	235	any character except "<tt class="character">5</tt>", and
	236	<tt class="regexp">[^<code>^</code>]</tt> will match any character
	237	except "<tt class="character">^</tt>".
	238
	239	<P>
	240	</DD>
	241	<DT><STRONG>"<tt class="character">\|</tt>"</STRONG></DT>
	242	<DD><code>A\|B</code>, where A and B can be arbitrary REs,
	243	creates a regular expression that will match either A or B. An
	244	arbitrary number of REs can be separated by the "<tt class="character">\|</tt>" in this
	245	way. This can be used inside groups (see below) as well. As the target
	246	string is scanned, REs separated by "<tt class="character">\|</tt>" are tried from left to
	247	right. When one pattern completely matches, that branch is accepted.
	248	This means that once <code>A</code> matches, <code>B</code> will not be tested further,
	249	even if it would produce a longer overall match. In other words, the
	250	"<tt class="character">\|</tt>" operator is never greedy. To match a literal "<tt class="character">\|</tt>",
	251	use <tt class="regexp">\\|</tt>, or enclose it inside a character class, as in <tt class="regexp">[\|]</tt>.
	252
	253	<P>
	254	</DD>
	255	<DT><STRONG><code>(...)</code></STRONG></DT>
	256	<DD>Matches whatever regular expression is inside the
	257	parentheses, and indicates the start and end of a group; the contents
	258	of a group can be retrieved after a match has been performed, and can
	259	be matched later in the string with the <tt class="regexp">\<var>number</var></tt> special
	260	sequence, described below. To match the literals "<tt class="character">(</tt>" or
	261	"<tt class="character">)</tt>", use <tt class="regexp">$</tt> or <tt class="regexp">$</tt>, or enclose them
	262	inside a character class: <tt class="regexp">[(] [)]</tt>.
	263
	264	<P>
	265	</DD>
	266	<DT><STRONG><code>(?...)</code></STRONG></DT>
	267	<DD>This is an extension notation (a "<tt class="character">?</tt>"
	268	following a "<tt class="character">(</tt>" is not meaningful otherwise). The first
	269	character after the "<tt class="character">?</tt>"
	270	determines what the meaning and further syntax of the construct is.
	271	Extensions usually do not create a new group;
	272	<tt class="regexp">(?P<<var>name</var>>...)</tt> is the only exception to this rule.
	273	Following are the currently supported extensions.
	274
	275	<P>
	276	</DD>
	277	<DT><STRONG><code>(?iLmsux)</code></STRONG></DT>
	278	<DD>(One or more letters from the set "<tt class="character">i</tt>",
	279	"<tt class="character">L</tt>", "<tt class="character">m</tt>", "<tt class="character">s</tt>", "<tt class="character">u</tt>",
	280	"<tt class="character">x</tt>".) The group matches the empty string; the letters set
	281	the corresponding flags (<tt class="constant">re.I</tt>, <tt class="constant">re.L</tt>,
	282	<tt class="constant">re.M</tt>, <tt class="constant">re.S</tt>, <tt class="constant">re.U</tt>, <tt class="constant">re.X</tt>)
	283	for the entire regular expression. This is useful if you wish to
	284	include the flags as part of the regular expression, instead of
	285	passing a <var>flag</var> argument to the <tt class="function">compile()</tt> function.
	286
	287	<P>
	288	Note that the <tt class="regexp">(?x)</tt> flag changes how the expression is parsed.
	289	It should be used first in the expression string, or after one or more
	290	whitespace characters. If there are non-whitespace characters before
	291	the flag, the results are undefined.
	292
	293	<P>
	294	</DD>
	295	<DT><STRONG><code>(?:...)</code></STRONG></DT>
	296	<DD>A non-grouping version of regular parentheses.
	297	Matches whatever regular expression is inside the parentheses, but the
	298	substring matched by the
	299	group <em>cannot</em> be retrieved after performing a match or
	300	referenced later in the pattern.
	301
	302	<P>
	303	</DD>
	304	<DT><STRONG><code>(?P<<var>name</var>>...)</code></STRONG></DT>
	305	<DD>Similar to regular parentheses, but
	306	the substring matched by the group is accessible via the symbolic group
	307	name <var>name</var>. Group names must be valid Python identifiers, and
	308	each group name must be defined only once within a regular expression. A
	309	symbolic group is also a numbered group, just as if the group were not
	310	named. So the group named 'id' in the example above can also be
	311	referenced as the numbered group 1.
	312
	313	<P>
	314	For example, if the pattern is
	315	<tt class="regexp">(?P<id>[a-zA-Z_]\w*)</tt>, the group can be referenced by its
	316	name in arguments to methods of match objects, such as
	317	<code>m.group('id')</code> or <code>m.end('id')</code>, and also by name in
	318	pattern text (for example, <tt class="regexp">(?P=id)</tt>) and replacement text
	319	(such as <code>\g<id></code>).
	320
	321	<P>
	322	</DD>
	323	<DT><STRONG><code>(?P=<var>name</var>)</code></STRONG></DT>
	324	<DD>Matches whatever text was matched by the
	325	earlier group named <var>name</var>.
	326
	327	<P>
	328	</DD>
	329	<DT><STRONG><code>(?#...)</code></STRONG></DT>
	330	<DD>A comment; the contents of the parentheses are
	331	simply ignored.
	332
	333	<P>
	334	</DD>
	335	<DT><STRONG><code>(?=...)</code></STRONG></DT>
	336	<DD>Matches if <tt class="regexp">...</tt> matches next, but doesn't
	337	consume any of the string. This is called a lookahead assertion. For
	338	example, <tt class="regexp">Isaac (?=Asimov)</tt> will match <code>'Isaac '</code> only if it's
	339	followed by <code>'Asimov'</code>.
	340
	341	<P>
	342	</DD>
	343	<DT><STRONG><code>(?!...)</code></STRONG></DT>
	344	<DD>Matches if <tt class="regexp">...</tt> doesn't match next. This
	345	is a negative lookahead assertion. For example,
	346	<tt class="regexp">Isaac (?!Asimov)</tt> will match <code>'Isaac '</code> only if it's <em>not</em>
	347	followed by <code>'Asimov'</code>.
	348
	349	<P>
	350	</DD>
	351	<DT><STRONG><code>(?<=...)</code></STRONG></DT>
	352	<DD>Matches if the current position in the string
	353	is preceded by a match for <tt class="regexp">...</tt> that ends at the current
	354	position. This is called a <i class="dfn">positive lookbehind assertion</i>.
	355	<tt class="regexp">(?<=abc)def</tt> will find a match in "<tt class="samp">abcdef</tt>", since the
	356	lookbehind will back up 3 characters and check if the contained
	357	pattern matches. The contained pattern must only match strings of
	358	some fixed length, meaning that <tt class="regexp">abc</tt> or <tt class="regexp">a\|b</tt> are
	359	allowed, but <tt class="regexp">a*</tt> and <tt class="regexp">a{3,4}</tt> are not. Note that
	360	patterns which start with positive lookbehind assertions will never
	361	match at the beginning of the string being searched; you will most
	362	likely want to use the <tt class="function">search()</tt> function rather than the
	363	<tt class="function">match()</tt> function:
	364
	365	<P>
	366	<div class="verbatim"><pre>
	367	>>> import re
	368	>>> m = re.search('(?<=abc)def', 'abcdef')
	369	>>> m.group(0)
	370	'def'
	371	</pre></div>
	372
	373	<P>
	374	This example looks for a word following a hyphen:
	375
	376	<P>
	377	<div class="verbatim"><pre>
	378	>>> m = re.search('(?<=-)\w+', 'spam-egg')
	379	>>> m.group(0)
	380	'egg'
	381	</pre></div>
	382
	383	<P>
	384	</DD>
	385	<DT><STRONG><code>(?<!...)</code></STRONG></DT>
	386	<DD>Matches if the current position in the string
	387	is not preceded by a match for <tt class="regexp">...</tt>. This is called a
	388	<i class="dfn">negative lookbehind assertion</i>. Similar to positive lookbehind
	389	assertions, the contained pattern must only match strings of some
	390	fixed length. Patterns which start with negative lookbehind
	391	assertions may match at the beginning of the string being searched.
	392
	393	<P>
	394	</DD>
	395	<DT><STRONG><code>(?(<var>id/name</var>)yes-pattern\|no-pattern)</code></STRONG></DT>
	396	<DD>Will try to match
	397	with <tt class="regexp">yes-pattern</tt> if the group with given <var>id</var> or <var>name</var>
	398	exists, and with <tt class="regexp">no-pattern</tt> if it doesn't. <tt class="regexp">\|no-pattern</tt>
	399	is optional and can be omitted. For example,
	400	<tt class="regexp">(<)?(\w+@\w+(?:\.\w+)+)(?(1)>)</tt> is a poor email matching
	401	pattern, which will match with <code>'<user@host.com>'</code> as well as
	402	<code>'user@host.com'</code>, but not with <code>'<user@host.com'</code>.
	403
	404	<span class="versionnote">New in version 2.4.</span>
	405
	406	<P>
	407	</DD>
	408	</DL>
	409
	410	<P>
	411	The special sequences consist of "<tt class="character">\</tt>" and a character from the
	412	list below. If the ordinary character is not on the list, then the
	413	resulting RE will match the second character. For example,
	414	<tt class="regexp">\$</tt> matches the character "<tt class="character">$</tt>".
	415	<DL>
	416	<DT><STRONG><code>\<var>number</var></code></STRONG></DT>
	417	<DD>Matches the contents of the group of the
	418	same number. Groups are numbered starting from 1. For example,
	419	<tt class="regexp">(.+) \1</tt> matches <code>'the the'</code> or <code>'55 55'</code>, but not
	420	<code>'the end'</code> (note
	421	the space after the group). This special sequence can only be used to
	422	match one of the first 99 groups. If the first digit of <var>number</var>
	423	is 0, or <var>number</var> is 3 octal digits long, it will not be interpreted
	424	as a group match, but as the character with octal value <var>number</var>.
	425	Inside the "<tt class="character">[</tt>" and "<tt class="character">]</tt>" of a character class, all numeric
	426	escapes are treated as characters.
	427
	428	<P>
	429	</DD>
	430	<DT><STRONG><code>\A</code></STRONG></DT>
	431	<DD>Matches only at the start of the string.
	432
	433	<P>
	434	</DD>
	435	<DT><STRONG><code>\b</code></STRONG></DT>
	436	<DD>Matches the empty string, but only at the
	437	beginning or end of a word. A word is defined as a sequence of
	438	alphanumeric or underscore characters, so the end of a word is indicated by
	439	whitespace or a non-alphanumeric, non-underscore character. Note that
	440	<code>\b</code> is defined as the boundary between <code>\w</code> and <code>\
	441	W</code>, so the precise set of characters deemed to be alphanumeric depends on the
	442	values of the <code>UNICODE</code> and <code>LOCALE</code> flags. Inside a character
	443	range, <tt class="regexp">\b</tt> represents the backspace character, for compatibility
	444	with Python's string literals.
	445
	446	<P>
	447	</DD>
	448	<DT><STRONG><code>\B</code></STRONG></DT>
	449	<DD>Matches the empty string, but only when it is <em>not</em>
	450	at the beginning or end of a word. This is just the opposite of <code>\
	451	b</code>, so is also subject to the settings of <code>LOCALE</code> and <code>UNICODE</code>.
	452
	453	<P>
	454	</DD>
	455	<DT><STRONG><code>\d</code></STRONG></DT>
	456	<DD>When the <tt class="constant">UNICODE</tt> flag is not specified, matches
	457	any decimal digit; this is equivalent to the set <tt class="regexp">[0-9]</tt>.
	458	With <tt class="constant">UNICODE</tt>, it will match whatever is classified as a digit
	459	in the Unicode character properties database.
	460
	461	<P>
	462	</DD>
	463	<DT><STRONG><code>\D</code></STRONG></DT>
	464	<DD>When the <tt class="constant">UNICODE</tt> flag is not specified, matches
	465	any non-digit character; this is equivalent to the set
	466	<tt class="regexp">[^0-9]</tt>. With <tt class="constant">UNICODE</tt>, it will match
	467	anything other than character marked as digits in the Unicode character
	468	properties database.
	469
	470	<P>
	471	</DD>
	472	<DT><STRONG><code>\s</code></STRONG></DT>
	473	<DD>When the <tt class="constant">LOCALE</tt> and <tt class="constant">UNICODE</tt>
	474	flags are not specified, matches any whitespace character; this is
	475	equivalent to the set <tt class="regexp">[ \t\n\r\f\v]</tt>.
	476	With <tt class="constant">LOCALE</tt>, it will match this set plus whatever characters
	477	are defined as space for the current locale. If <tt class="constant">UNICODE</tt> is set,
	478	this will match the characters <tt class="regexp">[ \t\n\r\f\v]</tt> plus
	479	whatever is classified as space in the Unicode character properties
	480	database.
	481
	482	<P>
	483	</DD>
	484	<DT><STRONG><code>\S</code></STRONG></DT>
	485	<DD>When the <tt class="constant">LOCALE</tt> and <tt class="constant">UNICODE</tt>
	486	flags are not specified, matches any non-whitespace character; this is
	487	equivalent to the set <tt class="regexp">[^ \t\n\r\f\v]</tt>
	488	With <tt class="constant">LOCALE</tt>, it will match any character not in this set,
	489	and not defined as space in the current locale. If <tt class="constant">UNICODE</tt>
	490	is set, this will match anything other than <tt class="regexp">[ \t\n\r\f\v]</tt>
	491	and characters marked as space in the Unicode character properties database.
	492
	493	<P>
	494	</DD>
	495	<DT><STRONG><code>\w</code></STRONG></DT>
	496	<DD>When the <tt class="constant">LOCALE</tt> and <tt class="constant">UNICODE</tt>
	497	flags are not specified, matches any alphanumeric character and the
	498	underscore; this is equivalent to the set
	499	<tt class="regexp">[a-zA-Z0-9_]</tt>. With <tt class="constant">LOCALE</tt>, it will match the set
	500	<tt class="regexp">[0-9_]</tt> plus whatever characters are defined as alphanumeric for
	501	the current locale. If <tt class="constant">UNICODE</tt> is set, this will match the
	502	characters <tt class="regexp">[0-9_]</tt> plus whatever is classified as alphanumeric
	503	in the Unicode character properties database.
	504
	505	<P>
	506	</DD>
	507	<DT><STRONG><code>\W</code></STRONG></DT>
	508	<DD>When the <tt class="constant">LOCALE</tt> and <tt class="constant">UNICODE</tt>
	509	flags are not specified, matches any non-alphanumeric character; this
	510	is equivalent to the set <tt class="regexp">[^a-zA-Z0-9_]</tt>. With
	511	<tt class="constant">LOCALE</tt>, it will match any character not in the set
	512	<tt class="regexp">[0-9_]</tt>, and not defined as alphanumeric for the current locale.
	513	If <tt class="constant">UNICODE</tt> is set, this will match anything other than
	514	<tt class="regexp">[0-9_]</tt> and characters marked as alphanumeric in the Unicode
	515	character properties database.
	516
	517	<P>
	518	</DD>
	519	<DT><STRONG><code>\Z</code></STRONG></DT>
	520	<DD>Matches only at the end of the string.
	521
	522	<P>
	523	</DD>
	524	</DL>
	525
	526	<P>
	527	Most of the standard escapes supported by Python string literals are
	528	also accepted by the regular expression parser:
	529
	530	<P>
	531	<div class="verbatim"><pre>
	532	\a \b \f \n
	533	\r \t \v \x
	534	\\
	535	</pre></div>
	536
	537	<P>
	538	Octal escapes are included in a limited form: If the first digit is a
	539	0, or if there are three octal digits, it is considered an octal
	540	escape. Otherwise, it is a group reference. As for string literals,
	541	octal escapes are always at most three digits in length.
	542
	543	<P>
	544
	545	<DIV CLASS="navigation">
	546	<div class='online-navigation'>
	547	<p></p><hr />
	548	<table align="center" width="100%" cellpadding="0" cellspacing="2">
	549	<tr>
	550	<td class='online-navigation'><a rel="prev" title="4.2 re "
	551	href="module-re.html"><img src='../icons/previous.png'
	552	border='0' height='32' alt='Previous Page' width='32' /></A></td>
	553	<td class='online-navigation'><a rel="parent" title="4.2 re "
	554	href="module-re.html"><img src='../icons/up.png'
	555	border='0' height='32' alt='Up One Level' width='32' /></A></td>
	556	<td class='online-navigation'><a rel="next" title="4.2.2 Matching vs Searching"
	557	href="matching-searching.html"><img src='../icons/next.png'
	558	border='0' height='32' alt='Next Page' width='32' /></A></td>
	559	<td align="center" width="100%">Python Library Reference</td>
	560	<td class='online-navigation'><a rel="contents" title="Table of Contents"
	561	href="contents.html"><img src='../icons/contents.png'
	562	border='0' height='32' alt='Contents' width='32' /></A></td>
	563	<td class='online-navigation'><a href="modindex.html" title="Module Index"><img src='../icons/modules.png'
	564	border='0' height='32' alt='Module Index' width='32' /></a></td>
	565	<td class='online-navigation'><a rel="index" title="Index"
	566	href="genindex.html"><img src='../icons/index.png'
	567	border='0' height='32' alt='Index' width='32' /></A></td>
	568	</tr></table>
	569	<div class='online-navigation'>
	570	<b class="navlabel">Previous:</b>
	571	<a class="sectref" rel="prev" href="module-re.html">4.2 re </A>
	572	<b class="navlabel">Up:</b>
	573	<a class="sectref" rel="parent" href="module-re.html">4.2 re </A>
	574	<b class="navlabel">Next:</b>
	575	<a class="sectref" rel="next" href="matching-searching.html">4.2.2 Matching vs Searching</A>
	576	</div>
	577	</div>
	578	<hr />
	579	<span class="release-info">Release 2.4.2, documentation updated on 28 September 2005.</span>
	580	</DIV>
	581	<!--End of Navigation Panel-->
	582	<ADDRESS>
	583	See <i><a href="about.html">About this document...</a></i> for information on suggesting changes.
	584	</ADDRESS>
	585	</BODY>
	586	</HTML>