<!DOCTYPE html PUBLIC
"-//W3C//DTD HTML 4.0 Transitional//EN">
<link rel=
"STYLESHEET" href=
"lib.css" type='text/css'
/>
<link rel=
"SHORTCUT ICON" href=
"../icons/pyfav.png" type=
"image/png" />
<link rel='start' href='../index.html' title='Python Documentation Index'
/>
<link rel=
"first" href=
"lib.html" title='Python Library Reference'
/>
<link rel='contents' href='contents.html'
title=
"Contents" />
<link rel='index' href='genindex.html' title='Index'
/>
<link rel='last' href='about.html' title='About this document...'
/>
<link rel='help' href='about.html' title='About this document...'
/>
<link rel=
"next" href=
"re-objects.html" />
<link rel=
"prev" href=
"matching-searching.html" />
<link rel=
"parent" href=
"module-re.html" />
<link rel=
"next" href=
"re-objects.html" />
<meta name='aesop' content='information'
/>
<title>4.2.3 Module Contents
</title>
<div id='top-navigation-panel' xml:id='top-navigation-panel'
>
<table align=
"center" width=
"100%" cellpadding=
"0" cellspacing=
"2">
<td class='online-navigation'
><a rel=
"prev" title=
"4.2.2 Matching vs Searching"
href=
"matching-searching.html"><img src='../icons/previous.png'
border='
0' height='
32' alt='Previous Page' width='
32'
/></A></td>
<td class='online-navigation'
><a rel=
"parent" title=
"4.2 re "
href=
"module-re.html"><img src='../icons/up.png'
border='
0' height='
32' alt='Up One Level' width='
32'
/></A></td>
<td class='online-navigation'
><a rel=
"next" title=
"4.2.4 Regular Expression Objects"
href=
"re-objects.html"><img src='../icons/next.png'
border='
0' height='
32' alt='Next Page' width='
32'
/></A></td>
<td align=
"center" width=
"100%">Python Library Reference
</td>
<td class='online-navigation'
><a rel=
"contents" title=
"Table of Contents"
href=
"contents.html"><img src='../icons/contents.png'
border='
0' height='
32' alt='Contents' width='
32'
/></A></td>
<td class='online-navigation'
><a href=
"modindex.html" title=
"Module Index"><img src='../icons/modules.png'
border='
0' height='
32' alt='Module Index' width='
32'
/></a></td>
<td class='online-navigation'
><a rel=
"index" title=
"Index"
href=
"genindex.html"><img src='../icons/index.png'
border='
0' height='
32' alt='Index' width='
32'
/></A></td>
<div class='online-navigation'
>
<b class=
"navlabel">Previous:
</b>
<a class=
"sectref" rel=
"prev" href=
"matching-searching.html">4.2.2 Matching vs Searching
</A>
<b class=
"navlabel">Up:
</b>
<a class=
"sectref" rel=
"parent" href=
"module-re.html">4.2 re
</A>
<b class=
"navlabel">Next:
</b>
<a class=
"sectref" rel=
"next" href=
"re-objects.html">4.2.4 Regular Expression Objects
</A>
<!--End of Navigation Panel-->
<H2><A NAME=
"SECTION006230000000000000000">
4.2.3 Module Contents
</A>
<A NAME=
"Contents_of_Module_re"></A>
The module defines several functions, constants, and an exception. Some of the
functions are simplified versions of the full featured methods for compiled
regular expressions. Most non-trivial applications always use the compiled
<dl><dt><table cellpadding=
"0" cellspacing=
"0"><tr valign=
"baseline">
<td><nobr><b><tt id='l2h-
869' xml:id='l2h-
869'
class=
"function">compile
</tt></b>(
</nobr></td>
<td><var>pattern
</var><big>[
</big><var>, flags
</var><big>]
</big><var></var>)
</td></tr></table></dt>
Compile a regular expression pattern into a regular expression
object, which can be used for matching using its
<tt class=
"function">match()
</tt> and
<tt class=
"function">search()
</tt> methods, described below.
The expression's behaviour can be modified by specifying a
<var>flags
</var> value. Values can be any of the following variables,
combined using bitwise OR (the
<code>|
</code> operator).
<div class=
"verbatim"><pre>
<div class=
"verbatim"><pre>
result = re.match(pat, str)
but the version using
<tt class=
"function">compile()
</tt> is more efficient when the
expression will be used several times in a single program.
<dl><dt><b><tt id='l2h-
870' xml:id='l2h-
870'
>I
</tt></b></dt>
<dt><b><tt id='l2h-
885' xml:id='l2h-
885'
>IGNORECASE
</tt></b></dt><dd>
Perform case-insensitive matching; expressions like
<tt class=
"regexp">[A-Z]
</tt>
will match lowercase letters, too. This is not affected by the
<dl><dt><b><tt id='l2h-
871' xml:id='l2h-
871'
>L
</tt></b></dt>
<dt><b><tt id='l2h-
886' xml:id='l2h-
886'
>LOCALE
</tt></b></dt><dd>
Make
<tt class=
"regexp">\w
</tt>,
<tt class=
"regexp">\W
</tt>,
<tt class=
"regexp">\b
</tt>,
<tt class=
"regexp">\B
</tt>,
<tt class=
"regexp">\s
</tt> and
<tt class=
"regexp">\S
</tt> dependent on the current locale.
<dl><dt><b><tt id='l2h-
872' xml:id='l2h-
872'
>M
</tt></b></dt>
<dt><b><tt id='l2h-
887' xml:id='l2h-
887'
>MULTILINE
</tt></b></dt><dd>
When specified, the pattern character
"<tt class="character
">^</tt>"
matches at the beginning of the string and at the beginning of each
line (immediately following each newline); and the pattern character
"<tt class="character
">$</tt>" matches at the end of the string and at the end of each
line (immediately preceding each newline). By default,
"<tt class="character
">^</tt>" matches only at the beginning of the
string, and
"<tt class="character
">$</tt>" only at the end of the string and
immediately before the newline (if any) at the end of the string.
<dl><dt><b><tt id='l2h-
873' xml:id='l2h-
873'
>S
</tt></b></dt>
<dt><b><tt id='l2h-
888' xml:id='l2h-
888'
>DOTALL
</tt></b></dt><dd>
Make the
"<tt class="character
">.</tt>" special character match any character at all,
including a newline; without this flag,
"<tt class="character
">.</tt>" will match
anything
<em>except
</em> a newline.
<dl><dt><b><tt id='l2h-
874' xml:id='l2h-
874'
>U
</tt></b></dt>
<dt><b><tt id='l2h-
889' xml:id='l2h-
889'
>UNICODE
</tt></b></dt><dd>
Make
<tt class=
"regexp">\w
</tt>,
<tt class=
"regexp">\W
</tt>,
<tt class=
"regexp">\b
</tt>,
<tt class=
"regexp">\B
</tt>,
<tt class=
"regexp">\d
</tt>,
<tt class=
"regexp">\D
</tt>,
<tt class=
"regexp">\s
</tt> and
<tt class=
"regexp">\S
</tt>
dependent on the Unicode character properties database.
<span class=
"versionnote">New in version
2.0.
</span>
<dl><dt><b><tt id='l2h-
875' xml:id='l2h-
875'
>X
</tt></b></dt>
<dt><b><tt id='l2h-
890' xml:id='l2h-
890'
>VERBOSE
</tt></b></dt><dd>
This flag allows you to write regular expressions that look nicer.
Whitespace within the pattern is ignored,
except when in a character class or preceded by an unescaped
backslash, and, when a line contains a
"<tt class="character
">#</tt>" neither in a
character class or preceded by an unescaped backslash, all characters
from the leftmost such
"<tt class="character
">#</tt>" through the end of the line are
<dl><dt><table cellpadding=
"0" cellspacing=
"0"><tr valign=
"baseline">
<td><nobr><b><tt id='l2h-
876' xml:id='l2h-
876'
class=
"function">search
</tt></b>(
</nobr></td>
<td><var>pattern, string
</var><big>[
</big><var>, flags
</var><big>]
</big><var></var>)
</td></tr></table></dt>
Scan through
<var>string
</var> looking for a location where the regular
expression
<var>pattern
</var> produces a match, and return a
corresponding
<tt class=
"class">MatchObject
</tt> instance.
Return
<code>None
</code> if no
position in the string matches the pattern; note that this is
different from finding a zero-length match at some point in the string.
<dl><dt><table cellpadding=
"0" cellspacing=
"0"><tr valign=
"baseline">
<td><nobr><b><tt id='l2h-
877' xml:id='l2h-
877'
class=
"function">match
</tt></b>(
</nobr></td>
<td><var>pattern, string
</var><big>[
</big><var>, flags
</var><big>]
</big><var></var>)
</td></tr></table></dt>
If zero or more characters at the beginning of
<var>string
</var> match
the regular expression
<var>pattern
</var>, return a corresponding
<tt class=
"class">MatchObject
</tt> instance. Return
<code>None
</code> if the string does not
match the pattern; note that this is different from a zero-length
<span class=
"note"><b class=
"label">Note:
</b>
If you want to locate a match anywhere in
<var>string
</var>, use
<tt class=
"method">search()
</tt> instead.
</span>
<dl><dt><table cellpadding=
"0" cellspacing=
"0"><tr valign=
"baseline">
<td><nobr><b><tt id='l2h-
878' xml:id='l2h-
878'
class=
"function">split
</tt></b>(
</nobr></td>
<td><var>pattern, string
</var><big>[
</big><var>, maxsplit
<code> =
0</code></var><big>]
</big><var></var>)
</td></tr></table></dt>
Split
<var>string
</var> by the occurrences of
<var>pattern
</var>. If
capturing parentheses are used in
<var>pattern
</var>, then the text of all
groups in the pattern are also returned as part of the resulting list.
If
<var>maxsplit
</var> is nonzero, at most
<var>maxsplit
</var> splits
occur, and the remainder of the string is returned as the final
element of the list. (Incompatibility note: in the original Python
1.5 release,
<var>maxsplit
</var> was ignored. This has been fixed in
<div class=
"verbatim"><pre>
>>> re.split('\W+', 'Words, words, words.')
['Words', 'words', 'words', '']
>>> re.split('(\W+)', 'Words, words, words.')
['Words', ', ', 'words', ', ', 'words', '.', '']
>>> re.split('\W+', 'Words, words, words.',
1)
['Words', 'words, words.']
This function combines and extends the functionality of
the old
<tt class=
"function">regsub.split()
</tt> and
<tt class=
"function">regsub.splitx()
</tt>.
<dl><dt><table cellpadding=
"0" cellspacing=
"0"><tr valign=
"baseline">
<td><nobr><b><tt id='l2h-
879' xml:id='l2h-
879'
class=
"function">findall
</tt></b>(
</nobr></td>
<td><var>pattern, string
</var><big>[
</big><var>, flags
</var><big>]
</big><var></var>)
</td></tr></table></dt>
Return a list of all non-overlapping matches of
<var>pattern
</var> in
<var>string
</var>. If one or more groups are present in the pattern,
return a list of groups; this will be a list of tuples if the
pattern has more than one group. Empty matches are included in the
result unless they touch the beginning of another match.
<span class=
"versionnote">New in version
1.5.2.
</span>
<span class=
"versionnote">Changed in version
2.4:
Added the optional flags argument.
</span>
<dl><dt><table cellpadding=
"0" cellspacing=
"0"><tr valign=
"baseline">
<td><nobr><b><tt id='l2h-
880' xml:id='l2h-
880'
class=
"function">finditer
</tt></b>(
</nobr></td>
<td><var>pattern, string
</var><big>[
</big><var>, flags
</var><big>]
</big><var></var>)
</td></tr></table></dt>
Return an iterator over all non-overlapping matches for the RE
<var>pattern
</var> in
<var>string
</var>. For each match, the iterator returns
a match object. Empty matches are included in the result unless they
touch the beginning of another match.
<span class=
"versionnote">New in version
2.2.
</span>
<span class=
"versionnote">Changed in version
2.4:
Added the optional flags argument.
</span>
<dl><dt><table cellpadding=
"0" cellspacing=
"0"><tr valign=
"baseline">
<td><nobr><b><tt id='l2h-
881' xml:id='l2h-
881'
class=
"function">sub
</tt></b>(
</nobr></td>
<td><var>pattern, repl, string
</var><big>[
</big><var>, count
</var><big>]
</big><var></var>)
</td></tr></table></dt>
Return the string obtained by replacing the leftmost non-overlapping
occurrences of
<var>pattern
</var> in
<var>string
</var> by the replacement
<var>repl
</var>. If the pattern isn't found,
<var>string
</var> is returned
unchanged.
<var>repl
</var> can be a string or a function; if it is a
string, any backslash escapes in it are processed. That is,
"<tt class="samp
">\n</tt>" is converted to a single newline character,
"<tt class="samp
">\r</tt>" is converted to a linefeed, and so forth. Unknown escapes such as
"<tt class="samp
">\j</tt>" are left alone. Backreferences, such as
"<tt class="samp
">\6</tt>", are
replaced with the substring matched by group
6 in the pattern. For
<div class=
"verbatim"><pre>
>>> re.sub(r'def\s+([a-zA-Z_][a-zA-Z_0-
9]*)\s*\(\s*\):',
... r'static PyObject*\npy_\
1(void)\n{',
'static PyObject*\npy_myfunc(void)\n{'
If
<var>repl
</var> is a function, it is called for every non-overlapping
occurrence of
<var>pattern
</var>. The function takes a single match
object argument, and returns the replacement string. For example:
<div class=
"verbatim"><pre>
>>> def dashrepl(matchobj):
... if matchobj.group(
0) == '-': return ' '
>>> re.sub('-{
1,
2}', dashrepl, 'pro----gram-files')
The pattern may be a string or an RE object; if you need to specify
regular expression flags, you must use a RE object, or use embedded
modifiers in a pattern; for example,
"<tt class="samp
">sub("(?i)b+
", "x
", "bbbb
BBBB
")</tt>" returns
<code>'x x'
</code>.
The optional argument
<var>count
</var> is the maximum number of pattern
occurrences to be replaced;
<var>count
</var> must be a non-negative
integer. If omitted or zero, all occurrences will be replaced.
Empty matches for the pattern are replaced only when not adjacent to
a previous match, so
"<tt class="samp
">sub('x*', '-', 'abc')</tt>" returns
In addition to character escapes and backreferences as described
above,
"<tt class="samp
">\g<name></tt>" will use the substring matched by the group
named
"<tt class="samp
">name</tt>", as defined by the
<tt class=
"regexp">(?P
<name
>...)
</tt> syntax.
"<tt class="samp
">\g<number></tt>" uses the corresponding group number;
"<tt class="samp
">\g<2></tt>" is therefore equivalent to
"<tt class="samp
">\2</tt>", but isn't
ambiguous in a replacement such as
"<tt class="samp
">\g<2>0</tt>".
"<tt class="samp
">\20</tt>" would be interpreted as a reference to group
20, not a reference to
group
2 followed by the literal character
"<tt class="character
">0</tt>". The
backreference
"<tt class="samp
">\g<0></tt>" substitutes in the entire substring
<dl><dt><table cellpadding=
"0" cellspacing=
"0"><tr valign=
"baseline">
<td><nobr><b><tt id='l2h-
882' xml:id='l2h-
882'
class=
"function">subn
</tt></b>(
</nobr></td>
<td><var>pattern, repl, string
</var><big>[
</big><var>, count
</var><big>]
</big><var></var>)
</td></tr></table></dt>
Perform the same operation as
<tt class=
"function">sub()
</tt>, but return a tuple
<code>(
<var>new_string
</var>,
<var>number_of_subs_made
</var>)
</code>.
<dl><dt><table cellpadding=
"0" cellspacing=
"0"><tr valign=
"baseline">
<td><nobr><b><tt id='l2h-
883' xml:id='l2h-
883'
class=
"function">escape
</tt></b>(
</nobr></td>
<td><var>string
</var>)
</td></tr></table></dt>
Return
<var>string
</var> with all non-alphanumerics backslashed; this is
useful if you want to match an arbitrary literal string that may have
regular expression metacharacters in it.
<dl><dt><b><span class=
"typelabel">exception
</span> <tt id='l2h-
884' xml:id='l2h-
884'
class=
"exception">error
</tt></b></dt>
Exception raised when a string passed to one of the functions here
is not a valid regular expression (for example, it might contain
unmatched parentheses) or when some other error occurs during
compilation or matching. It is never an error if a string contains
<div class='online-navigation'
>
<table align=
"center" width=
"100%" cellpadding=
"0" cellspacing=
"2">
<td class='online-navigation'
><a rel=
"prev" title=
"4.2.2 Matching vs Searching"
href=
"matching-searching.html"><img src='../icons/previous.png'
border='
0' height='
32' alt='Previous Page' width='
32'
/></A></td>
<td class='online-navigation'
><a rel=
"parent" title=
"4.2 re "
href=
"module-re.html"><img src='../icons/up.png'
border='
0' height='
32' alt='Up One Level' width='
32'
/></A></td>
<td class='online-navigation'
><a rel=
"next" title=
"4.2.4 Regular Expression Objects"
href=
"re-objects.html"><img src='../icons/next.png'
border='
0' height='
32' alt='Next Page' width='
32'
/></A></td>
<td align=
"center" width=
"100%">Python Library Reference
</td>
<td class='online-navigation'
><a rel=
"contents" title=
"Table of Contents"
href=
"contents.html"><img src='../icons/contents.png'
border='
0' height='
32' alt='Contents' width='
32'
/></A></td>
<td class='online-navigation'
><a href=
"modindex.html" title=
"Module Index"><img src='../icons/modules.png'
border='
0' height='
32' alt='Module Index' width='
32'
/></a></td>
<td class='online-navigation'
><a rel=
"index" title=
"Index"
href=
"genindex.html"><img src='../icons/index.png'
border='
0' height='
32' alt='Index' width='
32'
/></A></td>
<div class='online-navigation'
>
<b class=
"navlabel">Previous:
</b>
<a class=
"sectref" rel=
"prev" href=
"matching-searching.html">4.2.2 Matching vs Searching
</A>
<b class=
"navlabel">Up:
</b>
<a class=
"sectref" rel=
"parent" href=
"module-re.html">4.2 re
</A>
<b class=
"navlabel">Next:
</b>
<a class=
"sectref" rel=
"next" href=
"re-objects.html">4.2.4 Regular Expression Objects
</A>
<span class=
"release-info">Release
2.4.2, documentation updated on
28 September
2005.
</span>
<!--End of Navigation Panel-->
See
<i><a href=
"about.html">About this document...
</a></i> for information on suggesting changes.