Commit | Line | Data |
---|---|---|
920dae64 AT |
1 | <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"> |
2 | <html> | |
3 | <head> | |
4 | <link rel="STYLESHEET" href="lib.css" type='text/css' /> | |
5 | <link rel="SHORTCUT ICON" href="../icons/pyfav.png" type="image/png" /> | |
6 | <link rel='start' href='../index.html' title='Python Documentation Index' /> | |
7 | <link rel="first" href="lib.html" title='Python Library Reference' /> | |
8 | <link rel='contents' href='contents.html' title="Contents" /> | |
9 | <link rel='index' href='genindex.html' title='Index' /> | |
10 | <link rel='last' href='about.html' title='About this document...' /> | |
11 | <link rel='help' href='about.html' title='About this document...' /> | |
12 | <link rel="next" href="re-objects.html" /> | |
13 | <link rel="prev" href="matching-searching.html" /> | |
14 | <link rel="parent" href="module-re.html" /> | |
15 | <link rel="next" href="re-objects.html" /> | |
16 | <meta name='aesop' content='information' /> | |
17 | <title>4.2.3 Module Contents</title> | |
18 | </head> | |
19 | <body> | |
20 | <DIV CLASS="navigation"> | |
21 | <div id='top-navigation-panel' xml:id='top-navigation-panel'> | |
22 | <table align="center" width="100%" cellpadding="0" cellspacing="2"> | |
23 | <tr> | |
24 | <td class='online-navigation'><a rel="prev" title="4.2.2 Matching vs Searching" | |
25 | href="matching-searching.html"><img src='../icons/previous.png' | |
26 | border='0' height='32' alt='Previous Page' width='32' /></A></td> | |
27 | <td class='online-navigation'><a rel="parent" title="4.2 re " | |
28 | href="module-re.html"><img src='../icons/up.png' | |
29 | border='0' height='32' alt='Up One Level' width='32' /></A></td> | |
30 | <td class='online-navigation'><a rel="next" title="4.2.4 Regular Expression Objects" | |
31 | href="re-objects.html"><img src='../icons/next.png' | |
32 | border='0' height='32' alt='Next Page' width='32' /></A></td> | |
33 | <td align="center" width="100%">Python Library Reference</td> | |
34 | <td class='online-navigation'><a rel="contents" title="Table of Contents" | |
35 | href="contents.html"><img src='../icons/contents.png' | |
36 | border='0' height='32' alt='Contents' width='32' /></A></td> | |
37 | <td class='online-navigation'><a href="modindex.html" title="Module Index"><img src='../icons/modules.png' | |
38 | border='0' height='32' alt='Module Index' width='32' /></a></td> | |
39 | <td class='online-navigation'><a rel="index" title="Index" | |
40 | href="genindex.html"><img src='../icons/index.png' | |
41 | border='0' height='32' alt='Index' width='32' /></A></td> | |
42 | </tr></table> | |
43 | <div class='online-navigation'> | |
44 | <b class="navlabel">Previous:</b> | |
45 | <a class="sectref" rel="prev" href="matching-searching.html">4.2.2 Matching vs Searching</A> | |
46 | <b class="navlabel">Up:</b> | |
47 | <a class="sectref" rel="parent" href="module-re.html">4.2 re </A> | |
48 | <b class="navlabel">Next:</b> | |
49 | <a class="sectref" rel="next" href="re-objects.html">4.2.4 Regular Expression Objects</A> | |
50 | </div> | |
51 | <hr /></div> | |
52 | </DIV> | |
53 | <!--End of Navigation Panel--> | |
54 | ||
55 | <H2><A NAME="SECTION006230000000000000000"> | |
56 | 4.2.3 Module Contents</A> | |
57 | </H2> | |
58 | <A NAME="Contents_of_Module_re"></A> | |
59 | <P> | |
60 | The module defines several functions, constants, and an exception. Some of the | |
61 | functions are simplified versions of the full featured methods for compiled | |
62 | regular expressions. Most non-trivial applications always use the compiled | |
63 | form. | |
64 | ||
65 | <P> | |
66 | <dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"> | |
67 | <td><nobr><b><tt id='l2h-869' xml:id='l2h-869' class="function">compile</tt></b>(</nobr></td> | |
68 | <td><var>pattern</var><big>[</big><var>, flags</var><big>]</big><var></var>)</td></tr></table></dt> | |
69 | <dd> | |
70 | Compile a regular expression pattern into a regular expression | |
71 | object, which can be used for matching using its <tt class="function">match()</tt> and | |
72 | <tt class="function">search()</tt> methods, described below. | |
73 | ||
74 | <P> | |
75 | The expression's behaviour can be modified by specifying a | |
76 | <var>flags</var> value. Values can be any of the following variables, | |
77 | combined using bitwise OR (the <code>|</code> operator). | |
78 | ||
79 | <P> | |
80 | The sequence | |
81 | ||
82 | <P> | |
83 | <div class="verbatim"><pre> | |
84 | prog = re.compile(pat) | |
85 | result = prog.match(str) | |
86 | </pre></div> | |
87 | ||
88 | <P> | |
89 | is equivalent to | |
90 | ||
91 | <P> | |
92 | <div class="verbatim"><pre> | |
93 | result = re.match(pat, str) | |
94 | </pre></div> | |
95 | ||
96 | <P> | |
97 | but the version using <tt class="function">compile()</tt> is more efficient when the | |
98 | expression will be used several times in a single program. | |
99 | </dl> | |
100 | ||
101 | <P> | |
102 | <dl><dt><b><tt id='l2h-870' xml:id='l2h-870'>I</tt></b></dt> | |
103 | <dd> | |
104 | <dt><b><tt id='l2h-885' xml:id='l2h-885'>IGNORECASE</tt></b></dt><dd> | |
105 | Perform case-insensitive matching; expressions like <tt class="regexp">[A-Z]</tt> | |
106 | will match lowercase letters, too. This is not affected by the | |
107 | current locale. | |
108 | </dd></dl> | |
109 | ||
110 | <P> | |
111 | <dl><dt><b><tt id='l2h-871' xml:id='l2h-871'>L</tt></b></dt> | |
112 | <dd> | |
113 | <dt><b><tt id='l2h-886' xml:id='l2h-886'>LOCALE</tt></b></dt><dd> | |
114 | Make <tt class="regexp">\w</tt>, <tt class="regexp">\W</tt>, <tt class="regexp">\b</tt>, <tt class="regexp">\B</tt>, | |
115 | <tt class="regexp">\s</tt> and <tt class="regexp">\S</tt> dependent on the current locale. | |
116 | </dd></dl> | |
117 | ||
118 | <P> | |
119 | <dl><dt><b><tt id='l2h-872' xml:id='l2h-872'>M</tt></b></dt> | |
120 | <dd> | |
121 | <dt><b><tt id='l2h-887' xml:id='l2h-887'>MULTILINE</tt></b></dt><dd> | |
122 | When specified, the pattern character "<tt class="character">^</tt>" | |
123 | matches at the beginning of the string and at the beginning of each | |
124 | line (immediately following each newline); and the pattern character | |
125 | "<tt class="character">$</tt>" matches at the end of the string and at the end of each | |
126 | line (immediately preceding each newline). By default, | |
127 | "<tt class="character">^</tt>" matches only at the beginning of the | |
128 | string, and "<tt class="character">$</tt>" only at the end of the string and | |
129 | immediately before the newline (if any) at the end of the string. | |
130 | </dd></dl> | |
131 | ||
132 | <P> | |
133 | <dl><dt><b><tt id='l2h-873' xml:id='l2h-873'>S</tt></b></dt> | |
134 | <dd> | |
135 | <dt><b><tt id='l2h-888' xml:id='l2h-888'>DOTALL</tt></b></dt><dd> | |
136 | Make the "<tt class="character">.</tt>" special character match any character at all, | |
137 | including a newline; without this flag, "<tt class="character">.</tt>" will match | |
138 | anything <em>except</em> a newline. | |
139 | </dd></dl> | |
140 | ||
141 | <P> | |
142 | <dl><dt><b><tt id='l2h-874' xml:id='l2h-874'>U</tt></b></dt> | |
143 | <dd> | |
144 | <dt><b><tt id='l2h-889' xml:id='l2h-889'>UNICODE</tt></b></dt><dd> | |
145 | Make <tt class="regexp">\w</tt>, <tt class="regexp">\W</tt>, <tt class="regexp">\b</tt>, <tt class="regexp">\B</tt>, | |
146 | <tt class="regexp">\d</tt>, <tt class="regexp">\D</tt>, <tt class="regexp">\s</tt> and <tt class="regexp">\S</tt> | |
147 | dependent on the Unicode character properties database. | |
148 | ||
149 | <span class="versionnote">New in version 2.0.</span> | |
150 | ||
151 | </dd></dl> | |
152 | ||
153 | <P> | |
154 | <dl><dt><b><tt id='l2h-875' xml:id='l2h-875'>X</tt></b></dt> | |
155 | <dd> | |
156 | <dt><b><tt id='l2h-890' xml:id='l2h-890'>VERBOSE</tt></b></dt><dd> | |
157 | This flag allows you to write regular expressions that look nicer. | |
158 | Whitespace within the pattern is ignored, | |
159 | except when in a character class or preceded by an unescaped | |
160 | backslash, and, when a line contains a "<tt class="character">#</tt>" neither in a | |
161 | character class or preceded by an unescaped backslash, all characters | |
162 | from the leftmost such "<tt class="character">#</tt>" through the end of the line are | |
163 | ignored. | |
164 | </dd></dl> | |
165 | ||
166 | <P> | |
167 | <dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"> | |
168 | <td><nobr><b><tt id='l2h-876' xml:id='l2h-876' class="function">search</tt></b>(</nobr></td> | |
169 | <td><var>pattern, string</var><big>[</big><var>, flags</var><big>]</big><var></var>)</td></tr></table></dt> | |
170 | <dd> | |
171 | Scan through <var>string</var> looking for a location where the regular | |
172 | expression <var>pattern</var> produces a match, and return a | |
173 | corresponding <tt class="class">MatchObject</tt> instance. | |
174 | Return <code>None</code> if no | |
175 | position in the string matches the pattern; note that this is | |
176 | different from finding a zero-length match at some point in the string. | |
177 | </dl> | |
178 | ||
179 | <P> | |
180 | <dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"> | |
181 | <td><nobr><b><tt id='l2h-877' xml:id='l2h-877' class="function">match</tt></b>(</nobr></td> | |
182 | <td><var>pattern, string</var><big>[</big><var>, flags</var><big>]</big><var></var>)</td></tr></table></dt> | |
183 | <dd> | |
184 | If zero or more characters at the beginning of <var>string</var> match | |
185 | the regular expression <var>pattern</var>, return a corresponding | |
186 | <tt class="class">MatchObject</tt> instance. Return <code>None</code> if the string does not | |
187 | match the pattern; note that this is different from a zero-length | |
188 | match. | |
189 | ||
190 | <P> | |
191 | <span class="note"><b class="label">Note:</b> | |
192 | If you want to locate a match anywhere in | |
193 | <var>string</var>, use <tt class="method">search()</tt> instead.</span> | |
194 | </dl> | |
195 | ||
196 | <P> | |
197 | <dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"> | |
198 | <td><nobr><b><tt id='l2h-878' xml:id='l2h-878' class="function">split</tt></b>(</nobr></td> | |
199 | <td><var>pattern, string</var><big>[</big><var>, maxsplit<code> = 0</code></var><big>]</big><var></var>)</td></tr></table></dt> | |
200 | <dd> | |
201 | Split <var>string</var> by the occurrences of <var>pattern</var>. If | |
202 | capturing parentheses are used in <var>pattern</var>, then the text of all | |
203 | groups in the pattern are also returned as part of the resulting list. | |
204 | If <var>maxsplit</var> is nonzero, at most <var>maxsplit</var> splits | |
205 | occur, and the remainder of the string is returned as the final | |
206 | element of the list. (Incompatibility note: in the original Python | |
207 | 1.5 release, <var>maxsplit</var> was ignored. This has been fixed in | |
208 | later releases.) | |
209 | ||
210 | <P> | |
211 | <div class="verbatim"><pre> | |
212 | >>> re.split('\W+', 'Words, words, words.') | |
213 | ['Words', 'words', 'words', ''] | |
214 | >>> re.split('(\W+)', 'Words, words, words.') | |
215 | ['Words', ', ', 'words', ', ', 'words', '.', ''] | |
216 | >>> re.split('\W+', 'Words, words, words.', 1) | |
217 | ['Words', 'words, words.'] | |
218 | </pre></div> | |
219 | ||
220 | <P> | |
221 | This function combines and extends the functionality of | |
222 | the old <tt class="function">regsub.split()</tt> and <tt class="function">regsub.splitx()</tt>. | |
223 | </dl> | |
224 | ||
225 | <P> | |
226 | <dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"> | |
227 | <td><nobr><b><tt id='l2h-879' xml:id='l2h-879' class="function">findall</tt></b>(</nobr></td> | |
228 | <td><var>pattern, string</var><big>[</big><var>, flags</var><big>]</big><var></var>)</td></tr></table></dt> | |
229 | <dd> | |
230 | Return a list of all non-overlapping matches of <var>pattern</var> in | |
231 | <var>string</var>. If one or more groups are present in the pattern, | |
232 | return a list of groups; this will be a list of tuples if the | |
233 | pattern has more than one group. Empty matches are included in the | |
234 | result unless they touch the beginning of another match. | |
235 | ||
236 | <span class="versionnote">New in version 1.5.2.</span> | |
237 | ||
238 | <span class="versionnote">Changed in version 2.4: | |
239 | Added the optional flags argument.</span> | |
240 | ||
241 | </dl> | |
242 | ||
243 | <P> | |
244 | <dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"> | |
245 | <td><nobr><b><tt id='l2h-880' xml:id='l2h-880' class="function">finditer</tt></b>(</nobr></td> | |
246 | <td><var>pattern, string</var><big>[</big><var>, flags</var><big>]</big><var></var>)</td></tr></table></dt> | |
247 | <dd> | |
248 | Return an iterator over all non-overlapping matches for the RE | |
249 | <var>pattern</var> in <var>string</var>. For each match, the iterator returns | |
250 | a match object. Empty matches are included in the result unless they | |
251 | touch the beginning of another match. | |
252 | ||
253 | <span class="versionnote">New in version 2.2.</span> | |
254 | ||
255 | <span class="versionnote">Changed in version 2.4: | |
256 | Added the optional flags argument.</span> | |
257 | ||
258 | </dl> | |
259 | ||
260 | <P> | |
261 | <dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"> | |
262 | <td><nobr><b><tt id='l2h-881' xml:id='l2h-881' class="function">sub</tt></b>(</nobr></td> | |
263 | <td><var>pattern, repl, string</var><big>[</big><var>, count</var><big>]</big><var></var>)</td></tr></table></dt> | |
264 | <dd> | |
265 | Return the string obtained by replacing the leftmost non-overlapping | |
266 | occurrences of <var>pattern</var> in <var>string</var> by the replacement | |
267 | <var>repl</var>. If the pattern isn't found, <var>string</var> is returned | |
268 | unchanged. <var>repl</var> can be a string or a function; if it is a | |
269 | string, any backslash escapes in it are processed. That is, | |
270 | "<tt class="samp">\n</tt>" is converted to a single newline character, "<tt class="samp">\r</tt>" is converted to a linefeed, and so forth. Unknown escapes such as | |
271 | "<tt class="samp">\j</tt>" are left alone. Backreferences, such as "<tt class="samp">\6</tt>", are | |
272 | replaced with the substring matched by group 6 in the pattern. For | |
273 | example: | |
274 | ||
275 | <P> | |
276 | <div class="verbatim"><pre> | |
277 | >>> re.sub(r'def\s+([a-zA-Z_][a-zA-Z_0-9]*)\s*\(\s*\):', | |
278 | ... r'static PyObject*\npy_\1(void)\n{', | |
279 | ... 'def myfunc():') | |
280 | 'static PyObject*\npy_myfunc(void)\n{' | |
281 | </pre></div> | |
282 | ||
283 | <P> | |
284 | If <var>repl</var> is a function, it is called for every non-overlapping | |
285 | occurrence of <var>pattern</var>. The function takes a single match | |
286 | object argument, and returns the replacement string. For example: | |
287 | ||
288 | <P> | |
289 | <div class="verbatim"><pre> | |
290 | >>> def dashrepl(matchobj): | |
291 | ... if matchobj.group(0) == '-': return ' ' | |
292 | ... else: return '-' | |
293 | >>> re.sub('-{1,2}', dashrepl, 'pro----gram-files') | |
294 | 'pro--gram files' | |
295 | </pre></div> | |
296 | ||
297 | <P> | |
298 | The pattern may be a string or an RE object; if you need to specify | |
299 | regular expression flags, you must use a RE object, or use embedded | |
300 | modifiers in a pattern; for example, "<tt class="samp">sub("(?i)b+", "x", "bbbb | |
301 | BBBB")</tt>" returns <code>'x x'</code>. | |
302 | ||
303 | <P> | |
304 | The optional argument <var>count</var> is the maximum number of pattern | |
305 | occurrences to be replaced; <var>count</var> must be a non-negative | |
306 | integer. If omitted or zero, all occurrences will be replaced. | |
307 | Empty matches for the pattern are replaced only when not adjacent to | |
308 | a previous match, so "<tt class="samp">sub('x*', '-', 'abc')</tt>" returns | |
309 | <code>'-a-b-c-'</code>. | |
310 | ||
311 | <P> | |
312 | In addition to character escapes and backreferences as described | |
313 | above, "<tt class="samp">\g<name></tt>" will use the substring matched by the group | |
314 | named "<tt class="samp">name</tt>", as defined by the <tt class="regexp">(?P<name>...)</tt> syntax. | |
315 | "<tt class="samp">\g<number></tt>" uses the corresponding group number; | |
316 | "<tt class="samp">\g<2></tt>" is therefore equivalent to "<tt class="samp">\2</tt>", but isn't | |
317 | ambiguous in a replacement such as "<tt class="samp">\g<2>0</tt>". "<tt class="samp">\20</tt>" would be interpreted as a reference to group 20, not a reference to | |
318 | group 2 followed by the literal character "<tt class="character">0</tt>". The | |
319 | backreference "<tt class="samp">\g<0></tt>" substitutes in the entire substring | |
320 | matched by the RE. | |
321 | </dl> | |
322 | ||
323 | <P> | |
324 | <dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"> | |
325 | <td><nobr><b><tt id='l2h-882' xml:id='l2h-882' class="function">subn</tt></b>(</nobr></td> | |
326 | <td><var>pattern, repl, string</var><big>[</big><var>, count</var><big>]</big><var></var>)</td></tr></table></dt> | |
327 | <dd> | |
328 | Perform the same operation as <tt class="function">sub()</tt>, but return a tuple | |
329 | <code>(<var>new_string</var>, <var>number_of_subs_made</var>)</code>. | |
330 | </dl> | |
331 | ||
332 | <P> | |
333 | <dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"> | |
334 | <td><nobr><b><tt id='l2h-883' xml:id='l2h-883' class="function">escape</tt></b>(</nobr></td> | |
335 | <td><var>string</var>)</td></tr></table></dt> | |
336 | <dd> | |
337 | Return <var>string</var> with all non-alphanumerics backslashed; this is | |
338 | useful if you want to match an arbitrary literal string that may have | |
339 | regular expression metacharacters in it. | |
340 | </dl> | |
341 | ||
342 | <P> | |
343 | <dl><dt><b><span class="typelabel">exception</span> <tt id='l2h-884' xml:id='l2h-884' class="exception">error</tt></b></dt> | |
344 | <dd> | |
345 | Exception raised when a string passed to one of the functions here | |
346 | is not a valid regular expression (for example, it might contain | |
347 | unmatched parentheses) or when some other error occurs during | |
348 | compilation or matching. It is never an error if a string contains | |
349 | no match for a pattern. | |
350 | </dd></dl> | |
351 | ||
352 | <P> | |
353 | ||
354 | <DIV CLASS="navigation"> | |
355 | <div class='online-navigation'> | |
356 | <p></p><hr /> | |
357 | <table align="center" width="100%" cellpadding="0" cellspacing="2"> | |
358 | <tr> | |
359 | <td class='online-navigation'><a rel="prev" title="4.2.2 Matching vs Searching" | |
360 | href="matching-searching.html"><img src='../icons/previous.png' | |
361 | border='0' height='32' alt='Previous Page' width='32' /></A></td> | |
362 | <td class='online-navigation'><a rel="parent" title="4.2 re " | |
363 | href="module-re.html"><img src='../icons/up.png' | |
364 | border='0' height='32' alt='Up One Level' width='32' /></A></td> | |
365 | <td class='online-navigation'><a rel="next" title="4.2.4 Regular Expression Objects" | |
366 | href="re-objects.html"><img src='../icons/next.png' | |
367 | border='0' height='32' alt='Next Page' width='32' /></A></td> | |
368 | <td align="center" width="100%">Python Library Reference</td> | |
369 | <td class='online-navigation'><a rel="contents" title="Table of Contents" | |
370 | href="contents.html"><img src='../icons/contents.png' | |
371 | border='0' height='32' alt='Contents' width='32' /></A></td> | |
372 | <td class='online-navigation'><a href="modindex.html" title="Module Index"><img src='../icons/modules.png' | |
373 | border='0' height='32' alt='Module Index' width='32' /></a></td> | |
374 | <td class='online-navigation'><a rel="index" title="Index" | |
375 | href="genindex.html"><img src='../icons/index.png' | |
376 | border='0' height='32' alt='Index' width='32' /></A></td> | |
377 | </tr></table> | |
378 | <div class='online-navigation'> | |
379 | <b class="navlabel">Previous:</b> | |
380 | <a class="sectref" rel="prev" href="matching-searching.html">4.2.2 Matching vs Searching</A> | |
381 | <b class="navlabel">Up:</b> | |
382 | <a class="sectref" rel="parent" href="module-re.html">4.2 re </A> | |
383 | <b class="navlabel">Next:</b> | |
384 | <a class="sectref" rel="next" href="re-objects.html">4.2.4 Regular Expression Objects</A> | |
385 | </div> | |
386 | </div> | |
387 | <hr /> | |
388 | <span class="release-info">Release 2.4.2, documentation updated on 28 September 2005.</span> | |
389 | </DIV> | |
390 | <!--End of Navigation Panel--> | |
391 | <ADDRESS> | |
392 | See <i><a href="about.html">About this document...</a></i> for information on suggesting changes. | |
393 | </ADDRESS> | |
394 | </BODY> | |
395 | </HTML> |