Initial commit of OpenSPARC T2 architecture model.
[OpenSPARC-T2-SAM] / sam-t2 / devtools / v8plus / html / python / lib / module-difflib.html
CommitLineData
920dae64
AT
1<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
2<html>
3<head>
4<link rel="STYLESHEET" href="lib.css" type='text/css' />
5<link rel="SHORTCUT ICON" href="../icons/pyfav.png" type="image/png" />
6<link rel='start' href='../index.html' title='Python Documentation Index' />
7<link rel="first" href="lib.html" title='Python Library Reference' />
8<link rel='contents' href='contents.html' title="Contents" />
9<link rel='index' href='genindex.html' title='Index' />
10<link rel='last' href='about.html' title='About this document...' />
11<link rel='help' href='about.html' title='About this document...' />
12<link rel="next" href="module-fpformat.html" />
13<link rel="prev" href="module-struct.html" />
14<link rel="parent" href="strings.html" />
15<link rel="next" href="sequence-matcher.html" />
16<meta name='aesop' content='information' />
17<title>4.4 difflib -- Helpers for computing deltas</title>
18</head>
19<body>
20<DIV CLASS="navigation">
21<div id='top-navigation-panel' xml:id='top-navigation-panel'>
22<table align="center" width="100%" cellpadding="0" cellspacing="2">
23<tr>
24<td class='online-navigation'><a rel="prev" title="4.3 struct "
25 href="module-struct.html"><img src='../icons/previous.png'
26 border='0' height='32' alt='Previous Page' width='32' /></A></td>
27<td class='online-navigation'><a rel="parent" title="4. String Services"
28 href="strings.html"><img src='../icons/up.png'
29 border='0' height='32' alt='Up One Level' width='32' /></A></td>
30<td class='online-navigation'><a rel="next" title="4.4.1 SequenceMatcher Objects"
31 href="sequence-matcher.html"><img src='../icons/next.png'
32 border='0' height='32' alt='Next Page' width='32' /></A></td>
33<td align="center" width="100%">Python Library Reference</td>
34<td class='online-navigation'><a rel="contents" title="Table of Contents"
35 href="contents.html"><img src='../icons/contents.png'
36 border='0' height='32' alt='Contents' width='32' /></A></td>
37<td class='online-navigation'><a href="modindex.html" title="Module Index"><img src='../icons/modules.png'
38 border='0' height='32' alt='Module Index' width='32' /></a></td>
39<td class='online-navigation'><a rel="index" title="Index"
40 href="genindex.html"><img src='../icons/index.png'
41 border='0' height='32' alt='Index' width='32' /></A></td>
42</tr></table>
43<div class='online-navigation'>
44<b class="navlabel">Previous:</b>
45<a class="sectref" rel="prev" href="module-struct.html">4.3 struct </A>
46<b class="navlabel">Up:</b>
47<a class="sectref" rel="parent" href="strings.html">4. String Services</A>
48<b class="navlabel">Next:</b>
49<a class="sectref" rel="next" href="sequence-matcher.html">4.4.1 SequenceMatcher Objects</A>
50</div>
51<hr /></div>
52</DIV>
53<!--End of Navigation Panel-->
54
55<H1><A NAME="SECTION006400000000000000000">
564.4 <tt class="module">difflib</tt> --
57 Helpers for computing deltas</A>
58</H1>
59
60<P>
61<A NAME="module-difflib"></A>
62
63<P>
64
65<span class="versionnote">New in version 2.1.</span>
66
67<P>
68<dl><dt><b><span class="typelabel">class</span>&nbsp;<tt id='l2h-923' xml:id='l2h-923' class="class">SequenceMatcher</tt></b>
69<dd>
70 This is a flexible class for comparing pairs of sequences of any
71 type, so long as the sequence elements are hashable. The basic
72 algorithm predates, and is a little fancier than, an algorithm
73 published in the late 1980's by Ratcliff and Obershelp under the
74 hyperbolic name ``gestalt pattern matching.'' The idea is to find
75 the longest contiguous matching subsequence that contains no
76 ``junk'' elements (the Ratcliff and Obershelp algorithm doesn't
77 address junk). The same idea is then applied recursively to the
78 pieces of the sequences to the left and to the right of the matching
79 subsequence. This does not yield minimal edit sequences, but does
80 tend to yield matches that ``look right'' to people.
81
82<P>
83<strong>Timing:</strong> The basic Ratcliff-Obershelp algorithm is cubic
84 time in the worst case and quadratic time in the expected case.
85 <tt class="class">SequenceMatcher</tt> is quadratic time for the worst case and has
86 expected-case behavior dependent in a complicated way on how many
87 elements the sequences have in common; best case time is linear.
88</dl>
89
90<P>
91<dl><dt><b><span class="typelabel">class</span>&nbsp;<tt id='l2h-924' xml:id='l2h-924' class="class">Differ</tt></b>
92<dd>
93 This is a class for comparing sequences of lines of text, and
94 producing human-readable differences or deltas. Differ uses
95 <tt class="class">SequenceMatcher</tt> both to compare sequences of lines, and to
96 compare sequences of characters within similar (near-matching)
97 lines.
98
99<P>
100Each line of a <tt class="class">Differ</tt> delta begins with a two-letter code:
101
102<P>
103<div class="center"><table class="realtable">
104 <thead>
105 <tr>
106 <th class="left" >Code</th>
107 <th class="left" >Meaning</th>
108 </tr>
109 </thead>
110 <tbody>
111 <tr><td class="left" valign="baseline"><code>'- '</code></td>
112 <td class="left" >line unique to sequence 1</td></tr>
113 <tr><td class="left" valign="baseline"><code>'+ '</code></td>
114 <td class="left" >line unique to sequence 2</td></tr>
115 <tr><td class="left" valign="baseline"><code>' '</code></td>
116 <td class="left" >line common to both sequences</td></tr>
117 <tr><td class="left" valign="baseline"><code>'? '</code></td>
118 <td class="left" >line not present in either input sequence</td></tr></tbody>
119</table></div>
120
121<P>
122Lines beginning with `<code>?&nbsp;</code>' attempt to guide the eye to
123 intraline differences, and were not present in either input
124 sequence. These lines can be confusing if the sequences contain tab
125 characters.
126</dl>
127
128<P>
129<dl><dt><b><span class="typelabel">class</span>&nbsp;<tt id='l2h-925' xml:id='l2h-925' class="class">HtmlDiff</tt></b>
130<dd>
131
132<P>
133This class can be used to create an HTML table (or a complete HTML file
134 containing the table) showing a side by side, line by line comparison
135 of text with inter-line and intra-line change highlights. The table can
136 be generated in either full or contextual difference mode.
137
138<P>
139The constructor for this class is:
140
141<P>
142<dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline">
143 <td><nobr><b><tt id='l2h-926' xml:id='l2h-926' class="function">__init__</tt></b>(</nobr></td>
144 <td><var></var><big>[</big><var>tabsize</var><big>]</big><var></var><big>[</big><var>,
145 wrapcolumn</var><big>]</big><var></var><big>[</big><var>, linejunk</var><big>]</big><var></var><big>[</big><var>, charjunk</var><big>]</big><var></var>)</td></tr></table></dt>
146<dd>
147
148<P>
149Initializes instance of <tt class="class">HtmlDiff</tt>.
150
151<P>
152<var>tabsize</var> is an optional keyword argument to specify tab stop spacing
153 and defaults to <code>8</code>.
154
155<P>
156<var>wrapcolumn</var> is an optional keyword to specify column number where
157 lines are broken and wrapped, defaults to <code>None</code> where lines are not
158 wrapped.
159
160<P>
161<var>linejunk</var> and <var>charjunk</var> are optional keyword arguments passed
162 into <code>ndiff()</code> (used by <tt class="class">HtmlDiff</tt> to generate the
163 side by side HTML differences). See <code>ndiff()</code> documentation for
164 argument default values and descriptions.
165
166<P>
167</dl>
168
169<P>
170The following methods are public:
171
172<P>
173<dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline">
174 <td><nobr><b><tt id='l2h-927' xml:id='l2h-927' class="function">make_file</tt></b>(</nobr></td>
175 <td><var>fromlines, tolines
176 </var><big>[</big><var>, fromdesc</var><big>]</big><var></var><big>[</big><var>, todesc</var><big>]</big><var></var><big>[</big><var>, context</var><big>]</big><var></var><big>[</big><var>,
177 numlines</var><big>]</big><var></var>)</td></tr></table></dt>
178<dd>
179 Compares <var>fromlines</var> and <var>tolines</var> (lists of strings) and returns
180 a string which is a complete HTML file containing a table showing line by
181 line differences with inter-line and intra-line changes highlighted.
182
183<P>
184<var>fromdesc</var> and <var>todesc</var> are optional keyword arguments to specify
185 from/to file column header strings (both default to an empty string).
186
187<P>
188<var>context</var> and <var>numlines</var> are both optional keyword arguments.
189 Set <var>context</var> to <code>True</code> when contextual differences are to be
190 shown, else the default is <code>False</code> to show the full files.
191 <var>numlines</var> defaults to <code>5</code>. When <var>context</var> is <code>True</code>
192 <var>numlines</var> controls the number of context lines which surround the
193 difference highlights. When <var>context</var> is <code>False</code> <var>numlines</var>
194 controls the number of lines which are shown before a difference
195 highlight when using the "next" hyperlinks (setting to zero would cause
196 the "next" hyperlinks to place the next difference highlight at the top of
197 the browser without any leading context).
198 </dl>
199
200<P>
201<dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline">
202 <td><nobr><b><tt id='l2h-928' xml:id='l2h-928' class="function">make_table</tt></b>(</nobr></td>
203 <td><var>fromlines, tolines
204 </var><big>[</big><var>, fromdesc</var><big>]</big><var></var><big>[</big><var>, todesc</var><big>]</big><var></var><big>[</big><var>, context</var><big>]</big><var></var><big>[</big><var>,
205 numlines</var><big>]</big><var></var>)</td></tr></table></dt>
206<dd>
207 Compares <var>fromlines</var> and <var>tolines</var> (lists of strings) and returns
208 a string which is a complete HTML table showing line by line differences
209 with inter-line and intra-line changes highlighted.
210
211<P>
212The arguments for this method are the same as those for the
213 <tt class="method">make_file()</tt> method.
214 </dl>
215
216<P>
217<span class="file">Tools/scripts/diff.py</span> is a command-line front-end to this class
218 and contains a good example of its use.
219
220<P>
221
222<span class="versionnote">New in version 2.4.</span>
223
224</dl>
225
226<P>
227<dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline">
228 <td><nobr><b><tt id='l2h-929' xml:id='l2h-929' class="function">context_diff</tt></b>(</nobr></td>
229 <td><var>a, b</var><big>[</big><var>, fromfile</var><big>]</big><var></var><big>[</big><var>,
230 tofile</var><big>]</big><var></var><big>[</big><var>, fromfiledate</var><big>]</big><var></var><big>[</big><var>, tofiledate</var><big>]</big><var></var><big>[</big><var>,
231 n</var><big>]</big><var></var><big>[</big><var>, lineterm</var><big>]</big><var></var>)</td></tr></table></dt>
232<dd>
233 Compare <var>a</var> and <var>b</var> (lists of strings); return a
234 delta (a generator generating the delta lines) in context diff
235 format.
236
237<P>
238Context diffs are a compact way of showing just the lines that have
239 changed plus a few lines of context. The changes are shown in a
240 before/after style. The number of context lines is set by <var>n</var>
241 which defaults to three.
242
243<P>
244By default, the diff control lines (those with <code>***</code> or <code>-&#45;-</code>)
245 are created with a trailing newline. This is helpful so that inputs created
246 from <tt class="function">file.readlines()</tt> result in diffs that are suitable for use
247 with <tt class="function">file.writelines()</tt> since both the inputs and outputs have
248 trailing newlines.
249
250<P>
251For inputs that do not have trailing newlines, set the <var>lineterm</var>
252 argument to <code>""</code> so that the output will be uniformly newline free.
253
254<P>
255The context diff format normally has a header for filenames and
256 modification times. Any or all of these may be specified using strings for
257 <var>fromfile</var>, <var>tofile</var>, <var>fromfiledate</var>, and <var>tofiledate</var>.
258 The modification times are normally expressed in the format returned by
259 <tt class="function">time.ctime()</tt>. If not specified, the strings default to blanks.
260
261<P>
262<span class="file">Tools/scripts/diff.py</span> is a command-line front-end for this
263 function.
264
265<P>
266
267<span class="versionnote">New in version 2.3.</span>
268
269</dl>
270
271<P>
272<dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline">
273 <td><nobr><b><tt id='l2h-930' xml:id='l2h-930' class="function">get_close_matches</tt></b>(</nobr></td>
274 <td><var>word, possibilities</var><big>[</big><var>,
275 n</var><big>]</big><var></var><big>[</big><var>, cutoff</var><big>]</big><var></var>)</td></tr></table></dt>
276<dd>
277 Return a list of the best ``good enough'' matches. <var>word</var> is a
278 sequence for which close matches are desired (typically a string),
279 and <var>possibilities</var> is a list of sequences against which to
280 match <var>word</var> (typically a list of strings).
281
282<P>
283Optional argument <var>n</var> (default <code>3</code>) is the maximum number
284 of close matches to return; <var>n</var> must be greater than <code>0</code>.
285
286<P>
287Optional argument <var>cutoff</var> (default <code>0.6</code>) is a float in
288 the range [0, 1]. Possibilities that don't score at least that
289 similar to <var>word</var> are ignored.
290
291<P>
292The best (no more than <var>n</var>) matches among the possibilities are
293 returned in a list, sorted by similarity score, most similar first.
294
295<P>
296<div class="verbatim"><pre>
297&gt;&gt;&gt; get_close_matches('appel', ['ape', 'apple', 'peach', 'puppy'])
298['apple', 'ape']
299&gt;&gt;&gt; import keyword
300&gt;&gt;&gt; get_close_matches('wheel', keyword.kwlist)
301['while']
302&gt;&gt;&gt; get_close_matches('apple', keyword.kwlist)
303[]
304&gt;&gt;&gt; get_close_matches('accept', keyword.kwlist)
305['except']
306</pre></div>
307</dl>
308
309<P>
310<dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline">
311 <td><nobr><b><tt id='l2h-931' xml:id='l2h-931' class="function">ndiff</tt></b>(</nobr></td>
312 <td><var>a, b</var><big>[</big><var>, linejunk</var><big>]</big><var></var><big>[</big><var>, charjunk</var><big>]</big><var></var>)</td></tr></table></dt>
313<dd>
314 Compare <var>a</var> and <var>b</var> (lists of strings); return a
315 <tt class="class">Differ</tt>-style delta (a generator generating the delta lines).
316
317<P>
318Optional keyword parameters <var>linejunk</var> and <var>charjunk</var> are
319 for filter functions (or <code>None</code>):
320
321<P>
322<var>linejunk</var>: A function that accepts a single string
323 argument, and returns true if the string is junk, or false if not.
324 The default is (<code>None</code>), starting with Python 2.3. Before then,
325 the default was the module-level function
326 <tt class="function">IS_LINE_JUNK()</tt>, which filters out lines without visible
327 characters, except for at most one pound character ("<tt class="character">#</tt>").
328 As of Python 2.3, the underlying <tt class="class">SequenceMatcher</tt> class
329 does a dynamic analysis of which lines are so frequent as to
330 constitute noise, and this usually works better than the pre-2.3
331 default.
332
333<P>
334<var>charjunk</var>: A function that accepts a character (a string of
335 length 1), and returns if the character is junk, or false if not.
336 The default is module-level function <tt class="function">IS_CHARACTER_JUNK()</tt>,
337 which filters out whitespace characters (a blank or tab; note: bad
338 idea to include newline in this!).
339
340<P>
341<span class="file">Tools/scripts/ndiff.py</span> is a command-line front-end to this
342 function.
343
344<P>
345<div class="verbatim"><pre>
346&gt;&gt;&gt; diff = ndiff('one\ntwo\nthree\n'.splitlines(1),
347... 'ore\ntree\nemu\n'.splitlines(1))
348&gt;&gt;&gt; print ''.join(diff),
349- one
350? ^
351+ ore
352? ^
353- two
354- three
355? -
356+ tree
357+ emu
358</pre></div>
359</dl>
360
361<P>
362<dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline">
363 <td><nobr><b><tt id='l2h-932' xml:id='l2h-932' class="function">restore</tt></b>(</nobr></td>
364 <td><var>sequence, which</var>)</td></tr></table></dt>
365<dd>
366 Return one of the two sequences that generated a delta.
367
368<P>
369Given a <var>sequence</var> produced by <tt class="method">Differ.compare()</tt> or
370 <tt class="function">ndiff()</tt>, extract lines originating from file 1 or 2
371 (parameter <var>which</var>), stripping off line prefixes.
372
373<P>
374Example:
375
376<P>
377<div class="verbatim"><pre>
378&gt;&gt;&gt; diff = ndiff('one\ntwo\nthree\n'.splitlines(1),
379... 'ore\ntree\nemu\n'.splitlines(1))
380&gt;&gt;&gt; diff = list(diff) # materialize the generated delta into a list
381&gt;&gt;&gt; print ''.join(restore(diff, 1)),
382one
383two
384three
385&gt;&gt;&gt; print ''.join(restore(diff, 2)),
386ore
387tree
388emu
389</pre></div>
390
391<P>
392</dl>
393
394<P>
395<dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline">
396 <td><nobr><b><tt id='l2h-933' xml:id='l2h-933' class="function">unified_diff</tt></b>(</nobr></td>
397 <td><var>a, b</var><big>[</big><var>, fromfile</var><big>]</big><var></var><big>[</big><var>,
398 tofile</var><big>]</big><var></var><big>[</big><var>, fromfiledate</var><big>]</big><var></var><big>[</big><var>, tofiledate</var><big>]</big><var></var><big>[</big><var>,
399 n</var><big>]</big><var></var><big>[</big><var>, lineterm</var><big>]</big><var></var>)</td></tr></table></dt>
400<dd>
401 Compare <var>a</var> and <var>b</var> (lists of strings); return a
402 delta (a generator generating the delta lines) in unified diff
403 format.
404
405<P>
406Unified diffs are a compact way of showing just the lines that have
407 changed plus a few lines of context. The changes are shown in a
408 inline style (instead of separate before/after blocks). The number
409 of context lines is set by <var>n</var> which defaults to three.
410
411<P>
412By default, the diff control lines (those with <code>-&#45;-</code>, <code>+++</code>,
413 or <code>@@</code>) are created with a trailing newline. This is helpful so
414 that inputs created from <tt class="function">file.readlines()</tt> result in diffs
415 that are suitable for use with <tt class="function">file.writelines()</tt> since both
416 the inputs and outputs have trailing newlines.
417
418<P>
419For inputs that do not have trailing newlines, set the <var>lineterm</var>
420 argument to <code>""</code> so that the output will be uniformly newline free.
421
422<P>
423The context diff format normally has a header for filenames and
424 modification times. Any or all of these may be specified using strings for
425 <var>fromfile</var>, <var>tofile</var>, <var>fromfiledate</var>, and <var>tofiledate</var>.
426 The modification times are normally expressed in the format returned by
427 <tt class="function">time.ctime()</tt>. If not specified, the strings default to blanks.
428
429<P>
430<span class="file">Tools/scripts/diff.py</span> is a command-line front-end for this
431 function.
432
433<P>
434
435<span class="versionnote">New in version 2.3.</span>
436
437</dl>
438
439<P>
440<dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline">
441 <td><nobr><b><tt id='l2h-934' xml:id='l2h-934' class="function">IS_LINE_JUNK</tt></b>(</nobr></td>
442 <td><var>line</var>)</td></tr></table></dt>
443<dd>
444 Return true for ignorable lines. The line <var>line</var> is ignorable
445 if <var>line</var> is blank or contains a single "<tt class="character">#</tt>",
446 otherwise it is not ignorable. Used as a default for parameter
447 <var>linejunk</var> in <tt class="function">ndiff()</tt> before Python 2.3.
448</dl>
449
450<P>
451<dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline">
452 <td><nobr><b><tt id='l2h-935' xml:id='l2h-935' class="function">IS_CHARACTER_JUNK</tt></b>(</nobr></td>
453 <td><var>ch</var>)</td></tr></table></dt>
454<dd>
455 Return true for ignorable characters. The character <var>ch</var> is
456 ignorable if <var>ch</var> is a space or tab, otherwise it is not
457 ignorable. Used as a default for parameter <var>charjunk</var> in
458 <tt class="function">ndiff()</tt>.
459</dl>
460
461<P>
462<div class="seealso">
463 <p class="heading">See Also:</p>
464
465 <dl compact="compact" class="seetitle">
466 <dt><em class="citetitle"><a href="http://www.ddj.com/documents/s=1103/ddj8807c/"
467 >Pattern Matching: The Gestalt Approach</a></em></dt>
468 <dd>Discussion of a
469 similar algorithm by John W. Ratcliff and D. E. Metzener.
470 This was published in
471 <em class="citetitle"><a
472 href="http://www.ddj.com/"
473 title="Dr. Dobb's Journal"
474 >Dr. Dobb's Journal</a></em> in
475 July, 1988.</dd>
476 </dl>
477</div>
478
479<P>
480
481<p><br /></p><hr class='online-navigation' />
482<div class='online-navigation'>
483<!--Table of Child-Links-->
484<A NAME="CHILD_LINKS"><STRONG>Subsections</STRONG></a>
485
486<UL CLASS="ChildLinks">
487<LI><A href="sequence-matcher.html">4.4.1 SequenceMatcher Objects</a>
488<LI><A href="sequencematcher-examples.html">4.4.2 SequenceMatcher Examples</a>
489<LI><A href="differ-objects.html">4.4.3 Differ Objects</a>
490<LI><A href="differ-examples.html">4.4.4 Differ Example</a>
491</ul>
492<!--End of Table of Child-Links-->
493</div>
494
495<DIV CLASS="navigation">
496<div class='online-navigation'>
497<p></p><hr />
498<table align="center" width="100%" cellpadding="0" cellspacing="2">
499<tr>
500<td class='online-navigation'><a rel="prev" title="4.3 struct "
501 href="module-struct.html"><img src='../icons/previous.png'
502 border='0' height='32' alt='Previous Page' width='32' /></A></td>
503<td class='online-navigation'><a rel="parent" title="4. String Services"
504 href="strings.html"><img src='../icons/up.png'
505 border='0' height='32' alt='Up One Level' width='32' /></A></td>
506<td class='online-navigation'><a rel="next" title="4.4.1 SequenceMatcher Objects"
507 href="sequence-matcher.html"><img src='../icons/next.png'
508 border='0' height='32' alt='Next Page' width='32' /></A></td>
509<td align="center" width="100%">Python Library Reference</td>
510<td class='online-navigation'><a rel="contents" title="Table of Contents"
511 href="contents.html"><img src='../icons/contents.png'
512 border='0' height='32' alt='Contents' width='32' /></A></td>
513<td class='online-navigation'><a href="modindex.html" title="Module Index"><img src='../icons/modules.png'
514 border='0' height='32' alt='Module Index' width='32' /></a></td>
515<td class='online-navigation'><a rel="index" title="Index"
516 href="genindex.html"><img src='../icons/index.png'
517 border='0' height='32' alt='Index' width='32' /></A></td>
518</tr></table>
519<div class='online-navigation'>
520<b class="navlabel">Previous:</b>
521<a class="sectref" rel="prev" href="module-struct.html">4.3 struct </A>
522<b class="navlabel">Up:</b>
523<a class="sectref" rel="parent" href="strings.html">4. String Services</A>
524<b class="navlabel">Next:</b>
525<a class="sectref" rel="next" href="sequence-matcher.html">4.4.1 SequenceMatcher Objects</A>
526</div>
527</div>
528<hr />
529<span class="release-info">Release 2.4.2, documentation updated on 28 September 2005.</span>
530</DIV>
531<!--End of Navigation Panel-->
532<ADDRESS>
533See <i><a href="about.html">About this document...</a></i> for information on suggesting changes.
534</ADDRESS>
535</BODY>
536</HTML>