Initial commit of OpenSPARC T2 architecture model.
[OpenSPARC-T2-SAM] / sam-t2 / devtools / v8plus / html / python / lib / sequence-matcher.html
CommitLineData
920dae64
AT
1<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
2<html>
3<head>
4<link rel="STYLESHEET" href="lib.css" type='text/css' />
5<link rel="SHORTCUT ICON" href="../icons/pyfav.png" type="image/png" />
6<link rel='start' href='../index.html' title='Python Documentation Index' />
7<link rel="first" href="lib.html" title='Python Library Reference' />
8<link rel='contents' href='contents.html' title="Contents" />
9<link rel='index' href='genindex.html' title='Index' />
10<link rel='last' href='about.html' title='About this document...' />
11<link rel='help' href='about.html' title='About this document...' />
12<link rel="next" href="sequencematcher-examples.html" />
13<link rel="prev" href="module-difflib.html" />
14<link rel="parent" href="module-difflib.html" />
15<link rel="next" href="sequencematcher-examples.html" />
16<meta name='aesop' content='information' />
17<title>4.4.1 SequenceMatcher Objects </title>
18</head>
19<body>
20<DIV CLASS="navigation">
21<div id='top-navigation-panel' xml:id='top-navigation-panel'>
22<table align="center" width="100%" cellpadding="0" cellspacing="2">
23<tr>
24<td class='online-navigation'><a rel="prev" title="4.4 difflib "
25 href="module-difflib.html"><img src='../icons/previous.png'
26 border='0' height='32' alt='Previous Page' width='32' /></A></td>
27<td class='online-navigation'><a rel="parent" title="4.4 difflib "
28 href="module-difflib.html"><img src='../icons/up.png'
29 border='0' height='32' alt='Up One Level' width='32' /></A></td>
30<td class='online-navigation'><a rel="next" title="4.4.2 SequenceMatcher Examples"
31 href="sequencematcher-examples.html"><img src='../icons/next.png'
32 border='0' height='32' alt='Next Page' width='32' /></A></td>
33<td align="center" width="100%">Python Library Reference</td>
34<td class='online-navigation'><a rel="contents" title="Table of Contents"
35 href="contents.html"><img src='../icons/contents.png'
36 border='0' height='32' alt='Contents' width='32' /></A></td>
37<td class='online-navigation'><a href="modindex.html" title="Module Index"><img src='../icons/modules.png'
38 border='0' height='32' alt='Module Index' width='32' /></a></td>
39<td class='online-navigation'><a rel="index" title="Index"
40 href="genindex.html"><img src='../icons/index.png'
41 border='0' height='32' alt='Index' width='32' /></A></td>
42</tr></table>
43<div class='online-navigation'>
44<b class="navlabel">Previous:</b>
45<a class="sectref" rel="prev" href="module-difflib.html">4.4 difflib </A>
46<b class="navlabel">Up:</b>
47<a class="sectref" rel="parent" href="module-difflib.html">4.4 difflib </A>
48<b class="navlabel">Next:</b>
49<a class="sectref" rel="next" href="sequencematcher-examples.html">4.4.2 SequenceMatcher Examples</A>
50</div>
51<hr /></div>
52</DIV>
53<!--End of Navigation Panel-->
54
55<H2><A NAME="SECTION006410000000000000000"></A><A NAME="sequence-matcher"></A>
56<BR>
574.4.1 SequenceMatcher Objects
58</H2>
59
60<P>
61The <tt class="class">SequenceMatcher</tt> class has this constructor:
62
63<P>
64<dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline">
65 <td><nobr><b><span class="typelabel">class</span>&nbsp;<tt id='l2h-936' xml:id='l2h-936' class="class">SequenceMatcher</tt></b>(</nobr></td>
66 <td><var></var><big>[</big><var>isjunk</var><big>[</big><var>,
67 a</var><big>[</big><var>, b</var><big>]</big><var></var><big>]</big><var></var><big>]</big><var></var>)</td></tr></table></dt>
68<dd>
69 Optional argument <var>isjunk</var> must be <code>None</code> (the default) or
70 a one-argument function that takes a sequence element and returns
71 true if and only if the element is ``junk'' and should be ignored.
72 Passing <code>None</code> for <var>isjunk</var> is equivalent to passing
73 <code>lambda x: 0</code>; in other words, no elements are ignored. For
74 example, pass:
75
76<P>
77<div class="verbatim"><pre>
78lambda x: x in " \t"
79</pre></div>
80
81<P>
82if you're comparing lines as sequences of characters, and don't want
83 to synch up on blanks or hard tabs.
84
85<P>
86The optional arguments <var>a</var> and <var>b</var> are sequences to be
87 compared; both default to empty strings. The elements of both
88 sequences must be hashable.
89</dl>
90
91<P>
92<tt class="class">SequenceMatcher</tt> objects have the following methods:
93
94<P>
95<dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline">
96 <td><nobr><b><tt id='l2h-937' xml:id='l2h-937' class="method">set_seqs</tt></b>(</nobr></td>
97 <td><var>a, b</var>)</td></tr></table></dt>
98<dd>
99 Set the two sequences to be compared.
100</dl>
101
102<P>
103<tt class="class">SequenceMatcher</tt> computes and caches detailed information about
104the second sequence, so if you want to compare one sequence against
105many sequences, use <tt class="method">set_seq2()</tt> to set the commonly used
106sequence once and call <tt class="method">set_seq1()</tt> repeatedly, once for each
107of the other sequences.
108
109<P>
110<dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline">
111 <td><nobr><b><tt id='l2h-938' xml:id='l2h-938' class="method">set_seq1</tt></b>(</nobr></td>
112 <td><var>a</var>)</td></tr></table></dt>
113<dd>
114 Set the first sequence to be compared. The second sequence to be
115 compared is not changed.
116</dl>
117
118<P>
119<dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline">
120 <td><nobr><b><tt id='l2h-939' xml:id='l2h-939' class="method">set_seq2</tt></b>(</nobr></td>
121 <td><var>b</var>)</td></tr></table></dt>
122<dd>
123 Set the second sequence to be compared. The first sequence to be
124 compared is not changed.
125</dl>
126
127<P>
128<dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline">
129 <td><nobr><b><tt id='l2h-940' xml:id='l2h-940' class="method">find_longest_match</tt></b>(</nobr></td>
130 <td><var>alo, ahi, blo, bhi</var>)</td></tr></table></dt>
131<dd>
132 Find longest matching block in <code><var>a</var>[<var>alo</var>:<var>ahi</var>]</code>
133 and <code><var>b</var>[<var>blo</var>:<var>bhi</var>]</code>.
134
135<P>
136If <var>isjunk</var> was omitted or <code>None</code>,
137 <tt class="method">get_longest_match()</tt> returns <code>(<var>i</var>, <var>j</var>,
138 <var>k</var>)</code> such that <code><var>a</var>[<var>i</var>:<var>i</var>+<var>k</var>]</code> is equal
139 to <code><var>b</var>[<var>j</var>:<var>j</var>+<var>k</var>]</code>, where
140 <code><var>alo</var> &lt;= <var>i</var> &lt;= <var>i</var>+<var>k</var> &lt;= <var>ahi</var></code> and
141 <code><var>blo</var> &lt;= <var>j</var> &lt;= <var>j</var>+<var>k</var> &lt;= <var>bhi</var></code>.
142 For all <code>(<var>i'</var>, <var>j'</var>, <var>k'</var>)</code> meeting those
143 conditions, the additional conditions
144 <code><var>k</var> &gt;= <var>k'</var></code>,
145 <code><var>i</var> &lt;= <var>i'</var></code>,
146 and if <code><var>i</var> == <var>i'</var></code>, <code><var>j</var> &lt;= <var>j'</var></code>
147 are also met.
148 In other words, of all maximal matching blocks, return one that
149 starts earliest in <var>a</var>, and of all those maximal matching blocks
150 that start earliest in <var>a</var>, return the one that starts earliest
151 in <var>b</var>.
152
153<P>
154<div class="verbatim"><pre>
155&gt;&gt;&gt; s = SequenceMatcher(None, " abcd", "abcd abcd")
156&gt;&gt;&gt; s.find_longest_match(0, 5, 0, 9)
157(0, 4, 5)
158</pre></div>
159
160<P>
161If <var>isjunk</var> was provided, first the longest matching block is
162 determined as above, but with the additional restriction that no
163 junk element appears in the block. Then that block is extended as
164 far as possible by matching (only) junk elements on both sides.
165 So the resulting block never matches on junk except as identical
166 junk happens to be adjacent to an interesting match.
167
168<P>
169Here's the same example as before, but considering blanks to be junk.
170 That prevents <code>' abcd'</code> from matching the <code>' abcd'</code> at the
171 tail end of the second sequence directly. Instead only the
172 <code>'abcd'</code> can match, and matches the leftmost <code>'abcd'</code> in
173 the second sequence:
174
175<P>
176<div class="verbatim"><pre>
177&gt;&gt;&gt; s = SequenceMatcher(lambda x: x==" ", " abcd", "abcd abcd")
178&gt;&gt;&gt; s.find_longest_match(0, 5, 0, 9)
179(1, 0, 4)
180</pre></div>
181
182<P>
183If no blocks match, this returns <code>(<var>alo</var>, <var>blo</var>, 0)</code>.
184</dl>
185
186<P>
187<dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline">
188 <td><nobr><b><tt id='l2h-941' xml:id='l2h-941' class="method">get_matching_blocks</tt></b>(</nobr></td>
189 <td><var></var>)</td></tr></table></dt>
190<dd>
191 Return list of triples describing matching subsequences.
192 Each triple is of the form <code>(<var>i</var>, <var>j</var>, <var>n</var>)</code>, and
193 means that <code><var>a</var>[<var>i</var>:<var>i</var>+<var>n</var>] ==
194 <var>b</var>[<var>j</var>:<var>j</var>+<var>n</var>]</code>. The triples are monotonically
195 increasing in <var>i</var> and <var>j</var>.
196
197<P>
198The last triple is a dummy, and has the value <code>(len(<var>a</var>),
199 len(<var>b</var>), 0)</code>. It is the only triple with <code><var>n</var> == 0</code>.
200
201<P>
202<div class="verbatim"><pre>
203&gt;&gt;&gt; s = SequenceMatcher(None, "abxcd", "abcd")
204&gt;&gt;&gt; s.get_matching_blocks()
205[(0, 0, 2), (3, 2, 2), (5, 4, 0)]
206</pre></div>
207</dl>
208
209<P>
210<dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline">
211 <td><nobr><b><tt id='l2h-942' xml:id='l2h-942' class="method">get_opcodes</tt></b>(</nobr></td>
212 <td><var></var>)</td></tr></table></dt>
213<dd>
214 Return list of 5-tuples describing how to turn <var>a</var> into <var>b</var>.
215 Each tuple is of the form <code>(<var>tag</var>, <var>i1</var>, <var>i2</var>,
216 <var>j1</var>, <var>j2</var>)</code>. The first tuple has <code><var>i1</var> ==
217 <var>j1</var> == 0</code>, and remaining tuples have <var>i1</var> equal to the
218 <var>i2</var> from the preceding tuple, and, likewise, <var>j1</var> equal to
219 the previous <var>j2</var>.
220
221<P>
222The <var>tag</var> values are strings, with these meanings:
223
224<P>
225<div class="center"><table class="realtable">
226 <thead>
227 <tr>
228 <th class="left" >Value</th>
229 <th class="left" >Meaning</th>
230 </tr>
231 </thead>
232 <tbody>
233 <tr><td class="left" valign="baseline"><code>'replace'</code></td>
234 <td class="left" ><code><var>a</var>[<var>i1</var>:<var>i2</var>]</code> should be
235 replaced by <code><var>b</var>[<var>j1</var>:<var>j2</var>]</code>.</td></tr>
236 <tr><td class="left" valign="baseline"><code>'delete'</code></td>
237 <td class="left" ><code><var>a</var>[<var>i1</var>:<var>i2</var>]</code> should be
238 deleted. Note that <code><var>j1</var> == <var>j2</var></code> in
239 this case.</td></tr>
240 <tr><td class="left" valign="baseline"><code>'insert'</code></td>
241 <td class="left" ><code><var>b</var>[<var>j1</var>:<var>j2</var>]</code> should be
242 inserted at <code><var>a</var>[<var>i1</var>:<var>i1</var>]</code>.
243 Note that <code><var>i1</var> == <var>i2</var></code> in this
244 case.</td></tr>
245 <tr><td class="left" valign="baseline"><code>'equal'</code></td>
246 <td class="left" ><code><var>a</var>[<var>i1</var>:<var>i2</var>] ==
247 <var>b</var>[<var>j1</var>:<var>j2</var>]</code> (the sub-sequences are
248 equal).</td></tr></tbody>
249</table></div>
250
251<P>
252For example:
253
254<P>
255<div class="verbatim"><pre>
256&gt;&gt;&gt; a = "qabxcd"
257&gt;&gt;&gt; b = "abycdf"
258&gt;&gt;&gt; s = SequenceMatcher(None, a, b)
259&gt;&gt;&gt; for tag, i1, i2, j1, j2 in s.get_opcodes():
260... print ("%7s a[%d:%d] (%s) b[%d:%d] (%s)" %
261... (tag, i1, i2, a[i1:i2], j1, j2, b[j1:j2]))
262 delete a[0:1] (q) b[0:0] ()
263 equal a[1:3] (ab) b[0:2] (ab)
264replace a[3:4] (x) b[2:3] (y)
265 equal a[4:6] (cd) b[3:5] (cd)
266 insert a[6:6] () b[5:6] (f)
267</pre></div>
268</dl>
269
270<P>
271<dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline">
272 <td><nobr><b><tt id='l2h-943' xml:id='l2h-943' class="method">get_grouped_opcodes</tt></b>(</nobr></td>
273 <td><var></var><big>[</big><var>n</var><big>]</big><var></var>)</td></tr></table></dt>
274<dd>
275 Return a generator of groups with up to <var>n</var> lines of context.
276
277<P>
278Starting with the groups returned by <tt class="method">get_opcodes()</tt>,
279 this method splits out smaller change clusters and eliminates
280 intervening ranges which have no changes.
281
282<P>
283The groups are returned in the same format as <tt class="method">get_opcodes()</tt>.
284
285<span class="versionnote">New in version 2.3.</span>
286
287</dl>
288
289<P>
290<dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline">
291 <td><nobr><b><tt id='l2h-944' xml:id='l2h-944' class="method">ratio</tt></b>(</nobr></td>
292 <td><var></var>)</td></tr></table></dt>
293<dd>
294 Return a measure of the sequences' similarity as a float in the
295 range [0, 1].
296
297<P>
298Where T is the total number of elements in both sequences, and M is
299 the number of matches, this is 2.0*M / T. Note that this is
300 <code>1.0</code> if the sequences are identical, and <code>0.0</code> if they
301 have nothing in common.
302
303<P>
304This is expensive to compute if <tt class="method">get_matching_blocks()</tt> or
305 <tt class="method">get_opcodes()</tt> hasn't already been called, in which case you
306 may want to try <tt class="method">quick_ratio()</tt> or
307 <tt class="method">real_quick_ratio()</tt> first to get an upper bound.
308</dl>
309
310<P>
311<dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline">
312 <td><nobr><b><tt id='l2h-945' xml:id='l2h-945' class="method">quick_ratio</tt></b>(</nobr></td>
313 <td><var></var>)</td></tr></table></dt>
314<dd>
315 Return an upper bound on <tt class="method">ratio()</tt> relatively quickly.
316
317<P>
318This isn't defined beyond that it is an upper bound on
319 <tt class="method">ratio()</tt>, and is faster to compute.
320</dl>
321
322<P>
323<dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline">
324 <td><nobr><b><tt id='l2h-946' xml:id='l2h-946' class="method">real_quick_ratio</tt></b>(</nobr></td>
325 <td><var></var>)</td></tr></table></dt>
326<dd>
327 Return an upper bound on <tt class="method">ratio()</tt> very quickly.
328
329<P>
330This isn't defined beyond that it is an upper bound on
331 <tt class="method">ratio()</tt>, and is faster to compute than either
332 <tt class="method">ratio()</tt> or <tt class="method">quick_ratio()</tt>.
333</dl>
334
335<P>
336The three methods that return the ratio of matching to total characters
337can give different results due to differing levels of approximation,
338although <tt class="method">quick_ratio()</tt> and <tt class="method">real_quick_ratio()</tt> are always
339at least as large as <tt class="method">ratio()</tt>:
340
341<P>
342<div class="verbatim"><pre>
343&gt;&gt;&gt; s = SequenceMatcher(None, "abcd", "bcde")
344&gt;&gt;&gt; s.ratio()
3450.75
346&gt;&gt;&gt; s.quick_ratio()
3470.75
348&gt;&gt;&gt; s.real_quick_ratio()
3491.0
350</pre></div>
351
352<P>
353
354<DIV CLASS="navigation">
355<div class='online-navigation'>
356<p></p><hr />
357<table align="center" width="100%" cellpadding="0" cellspacing="2">
358<tr>
359<td class='online-navigation'><a rel="prev" title="4.4 difflib "
360 href="module-difflib.html"><img src='../icons/previous.png'
361 border='0' height='32' alt='Previous Page' width='32' /></A></td>
362<td class='online-navigation'><a rel="parent" title="4.4 difflib "
363 href="module-difflib.html"><img src='../icons/up.png'
364 border='0' height='32' alt='Up One Level' width='32' /></A></td>
365<td class='online-navigation'><a rel="next" title="4.4.2 SequenceMatcher Examples"
366 href="sequencematcher-examples.html"><img src='../icons/next.png'
367 border='0' height='32' alt='Next Page' width='32' /></A></td>
368<td align="center" width="100%">Python Library Reference</td>
369<td class='online-navigation'><a rel="contents" title="Table of Contents"
370 href="contents.html"><img src='../icons/contents.png'
371 border='0' height='32' alt='Contents' width='32' /></A></td>
372<td class='online-navigation'><a href="modindex.html" title="Module Index"><img src='../icons/modules.png'
373 border='0' height='32' alt='Module Index' width='32' /></a></td>
374<td class='online-navigation'><a rel="index" title="Index"
375 href="genindex.html"><img src='../icons/index.png'
376 border='0' height='32' alt='Index' width='32' /></A></td>
377</tr></table>
378<div class='online-navigation'>
379<b class="navlabel">Previous:</b>
380<a class="sectref" rel="prev" href="module-difflib.html">4.4 difflib </A>
381<b class="navlabel">Up:</b>
382<a class="sectref" rel="parent" href="module-difflib.html">4.4 difflib </A>
383<b class="navlabel">Next:</b>
384<a class="sectref" rel="next" href="sequencematcher-examples.html">4.4.2 SequenceMatcher Examples</A>
385</div>
386</div>
387<hr />
388<span class="release-info">Release 2.4.2, documentation updated on 28 September 2005.</span>
389</DIV>
390<!--End of Navigation Panel-->
391<ADDRESS>
392See <i><a href="about.html">About this document...</a></i> for information on suggesting changes.
393</ADDRESS>
394</BODY>
395</HTML>