| 1 | <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"> |
| 2 | <html> |
| 3 | <head> |
| 4 | <link rel="STYLESHEET" href="lib.css" type='text/css' /> |
| 5 | <link rel="SHORTCUT ICON" href="../icons/pyfav.png" type="image/png" /> |
| 6 | <link rel='start' href='../index.html' title='Python Documentation Index' /> |
| 7 | <link rel="first" href="lib.html" title='Python Library Reference' /> |
| 8 | <link rel='contents' href='contents.html' title="Contents" /> |
| 9 | <link rel='index' href='genindex.html' title='Index' /> |
| 10 | <link rel='last' href='about.html' title='About this document...' /> |
| 11 | <link rel='help' href='about.html' title='About this document...' /> |
| 12 | <link rel="next" href="module-tabnanny.html" /> |
| 13 | <link rel="prev" href="module-keyword.html" /> |
| 14 | <link rel="parent" href="language.html" /> |
| 15 | <link rel="next" href="module-tabnanny.html" /> |
| 16 | <meta name='aesop' content='information' /> |
| 17 | <title>18.5 tokenize -- Tokenizer for Python source</title> |
| 18 | </head> |
| 19 | <body> |
| 20 | <DIV CLASS="navigation"> |
| 21 | <div id='top-navigation-panel' xml:id='top-navigation-panel'> |
| 22 | <table align="center" width="100%" cellpadding="0" cellspacing="2"> |
| 23 | <tr> |
| 24 | <td class='online-navigation'><a rel="prev" title="18.4 keyword " |
| 25 | href="module-keyword.html"><img src='../icons/previous.png' |
| 26 | border='0' height='32' alt='Previous Page' width='32' /></A></td> |
| 27 | <td class='online-navigation'><a rel="parent" title="18. Python Language Services" |
| 28 | href="language.html"><img src='../icons/up.png' |
| 29 | border='0' height='32' alt='Up One Level' width='32' /></A></td> |
| 30 | <td class='online-navigation'><a rel="next" title="18.6 tabnanny " |
| 31 | href="module-tabnanny.html"><img src='../icons/next.png' |
| 32 | border='0' height='32' alt='Next Page' width='32' /></A></td> |
| 33 | <td align="center" width="100%">Python Library Reference</td> |
| 34 | <td class='online-navigation'><a rel="contents" title="Table of Contents" |
| 35 | href="contents.html"><img src='../icons/contents.png' |
| 36 | border='0' height='32' alt='Contents' width='32' /></A></td> |
| 37 | <td class='online-navigation'><a href="modindex.html" title="Module Index"><img src='../icons/modules.png' |
| 38 | border='0' height='32' alt='Module Index' width='32' /></a></td> |
| 39 | <td class='online-navigation'><a rel="index" title="Index" |
| 40 | href="genindex.html"><img src='../icons/index.png' |
| 41 | border='0' height='32' alt='Index' width='32' /></A></td> |
| 42 | </tr></table> |
| 43 | <div class='online-navigation'> |
| 44 | <b class="navlabel">Previous:</b> |
| 45 | <a class="sectref" rel="prev" href="module-keyword.html">18.4 keyword </A> |
| 46 | <b class="navlabel">Up:</b> |
| 47 | <a class="sectref" rel="parent" href="language.html">18. Python Language Services</A> |
| 48 | <b class="navlabel">Next:</b> |
| 49 | <a class="sectref" rel="next" href="module-tabnanny.html">18.6 tabnanny </A> |
| 50 | </div> |
| 51 | <hr /></div> |
| 52 | </DIV> |
| 53 | <!--End of Navigation Panel--> |
| 54 | |
| 55 | <H1><A NAME="SECTION0020500000000000000000"> |
| 56 | 18.5 <tt class="module">tokenize</tt> -- |
| 57 | Tokenizer for Python source</A> |
| 58 | </H1> |
| 59 | |
| 60 | <P> |
| 61 | <A NAME="module-tokenize"></A> |
| 62 | |
| 63 | <P> |
| 64 | The <tt class="module">tokenize</tt> module provides a lexical scanner for Python |
| 65 | source code, implemented in Python. The scanner in this module |
| 66 | returns comments as tokens as well, making it useful for implementing |
| 67 | ``pretty-printers,'' including colorizers for on-screen displays. |
| 68 | |
| 69 | <P> |
| 70 | The primary entry point is a generator: |
| 71 | |
| 72 | <P> |
| 73 | <dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"> |
| 74 | <td><nobr><b><tt id='l2h-4969' xml:id='l2h-4969' class="function">generate_tokens</tt></b>(</nobr></td> |
| 75 | <td><var>readline</var>)</td></tr></table></dt> |
| 76 | <dd> |
| 77 | The <tt class="function">generate_tokens()</tt> generator requires one argment, |
| 78 | <var>readline</var>, which must be a callable object which |
| 79 | provides the same interface as the <tt class="method">readline()</tt> method of |
| 80 | built-in file objects (see section <A href="bltin-file-objects.html#bltin-file-objects">2.3.9</A>). Each |
| 81 | call to the function should return one line of input as a string. |
| 82 | |
| 83 | <P> |
| 84 | The generator produces 5-tuples with these members: |
| 85 | the token type; |
| 86 | the token string; |
| 87 | a 2-tuple <code>(<var>srow</var>, <var>scol</var>)</code> of ints specifying the |
| 88 | row and column where the token begins in the source; |
| 89 | a 2-tuple <code>(<var>erow</var>, <var>ecol</var>)</code> of ints specifying the |
| 90 | row and column where the token ends in the source; |
| 91 | and the line on which the token was found. |
| 92 | The line passed is the <em>logical</em> line; |
| 93 | continuation lines are included. |
| 94 | |
| 95 | <span class="versionnote">New in version 2.2.</span> |
| 96 | |
| 97 | </dl> |
| 98 | |
| 99 | <P> |
| 100 | An older entry point is retained for backward compatibility: |
| 101 | |
| 102 | <P> |
| 103 | <dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"> |
| 104 | <td><nobr><b><tt id='l2h-4970' xml:id='l2h-4970' class="function">tokenize</tt></b>(</nobr></td> |
| 105 | <td><var>readline</var><big>[</big><var>, tokeneater</var><big>]</big><var></var>)</td></tr></table></dt> |
| 106 | <dd> |
| 107 | The <tt class="function">tokenize()</tt> function accepts two parameters: one |
| 108 | representing the input stream, and one providing an output mechanism |
| 109 | for <tt class="function">tokenize()</tt>. |
| 110 | |
| 111 | <P> |
| 112 | The first parameter, <var>readline</var>, must be a callable object which |
| 113 | provides the same interface as the <tt class="method">readline()</tt> method of |
| 114 | built-in file objects (see section <A href="bltin-file-objects.html#bltin-file-objects">2.3.9</A>). Each |
| 115 | call to the function should return one line of input as a string. |
| 116 | |
| 117 | <P> |
| 118 | The second parameter, <var>tokeneater</var>, must also be a callable |
| 119 | object. It is called once for each token, with five arguments, |
| 120 | corresponding to the tuples generated by <tt class="function">generate_tokens()</tt>. |
| 121 | </dl> |
| 122 | |
| 123 | <P> |
| 124 | All constants from the <tt class="module"><a href="module-token.html">token</a></tt> module are also exported from |
| 125 | <tt class="module">tokenize</tt>, as are two additional token type values that might be |
| 126 | passed to the <var>tokeneater</var> function by <tt class="function">tokenize()</tt>: |
| 127 | |
| 128 | <P> |
| 129 | <dl><dt><b><tt id='l2h-4971' xml:id='l2h-4971'>COMMENT</tt></b></dt> |
| 130 | <dd> |
| 131 | Token value used to indicate a comment. |
| 132 | </dd></dl> |
| 133 | <dl><dt><b><tt id='l2h-4972' xml:id='l2h-4972'>NL</tt></b></dt> |
| 134 | <dd> |
| 135 | Token value used to indicate a non-terminating newline. The NEWLINE |
| 136 | token indicates the end of a logical line of Python code; NL tokens |
| 137 | are generated when a logical line of code is continued over multiple |
| 138 | physical lines. |
| 139 | </dd></dl> |
| 140 | |
| 141 | <DIV CLASS="navigation"> |
| 142 | <div class='online-navigation'> |
| 143 | <p></p><hr /> |
| 144 | <table align="center" width="100%" cellpadding="0" cellspacing="2"> |
| 145 | <tr> |
| 146 | <td class='online-navigation'><a rel="prev" title="18.4 keyword " |
| 147 | href="module-keyword.html"><img src='../icons/previous.png' |
| 148 | border='0' height='32' alt='Previous Page' width='32' /></A></td> |
| 149 | <td class='online-navigation'><a rel="parent" title="18. Python Language Services" |
| 150 | href="language.html"><img src='../icons/up.png' |
| 151 | border='0' height='32' alt='Up One Level' width='32' /></A></td> |
| 152 | <td class='online-navigation'><a rel="next" title="18.6 tabnanny " |
| 153 | href="module-tabnanny.html"><img src='../icons/next.png' |
| 154 | border='0' height='32' alt='Next Page' width='32' /></A></td> |
| 155 | <td align="center" width="100%">Python Library Reference</td> |
| 156 | <td class='online-navigation'><a rel="contents" title="Table of Contents" |
| 157 | href="contents.html"><img src='../icons/contents.png' |
| 158 | border='0' height='32' alt='Contents' width='32' /></A></td> |
| 159 | <td class='online-navigation'><a href="modindex.html" title="Module Index"><img src='../icons/modules.png' |
| 160 | border='0' height='32' alt='Module Index' width='32' /></a></td> |
| 161 | <td class='online-navigation'><a rel="index" title="Index" |
| 162 | href="genindex.html"><img src='../icons/index.png' |
| 163 | border='0' height='32' alt='Index' width='32' /></A></td> |
| 164 | </tr></table> |
| 165 | <div class='online-navigation'> |
| 166 | <b class="navlabel">Previous:</b> |
| 167 | <a class="sectref" rel="prev" href="module-keyword.html">18.4 keyword </A> |
| 168 | <b class="navlabel">Up:</b> |
| 169 | <a class="sectref" rel="parent" href="language.html">18. Python Language Services</A> |
| 170 | <b class="navlabel">Next:</b> |
| 171 | <a class="sectref" rel="next" href="module-tabnanny.html">18.6 tabnanny </A> |
| 172 | </div> |
| 173 | </div> |
| 174 | <hr /> |
| 175 | <span class="release-info">Release 2.4.2, documentation updated on 28 September 2005.</span> |
| 176 | </DIV> |
| 177 | <!--End of Navigation Panel--> |
| 178 | <ADDRESS> |
| 179 | See <i><a href="about.html">About this document...</a></i> for information on suggesting changes. |
| 180 | </ADDRESS> |
| 181 | </BODY> |
| 182 | </HTML> |