Initial commit of OpenSPARC T2 architecture model.
[OpenSPARC-T2-SAM] / sam-t2 / devtools / amd64 / html / python / lib / module-htmllib.html
CommitLineData
920dae64
AT
1<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
2<html>
3<head>
4<link rel="STYLESHEET" href="lib.css" type='text/css' />
5<link rel="SHORTCUT ICON" href="../icons/pyfav.png" type="image/png" />
6<link rel='start' href='../index.html' title='Python Documentation Index' />
7<link rel="first" href="lib.html" title='Python Library Reference' />
8<link rel='contents' href='contents.html' title="Contents" />
9<link rel='index' href='genindex.html' title='Index' />
10<link rel='last' href='about.html' title='About this document...' />
11<link rel='help' href='about.html' title='About this document...' />
12<link rel="next" href="module-htmlentitydefs.html" />
13<link rel="prev" href="module-sgmllib.html" />
14<link rel="parent" href="markup.html" />
15<link rel="next" href="html-parser-objects.html" />
16<meta name='aesop' content='information' />
17<title>13.3 htmllib -- A parser for HTML documents</title>
18</head>
19<body>
20<DIV CLASS="navigation">
21<div id='top-navigation-panel' xml:id='top-navigation-panel'>
22<table align="center" width="100%" cellpadding="0" cellspacing="2">
23<tr>
24<td class='online-navigation'><a rel="prev" title="13.2 sgmllib "
25 href="module-sgmllib.html"><img src='../icons/previous.png'
26 border='0' height='32' alt='Previous Page' width='32' /></A></td>
27<td class='online-navigation'><a rel="parent" title="13. Structured Markup Processing"
28 href="markup.html"><img src='../icons/up.png'
29 border='0' height='32' alt='Up One Level' width='32' /></A></td>
30<td class='online-navigation'><a rel="next" title="13.3.1 HTMLParser Objects"
31 href="html-parser-objects.html"><img src='../icons/next.png'
32 border='0' height='32' alt='Next Page' width='32' /></A></td>
33<td align="center" width="100%">Python Library Reference</td>
34<td class='online-navigation'><a rel="contents" title="Table of Contents"
35 href="contents.html"><img src='../icons/contents.png'
36 border='0' height='32' alt='Contents' width='32' /></A></td>
37<td class='online-navigation'><a href="modindex.html" title="Module Index"><img src='../icons/modules.png'
38 border='0' height='32' alt='Module Index' width='32' /></a></td>
39<td class='online-navigation'><a rel="index" title="Index"
40 href="genindex.html"><img src='../icons/index.png'
41 border='0' height='32' alt='Index' width='32' /></A></td>
42</tr></table>
43<div class='online-navigation'>
44<b class="navlabel">Previous:</b>
45<a class="sectref" rel="prev" href="module-sgmllib.html">13.2 sgmllib </A>
46<b class="navlabel">Up:</b>
47<a class="sectref" rel="parent" href="markup.html">13. Structured Markup Processing</A>
48<b class="navlabel">Next:</b>
49<a class="sectref" rel="next" href="html-parser-objects.html">13.3.1 HTMLParser Objects</A>
50</div>
51<hr /></div>
52</DIV>
53<!--End of Navigation Panel-->
54
55<H1><A NAME="SECTION0015300000000000000000">
5613.3 <tt class="module">htmllib</tt> --
57 A parser for HTML documents</A>
58</H1>
59
60<P>
61<A NAME="module-htmllib"></A>
62
63<P>
64<a id='l2h-4286' xml:id='l2h-4286'></a>
65
66<P>
67This module defines a class which can serve as a base for parsing text
68files formatted in the HyperText Mark-up Language (HTML). The class
69is not directly concerned with I/O -- it must be provided with input
70in string form via a method, and makes calls to methods of a
71``formatter'' object in order to produce output. The
72<tt class="class">HTMLParser</tt> class is designed to be used as a base class for
73other classes in order to add functionality, and allows most of its
74methods to be extended or overridden. In turn, this class is derived
75from and extends the <tt class="class">SGMLParser</tt> class defined in module
76<tt class="module"><a href="module-sgmllib.html">sgmllib</a></tt><a id='l2h-4287' xml:id='l2h-4287'></a>. The <tt class="class">HTMLParser</tt>
77implementation supports the HTML 2.0 language as described in
78<a class="rfc" id='rfcref-91438' xml:id='rfcref-91438'
79href="http://www.faqs.org/rfcs/rfc1866.html">RFC 1866</a>. Two implementations of formatter objects are provided in
80the <tt class="module"><a href="module-formatter.html">formatter</a></tt><a id='l2h-4288' xml:id='l2h-4288'></a> module; refer to the
81documentation for that module for information on the formatter
82interface.
83<a id='l2h-4283' xml:id='l2h-4283'></a>
84<P>
85The following is a summary of the interface defined by
86<tt class="class">sgmllib.SGMLParser</tt>:
87
88<P>
89
90<UL>
91<LI>The interface to feed data to an instance is through the <tt class="method">feed()</tt>
92method, which takes a string argument. This can be called with as
93little or as much text at a time as desired; "<tt class="samp">p.feed(a);
94p.feed(b)</tt>" has the same effect as "<tt class="samp">p.feed(a+b)</tt>". When the data
95contains complete HTML markup constructs, these are processed immediately;
96incomplete constructs are saved in a buffer. To force processing of all
97unprocessed data, call the <tt class="method">close()</tt> method.
98
99<P>
100For example, to parse the entire contents of a file, use:
101<div class="verbatim"><pre>
102parser.feed(open('myfile.html').read())
103parser.close()
104</pre></div>
105
106<P>
107</LI>
108<LI>The interface to define semantics for HTML tags is very simple: derive
109a class and define methods called <tt class="method">start_<var>tag</var>()</tt>,
110<tt class="method">end_<var>tag</var>()</tt>, or <tt class="method">do_<var>tag</var>()</tt>. The parser will
111call these at appropriate moments: <tt class="method">start_<var>tag</var></tt> or
112<tt class="method">do_<var>tag</var>()</tt> is called when an opening tag of the form
113<code>&lt;<var>tag</var> ...&gt;</code> is encountered; <tt class="method">end_<var>tag</var>()</tt> is called
114when a closing tag of the form <code>&lt;<var>tag</var>&gt;</code> is encountered. If
115an opening tag requires a corresponding closing tag, like <code>&lt;H1&gt;</code>
116... <code>&lt;/H1&gt;</code>, the class should define the <tt class="method">start_<var>tag</var>()</tt>
117method; if a tag requires no closing tag, like <code>&lt;P&gt;</code>, the class
118should define the <tt class="method">do_<var>tag</var>()</tt> method.
119
120<P>
121</LI>
122</UL>
123
124<P>
125The module defines a parser class and an exception:
126
127<P>
128<dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline">
129 <td><nobr><b><span class="typelabel">class</span>&nbsp;<tt id='l2h-4284' xml:id='l2h-4284' class="class">HTMLParser</tt></b>(</nobr></td>
130 <td><var>formatter</var>)</td></tr></table></dt>
131<dd>
132This is the basic HTML parser class. It supports all entity names
133required by the XHTML 1.0 Recommendation (<a class="url" href="http://www.w3.org/TR/xhtml1">http://www.w3.org/TR/xhtml1</a>).
134It also defines handlers for all HTML 2.0 and many HTML 3.0 and 3.2 elements.
135</dl>
136
137<P>
138<dl><dt><b><span class="typelabel">exception</span>&nbsp;<tt id='l2h-4285' xml:id='l2h-4285' class="exception">HTMLParseError</tt></b></dt>
139<dd>
140Exception raised by the <tt class="class">HTMLParser</tt> class when it encounters an
141error while parsing.
142
143<span class="versionnote">New in version 2.4.</span>
144
145</dd></dl>
146
147<P>
148<div class="seealso">
149 <p class="heading">See Also:</p>
150
151 <dl compact="compact" class="seemodule">
152 <dt>Module <b><tt class="module"><a href="module-formatter.html">formatter</a></tt>:</b>
153 <dd>Interface definition for transforming an
154 abstract flow of formatting events into
155 specific output events on writer objects.
156 </dl>
157 <dl compact="compact" class="seemodule">
158 <dt>Module <b><tt class="module"><a href="module-HTMLParser.html">HTMLParser</a></tt>:</b>
159 <dd>Alternate HTML parser that offers a slightly
160 lower-level view of the input, but is
161 designed to work with XHTML, and does not
162 implement some of the SGML syntax not used in
163 ``HTML as deployed'' and which isn't legal
164 for XHTML.
165 </dl>
166 <dl compact="compact" class="seemodule">
167 <dt>Module <b><tt class="module"><a href="module-htmlentitydefs.html">htmlentitydefs</a></tt>:</b>
168 <dd>Definition of replacement text for XHTML 1.0
169 entities.
170 </dl>
171 <dl compact="compact" class="seemodule">
172 <dt>Module <b><tt class="module"><a href="module-sgmllib.html">sgmllib</a></tt>:</b>
173 <dd>Base class for <tt class="class">HTMLParser</tt>.
174 </dl>
175</div>
176
177<P>
178
179<p><br /></p><hr class='online-navigation' />
180<div class='online-navigation'>
181<!--Table of Child-Links-->
182<A NAME="CHILD_LINKS"><STRONG>Subsections</STRONG></a>
183
184<UL CLASS="ChildLinks">
185<LI><A href="html-parser-objects.html">13.3.1 HTMLParser Objects</a>
186</ul>
187<!--End of Table of Child-Links-->
188</div>
189
190<DIV CLASS="navigation">
191<div class='online-navigation'>
192<p></p><hr />
193<table align="center" width="100%" cellpadding="0" cellspacing="2">
194<tr>
195<td class='online-navigation'><a rel="prev" title="13.2 sgmllib "
196 href="module-sgmllib.html"><img src='../icons/previous.png'
197 border='0' height='32' alt='Previous Page' width='32' /></A></td>
198<td class='online-navigation'><a rel="parent" title="13. Structured Markup Processing"
199 href="markup.html"><img src='../icons/up.png'
200 border='0' height='32' alt='Up One Level' width='32' /></A></td>
201<td class='online-navigation'><a rel="next" title="13.3.1 HTMLParser Objects"
202 href="html-parser-objects.html"><img src='../icons/next.png'
203 border='0' height='32' alt='Next Page' width='32' /></A></td>
204<td align="center" width="100%">Python Library Reference</td>
205<td class='online-navigation'><a rel="contents" title="Table of Contents"
206 href="contents.html"><img src='../icons/contents.png'
207 border='0' height='32' alt='Contents' width='32' /></A></td>
208<td class='online-navigation'><a href="modindex.html" title="Module Index"><img src='../icons/modules.png'
209 border='0' height='32' alt='Module Index' width='32' /></a></td>
210<td class='online-navigation'><a rel="index" title="Index"
211 href="genindex.html"><img src='../icons/index.png'
212 border='0' height='32' alt='Index' width='32' /></A></td>
213</tr></table>
214<div class='online-navigation'>
215<b class="navlabel">Previous:</b>
216<a class="sectref" rel="prev" href="module-sgmllib.html">13.2 sgmllib </A>
217<b class="navlabel">Up:</b>
218<a class="sectref" rel="parent" href="markup.html">13. Structured Markup Processing</A>
219<b class="navlabel">Next:</b>
220<a class="sectref" rel="next" href="html-parser-objects.html">13.3.1 HTMLParser Objects</A>
221</div>
222</div>
223<hr />
224<span class="release-info">Release 2.4.2, documentation updated on 28 September 2005.</span>
225</DIV>
226<!--End of Navigation Panel-->
227<ADDRESS>
228See <i><a href="about.html">About this document...</a></i> for information on suggesting changes.
229</ADDRESS>
230</BODY>
231</HTML>