Initial commit of OpenSPARC T2 architecture model.
[OpenSPARC-T2-SAM] / sam-t2 / devtools / v8plus / html / python / lib / module-robotparser.html
CommitLineData
920dae64
AT
1<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
2<html>
3<head>
4<link rel="STYLESHEET" href="lib.css" type='text/css' />
5<link rel="SHORTCUT ICON" href="../icons/pyfav.png" type="image/png" />
6<link rel='start' href='../index.html' title='Python Documentation Index' />
7<link rel="first" href="lib.html" title='Python Library Reference' />
8<link rel='contents' href='contents.html' title="Contents" />
9<link rel='index' href='genindex.html' title='Index' />
10<link rel='last' href='about.html' title='About this document...' />
11<link rel='help' href='about.html' title='About this document...' />
12<link rel="next" href="module-csv.html" />
13<link rel="prev" href="module-netrc.html" />
14<link rel="parent" href="netdata.html" />
15<link rel="next" href="module-csv.html" />
16<meta name='aesop' content='information' />
17<title>12.19 robotparser -- Parser for robots.txt</title>
18</head>
19<body>
20<DIV CLASS="navigation">
21<div id='top-navigation-panel' xml:id='top-navigation-panel'>
22<table align="center" width="100%" cellpadding="0" cellspacing="2">
23<tr>
24<td class='online-navigation'><a rel="prev" title="12.18.1 netrc Objects"
25 href="netrc-objects.html"><img src='../icons/previous.png'
26 border='0' height='32' alt='Previous Page' width='32' /></A></td>
27<td class='online-navigation'><a rel="parent" title="12. Internet Data Handling"
28 href="netdata.html"><img src='../icons/up.png'
29 border='0' height='32' alt='Up One Level' width='32' /></A></td>
30<td class='online-navigation'><a rel="next" title="12.20 csv "
31 href="module-csv.html"><img src='../icons/next.png'
32 border='0' height='32' alt='Next Page' width='32' /></A></td>
33<td align="center" width="100%">Python Library Reference</td>
34<td class='online-navigation'><a rel="contents" title="Table of Contents"
35 href="contents.html"><img src='../icons/contents.png'
36 border='0' height='32' alt='Contents' width='32' /></A></td>
37<td class='online-navigation'><a href="modindex.html" title="Module Index"><img src='../icons/modules.png'
38 border='0' height='32' alt='Module Index' width='32' /></a></td>
39<td class='online-navigation'><a rel="index" title="Index"
40 href="genindex.html"><img src='../icons/index.png'
41 border='0' height='32' alt='Index' width='32' /></A></td>
42</tr></table>
43<div class='online-navigation'>
44<b class="navlabel">Previous:</b>
45<a class="sectref" rel="prev" href="netrc-objects.html">12.18.1 netrc Objects</A>
46<b class="navlabel">Up:</b>
47<a class="sectref" rel="parent" href="netdata.html">12. Internet Data Handling</A>
48<b class="navlabel">Next:</b>
49<a class="sectref" rel="next" href="module-csv.html">12.20 csv </A>
50</div>
51<hr /></div>
52</DIV>
53<!--End of Navigation Panel-->
54
55<H1><A NAME="SECTION00141900000000000000000">
5612.19 <tt class="module">robotparser</tt> --
57 Parser for robots.txt</A>
58</H1>
59
60<P>
61<A NAME="module-robotparser"></A>
62
63<P>
64<a id='l2h-4209' xml:id='l2h-4209'></a>
65
66<P>
67This module provides a single class, <tt class="class">RobotFileParser</tt>, which answers
68questions about whether or not a particular user agent can fetch a URL on
69the Web site that published the <span class="file">robots.txt</span> file. For more details on
70the structure of <span class="file">robots.txt</span> files, see
71<a class="url" href="http://www.robotstxt.org/wc/norobots.html">http://www.robotstxt.org/wc/norobots.html</a>.
72
73<P>
74<dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline">
75 <td><nobr><b><span class="typelabel">class</span>&nbsp;<tt id='l2h-4202' xml:id='l2h-4202' class="class">RobotFileParser</tt></b>(</nobr></td>
76 <td><var></var>)</td></tr></table></dt>
77<dd>
78
79<P>
80This class provides a set of methods to read, parse and answer questions
81about a single <span class="file">robots.txt</span> file.
82
83<P>
84<dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline">
85 <td><nobr><b><tt id='l2h-4203' xml:id='l2h-4203' class="method">set_url</tt></b>(</nobr></td>
86 <td><var>url</var>)</td></tr></table></dt>
87<dd>
88Sets the URL referring to a <span class="file">robots.txt</span> file.
89</dl>
90
91<P>
92<dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline">
93 <td><nobr><b><tt id='l2h-4204' xml:id='l2h-4204' class="method">read</tt></b>(</nobr></td>
94 <td><var></var>)</td></tr></table></dt>
95<dd>
96Reads the <span class="file">robots.txt</span> URL and feeds it to the parser.
97</dl>
98
99<P>
100<dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline">
101 <td><nobr><b><tt id='l2h-4205' xml:id='l2h-4205' class="method">parse</tt></b>(</nobr></td>
102 <td><var>lines</var>)</td></tr></table></dt>
103<dd>
104Parses the lines argument.
105</dl>
106
107<P>
108<dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline">
109 <td><nobr><b><tt id='l2h-4206' xml:id='l2h-4206' class="method">can_fetch</tt></b>(</nobr></td>
110 <td><var>useragent, url</var>)</td></tr></table></dt>
111<dd>
112Returns <code>True</code> if the <var>useragent</var> is allowed to fetch the <var>url</var>
113according to the rules contained in the parsed <span class="file">robots.txt</span> file.
114</dl>
115
116<P>
117<dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline">
118 <td><nobr><b><tt id='l2h-4207' xml:id='l2h-4207' class="method">mtime</tt></b>(</nobr></td>
119 <td><var></var>)</td></tr></table></dt>
120<dd>
121Returns the time the <code>robots.txt</code> file was last fetched. This is
122useful for long-running web spiders that need to check for new
123<code>robots.txt</code> files periodically.
124</dl>
125
126<P>
127<dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline">
128 <td><nobr><b><tt id='l2h-4208' xml:id='l2h-4208' class="method">modified</tt></b>(</nobr></td>
129 <td><var></var>)</td></tr></table></dt>
130<dd>
131Sets the time the <code>robots.txt</code> file was last fetched to the current
132time.
133</dl>
134
135<P>
136</dl>
137
138<P>
139The following example demonstrates basic use of the RobotFileParser class.
140
141<P>
142<div class="verbatim"><pre>
143&gt;&gt;&gt; import robotparser
144&gt;&gt;&gt; rp = robotparser.RobotFileParser()
145&gt;&gt;&gt; rp.set_url("http://www.musi-cal.com/robots.txt")
146&gt;&gt;&gt; rp.read()
147&gt;&gt;&gt; rp.can_fetch("*", "http://www.musi-cal.com/cgi-bin/search?city=San+Francisco")
148False
149&gt;&gt;&gt; rp.can_fetch("*", "http://www.musi-cal.com/")
150True
151</pre></div>
152
153<DIV CLASS="navigation">
154<div class='online-navigation'>
155<p></p><hr />
156<table align="center" width="100%" cellpadding="0" cellspacing="2">
157<tr>
158<td class='online-navigation'><a rel="prev" title="12.18.1 netrc Objects"
159 href="netrc-objects.html"><img src='../icons/previous.png'
160 border='0' height='32' alt='Previous Page' width='32' /></A></td>
161<td class='online-navigation'><a rel="parent" title="12. Internet Data Handling"
162 href="netdata.html"><img src='../icons/up.png'
163 border='0' height='32' alt='Up One Level' width='32' /></A></td>
164<td class='online-navigation'><a rel="next" title="12.20 csv "
165 href="module-csv.html"><img src='../icons/next.png'
166 border='0' height='32' alt='Next Page' width='32' /></A></td>
167<td align="center" width="100%">Python Library Reference</td>
168<td class='online-navigation'><a rel="contents" title="Table of Contents"
169 href="contents.html"><img src='../icons/contents.png'
170 border='0' height='32' alt='Contents' width='32' /></A></td>
171<td class='online-navigation'><a href="modindex.html" title="Module Index"><img src='../icons/modules.png'
172 border='0' height='32' alt='Module Index' width='32' /></a></td>
173<td class='online-navigation'><a rel="index" title="Index"
174 href="genindex.html"><img src='../icons/index.png'
175 border='0' height='32' alt='Index' width='32' /></A></td>
176</tr></table>
177<div class='online-navigation'>
178<b class="navlabel">Previous:</b>
179<a class="sectref" rel="prev" href="netrc-objects.html">12.18.1 netrc Objects</A>
180<b class="navlabel">Up:</b>
181<a class="sectref" rel="parent" href="netdata.html">12. Internet Data Handling</A>
182<b class="navlabel">Next:</b>
183<a class="sectref" rel="next" href="module-csv.html">12.20 csv </A>
184</div>
185</div>
186<hr />
187<span class="release-info">Release 2.4.2, documentation updated on 28 September 2005.</span>
188</DIV>
189<!--End of Navigation Panel-->
190<ADDRESS>
191See <i><a href="about.html">About this document...</a></i> for information on suggesting changes.
192</ADDRESS>
193</BODY>
194</HTML>