| 1 | <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"> |
| 2 | <html> |
| 3 | <head> |
| 4 | <link rel="STYLESHEET" href="lib.css" type='text/css' /> |
| 5 | <link rel="SHORTCUT ICON" href="../icons/pyfav.png" type="image/png" /> |
| 6 | <link rel='start' href='../index.html' title='Python Documentation Index' /> |
| 7 | <link rel="first" href="lib.html" title='Python Library Reference' /> |
| 8 | <link rel='contents' href='contents.html' title="Contents" /> |
| 9 | <link rel='index' href='genindex.html' title='Index' /> |
| 10 | <link rel='last' href='about.html' title='About this document...' /> |
| 11 | <link rel='help' href='about.html' title='About this document...' /> |
| 12 | <link rel="next" href="module-urllib2.html" /> |
| 13 | <link rel="prev" href="module-cgitb.html" /> |
| 14 | <link rel="parent" href="internet.html" /> |
| 15 | <link rel="next" href="urlopener-objs.html" /> |
| 16 | <meta name='aesop' content='information' /> |
| 17 | <title>11.4 urllib -- Open arbitrary resources by URL</title> |
| 18 | </head> |
| 19 | <body> |
| 20 | <DIV CLASS="navigation"> |
| 21 | <div id='top-navigation-panel' xml:id='top-navigation-panel'> |
| 22 | <table align="center" width="100%" cellpadding="0" cellspacing="2"> |
| 23 | <tr> |
| 24 | <td class='online-navigation'><a rel="prev" title="11.3 cgitb " |
| 25 | href="module-cgitb.html"><img src='../icons/previous.png' |
| 26 | border='0' height='32' alt='Previous Page' width='32' /></A></td> |
| 27 | <td class='online-navigation'><a rel="parent" title="11. Internet Protocols and" |
| 28 | href="internet.html"><img src='../icons/up.png' |
| 29 | border='0' height='32' alt='Up One Level' width='32' /></A></td> |
| 30 | <td class='online-navigation'><a rel="next" title="11.4.1 URLopener Objects" |
| 31 | href="urlopener-objs.html"><img src='../icons/next.png' |
| 32 | border='0' height='32' alt='Next Page' width='32' /></A></td> |
| 33 | <td align="center" width="100%">Python Library Reference</td> |
| 34 | <td class='online-navigation'><a rel="contents" title="Table of Contents" |
| 35 | href="contents.html"><img src='../icons/contents.png' |
| 36 | border='0' height='32' alt='Contents' width='32' /></A></td> |
| 37 | <td class='online-navigation'><a href="modindex.html" title="Module Index"><img src='../icons/modules.png' |
| 38 | border='0' height='32' alt='Module Index' width='32' /></a></td> |
| 39 | <td class='online-navigation'><a rel="index" title="Index" |
| 40 | href="genindex.html"><img src='../icons/index.png' |
| 41 | border='0' height='32' alt='Index' width='32' /></A></td> |
| 42 | </tr></table> |
| 43 | <div class='online-navigation'> |
| 44 | <b class="navlabel">Previous:</b> |
| 45 | <a class="sectref" rel="prev" href="module-cgitb.html">11.3 cgitb </A> |
| 46 | <b class="navlabel">Up:</b> |
| 47 | <a class="sectref" rel="parent" href="internet.html">11. Internet Protocols and</A> |
| 48 | <b class="navlabel">Next:</b> |
| 49 | <a class="sectref" rel="next" href="urlopener-objs.html">11.4.1 URLopener Objects</A> |
| 50 | </div> |
| 51 | <hr /></div> |
| 52 | </DIV> |
| 53 | <!--End of Navigation Panel--> |
| 54 | |
| 55 | <H1><A NAME="SECTION0013400000000000000000"> |
| 56 | 11.4 <tt class="module">urllib</tt> -- |
| 57 | Open arbitrary resources by URL</A> |
| 58 | </H1> |
| 59 | |
| 60 | <P> |
| 61 | <A NAME="module-urllib"></A> |
| 62 | |
| 63 | <P> |
| 64 | <a id='l2h-3202' xml:id='l2h-3202'></a> |
| 65 | |
| 66 | <P> |
| 67 | This module provides a high-level interface for fetching data across |
| 68 | the World Wide Web. In particular, the <tt class="function">urlopen()</tt> function |
| 69 | is similar to the built-in function <tt class="function">open()</tt>, but accepts |
| 70 | Universal Resource Locators (URLs) instead of filenames. Some |
| 71 | restrictions apply -- it can only open URLs for reading, and no seek |
| 72 | operations are available. |
| 73 | |
| 74 | <P> |
| 75 | It defines the following public functions: |
| 76 | |
| 77 | <P> |
| 78 | <dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"> |
| 79 | <td><nobr><b><tt id='l2h-3184' xml:id='l2h-3184' class="function">urlopen</tt></b>(</nobr></td> |
| 80 | <td><var>url</var><big>[</big><var>, data</var><big>[</big><var>, proxies</var><big>]</big><var></var><big>]</big><var></var>)</td></tr></table></dt> |
| 81 | <dd> |
| 82 | Open a network object denoted by a URL for reading. If the URL does |
| 83 | not have a scheme identifier, or if it has <span class="file">file:</span> as its scheme |
| 84 | identifier, this opens a local file (without universal newlines); |
| 85 | otherwise it opens a socket to a server somewhere on the network. If |
| 86 | the connection cannot be made, or if the server returns an error code, |
| 87 | the <tt class="exception">IOError</tt> exception is raised. If all went well, a |
| 88 | file-like object is returned. This supports the following methods: |
| 89 | <tt class="method">read()</tt>, <tt class="method">readline()</tt>, <tt class="method">readlines()</tt>, <tt class="method">fileno()</tt>, |
| 90 | <tt class="method">close()</tt>, <tt class="method">info()</tt> and <tt class="method">geturl()</tt>. It also has |
| 91 | proper support for the iterator protocol. |
| 92 | One caveat: the <tt class="method">read()</tt> method, if the size argument is |
| 93 | omitted or negative, may not read until the end of the data stream; |
| 94 | there is no good way to determine that the entire stream from a socket |
| 95 | has been read in the general case. |
| 96 | |
| 97 | <P> |
| 98 | Except for the <tt class="method">info()</tt> and <tt class="method">geturl()</tt> methods, |
| 99 | these methods have the same interface as for |
| 100 | file objects -- see section <A href="bltin-file-objects.html#bltin-file-objects">2.3.9</A> in this |
| 101 | manual. (It is not a built-in file object, however, so it can't be |
| 102 | used at those few places where a true built-in file object is |
| 103 | required.) |
| 104 | |
| 105 | <P> |
| 106 | The <tt class="method">info()</tt> method returns an instance of the class |
| 107 | <tt class="class">mimetools.Message</tt> containing meta-information associated |
| 108 | with the URL. When the method is HTTP, these headers are those |
| 109 | returned by the server at the head of the retrieved HTML page |
| 110 | (including Content-Length and Content-Type). When the method is FTP, |
| 111 | a Content-Length header will be present if (as is now usual) the |
| 112 | server passed back a file length in response to the FTP retrieval |
| 113 | request. A Content-Type header will be present if the MIME type can |
| 114 | be guessed. When the method is local-file, returned headers will include |
| 115 | a Date representing the file's last-modified time, a Content-Length |
| 116 | giving file size, and a Content-Type containing a guess at the file's |
| 117 | type. See also the description of the |
| 118 | <tt class="module"><a href="module-mimetools.html">mimetools</a></tt><a id='l2h-3203' xml:id='l2h-3203'></a> module. |
| 119 | |
| 120 | <P> |
| 121 | The <tt class="method">geturl()</tt> method returns the real URL of the page. In |
| 122 | some cases, the HTTP server redirects a client to another URL. The |
| 123 | <tt class="function">urlopen()</tt> function handles this transparently, but in some |
| 124 | cases the caller needs to know which URL the client was redirected |
| 125 | to. The <tt class="method">geturl()</tt> method can be used to get at this |
| 126 | redirected URL. |
| 127 | |
| 128 | <P> |
| 129 | If the <var>url</var> uses the <span class="file">http:</span> scheme identifier, the optional |
| 130 | <var>data</var> argument may be given to specify a <code>POST</code> request |
| 131 | (normally the request type is <code>GET</code>). The <var>data</var> argument |
| 132 | must be in standard <span class="mimetype">application/x-www-form-urlencoded</span> format; |
| 133 | see the <tt class="function">urlencode()</tt> function below. |
| 134 | |
| 135 | <P> |
| 136 | The <tt class="function">urlopen()</tt> function works transparently with proxies |
| 137 | which do not require authentication. In a <span class="Unix">Unix</span> or Windows |
| 138 | environment, set the <a class="envvar" id='l2h-3204' xml:id='l2h-3204'>http_proxy</a>, <a class="envvar" id='l2h-3205' xml:id='l2h-3205'>ftp_proxy</a> or |
| 139 | <a class="envvar" id='l2h-3206' xml:id='l2h-3206'>gopher_proxy</a> environment variables to a URL that identifies |
| 140 | the proxy server before starting the Python interpreter. For example |
| 141 | (the "<tt class="character">%</tt>" is the command prompt): |
| 142 | |
| 143 | <P> |
| 144 | <div class="verbatim"><pre> |
| 145 | % http_proxy="http://www.someproxy.com:3128" |
| 146 | % export http_proxy |
| 147 | % python |
| 148 | ... |
| 149 | </pre></div> |
| 150 | |
| 151 | <P> |
| 152 | In a Windows environment, if no proxy environment variables are set, |
| 153 | proxy settings are obtained from the registry's Internet Settings |
| 154 | section. |
| 155 | |
| 156 | <P> |
| 157 | In a Macintosh environment, <tt class="function">urlopen()</tt> will retrieve proxy |
| 158 | information from Internet<a id='l2h-3207' xml:id='l2h-3207'></a> Config. |
| 159 | |
| 160 | <P> |
| 161 | Alternatively, the optional <var>proxies</var> argument may be used to |
| 162 | explicitly specify proxies. It must be a dictionary mapping scheme |
| 163 | names to proxy URLs, where an empty dictionary causes no proxies to be |
| 164 | used, and <code>None</code> (the default value) causes environmental proxy |
| 165 | settings to be used as discussed above. For example: |
| 166 | |
| 167 | <P> |
| 168 | <div class="verbatim"><pre> |
| 169 | # Use http://www.someproxy.com:3128 for http proxying |
| 170 | proxies = {'http': 'http://www.someproxy.com:3128'} |
| 171 | filehandle = urllib.urlopen(some_url, proxies=proxies) |
| 172 | # Don't use any proxies |
| 173 | filehandle = urllib.urlopen(some_url, proxies={}) |
| 174 | # Use proxies from environment - both versions are equivalent |
| 175 | filehandle = urllib.urlopen(some_url, proxies=None) |
| 176 | filehandle = urllib.urlopen(some_url) |
| 177 | </pre></div> |
| 178 | |
| 179 | <P> |
| 180 | The <tt class="function">urlopen()</tt> function does not support explicit proxy |
| 181 | specification. If you need to override environmental proxy settings, |
| 182 | use <tt class="class">URLopener</tt>, or a subclass such as <tt class="class">FancyURLopener</tt>. |
| 183 | |
| 184 | <P> |
| 185 | Proxies which require authentication for use are not currently |
| 186 | supported; this is considered an implementation limitation. |
| 187 | |
| 188 | <P> |
| 189 | |
| 190 | <span class="versionnote">Changed in version 2.3: |
| 191 | Added the <var>proxies</var> support.</span> |
| 192 | |
| 193 | </dl> |
| 194 | |
| 195 | <P> |
| 196 | <dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"> |
| 197 | <td><nobr><b><tt id='l2h-3185' xml:id='l2h-3185' class="function">urlretrieve</tt></b>(</nobr></td> |
| 198 | <td><var>url</var><big>[</big><var>, filename</var><big>[</big><var>, |
| 199 | reporthook</var><big>[</big><var>, data</var><big>]</big><var></var><big>]</big><var></var><big>]</big><var></var>)</td></tr></table></dt> |
| 200 | <dd> |
| 201 | Copy a network object denoted by a URL to a local file, if necessary. |
| 202 | If the URL points to a local file, or a valid cached copy of the |
| 203 | object exists, the object is not copied. Return a tuple |
| 204 | <code>(<var>filename</var>, <var>headers</var>)</code> where <var>filename</var> is the |
| 205 | local file name under which the object can be found, and <var>headers</var> |
| 206 | is whatever the <tt class="method">info()</tt> method of the object returned by |
| 207 | <tt class="function">urlopen()</tt> returned (for a remote object, possibly cached). |
| 208 | Exceptions are the same as for <tt class="function">urlopen()</tt>. |
| 209 | |
| 210 | <P> |
| 211 | The second argument, if present, specifies the file location to copy |
| 212 | to (if absent, the location will be a tempfile with a generated name). |
| 213 | The third argument, if present, is a hook function that will be called |
| 214 | once on establishment of the network connection and once after each |
| 215 | block read thereafter. The hook will be passed three arguments; a |
| 216 | count of blocks transferred so far, a block size in bytes, and the |
| 217 | total size of the file. The third argument may be <code>-1</code> on older |
| 218 | FTP servers which do not return a file size in response to a retrieval |
| 219 | request. |
| 220 | |
| 221 | <P> |
| 222 | If the <var>url</var> uses the <span class="file">http:</span> scheme identifier, the optional |
| 223 | <var>data</var> argument may be given to specify a <code>POST</code> request |
| 224 | (normally the request type is <code>GET</code>). The <var>data</var> argument |
| 225 | must in standard <span class="mimetype">application/x-www-form-urlencoded</span> format; |
| 226 | see the <tt class="function">urlencode()</tt> function below. |
| 227 | </dl> |
| 228 | |
| 229 | <P> |
| 230 | <dl><dt><b><tt id='l2h-3186' xml:id='l2h-3186'>_urlopener</tt></b></dt> |
| 231 | <dd> |
| 232 | The public functions <tt class="function">urlopen()</tt> and |
| 233 | <tt class="function">urlretrieve()</tt> create an instance of the |
| 234 | <tt class="class">FancyURLopener</tt> class and use it to perform their requested |
| 235 | actions. To override this functionality, programmers can create a |
| 236 | subclass of <tt class="class">URLopener</tt> or <tt class="class">FancyURLopener</tt>, then assign |
| 237 | an instance of that class to the |
| 238 | <code>urllib._urlopener</code> variable before calling the desired function. |
| 239 | For example, applications may want to specify a different |
| 240 | <span class="mailheader">User-Agent:</span> header than <tt class="class">URLopener</tt> defines. This |
| 241 | can be accomplished with the following code: |
| 242 | |
| 243 | <P> |
| 244 | <div class="verbatim"><pre> |
| 245 | import urllib |
| 246 | |
| 247 | class AppURLopener(urllib.FancyURLopener): |
| 248 | version = "App/1.7" |
| 249 | |
| 250 | urllib._urlopener = AppURLopener() |
| 251 | </pre></div> |
| 252 | </dd></dl> |
| 253 | |
| 254 | <P> |
| 255 | <dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"> |
| 256 | <td><nobr><b><tt id='l2h-3187' xml:id='l2h-3187' class="function">urlcleanup</tt></b>(</nobr></td> |
| 257 | <td><var></var>)</td></tr></table></dt> |
| 258 | <dd> |
| 259 | Clear the cache that may have been built up by previous calls to |
| 260 | <tt class="function">urlretrieve()</tt>. |
| 261 | </dl> |
| 262 | |
| 263 | <P> |
| 264 | <dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"> |
| 265 | <td><nobr><b><tt id='l2h-3188' xml:id='l2h-3188' class="function">quote</tt></b>(</nobr></td> |
| 266 | <td><var>string</var><big>[</big><var>, safe</var><big>]</big><var></var>)</td></tr></table></dt> |
| 267 | <dd> |
| 268 | Replace special characters in <var>string</var> using the "<tt class="samp">%xx</tt>" escape. |
| 269 | Letters, digits, and the characters "<tt class="character">_.-</tt>" are never quoted. |
| 270 | The optional <var>safe</var> parameter specifies additional characters |
| 271 | that should not be quoted -- its default value is <code>'/'</code>. |
| 272 | |
| 273 | <P> |
| 274 | Example: <code>quote('/~connolly/')</code> yields <code>'/%7econnolly/'</code>. |
| 275 | </dl> |
| 276 | |
| 277 | <P> |
| 278 | <dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"> |
| 279 | <td><nobr><b><tt id='l2h-3189' xml:id='l2h-3189' class="function">quote_plus</tt></b>(</nobr></td> |
| 280 | <td><var>string</var><big>[</big><var>, safe</var><big>]</big><var></var>)</td></tr></table></dt> |
| 281 | <dd> |
| 282 | Like <tt class="function">quote()</tt>, but also replaces spaces by plus signs, as |
| 283 | required for quoting HTML form values. Plus signs in the original |
| 284 | string are escaped unless they are included in <var>safe</var>. It also |
| 285 | does not have <var>safe</var> default to <code>'/'</code>. |
| 286 | </dl> |
| 287 | |
| 288 | <P> |
| 289 | <dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"> |
| 290 | <td><nobr><b><tt id='l2h-3190' xml:id='l2h-3190' class="function">unquote</tt></b>(</nobr></td> |
| 291 | <td><var>string</var>)</td></tr></table></dt> |
| 292 | <dd> |
| 293 | Replace "<tt class="samp">%xx</tt>" escapes by their single-character equivalent. |
| 294 | |
| 295 | <P> |
| 296 | Example: <code>unquote('/%7Econnolly/')</code> yields <code>'/~connolly/'</code>. |
| 297 | </dl> |
| 298 | |
| 299 | <P> |
| 300 | <dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"> |
| 301 | <td><nobr><b><tt id='l2h-3191' xml:id='l2h-3191' class="function">unquote_plus</tt></b>(</nobr></td> |
| 302 | <td><var>string</var>)</td></tr></table></dt> |
| 303 | <dd> |
| 304 | Like <tt class="function">unquote()</tt>, but also replaces plus signs by spaces, as |
| 305 | required for unquoting HTML form values. |
| 306 | </dl> |
| 307 | |
| 308 | <P> |
| 309 | <dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"> |
| 310 | <td><nobr><b><tt id='l2h-3192' xml:id='l2h-3192' class="function">urlencode</tt></b>(</nobr></td> |
| 311 | <td><var>query</var><big>[</big><var>, doseq</var><big>]</big><var></var>)</td></tr></table></dt> |
| 312 | <dd> |
| 313 | Convert a mapping object or a sequence of two-element tuples to a |
| 314 | ``url-encoded'' string, suitable to pass to |
| 315 | <tt class="function">urlopen()</tt> above as the optional <var>data</var> argument. This |
| 316 | is useful to pass a dictionary of form fields to a <code>POST</code> |
| 317 | request. The resulting string is a series of |
| 318 | <code><var>key</var>=<var>value</var></code> pairs separated by "<tt class="character">&</tt>" |
| 319 | characters, where both <var>key</var> and <var>value</var> are quoted using |
| 320 | <tt class="function">quote_plus()</tt> above. If the optional parameter <var>doseq</var> is |
| 321 | present and evaluates to true, individual <code><var>key</var>=<var>value</var></code> pairs |
| 322 | are generated for each element of the sequence. |
| 323 | When a sequence of two-element tuples is used as the <var>query</var> argument, |
| 324 | the first element of each tuple is a key and the second is a value. The |
| 325 | order of parameters in the encoded string will match the order of parameter |
| 326 | tuples in the sequence. |
| 327 | The <tt class="module"><a href="module-cgi.html">cgi</a></tt> module provides the functions |
| 328 | <tt class="function">parse_qs()</tt> and <tt class="function">parse_qsl()</tt> which are used to |
| 329 | parse query strings into Python data structures. |
| 330 | </dl> |
| 331 | |
| 332 | <P> |
| 333 | <dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"> |
| 334 | <td><nobr><b><tt id='l2h-3193' xml:id='l2h-3193' class="function">pathname2url</tt></b>(</nobr></td> |
| 335 | <td><var>path</var>)</td></tr></table></dt> |
| 336 | <dd> |
| 337 | Convert the pathname <var>path</var> from the local syntax for a path to |
| 338 | the form used in the path component of a URL. This does not produce a |
| 339 | complete URL. The return value will already be quoted using the |
| 340 | <tt class="function">quote()</tt> function. |
| 341 | </dl> |
| 342 | |
| 343 | <P> |
| 344 | <dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"> |
| 345 | <td><nobr><b><tt id='l2h-3194' xml:id='l2h-3194' class="function">url2pathname</tt></b>(</nobr></td> |
| 346 | <td><var>path</var>)</td></tr></table></dt> |
| 347 | <dd> |
| 348 | Convert the path component <var>path</var> from an encoded URL to the local |
| 349 | syntax for a path. This does not accept a complete URL. This |
| 350 | function uses <tt class="function">unquote()</tt> to decode <var>path</var>. |
| 351 | </dl> |
| 352 | |
| 353 | <P> |
| 354 | <dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"> |
| 355 | <td><nobr><b><span class="typelabel">class</span> <tt id='l2h-3195' xml:id='l2h-3195' class="class">URLopener</tt></b>(</nobr></td> |
| 356 | <td><var></var><big>[</big><var>proxies</var><big>[</big><var>, **x509</var><big>]</big><var></var><big>]</big><var></var>)</td></tr></table></dt> |
| 357 | <dd> |
| 358 | Base class for opening and reading URLs. Unless you need to support |
| 359 | opening objects using schemes other than <span class="file">http:</span>, <span class="file">ftp:</span>, |
| 360 | <span class="file">gopher:</span> or <span class="file">file:</span>, you probably want to use |
| 361 | <tt class="class">FancyURLopener</tt>. |
| 362 | |
| 363 | <P> |
| 364 | By default, the <tt class="class">URLopener</tt> class sends a |
| 365 | <span class="mailheader">User-Agent:</span> header of "<tt class="samp">urllib/<var>VVV</var></tt>", where |
| 366 | <var>VVV</var> is the <tt class="module">urllib</tt> version number. Applications can |
| 367 | define their own <span class="mailheader">User-Agent:</span> header by subclassing |
| 368 | <tt class="class">URLopener</tt> or <tt class="class">FancyURLopener</tt> and setting the class |
| 369 | attribute <tt class="member">version</tt> to an appropriate string value in the |
| 370 | subclass definition. |
| 371 | |
| 372 | <P> |
| 373 | The optional <var>proxies</var> parameter should be a dictionary mapping |
| 374 | scheme names to proxy URLs, where an empty dictionary turns proxies |
| 375 | off completely. Its default value is <code>None</code>, in which case |
| 376 | environmental proxy settings will be used if present, as discussed in |
| 377 | the definition of <tt class="function">urlopen()</tt>, above. |
| 378 | |
| 379 | <P> |
| 380 | Additional keyword parameters, collected in <var>x509</var>, are used for |
| 381 | authentication with the <span class="file">https:</span> scheme. The keywords |
| 382 | <var>key_file</var> and <var>cert_file</var> are supported; both are needed to |
| 383 | actually retrieve a resource at an <span class="file">https:</span> URL. |
| 384 | </dl> |
| 385 | |
| 386 | <P> |
| 387 | <dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"> |
| 388 | <td><nobr><b><span class="typelabel">class</span> <tt id='l2h-3196' xml:id='l2h-3196' class="class">FancyURLopener</tt></b>(</nobr></td> |
| 389 | <td><var>...</var>)</td></tr></table></dt> |
| 390 | <dd> |
| 391 | <tt class="class">FancyURLopener</tt> subclasses <tt class="class">URLopener</tt> providing default |
| 392 | handling for the following HTTP response codes: 301, 302, 303, 307 and |
| 393 | 401. For the 30x response codes listed above, the |
| 394 | <span class="mailheader">Location:</span> header is used to fetch the actual URL. For 401 |
| 395 | response codes (authentication required), basic HTTP authentication is |
| 396 | performed. For the 30x response codes, recursion is bounded by the |
| 397 | value of the <var>maxtries</var> attribute, which defaults to 10. |
| 398 | |
| 399 | <P> |
| 400 | <span class="note"><b class="label">Note:</b> |
| 401 | According to the letter of <a class="rfc" id='rfcref-89922' xml:id='rfcref-89922' |
| 402 | href="http://www.faqs.org/rfcs/rfc2616.html">RFC 2616</a>, 301 and 302 responses to |
| 403 | POST requests must not be automatically redirected without |
| 404 | confirmation by the user. In reality, browsers do allow automatic |
| 405 | redirection of these responses, changing the POST to a GET, and |
| 406 | <tt class="module">urllib</tt> reproduces this behaviour.</span> |
| 407 | |
| 408 | <P> |
| 409 | The parameters to the constructor are the same as those for |
| 410 | <tt class="class">URLopener</tt>. |
| 411 | |
| 412 | <P> |
| 413 | <span class="note"><b class="label">Note:</b> |
| 414 | When performing basic authentication, a |
| 415 | <tt class="class">FancyURLopener</tt> instance calls its |
| 416 | <tt class="method">prompt_user_passwd()</tt> method. The default implementation asks |
| 417 | the users for the required information on the controlling terminal. A |
| 418 | subclass may override this method to support more appropriate behavior |
| 419 | if needed.</span> |
| 420 | </dl> |
| 421 | |
| 422 | <P> |
| 423 | Restrictions: |
| 424 | |
| 425 | <P> |
| 426 | |
| 427 | <UL> |
| 428 | <LI>Currently, only the following protocols are supported: HTTP, (versions |
| 429 | 0.9 and 1.0), Gopher (but not Gopher-+), FTP, and local files. |
| 430 | <a id='l2h-3197' xml:id='l2h-3197'></a><a id='l2h-3198' xml:id='l2h-3198'></a><a id='l2h-3199' xml:id='l2h-3199'></a> |
| 431 | <P> |
| 432 | </LI> |
| 433 | <LI>The caching feature of <tt class="function">urlretrieve()</tt> has been disabled |
| 434 | until I find the time to hack proper processing of Expiration time |
| 435 | headers. |
| 436 | |
| 437 | <P> |
| 438 | </LI> |
| 439 | <LI>There should be a function to query whether a particular URL is in |
| 440 | the cache. |
| 441 | |
| 442 | <P> |
| 443 | </LI> |
| 444 | <LI>For backward compatibility, if a URL appears to point to a local file |
| 445 | but the file can't be opened, the URL is re-interpreted using the FTP |
| 446 | protocol. This can sometimes cause confusing error messages. |
| 447 | |
| 448 | <P> |
| 449 | </LI> |
| 450 | <LI>The <tt class="function">urlopen()</tt> and <tt class="function">urlretrieve()</tt> functions can |
| 451 | cause arbitrarily long delays while waiting for a network connection |
| 452 | to be set up. This means that it is difficult to build an interactive |
| 453 | Web client using these functions without using threads. |
| 454 | |
| 455 | <P> |
| 456 | </LI> |
| 457 | <LI>The data returned by <tt class="function">urlopen()</tt> or <tt class="function">urlretrieve()</tt> |
| 458 | is the raw data returned by the server. This may be binary data |
| 459 | (e.g. an image), plain text or (for example) HTML<a id='l2h-3208' xml:id='l2h-3208'></a>. The |
| 460 | HTTP<a id='l2h-3200' xml:id='l2h-3200'></a> protocol provides type information in the |
| 461 | reply header, which can be inspected by looking at the |
| 462 | <span class="mailheader">Content-Type:</span> header. For the |
| 463 | Gopher<a id='l2h-3201' xml:id='l2h-3201'></a> protocol, type information is encoded |
| 464 | in the URL; there is currently no easy way to extract it. If the |
| 465 | returned data is HTML, you can use the module |
| 466 | <tt class="module"><a href="module-htmllib.html">htmllib</a></tt><a id='l2h-3209' xml:id='l2h-3209'></a> to parse it. |
| 467 | |
| 468 | <P> |
| 469 | </LI> |
| 470 | <LI>The code handling the FTP<a id='l2h-3210' xml:id='l2h-3210'></a> protocol cannot differentiate |
| 471 | between a file and a directory. This can lead to unexpected behavior |
| 472 | when attempting to read a URL that points to a file that is not |
| 473 | accessible. If the URL ends in a <code>/</code>, it is assumed to refer to |
| 474 | a directory and will be handled accordingly. But if an attempt to |
| 475 | read a file leads to a 550 error (meaning the URL cannot be found or |
| 476 | is not accessible, often for permission reasons), then the path is |
| 477 | treated as a directory in order to handle the case when a directory is |
| 478 | specified by a URL but the trailing <code>/</code> has been left off. This can |
| 479 | cause misleading results when you try to fetch a file whose read |
| 480 | permissions make it inaccessible; the FTP code will try to read it, |
| 481 | fail with a 550 error, and then perform a directory listing for the |
| 482 | unreadable file. If fine-grained control is needed, consider using the |
| 483 | <tt class="module">ftplib</tt> module, subclassing <tt class="class">FancyURLOpener</tt>, or changing |
| 484 | <var>_urlopener</var> to meet your needs. |
| 485 | |
| 486 | <P> |
| 487 | </LI> |
| 488 | <LI>This module does not support the use of proxies which require |
| 489 | authentication. This may be implemented in the future. |
| 490 | |
| 491 | <P> |
| 492 | </LI> |
| 493 | <LI>Although the <tt class="module">urllib</tt> module contains (undocumented) routines |
| 494 | to parse and unparse URL strings, the recommended interface for URL |
| 495 | manipulation is in module <tt class="module"><a href="module-urlparse.html">urlparse</a></tt><a id='l2h-3211' xml:id='l2h-3211'></a>. |
| 496 | |
| 497 | <P> |
| 498 | </LI> |
| 499 | </UL> |
| 500 | |
| 501 | <P> |
| 502 | |
| 503 | <p><br /></p><hr class='online-navigation' /> |
| 504 | <div class='online-navigation'> |
| 505 | <!--Table of Child-Links--> |
| 506 | <A NAME="CHILD_LINKS"><STRONG>Subsections</STRONG></a> |
| 507 | |
| 508 | <UL CLASS="ChildLinks"> |
| 509 | <LI><A href="urlopener-objs.html">11.4.1 URLopener Objects</a> |
| 510 | <LI><A href="node483.html">11.4.2 Examples</a> |
| 511 | </ul> |
| 512 | <!--End of Table of Child-Links--> |
| 513 | </div> |
| 514 | |
| 515 | <DIV CLASS="navigation"> |
| 516 | <div class='online-navigation'> |
| 517 | <p></p><hr /> |
| 518 | <table align="center" width="100%" cellpadding="0" cellspacing="2"> |
| 519 | <tr> |
| 520 | <td class='online-navigation'><a rel="prev" title="11.3 cgitb " |
| 521 | href="module-cgitb.html"><img src='../icons/previous.png' |
| 522 | border='0' height='32' alt='Previous Page' width='32' /></A></td> |
| 523 | <td class='online-navigation'><a rel="parent" title="11. Internet Protocols and" |
| 524 | href="internet.html"><img src='../icons/up.png' |
| 525 | border='0' height='32' alt='Up One Level' width='32' /></A></td> |
| 526 | <td class='online-navigation'><a rel="next" title="11.4.1 URLopener Objects" |
| 527 | href="urlopener-objs.html"><img src='../icons/next.png' |
| 528 | border='0' height='32' alt='Next Page' width='32' /></A></td> |
| 529 | <td align="center" width="100%">Python Library Reference</td> |
| 530 | <td class='online-navigation'><a rel="contents" title="Table of Contents" |
| 531 | href="contents.html"><img src='../icons/contents.png' |
| 532 | border='0' height='32' alt='Contents' width='32' /></A></td> |
| 533 | <td class='online-navigation'><a href="modindex.html" title="Module Index"><img src='../icons/modules.png' |
| 534 | border='0' height='32' alt='Module Index' width='32' /></a></td> |
| 535 | <td class='online-navigation'><a rel="index" title="Index" |
| 536 | href="genindex.html"><img src='../icons/index.png' |
| 537 | border='0' height='32' alt='Index' width='32' /></A></td> |
| 538 | </tr></table> |
| 539 | <div class='online-navigation'> |
| 540 | <b class="navlabel">Previous:</b> |
| 541 | <a class="sectref" rel="prev" href="module-cgitb.html">11.3 cgitb </A> |
| 542 | <b class="navlabel">Up:</b> |
| 543 | <a class="sectref" rel="parent" href="internet.html">11. Internet Protocols and</A> |
| 544 | <b class="navlabel">Next:</b> |
| 545 | <a class="sectref" rel="next" href="urlopener-objs.html">11.4.1 URLopener Objects</A> |
| 546 | </div> |
| 547 | </div> |
| 548 | <hr /> |
| 549 | <span class="release-info">Release 2.4.2, documentation updated on 28 September 2005.</span> |
| 550 | </DIV> |
| 551 | <!--End of Navigation Panel--> |
| 552 | <ADDRESS> |
| 553 | See <i><a href="about.html">About this document...</a></i> for information on suggesting changes. |
| 554 | </ADDRESS> |
| 555 | </BODY> |
| 556 | </HTML> |