| 1 | <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"> |
| 2 | <html> |
| 3 | <head> |
| 4 | <link rel="STYLESHEET" href="lib.css" type='text/css' /> |
| 5 | <link rel="SHORTCUT ICON" href="../icons/pyfav.png" type="image/png" /> |
| 6 | <link rel='start' href='../index.html' title='Python Documentation Index' /> |
| 7 | <link rel="first" href="lib.html" title='Python Library Reference' /> |
| 8 | <link rel='contents' href='contents.html' title="Contents" /> |
| 9 | <link rel='index' href='genindex.html' title='Index' /> |
| 10 | <link rel='last' href='about.html' title='About this document...' /> |
| 11 | <link rel='help' href='about.html' title='About this document...' /> |
| 12 | <link rel="next" href="module-SocketServer.html" /> |
| 13 | <link rel="prev" href="module-telnetlib.html" /> |
| 14 | <link rel="parent" href="internet.html" /> |
| 15 | <link rel="next" href="module-SocketServer.html" /> |
| 16 | <meta name='aesop' content='information' /> |
| 17 | <title>11.15 urlparse -- Parse URLs into components</title> |
| 18 | </head> |
| 19 | <body> |
| 20 | <DIV CLASS="navigation"> |
| 21 | <div id='top-navigation-panel' xml:id='top-navigation-panel'> |
| 22 | <table align="center" width="100%" cellpadding="0" cellspacing="2"> |
| 23 | <tr> |
| 24 | <td class='online-navigation'><a rel="prev" title="11.14.2 Telnet Example" |
| 25 | href="telnet-example.html"><img src='../icons/previous.png' |
| 26 | border='0' height='32' alt='Previous Page' width='32' /></A></td> |
| 27 | <td class='online-navigation'><a rel="parent" title="11. Internet Protocols and" |
| 28 | href="internet.html"><img src='../icons/up.png' |
| 29 | border='0' height='32' alt='Up One Level' width='32' /></A></td> |
| 30 | <td class='online-navigation'><a rel="next" title="11.16 SocketServer " |
| 31 | href="module-SocketServer.html"><img src='../icons/next.png' |
| 32 | border='0' height='32' alt='Next Page' width='32' /></A></td> |
| 33 | <td align="center" width="100%">Python Library Reference</td> |
| 34 | <td class='online-navigation'><a rel="contents" title="Table of Contents" |
| 35 | href="contents.html"><img src='../icons/contents.png' |
| 36 | border='0' height='32' alt='Contents' width='32' /></A></td> |
| 37 | <td class='online-navigation'><a href="modindex.html" title="Module Index"><img src='../icons/modules.png' |
| 38 | border='0' height='32' alt='Module Index' width='32' /></a></td> |
| 39 | <td class='online-navigation'><a rel="index" title="Index" |
| 40 | href="genindex.html"><img src='../icons/index.png' |
| 41 | border='0' height='32' alt='Index' width='32' /></A></td> |
| 42 | </tr></table> |
| 43 | <div class='online-navigation'> |
| 44 | <b class="navlabel">Previous:</b> |
| 45 | <a class="sectref" rel="prev" href="telnet-example.html">11.14.2 Telnet Example</A> |
| 46 | <b class="navlabel">Up:</b> |
| 47 | <a class="sectref" rel="parent" href="internet.html">11. Internet Protocols and</A> |
| 48 | <b class="navlabel">Next:</b> |
| 49 | <a class="sectref" rel="next" href="module-SocketServer.html">11.16 SocketServer </A> |
| 50 | </div> |
| 51 | <hr /></div> |
| 52 | </DIV> |
| 53 | <!--End of Navigation Panel--> |
| 54 | |
| 55 | <H1><A NAME="SECTION00131500000000000000000"> |
| 56 | 11.15 <tt class="module">urlparse</tt> -- |
| 57 | Parse URLs into components</A> |
| 58 | </H1> |
| 59 | <A NAME="module-urlparse"></A> |
| 60 | <P> |
| 61 | |
| 62 | <P> |
| 63 | <a id='l2h-3546' xml:id='l2h-3546'></a> |
| 64 | <a id='l2h-3538' xml:id='l2h-3538'></a><a id='l2h-3539' xml:id='l2h-3539'></a> |
| 65 | <P> |
| 66 | This module defines a standard interface to break Uniform Resource |
| 67 | Locator (URL) strings up in components (addressing scheme, network |
| 68 | location, path etc.), to combine the components back into a URL |
| 69 | string, and to convert a ``relative URL'' to an absolute URL given a |
| 70 | ``base URL.'' |
| 71 | |
| 72 | <P> |
| 73 | The module has been designed to match the Internet RFC on Relative |
| 74 | Uniform Resource Locators (and discovered a bug in an earlier |
| 75 | draft!). |
| 76 | |
| 77 | <P> |
| 78 | It defines the following functions: |
| 79 | |
| 80 | <P> |
| 81 | <dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"> |
| 82 | <td><nobr><b><tt id='l2h-3540' xml:id='l2h-3540' class="function">urlparse</tt></b>(</nobr></td> |
| 83 | <td><var>urlstring</var><big>[</big><var>, default_scheme</var><big>[</big><var>, allow_fragments</var><big>]</big><var></var><big>]</big><var></var>)</td></tr></table></dt> |
| 84 | <dd> |
| 85 | Parse a URL into 6 components, returning a 6-tuple: (addressing |
| 86 | scheme, network location, path, parameters, query, fragment |
| 87 | identifier). This corresponds to the general structure of a URL: |
| 88 | <code><var>scheme</var>://<var>netloc</var>/<var>path</var>;<var>parameters</var>?<var>query</var>#<var>fragment</var></code>. |
| 89 | Each tuple item is a string, possibly empty. |
| 90 | The components are not broken up in smaller parts (e.g. the network |
| 91 | location is a single string), and % escapes are not expanded. |
| 92 | The delimiters as shown above are not part of the tuple items, |
| 93 | except for a leading slash in the <var>path</var> component, which is |
| 94 | retained if present. |
| 95 | |
| 96 | <P> |
| 97 | Example: |
| 98 | |
| 99 | <P> |
| 100 | <div class="verbatim"><pre> |
| 101 | urlparse('http://www.cwi.nl:80/%7Eguido/Python.html') |
| 102 | </pre></div> |
| 103 | |
| 104 | <P> |
| 105 | yields the tuple |
| 106 | |
| 107 | <P> |
| 108 | <div class="verbatim"><pre> |
| 109 | ('http', 'www.cwi.nl:80', '/%7Eguido/Python.html', '', '', '') |
| 110 | </pre></div> |
| 111 | |
| 112 | <P> |
| 113 | If the <var>default_scheme</var> argument is specified, it gives the |
| 114 | default addressing scheme, to be used only if the URL string does not |
| 115 | specify one. The default value for this argument is the empty string. |
| 116 | |
| 117 | <P> |
| 118 | If the <var>allow_fragments</var> argument is zero, fragment identifiers |
| 119 | are not allowed, even if the URL's addressing scheme normally does |
| 120 | support them. The default value for this argument is <code>1</code>. |
| 121 | </dl> |
| 122 | |
| 123 | <P> |
| 124 | <dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"> |
| 125 | <td><nobr><b><tt id='l2h-3541' xml:id='l2h-3541' class="function">urlunparse</tt></b>(</nobr></td> |
| 126 | <td><var>tuple</var>)</td></tr></table></dt> |
| 127 | <dd> |
| 128 | Construct a URL string from a tuple as returned by <code>urlparse()</code>. |
| 129 | This may result in a slightly different, but equivalent URL, if the |
| 130 | URL that was parsed originally had redundant delimiters, e.g. a ? with |
| 131 | an empty query (the draft states that these are equivalent). |
| 132 | </dl> |
| 133 | |
| 134 | <P> |
| 135 | <dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"> |
| 136 | <td><nobr><b><tt id='l2h-3542' xml:id='l2h-3542' class="function">urlsplit</tt></b>(</nobr></td> |
| 137 | <td><var>urlstring</var><big>[</big><var>, |
| 138 | default_scheme</var><big>[</big><var>, allow_fragments</var><big>]</big><var></var><big>]</big><var></var>)</td></tr></table></dt> |
| 139 | <dd> |
| 140 | This is similar to <tt class="function">urlparse()</tt>, but does not split the |
| 141 | params from the URL. This should generally be used instead of |
| 142 | <tt class="function">urlparse()</tt> if the more recent URL syntax allowing |
| 143 | parameters to be applied to each segment of the <var>path</var> portion of |
| 144 | the URL (see <a class="rfc" id='rfcref-90402' xml:id='rfcref-90402' |
| 145 | href="http://www.faqs.org/rfcs/rfc2396.html">RFC 2396</a>) is wanted. A separate function is needed to |
| 146 | separate the path segments and parameters. This function returns a |
| 147 | 5-tuple: (addressing scheme, network location, path, query, fragment |
| 148 | identifier). |
| 149 | |
| 150 | <span class="versionnote">New in version 2.2.</span> |
| 151 | |
| 152 | </dl> |
| 153 | |
| 154 | <P> |
| 155 | <dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"> |
| 156 | <td><nobr><b><tt id='l2h-3543' xml:id='l2h-3543' class="function">urlunsplit</tt></b>(</nobr></td> |
| 157 | <td><var>tuple</var>)</td></tr></table></dt> |
| 158 | <dd> |
| 159 | Combine the elements of a tuple as returned by <tt class="function">urlsplit()</tt> |
| 160 | into a complete URL as a string. |
| 161 | |
| 162 | <span class="versionnote">New in version 2.2.</span> |
| 163 | |
| 164 | </dl> |
| 165 | |
| 166 | <P> |
| 167 | <dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"> |
| 168 | <td><nobr><b><tt id='l2h-3544' xml:id='l2h-3544' class="function">urljoin</tt></b>(</nobr></td> |
| 169 | <td><var>base, url</var><big>[</big><var>, allow_fragments</var><big>]</big><var></var>)</td></tr></table></dt> |
| 170 | <dd> |
| 171 | Construct a full (``absolute'') URL by combining a ``base URL'' |
| 172 | (<var>base</var>) with a ``relative URL'' (<var>url</var>). Informally, this |
| 173 | uses components of the base URL, in particular the addressing scheme, |
| 174 | the network location and (part of) the path, to provide missing |
| 175 | components in the relative URL. |
| 176 | |
| 177 | <P> |
| 178 | Example: |
| 179 | |
| 180 | <P> |
| 181 | <div class="verbatim"><pre> |
| 182 | urljoin('http://www.cwi.nl/%7Eguido/Python.html', 'FAQ.html') |
| 183 | </pre></div> |
| 184 | |
| 185 | <P> |
| 186 | yields the string |
| 187 | |
| 188 | <P> |
| 189 | <div class="verbatim"><pre> |
| 190 | 'http://www.cwi.nl/%7Eguido/FAQ.html' |
| 191 | </pre></div> |
| 192 | |
| 193 | <P> |
| 194 | The <var>allow_fragments</var> argument has the same meaning as for |
| 195 | <code>urlparse()</code>. |
| 196 | </dl> |
| 197 | |
| 198 | <P> |
| 199 | <dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"> |
| 200 | <td><nobr><b><tt id='l2h-3545' xml:id='l2h-3545' class="function">urldefrag</tt></b>(</nobr></td> |
| 201 | <td><var>url</var>)</td></tr></table></dt> |
| 202 | <dd> |
| 203 | If <var>url</var> contains a fragment identifier, returns a modified |
| 204 | version of <var>url</var> with no fragment identifier, and the fragment |
| 205 | identifier as a separate string. If there is no fragment identifier |
| 206 | in <var>url</var>, returns <var>url</var> unmodified and an empty string. |
| 207 | </dl> |
| 208 | |
| 209 | <P> |
| 210 | <div class="seealso"> |
| 211 | <p class="heading">See Also:</p> |
| 212 | |
| 213 | <dl compact="compact" class="seerfc"> |
| 214 | <dt><a href="http://www.faqs.org/rfcs/rfc1738.html" |
| 215 | title="Uniform Resource Locators (URL)" |
| 216 | >RFC 1738, <em>Uniform Resource Locators (URL)</em></a> |
| 217 | <dd> |
| 218 | This specifies the formal syntax and semantics of absolute |
| 219 | URLs. |
| 220 | </dl> |
| 221 | <dl compact="compact" class="seerfc"> |
| 222 | <dt><a href="http://www.faqs.org/rfcs/rfc1808.html" |
| 223 | title="Relative Uniform Resource Locators" |
| 224 | >RFC 1808, <em>Relative Uniform Resource Locators</em></a> |
| 225 | <dd> |
| 226 | This Request For Comments includes the rules for joining an |
| 227 | absolute and a relative URL, including a fair number of |
| 228 | ``Abnormal Examples'' which govern the treatment of border |
| 229 | cases. |
| 230 | </dl> |
| 231 | <dl compact="compact" class="seerfc"> |
| 232 | <dt><a href="http://www.faqs.org/rfcs/rfc2396.html" |
| 233 | title="Uniform Resource Identifiers (URI): Generic Syntax" |
| 234 | >RFC 2396, <em>Uniform Resource Identifiers (URI): Generic Syntax</em></a> |
| 235 | <dd> |
| 236 | Document describing the generic syntactic requirements for |
| 237 | both Uniform Resource Names (URNs) and Uniform Resource |
| 238 | Locators (URLs). |
| 239 | </dl> |
| 240 | </div> |
| 241 | |
| 242 | <DIV CLASS="navigation"> |
| 243 | <div class='online-navigation'> |
| 244 | <p></p><hr /> |
| 245 | <table align="center" width="100%" cellpadding="0" cellspacing="2"> |
| 246 | <tr> |
| 247 | <td class='online-navigation'><a rel="prev" title="11.14.2 Telnet Example" |
| 248 | href="telnet-example.html"><img src='../icons/previous.png' |
| 249 | border='0' height='32' alt='Previous Page' width='32' /></A></td> |
| 250 | <td class='online-navigation'><a rel="parent" title="11. Internet Protocols and" |
| 251 | href="internet.html"><img src='../icons/up.png' |
| 252 | border='0' height='32' alt='Up One Level' width='32' /></A></td> |
| 253 | <td class='online-navigation'><a rel="next" title="11.16 SocketServer " |
| 254 | href="module-SocketServer.html"><img src='../icons/next.png' |
| 255 | border='0' height='32' alt='Next Page' width='32' /></A></td> |
| 256 | <td align="center" width="100%">Python Library Reference</td> |
| 257 | <td class='online-navigation'><a rel="contents" title="Table of Contents" |
| 258 | href="contents.html"><img src='../icons/contents.png' |
| 259 | border='0' height='32' alt='Contents' width='32' /></A></td> |
| 260 | <td class='online-navigation'><a href="modindex.html" title="Module Index"><img src='../icons/modules.png' |
| 261 | border='0' height='32' alt='Module Index' width='32' /></a></td> |
| 262 | <td class='online-navigation'><a rel="index" title="Index" |
| 263 | href="genindex.html"><img src='../icons/index.png' |
| 264 | border='0' height='32' alt='Index' width='32' /></A></td> |
| 265 | </tr></table> |
| 266 | <div class='online-navigation'> |
| 267 | <b class="navlabel">Previous:</b> |
| 268 | <a class="sectref" rel="prev" href="telnet-example.html">11.14.2 Telnet Example</A> |
| 269 | <b class="navlabel">Up:</b> |
| 270 | <a class="sectref" rel="parent" href="internet.html">11. Internet Protocols and</A> |
| 271 | <b class="navlabel">Next:</b> |
| 272 | <a class="sectref" rel="next" href="module-SocketServer.html">11.16 SocketServer </A> |
| 273 | </div> |
| 274 | </div> |
| 275 | <hr /> |
| 276 | <span class="release-info">Release 2.4.2, documentation updated on 28 September 2005.</span> |
| 277 | </DIV> |
| 278 | <!--End of Navigation Panel--> |
| 279 | <ADDRESS> |
| 280 | See <i><a href="about.html">About this document...</a></i> for information on suggesting changes. |
| 281 | </ADDRESS> |
| 282 | </BODY> |
| 283 | </HTML> |