Commit | Line | Data |
---|---|---|
920dae64 AT |
1 | <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"> |
2 | <html> | |
3 | <head> | |
4 | <link rel="STYLESHEET" href="lib.css" type='text/css' /> | |
5 | <link rel="SHORTCUT ICON" href="../icons/pyfav.png" type="image/png" /> | |
6 | <link rel='start' href='../index.html' title='Python Documentation Index' /> | |
7 | <link rel="first" href="lib.html" title='Python Library Reference' /> | |
8 | <link rel='contents' href='contents.html' title="Contents" /> | |
9 | <link rel='index' href='genindex.html' title='Index' /> | |
10 | <link rel='last' href='about.html' title='About this document...' /> | |
11 | <link rel='help' href='about.html' title='About this document...' /> | |
12 | <link rel="next" href="module-SocketServer.html" /> | |
13 | <link rel="prev" href="module-telnetlib.html" /> | |
14 | <link rel="parent" href="internet.html" /> | |
15 | <link rel="next" href="module-SocketServer.html" /> | |
16 | <meta name='aesop' content='information' /> | |
17 | <title>11.15 urlparse -- Parse URLs into components</title> | |
18 | </head> | |
19 | <body> | |
20 | <DIV CLASS="navigation"> | |
21 | <div id='top-navigation-panel' xml:id='top-navigation-panel'> | |
22 | <table align="center" width="100%" cellpadding="0" cellspacing="2"> | |
23 | <tr> | |
24 | <td class='online-navigation'><a rel="prev" title="11.14.2 Telnet Example" | |
25 | href="telnet-example.html"><img src='../icons/previous.png' | |
26 | border='0' height='32' alt='Previous Page' width='32' /></A></td> | |
27 | <td class='online-navigation'><a rel="parent" title="11. Internet Protocols and" | |
28 | href="internet.html"><img src='../icons/up.png' | |
29 | border='0' height='32' alt='Up One Level' width='32' /></A></td> | |
30 | <td class='online-navigation'><a rel="next" title="11.16 SocketServer " | |
31 | href="module-SocketServer.html"><img src='../icons/next.png' | |
32 | border='0' height='32' alt='Next Page' width='32' /></A></td> | |
33 | <td align="center" width="100%">Python Library Reference</td> | |
34 | <td class='online-navigation'><a rel="contents" title="Table of Contents" | |
35 | href="contents.html"><img src='../icons/contents.png' | |
36 | border='0' height='32' alt='Contents' width='32' /></A></td> | |
37 | <td class='online-navigation'><a href="modindex.html" title="Module Index"><img src='../icons/modules.png' | |
38 | border='0' height='32' alt='Module Index' width='32' /></a></td> | |
39 | <td class='online-navigation'><a rel="index" title="Index" | |
40 | href="genindex.html"><img src='../icons/index.png' | |
41 | border='0' height='32' alt='Index' width='32' /></A></td> | |
42 | </tr></table> | |
43 | <div class='online-navigation'> | |
44 | <b class="navlabel">Previous:</b> | |
45 | <a class="sectref" rel="prev" href="telnet-example.html">11.14.2 Telnet Example</A> | |
46 | <b class="navlabel">Up:</b> | |
47 | <a class="sectref" rel="parent" href="internet.html">11. Internet Protocols and</A> | |
48 | <b class="navlabel">Next:</b> | |
49 | <a class="sectref" rel="next" href="module-SocketServer.html">11.16 SocketServer </A> | |
50 | </div> | |
51 | <hr /></div> | |
52 | </DIV> | |
53 | <!--End of Navigation Panel--> | |
54 | ||
55 | <H1><A NAME="SECTION00131500000000000000000"> | |
56 | 11.15 <tt class="module">urlparse</tt> -- | |
57 | Parse URLs into components</A> | |
58 | </H1> | |
59 | <A NAME="module-urlparse"></A> | |
60 | <P> | |
61 | ||
62 | <P> | |
63 | <a id='l2h-3546' xml:id='l2h-3546'></a> | |
64 | <a id='l2h-3538' xml:id='l2h-3538'></a><a id='l2h-3539' xml:id='l2h-3539'></a> | |
65 | <P> | |
66 | This module defines a standard interface to break Uniform Resource | |
67 | Locator (URL) strings up in components (addressing scheme, network | |
68 | location, path etc.), to combine the components back into a URL | |
69 | string, and to convert a ``relative URL'' to an absolute URL given a | |
70 | ``base URL.'' | |
71 | ||
72 | <P> | |
73 | The module has been designed to match the Internet RFC on Relative | |
74 | Uniform Resource Locators (and discovered a bug in an earlier | |
75 | draft!). | |
76 | ||
77 | <P> | |
78 | It defines the following functions: | |
79 | ||
80 | <P> | |
81 | <dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"> | |
82 | <td><nobr><b><tt id='l2h-3540' xml:id='l2h-3540' class="function">urlparse</tt></b>(</nobr></td> | |
83 | <td><var>urlstring</var><big>[</big><var>, default_scheme</var><big>[</big><var>, allow_fragments</var><big>]</big><var></var><big>]</big><var></var>)</td></tr></table></dt> | |
84 | <dd> | |
85 | Parse a URL into 6 components, returning a 6-tuple: (addressing | |
86 | scheme, network location, path, parameters, query, fragment | |
87 | identifier). This corresponds to the general structure of a URL: | |
88 | <code><var>scheme</var>://<var>netloc</var>/<var>path</var>;<var>parameters</var>?<var>query</var>#<var>fragment</var></code>. | |
89 | Each tuple item is a string, possibly empty. | |
90 | The components are not broken up in smaller parts (e.g. the network | |
91 | location is a single string), and % escapes are not expanded. | |
92 | The delimiters as shown above are not part of the tuple items, | |
93 | except for a leading slash in the <var>path</var> component, which is | |
94 | retained if present. | |
95 | ||
96 | <P> | |
97 | Example: | |
98 | ||
99 | <P> | |
100 | <div class="verbatim"><pre> | |
101 | urlparse('http://www.cwi.nl:80/%7Eguido/Python.html') | |
102 | </pre></div> | |
103 | ||
104 | <P> | |
105 | yields the tuple | |
106 | ||
107 | <P> | |
108 | <div class="verbatim"><pre> | |
109 | ('http', 'www.cwi.nl:80', '/%7Eguido/Python.html', '', '', '') | |
110 | </pre></div> | |
111 | ||
112 | <P> | |
113 | If the <var>default_scheme</var> argument is specified, it gives the | |
114 | default addressing scheme, to be used only if the URL string does not | |
115 | specify one. The default value for this argument is the empty string. | |
116 | ||
117 | <P> | |
118 | If the <var>allow_fragments</var> argument is zero, fragment identifiers | |
119 | are not allowed, even if the URL's addressing scheme normally does | |
120 | support them. The default value for this argument is <code>1</code>. | |
121 | </dl> | |
122 | ||
123 | <P> | |
124 | <dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"> | |
125 | <td><nobr><b><tt id='l2h-3541' xml:id='l2h-3541' class="function">urlunparse</tt></b>(</nobr></td> | |
126 | <td><var>tuple</var>)</td></tr></table></dt> | |
127 | <dd> | |
128 | Construct a URL string from a tuple as returned by <code>urlparse()</code>. | |
129 | This may result in a slightly different, but equivalent URL, if the | |
130 | URL that was parsed originally had redundant delimiters, e.g. a ? with | |
131 | an empty query (the draft states that these are equivalent). | |
132 | </dl> | |
133 | ||
134 | <P> | |
135 | <dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"> | |
136 | <td><nobr><b><tt id='l2h-3542' xml:id='l2h-3542' class="function">urlsplit</tt></b>(</nobr></td> | |
137 | <td><var>urlstring</var><big>[</big><var>, | |
138 | default_scheme</var><big>[</big><var>, allow_fragments</var><big>]</big><var></var><big>]</big><var></var>)</td></tr></table></dt> | |
139 | <dd> | |
140 | This is similar to <tt class="function">urlparse()</tt>, but does not split the | |
141 | params from the URL. This should generally be used instead of | |
142 | <tt class="function">urlparse()</tt> if the more recent URL syntax allowing | |
143 | parameters to be applied to each segment of the <var>path</var> portion of | |
144 | the URL (see <a class="rfc" id='rfcref-90402' xml:id='rfcref-90402' | |
145 | href="http://www.faqs.org/rfcs/rfc2396.html">RFC 2396</a>) is wanted. A separate function is needed to | |
146 | separate the path segments and parameters. This function returns a | |
147 | 5-tuple: (addressing scheme, network location, path, query, fragment | |
148 | identifier). | |
149 | ||
150 | <span class="versionnote">New in version 2.2.</span> | |
151 | ||
152 | </dl> | |
153 | ||
154 | <P> | |
155 | <dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"> | |
156 | <td><nobr><b><tt id='l2h-3543' xml:id='l2h-3543' class="function">urlunsplit</tt></b>(</nobr></td> | |
157 | <td><var>tuple</var>)</td></tr></table></dt> | |
158 | <dd> | |
159 | Combine the elements of a tuple as returned by <tt class="function">urlsplit()</tt> | |
160 | into a complete URL as a string. | |
161 | ||
162 | <span class="versionnote">New in version 2.2.</span> | |
163 | ||
164 | </dl> | |
165 | ||
166 | <P> | |
167 | <dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"> | |
168 | <td><nobr><b><tt id='l2h-3544' xml:id='l2h-3544' class="function">urljoin</tt></b>(</nobr></td> | |
169 | <td><var>base, url</var><big>[</big><var>, allow_fragments</var><big>]</big><var></var>)</td></tr></table></dt> | |
170 | <dd> | |
171 | Construct a full (``absolute'') URL by combining a ``base URL'' | |
172 | (<var>base</var>) with a ``relative URL'' (<var>url</var>). Informally, this | |
173 | uses components of the base URL, in particular the addressing scheme, | |
174 | the network location and (part of) the path, to provide missing | |
175 | components in the relative URL. | |
176 | ||
177 | <P> | |
178 | Example: | |
179 | ||
180 | <P> | |
181 | <div class="verbatim"><pre> | |
182 | urljoin('http://www.cwi.nl/%7Eguido/Python.html', 'FAQ.html') | |
183 | </pre></div> | |
184 | ||
185 | <P> | |
186 | yields the string | |
187 | ||
188 | <P> | |
189 | <div class="verbatim"><pre> | |
190 | 'http://www.cwi.nl/%7Eguido/FAQ.html' | |
191 | </pre></div> | |
192 | ||
193 | <P> | |
194 | The <var>allow_fragments</var> argument has the same meaning as for | |
195 | <code>urlparse()</code>. | |
196 | </dl> | |
197 | ||
198 | <P> | |
199 | <dl><dt><table cellpadding="0" cellspacing="0"><tr valign="baseline"> | |
200 | <td><nobr><b><tt id='l2h-3545' xml:id='l2h-3545' class="function">urldefrag</tt></b>(</nobr></td> | |
201 | <td><var>url</var>)</td></tr></table></dt> | |
202 | <dd> | |
203 | If <var>url</var> contains a fragment identifier, returns a modified | |
204 | version of <var>url</var> with no fragment identifier, and the fragment | |
205 | identifier as a separate string. If there is no fragment identifier | |
206 | in <var>url</var>, returns <var>url</var> unmodified and an empty string. | |
207 | </dl> | |
208 | ||
209 | <P> | |
210 | <div class="seealso"> | |
211 | <p class="heading">See Also:</p> | |
212 | ||
213 | <dl compact="compact" class="seerfc"> | |
214 | <dt><a href="http://www.faqs.org/rfcs/rfc1738.html" | |
215 | title="Uniform Resource Locators (URL)" | |
216 | >RFC 1738, <em>Uniform Resource Locators (URL)</em></a> | |
217 | <dd> | |
218 | This specifies the formal syntax and semantics of absolute | |
219 | URLs. | |
220 | </dl> | |
221 | <dl compact="compact" class="seerfc"> | |
222 | <dt><a href="http://www.faqs.org/rfcs/rfc1808.html" | |
223 | title="Relative Uniform Resource Locators" | |
224 | >RFC 1808, <em>Relative Uniform Resource Locators</em></a> | |
225 | <dd> | |
226 | This Request For Comments includes the rules for joining an | |
227 | absolute and a relative URL, including a fair number of | |
228 | ``Abnormal Examples'' which govern the treatment of border | |
229 | cases. | |
230 | </dl> | |
231 | <dl compact="compact" class="seerfc"> | |
232 | <dt><a href="http://www.faqs.org/rfcs/rfc2396.html" | |
233 | title="Uniform Resource Identifiers (URI): Generic Syntax" | |
234 | >RFC 2396, <em>Uniform Resource Identifiers (URI): Generic Syntax</em></a> | |
235 | <dd> | |
236 | Document describing the generic syntactic requirements for | |
237 | both Uniform Resource Names (URNs) and Uniform Resource | |
238 | Locators (URLs). | |
239 | </dl> | |
240 | </div> | |
241 | ||
242 | <DIV CLASS="navigation"> | |
243 | <div class='online-navigation'> | |
244 | <p></p><hr /> | |
245 | <table align="center" width="100%" cellpadding="0" cellspacing="2"> | |
246 | <tr> | |
247 | <td class='online-navigation'><a rel="prev" title="11.14.2 Telnet Example" | |
248 | href="telnet-example.html"><img src='../icons/previous.png' | |
249 | border='0' height='32' alt='Previous Page' width='32' /></A></td> | |
250 | <td class='online-navigation'><a rel="parent" title="11. Internet Protocols and" | |
251 | href="internet.html"><img src='../icons/up.png' | |
252 | border='0' height='32' alt='Up One Level' width='32' /></A></td> | |
253 | <td class='online-navigation'><a rel="next" title="11.16 SocketServer " | |
254 | href="module-SocketServer.html"><img src='../icons/next.png' | |
255 | border='0' height='32' alt='Next Page' width='32' /></A></td> | |
256 | <td align="center" width="100%">Python Library Reference</td> | |
257 | <td class='online-navigation'><a rel="contents" title="Table of Contents" | |
258 | href="contents.html"><img src='../icons/contents.png' | |
259 | border='0' height='32' alt='Contents' width='32' /></A></td> | |
260 | <td class='online-navigation'><a href="modindex.html" title="Module Index"><img src='../icons/modules.png' | |
261 | border='0' height='32' alt='Module Index' width='32' /></a></td> | |
262 | <td class='online-navigation'><a rel="index" title="Index" | |
263 | href="genindex.html"><img src='../icons/index.png' | |
264 | border='0' height='32' alt='Index' width='32' /></A></td> | |
265 | </tr></table> | |
266 | <div class='online-navigation'> | |
267 | <b class="navlabel">Previous:</b> | |
268 | <a class="sectref" rel="prev" href="telnet-example.html">11.14.2 Telnet Example</A> | |
269 | <b class="navlabel">Up:</b> | |
270 | <a class="sectref" rel="parent" href="internet.html">11. Internet Protocols and</A> | |
271 | <b class="navlabel">Next:</b> | |
272 | <a class="sectref" rel="next" href="module-SocketServer.html">11.16 SocketServer </A> | |
273 | </div> | |
274 | </div> | |
275 | <hr /> | |
276 | <span class="release-info">Release 2.4.2, documentation updated on 28 September 2005.</span> | |
277 | </DIV> | |
278 | <!--End of Navigation Panel--> | |
279 | <ADDRESS> | |
280 | See <i><a href="about.html">About this document...</a></i> for information on suggesting changes. | |
281 | </ADDRESS> | |
282 | </BODY> | |
283 | </HTML> |