Commit | Line | Data |
---|---|---|
d60e8dff MK |
1 | .\" Copyright (c) 1986 Regents of the University of California. |
2 | .\" All rights reserved. The Berkeley software License Agreement | |
3 | .\" specifies the terms and conditions for redistribution. | |
4 | .\" | |
5 | .\" @(#)2.t 1.2 (Berkeley) %G% | |
6 | .\" | |
200989e9 MK |
7 | .ds RH "Basics |
8 | .bp | |
9 | .nr H1 2 | |
10 | .nr H2 0 | |
11 | .bp | |
12 | .LG | |
13 | .B | |
14 | .ce | |
15 | 2. BASICS | |
16 | .sp 2 | |
17 | .R | |
18 | .NL | |
19 | .PP | |
20 | The basic building block for communication is the \fIsocket\fP. | |
21 | A socket is an endpoint of communication to which a name may | |
22 | be \fIbound\fP. Each socket in use has a \fItype\fP | |
23 | and one or more associated processes. Sockets exist within | |
24 | \fIcommunication domains\fP. | |
25 | A communication domain is an | |
26 | abstraction introduced to bundle common properties of | |
27 | processes communicating through sockets. | |
28 | One such property is the scheme used to name sockets. For | |
29 | example, in the UNIX communication domain sockets are | |
30 | named with UNIX path names; e.g. a | |
31 | socket may be named \*(lq/dev/foo\*(rq. Sockets normally | |
32 | exchange data only with | |
33 | sockets in the same domain (it may be possible to cross domain | |
34 | boundaries, but only if some translation process is | |
35 | performed). The | |
d60e8dff MK |
36 | 4.3BSD IPC facilities support three separate communication domains: |
37 | the UNIX domain, for on-system communication; | |
38 | the Internet domain, which is used by | |
200989e9 | 39 | processes which communicate |
d60e8dff MK |
40 | using the the DARPA standard communication protocols; |
41 | and the NS domain, which is used by processes which | |
42 | communicate using the Xerox standard communication | |
43 | protocols*. | |
44 | .FS | |
45 | * See \fIInternet Transport Protocols\fP, Xerox System Integration | |
46 | Standard (XSIS)028112 for more information. This document is | |
47 | almost a necessity for one trying to write NS applications. | |
48 | .FE | |
200989e9 MK |
49 | The underlying communication |
50 | facilities provided by these domains have a significant influence | |
51 | on the internal system implementation as well as the interface to | |
52 | socket facilities available to a user. An example of the | |
53 | latter is that a socket \*(lqoperating\*(rq in the UNIX domain | |
d60e8dff MK |
54 | sees a subset of the error conditions which are possible |
55 | when operating in the Internet (or NS) domain. | |
200989e9 MK |
56 | .NH 2 |
57 | Socket types | |
58 | .PP | |
59 | Sockets are | |
60 | typed according to the communication properties visible to a | |
61 | user. | |
62 | Processes are presumed to communicate only between sockets of | |
63 | the same type, although there is | |
64 | nothing that prevents communication between sockets of different | |
65 | types should the underlying communication | |
66 | protocols support this. | |
67 | .PP | |
d60e8dff | 68 | Four types of sockets currently are available to a user. |
200989e9 MK |
69 | A \fIstream\fP socket provides for the bidirectional, reliable, |
70 | sequenced, and unduplicated flow of data without record boundaries. | |
71 | Aside from the bidirectionality of data flow, a pair of connected | |
d60e8dff | 72 | stream sockets provides an interface nearly identical to that of pipes\(dg. |
200989e9 | 73 | .FS |
d60e8dff | 74 | \(dg In the UNIX domain, in fact, the semantics are identical and, |
200989e9 MK |
75 | as one might expect, pipes have been implemented internally |
76 | as simply a pair of connected stream sockets. | |
77 | .FE | |
78 | .PP | |
79 | A \fIdatagram\fP socket supports bidirectional flow of data which | |
80 | is not promised to be sequenced, reliable, or unduplicated. | |
81 | That is, a process | |
82 | receiving messages on a datagram socket may find messages duplicated, | |
83 | and, possibly, | |
84 | in an order different from the order in which it was sent. | |
85 | An important characteristic of a datagram | |
86 | socket is that record boundaries in data are preserved. Datagram | |
87 | sockets closely model the facilities found in many contemporary | |
88 | packet switched networks such as the Ethernet. | |
89 | .PP | |
90 | A \fIraw\fP socket provides users access to | |
91 | the underlying communication | |
92 | protocols which support socket abstractions. | |
93 | These sockets are normally datagram oriented, though their | |
94 | exact characteristics are dependent on the interface provided by | |
95 | the protocol. Raw sockets are not intended for the general user; they | |
96 | have been provided mainly for those interested in developing new | |
97 | communication protocols, or for gaining access to some of the more | |
98 | esoteric facilities of an existing protocol. The use of raw sockets | |
99 | is considered in section 5. | |
100 | .PP | |
d60e8dff MK |
101 | A \fIsequenced packet\fP socket is similar to a stream socket, |
102 | with the exception that record boundaries are preserved. This | |
103 | interface is provided only as part of the NS socket abstraction, | |
104 | and is very important in most serious NS applications. | |
105 | Sequenced-packet sockets allow the user to manipulate the | |
106 | SPP or IDP headers on a packet or a group of packets either | |
107 | by writing a prototype header along with whatever data is | |
108 | to be sent, or by specifying a default header to be used with | |
109 | all outgoing data, and allows the user to receive the headers | |
110 | on incoming packets. The use of these options is considered in | |
111 | section 5. | |
112 | .PP | |
113 | Another potential socket type which has interesting properties is | |
114 | the \fIreliably delivered | |
115 | message\fP socket. | |
200989e9 MK |
116 | The reliably delivered message socket has |
117 | similar properties to a datagram socket, but with | |
d60e8dff MK |
118 | reliable delivery. There is currently no support for this |
119 | type of socket, but a reliably delivered message protocol | |
120 | similar to Xerox's Packet Exchange Protocol (PEX) may be | |
121 | simulated at the user level. More information on this topic | |
122 | can be found in section 5. | |
200989e9 MK |
123 | .NH 2 |
124 | Socket creation | |
125 | .PP | |
126 | To create a socket the \fIsocket\fP system call is used: | |
127 | .DS | |
128 | s = socket(domain, type, protocol); | |
129 | .DE | |
130 | This call requests that the system create a socket in the specified | |
131 | \fIdomain\fP and of the specified \fItype\fP. A particular protocol may | |
132 | also be requested. If the protocol is left unspecified (a value | |
133 | of 0), the system will select an appropriate protocol from those | |
134 | protocols which comprise the communication domain and which | |
135 | may be used to support the requested socket type. The user is | |
136 | returned a descriptor (a small integer number) which may be used | |
137 | in later system calls which operate on sockets. The domain is specified as | |
138 | one of the manifest constants defined in the file <\fIsys/socket.h\fP>. | |
139 | For the UNIX domain the constant is AF_UNIX*; for the Internet | |
140 | .FS | |
141 | * The manifest constants are named AF_whatever as they indicate | |
142 | the ``address format'' to use in interpreting names. | |
143 | .FE | |
d60e8dff MK |
144 | domain AF_INET; and for the NS domain, AF_NS. |
145 | The socket types are also defined in this file | |
146 | and one of SOCK_STREAM, SOCK_DGRAM, SOCK_RAW, or SOCK_SEQPACKET | |
147 | must be specified. | |
200989e9 MK |
148 | To create a stream socket in the Internet domain the following |
149 | call might be used: | |
150 | .DS | |
151 | s = socket(AF_INET, SOCK_STREAM, 0); | |
152 | .DE | |
153 | This call would result in a stream socket being created with the TCP | |
154 | protocol providing the underlying communication support. To | |
d60e8dff | 155 | create a datagram socket for on-machine use the call might |
200989e9 MK |
156 | be: |
157 | .DS | |
158 | s = socket(AF_UNIX, SOCK_DGRAM, 0); | |
159 | .DE | |
160 | .PP | |
d60e8dff MK |
161 | The default protocol (used when the \fIprotocol\fP argument to the |
162 | \fIsocket\fP call is 0) should be correct for most every | |
163 | situation. However, it is possible to specify a protocol | |
164 | other than the default; this will be covered in | |
165 | section 5. | |
200989e9 MK |
166 | .PP |
167 | There are several reasons a socket call may fail. Aside from | |
168 | the rare occurrence of lack of memory (ENOBUFS), a socket | |
169 | request may fail due to a request for an unknown protocol | |
170 | (EPROTONOSUPPORT), or a request for a type of socket for | |
171 | which there is no supporting protocol (EPROTOTYPE). | |
172 | .NH 2 | |
d60e8dff | 173 | Binding local names |
200989e9 MK |
174 | .PP |
175 | A socket is created without a name. Until a name is bound | |
176 | to a socket, processes have no way to reference it and, consequently, | |
d60e8dff MK |
177 | no messages may be received on it. |
178 | Communicating processes are bound | |
179 | by an \fIassociation\fP. In the Internet and NS domains, | |
180 | an association | |
181 | is composed of local and foreign | |
182 | addresses, and local and foreign ports, | |
183 | while in the UNIX domain, an association is composed of | |
184 | local and foreign path names (the phrase ``foreign pathname'' | |
185 | means a pathname created by a foreign process, not a pathname | |
186 | on a foreign system). | |
187 | Associations are always unique. That is, in the Internet domain, there | |
188 | may never be duplicate <protocol, local address, local port, foreign | |
189 | address, foreign port> tuples. Similarly, in the UNIX domain, | |
190 | there may never be duplicate <protocol, local pathname, foreign | |
191 | pathname> tuples, and the pathnames must (in 4.3; the situation | |
192 | may change in future releases) be unique with respect to the files | |
193 | already existing on the system. | |
194 | .PP | |
195 | The \fIbind\fP system call allows a process to specify half of | |
196 | an association, <local address, local port> | |
197 | (or <local pathname>), while the \fIconnect\fP | |
198 | and \fIaccept\fP primitives are used to complete a socket's association. | |
199 | .PP | |
200 | In the Internet domain, | |
201 | binding names to sockets can be fairly complex. | |
202 | Fortunately, it is usually not necessary to specifically bind an | |
203 | address and port number to a socket, because the | |
204 | \fIconnect\fP and \fIsend\fP calls will automatically | |
205 | bind an appropriate address if they are used with an | |
206 | unbound socket. The process of binding names to NS | |
207 | sockets is similar in most ways to that of | |
208 | binding names to Internet sockets. | |
209 | While binding names to sockets in the | |
210 | UNIX domain is less complex, the \fIconnect\fP and \fIsend\fP | |
211 | calls can still be used to automatically bind local names. | |
212 | .PP | |
213 | The \fIbind\fP system call is used as follows: | |
200989e9 MK |
214 | .DS |
215 | bind(s, name, namelen); | |
216 | .DE | |
217 | The bound name is a variable length byte string which is interpreted | |
218 | by the supporting protocol(s). Its interpretation may vary from | |
219 | communication domain to communication domain (this is one of | |
d60e8dff MK |
220 | the properties which comprise the \*(lqdomain\*(rq). |
221 | As mentioned, in the | |
222 | Internet domain names contain an Internet address and port | |
223 | number. NS domain names contain a NS address and | |
224 | port number. In the UNIX domain, names contain a path name and | |
225 | a family, which is always AF_UNIX. If one wanted to bind | |
226 | the name \*(lq/tmp/foo\*(rq to a UNIX domain socket, the | |
227 | following code would be used*: | |
228 | .FS | |
229 | * Note that, although the tendency here is to call the \*(lqaddr\*(rq | |
230 | structure \*(lqsun\*(rq, doing so would cause problems if the code | |
231 | were ever ported to a Sun workstation. | |
232 | .FE | |
200989e9 | 233 | .DS |
d60e8dff MK |
234 | #include <sys/un.h> |
235 | ... | |
236 | struct sockaddr_un addr; | |
237 | ... | |
238 | strcpy(addr.sun_path, "/tmp/foo"); | |
239 | addr.sun_family = AF_UNIX; | |
240 | bind(s, (struct sockaddr *) &addr, strlen(addr.sun_path) + | |
241 | sizeof (addr.sun_family)); | |
200989e9 | 242 | .DE |
d60e8dff MK |
243 | Note that in determining the size of a UNIX domain address null |
244 | bytes are not counted, which is why \fIstrlen\fP is used. In | |
245 | the current implementation of UNIX domain IPC under 4.3BSD, | |
246 | the file name | |
247 | referred to in \fIaddr.sun_path\fP is created as a socket | |
248 | in the system file space. | |
249 | The caller must, therefore, have | |
250 | write permission in the directory where | |
251 | \fIaddr.sun_path\fP is to reside, and this file should be deleted by the | |
252 | caller when it is no longer needed. Future versions of 4BSD | |
253 | may not create this file. | |
254 | .PP | |
255 | In binding an Internet address things become more | |
256 | complicated. The actual call is similar, | |
200989e9 MK |
257 | .DS |
258 | #include <sys/types.h> | |
259 | #include <netinet/in.h> | |
260 | ... | |
261 | struct sockaddr_in sin; | |
262 | ... | |
d60e8dff | 263 | bind(s, (struct sockaddr *) &sin, sizeof (sin)); |
200989e9 MK |
264 | .DE |
265 | but the selection of what to place in the address \fIsin\fP | |
266 | requires some discussion. We will come back to the problem | |
267 | of formulating Internet addresses in section 3 when | |
268 | the library routines used in name resolution are discussed. | |
d60e8dff MK |
269 | .PP |
270 | Binding a NS address to a socket is even more | |
271 | difficult, | |
272 | especially since the Internet library routines do not | |
273 | work with NS hostnames. The actual call is again similar: | |
274 | .DS | |
275 | #include <sys/types.h> | |
276 | #include <netns/ns.h> | |
277 | ... | |
278 | struct sockaddr_ns sns; | |
279 | ... | |
280 | bind(s, (struct sockaddr *) &sns, sizeof (sns)); | |
281 | .DE | |
282 | Again, discussion of what to place in a \*(lqstruct sockaddr_ns\*(rq | |
283 | will be deferred to section 3. | |
200989e9 MK |
284 | .NH 2 |
285 | Connection establishment | |
286 | .PP | |
287 | With a bound socket it is possible to rendezvous with | |
288 | an unrelated process. This operation is usually asymmetric | |
289 | with one process a \*(lqclient\*(rq and the other a \*(lqserver\*(rq. | |
290 | The client requests services from the server by initiating a | |
291 | \*(lqconnection\*(rq to the server's socket. The server, when | |
292 | willing to offer its advertised services, passively \*(lqlistens\*(rq | |
293 | on its socket. On the client side the \fIconnect\fP call is | |
294 | used to initiate a connection. Using the UNIX domain, this | |
295 | might appear as, | |
296 | .DS | |
d60e8dff MK |
297 | struct sockaddr_un server; |
298 | ... | |
299 | connect(s, (struct sockaddr *)&server, strlen(server.sun_path) + | |
300 | sizeof (server.sun_family)); | |
200989e9 MK |
301 | .DE |
302 | while in the Internet domain, | |
303 | .DS | |
304 | struct sockaddr_in server; | |
d60e8dff MK |
305 | ... |
306 | connect(s, (struct sockaddr *)&server, sizeof (server)); | |
307 | .DE | |
308 | and in the NS domain, | |
309 | .DS | |
310 | struct sockaddr_ns server; | |
311 | ... | |
312 | connect(s, (struct sockaddr *)&server, sizeof (server)); | |
200989e9 | 313 | .DE |
d60e8dff MK |
314 | where \fIserver\fP in the example above would contain either the UNIX |
315 | pathname, Internet address and port number, or NS address and | |
316 | port number of the server to which the | |
317 | client process wishes to speak. | |
200989e9 MK |
318 | If the client process's socket is unbound at the time of |
319 | the connect call, | |
320 | the system will automatically select and bind a name to | |
321 | the socket; c.f. section 5.4. | |
d60e8dff MK |
322 | This is the usual way that local addresses are bound |
323 | to a socket. | |
324 | .PP | |
325 | An error is returned if the connection was unsuccessful | |
200989e9 MK |
326 | (any name automatically bound by the system, however, remains). |
327 | Otherwise, the socket is associated with the server and | |
d60e8dff MK |
328 | data transfer may begin. Some of the more common errors returned |
329 | when a connection attempt fails are: | |
200989e9 MK |
330 | .IP ETIMEDOUT |
331 | .br | |
332 | After failing to establish a connection for a period of time, | |
333 | the system decided there was no point in retrying the | |
334 | connection attempt any more. This usually occurs because | |
335 | the destination host is down, or because problems in | |
336 | the network resulted in transmissions being lost. | |
337 | .IP ECONNREFUSED | |
338 | .br | |
d60e8dff MK |
339 | The host refused service for some reason. |
340 | This is usually | |
200989e9 MK |
341 | due to a server process |
342 | not being present at the requested name. | |
343 | .IP "ENETDOWN or EHOSTDOWN" | |
344 | .br | |
345 | These operational errors are | |
346 | returned based on status information delivered to | |
347 | the client host by the underlying communication services. | |
348 | .IP "ENETUNREACH or EHOSTUNREACH" | |
349 | .br | |
350 | These operational errors can occur either because the network | |
351 | or host is unknown (no route to the network or host is present), | |
352 | or because of status information returned by intermediate | |
353 | gateways or switching nodes. Many times the status returned | |
354 | is not sufficient to distinguish a network being down from a | |
d60e8dff | 355 | host being down, in which case the system |
200989e9 MK |
356 | indicates the entire network is unreachable. |
357 | .PP | |
358 | For the server to receive a client's connection it must perform | |
359 | two steps after binding its socket. | |
360 | The first is to indicate a willingness to listen for | |
361 | incoming connection requests: | |
362 | .DS | |
363 | listen(s, 5); | |
364 | .DE | |
365 | The second parameter to the \fIlisten\fP call specifies the maximum | |
366 | number of outstanding connections which may be queued awaiting | |
d60e8dff MK |
367 | acceptance by the server process; this number |
368 | may be limited by the system. Should a connection be | |
200989e9 MK |
369 | requested while the queue is full, the connection will not be |
370 | refused, but rather the individual messages which comprise the | |
371 | request will be ignored. This gives a harried server time to | |
372 | make room in its pending connection queue while the client | |
373 | retries the connection request. Had the connection been returned | |
374 | with the ECONNREFUSED error, the client would be unable to tell | |
375 | if the server was up or not. As it is now it is still possible | |
376 | to get the ETIMEDOUT error back, though this is unlikely. The | |
d60e8dff | 377 | backlog figure supplied with the listen call is currently limited |
200989e9 MK |
378 | by the system to a maximum of 5 pending connections on any |
379 | one queue. This avoids the problem of processes hogging system | |
380 | resources by setting an infinite backlog, then ignoring | |
381 | all connection requests. | |
382 | .PP | |
383 | With a socket marked as listening, a server may \fIaccept\fP | |
384 | a connection: | |
385 | .DS | |
d60e8dff MK |
386 | struct sockaddr_in from; |
387 | ... | |
200989e9 | 388 | fromlen = sizeof (from); |
d60e8dff | 389 | newsock = accept(s, (struct sockaddr *)&from, &fromlen); |
200989e9 | 390 | .DE |
d60e8dff MK |
391 | (For the UNIX domain, \fIfrom\fP would be declared as a |
392 | \fIstruct sockaddr_un\fP, and for the NS domain, \fIfrom\fP | |
393 | would be declared as a \fIstruct sockaddr_ns\fP, | |
394 | but nothing different would need | |
395 | to be done as far as \fIfromlen\fP is concerned. In the examples | |
396 | which follow, only Internet routines will be discussed.) A new | |
397 | descriptor is returned on receipt of a connection (along with | |
200989e9 MK |
398 | a new socket). If the server wishes to find out who its client is, |
399 | it may supply a buffer for the client socket's name. The value-result | |
400 | parameter \fIfromlen\fP is initialized by the server to indicate how | |
401 | much space is associated with \fIfrom\fP, then modified on return | |
402 | to reflect the true size of the name. If the client's name is not | |
d60e8dff | 403 | of interest, the second parameter may be a null pointer. |
200989e9 | 404 | .PP |
d60e8dff | 405 | \fIAccept\fP normally blocks. That is, \fIaccept\fP |
200989e9 MK |
406 | will not return until a connection is available or the system call |
407 | is interrupted by a signal to the process. Further, there is no | |
408 | way for a process to indicate it will accept connections from only | |
409 | a specific individual, or individuals. It is up to the user process | |
410 | to consider who the connection is from and close down the connection | |
411 | if it does not wish to speak to the process. If the server process | |
412 | wants to accept connections on more than one socket, or not block | |
d60e8dff | 413 | on the accept call, there are alternatives; they will be considered |
200989e9 MK |
414 | in section 5. |
415 | .NH 2 | |
416 | Data transfer | |
417 | .PP | |
418 | With a connection established, data may begin to flow. To send | |
419 | and receive data there are a number of possible calls. | |
420 | With the peer entity at each end of a connection | |
421 | anchored, a user can send or receive a message without specifying | |
422 | the peer. As one might expect, in this case, then | |
d60e8dff | 423 | the normal \fIread\fP and \fIwrite\fP system calls are usable, |
200989e9 MK |
424 | .DS |
425 | write(s, buf, sizeof (buf)); | |
426 | read(s, buf, sizeof (buf)); | |
427 | .DE | |
428 | In addition to \fIread\fP and \fIwrite\fP, | |
429 | the new calls \fIsend\fP and \fIrecv\fP | |
430 | may be used: | |
431 | .DS | |
432 | send(s, buf, sizeof (buf), flags); | |
433 | recv(s, buf, sizeof (buf), flags); | |
434 | .DE | |
435 | While \fIsend\fP and \fIrecv\fP are virtually identical to | |
436 | \fIread\fP and \fIwrite\fP, | |
d60e8dff MK |
437 | the extra \fIflags\fP argument is important. The flags, |
438 | defined in \fI<sys/socket.h>\fP, may be | |
200989e9 MK |
439 | specified as a non-zero value if one or more |
440 | of the following is required: | |
441 | .DS | |
442 | .TS | |
443 | l l. | |
d60e8dff MK |
444 | MSG_OOB send/receive out of band data |
445 | MSG_PEEK look at data without reading | |
446 | MSG_DONTROUTE send data without routing packets | |
200989e9 MK |
447 | .TE |
448 | .DE | |
449 | Out of band data is a notion specific to stream sockets, and one | |
450 | which we will not immediately consider. The option to have data | |
451 | sent without routing applied to the outgoing packets is currently | |
452 | used only by the routing table management process, and is | |
453 | unlikely to be of interest to the casual user. The ability | |
d60e8dff | 454 | to preview data is, however, of interest. When MSG_PEEK |
200989e9 MK |
455 | is specified with a \fIrecv\fP call, any data present is returned |
456 | to the user, but treated as still \*(lqunread\*(rq. That | |
457 | is, the next \fIread\fP or \fIrecv\fP call applied to the socket will | |
458 | return the data previously previewed. | |
459 | .NH 2 | |
460 | Discarding sockets | |
461 | .PP | |
462 | Once a socket is no longer of interest, it may be discarded | |
463 | by applying a \fIclose\fP to the descriptor, | |
464 | .DS | |
465 | close(s); | |
466 | .DE | |
467 | If data is associated with a socket which promises reliable delivery | |
468 | (e.g. a stream socket) when a close takes place, the system will | |
469 | continue to attempt to transfer the data. | |
470 | However, after a fairly long period of | |
471 | time, if the data is still undelivered, it will be discarded. | |
472 | Should a user have no use for any pending data, it may | |
473 | perform a \fIshutdown\fP on the socket prior to closing it. | |
474 | This call is of the form: | |
475 | .DS | |
476 | shutdown(s, how); | |
477 | .DE | |
478 | where \fIhow\fP is 0 if the user is no longer interested in reading | |
479 | data, 1 if no more data will be sent, or 2 if no data is to | |
d60e8dff | 480 | be sent or received. |
200989e9 MK |
481 | .NH 2 |
482 | Connectionless sockets | |
483 | .PP | |
484 | To this point we have been concerned mostly with sockets which | |
485 | follow a connection oriented model. However, there is also | |
486 | support for connectionless interactions typical of the datagram | |
487 | facilities found in contemporary packet switched networks. | |
488 | A datagram socket provides a symmetric interface to data | |
489 | exchange. While processes are still likely to be client | |
490 | and server, there is no requirement for connection establishment. | |
491 | Instead, each message includes the destination address. | |
492 | .PP | |
493 | Datagram sockets are created as before, and each should | |
494 | have a name bound to it in order that the recipient of | |
495 | a message may identify the sender. To send data, | |
496 | the \fIsendto\fP primitive is used, | |
497 | .DS | |
d60e8dff | 498 | sendto(s, buf, buflen, flags, (struct sockaddr *)&to, tolen); |
200989e9 MK |
499 | .DE |
500 | The \fIs\fP, \fIbuf\fP, \fIbuflen\fP, and \fIflags\fP | |
501 | parameters are used as before. | |
502 | The \fIto\fP and \fItolen\fP | |
503 | values are used to indicate the intended recipient of the | |
d60e8dff MK |
504 | message. When |
505 | using an unreliable datagram interface, it is | |
200989e9 MK |
506 | unlikely any errors will be reported to the sender. Where |
507 | information is present locally to recognize a message which may | |
508 | never be delivered (for instance when a network is unreachable), | |
509 | the call will return \-1 and the global value \fIerrno\fP will | |
510 | contain an error number. | |
511 | .PP | |
512 | To receive messages on an unconnected datagram socket, the | |
513 | \fIrecvfrom\fP primitive is provided: | |
514 | .DS | |
d60e8dff | 515 | recvfrom(s, buf, buflen, flags, (struct sockaddr *)&from, &fromlen); |
200989e9 MK |
516 | .DE |
517 | Once again, the \fIfromlen\fP parameter is handled in | |
518 | a value-result fashion, initially containing the size of | |
d60e8dff MK |
519 | the \fIfrom\fP buffer, and modified on return to indicate |
520 | the actual size of the from address. | |
200989e9 MK |
521 | .PP |
522 | In addition to the two calls mentioned above, datagram | |
523 | sockets may also use the \fIconnect\fP call to associate | |
d60e8dff | 524 | a socket with a specific destination address. In this case, any |
200989e9 MK |
525 | data sent on the socket will automatically be addressed |
526 | to the connected peer, and only data received from that | |
527 | peer will be delivered to the user. Only one connected | |
528 | address is permitted for each socket (i.e. no multi-casting). | |
529 | Connect requests on datagram sockets return immediately, | |
530 | as this simply results in the system recording | |
531 | the peer's address (as compared to a stream socket where a | |
532 | connect request initiates establishment of an end to end | |
d60e8dff MK |
533 | connection). \fIAccept\fP and \fIlisten\fP are not |
534 | used with datagram sockets. | |
200989e9 MK |
535 | Other of the less |
536 | important details of datagram sockets are described | |
537 | in section 5. | |
538 | .NH 2 | |
539 | Input/Output multiplexing | |
540 | .PP | |
541 | One last facility often used in developing applications | |
542 | is the ability to multiplex i/o requests among multiple | |
543 | sockets and/or files. This is done using the \fIselect\fP | |
544 | call: | |
545 | .DS | |
d60e8dff MK |
546 | #define FD_SETSIZE 128 /* How many file descriptors we're interested in */ |
547 | ... | |
548 | #include <sys/time.h> | |
549 | #include <sys/types.h> | |
550 | ... | |
551 | ||
552 | fd_set readmask, writemask, exceptmask; | |
553 | struct timeval timeout; | |
554 | ... | |
555 | select(nfds, &readmask, &writemask, &exceptmask, &timeout); | |
200989e9 | 556 | .DE |
d60e8dff | 557 | \fISelect\fP takes as arguments pointers to three sets, one for |
200989e9 MK |
558 | the set of file descriptors for which the caller wishes to |
559 | be able to read data on, one for those descriptors to which | |
560 | data is to be written, and one for which exceptional conditions | |
d60e8dff MK |
561 | are pending; out-of-band data is the only |
562 | exceptional condition currently implemented by the socket | |
563 | abstraction. | |
564 | If it is known that the | |
565 | that the maximum number of open file descriptors will be less than | |
566 | a given value, | |
567 | then this number should be used as the definition of FD_SETSIZE | |
568 | (which must be done before \fI<sys/types>\fP is included). | |
569 | Otherwise, it is acceptable to let FD_SETSIZE default to the | |
570 | value specified in \fI<sys/types.h>\fP. | |
571 | .PP | |
572 | Each set is actually a structure containing an array of | |
573 | long integers; the length of the array is implicitly set | |
574 | by the definition of FD_SETSIZE, and the array will be | |
575 | long enough to hold one bit for each of FD_SETSIZE file descriptors. | |
576 | If the user is not interested | |
577 | in certain conditions (i.e., read, write, or exceptions), | |
578 | the corresponding argument to the \fIselect\fP should | |
579 | be a null pointer. | |
580 | .PP | |
581 | The macros \fIFD_SET(fd, &mask)\fP and | |
582 | \fIFD_CLR(fd, &mask)\fP | |
583 | have been provided for adding and removing file descriptor | |
584 | \fIfd\fP in the set \fImask\fP. The | |
585 | set should be zeroed before use, and | |
586 | the macro \fIFD_ZERO(&mask)\fP has been provided | |
587 | to clear the set \fImask\fP. | |
588 | The parameter \fInfds\fP in the \fIselect\fP call specifies the range | |
200989e9 | 589 | of file descriptors (i.e. one plus the value of the largest |
d60e8dff | 590 | descriptor) specified in a set. |
200989e9 MK |
591 | .PP |
592 | A timeout value may be specified if the selection | |
593 | is not to last more than a predetermined period of time. If | |
d60e8dff MK |
594 | the fields in \fItimeout\fP are set to 0, the selection takes |
595 | the form of a | |
200989e9 MK |
596 | \fIpoll\fP, returning immediately. If the last parameter is |
597 | a null pointer, the selection will block indefinitely*. | |
598 | .FS | |
599 | * To be more specific, a return takes place only when a | |
600 | descriptor is selectable, or when a signal is received by | |
601 | the caller, interrupting the system call. | |
602 | .FE | |
d60e8dff MK |
603 | \fISelect\fP normally returns the number of file descriptors selected; |
604 | if the \fIselect\fP call returns due to the timeout expiring, then | |
605 | the value 0 is returned. | |
606 | If the \fIselect\fP terminates because of an error, a \-1 is returned | |
607 | with the error number in \fIerrno\fP. | |
608 | .PP | |
609 | Assuming a successful return, the three sets will | |
610 | indicate which | |
611 | file descriptors are ready to be read from, written to, or | |
612 | have exceptional conditions pending. | |
613 | The status of a file descriptor in a select mask may be | |
614 | tested with the \fIFD_ISSET(fd, &mask)\fP macro, which | |
615 | returns a non-zero value if \fIfd\fP is a member of the set | |
616 | \fImask\fP, and 0 if it is not. | |
617 | .PP | |
618 | To determine if there are connections waiting | |
619 | on a socket to be used with an \fIaccept\fP call, | |
620 | \fIselect\fP can be used, followed by | |
621 | a \fIFD_ISSET(fd, &mask)\fP macro to check for read | |
622 | readiness on the appropriate socket. If \fIFD_ISSET\fP | |
623 | returns a non-zero value, indicating permission to read, then a | |
624 | connection is pending on the socket. | |
625 | .PP | |
626 | As an example, to read data from two sockets, \fIs1\fP and | |
627 | \fIs2\fP as it is available from each and with a one-second | |
628 | timeout, the following code | |
629 | might be used: | |
630 | .DS | |
631 | #include <sys/time.h> | |
632 | #include <sys/types.h> | |
633 | ... | |
634 | fd_set read_template; | |
635 | struct timeval wait; | |
636 | ... | |
637 | for (;;) { | |
638 | wait.tv_sec = 1; /* one second */ | |
639 | wait.tv_usec = 0; | |
640 | ||
641 | FD_ZERO(&read_template); | |
642 | ||
643 | FD_SET(s1, &read_template); | |
644 | FD_SET(s2, &read_template); | |
645 | ||
646 | nb = select(FD_SETSIZE, &read_template, (fd_set *) 0, (fd_set *) 0, &wait); | |
647 | if (nb <= 0) { | |
648 | \fIAn error occurred during the \fPselect\fI, or | |
649 | the \fPselect\fI timed out.\fP | |
650 | } | |
651 | ||
652 | if (FD_ISSET(s1, &read_template)) { | |
653 | \fISocket #1 is ready to be read from.\fP | |
654 | } | |
655 | ||
656 | if (FD_ISSET(s2, &read_template)) { | |
657 | \fISocket #2 is ready to be read from.\fP | |
658 | } | |
659 | } | |
660 | .DE | |
661 | .PP | |
662 | In 4.2, the arguments to \fIselect\fP were pointers to integers | |
663 | instead of pointers to \fIfd_set\fPs. This type of call | |
664 | will still work as long as the number of file descriptors | |
665 | being examined is less than the number of bits in an | |
666 | integer; however, the methods illustrated above should | |
667 | be used in all current programs. | |
200989e9 MK |
668 | .PP |
669 | \fISelect\fP provides a synchronous multiplexing scheme. | |
670 | Asynchronous notification of output completion, input availability, | |
671 | and exceptional conditions is possible through use of the | |
672 | SIGIO and SIGURG signals described in section 5. |