Commit | Line | Data |
---|---|---|
15637ed4 RG |
1 | .\" Copyright (c) 1986 The Regents of the University of California. |
2 | .\" All rights reserved. | |
3 | .\" | |
4 | .\" Redistribution and use in source and binary forms, with or without | |
5 | .\" modification, are permitted provided that the following conditions | |
6 | .\" are met: | |
7 | .\" 1. Redistributions of source code must retain the above copyright | |
8 | .\" notice, this list of conditions and the following disclaimer. | |
9 | .\" 2. Redistributions in binary form must reproduce the above copyright | |
10 | .\" notice, this list of conditions and the following disclaimer in the | |
11 | .\" documentation and/or other materials provided with the distribution. | |
12 | .\" 3. All advertising materials mentioning features or use of this software | |
13 | .\" must display the following acknowledgement: | |
14 | .\" This product includes software developed by the University of | |
15 | .\" California, Berkeley and its contributors. | |
16 | .\" 4. Neither the name of the University nor the names of its contributors | |
17 | .\" may be used to endorse or promote products derived from this software | |
18 | .\" without specific prior written permission. | |
19 | .\" | |
20 | .\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND | |
21 | .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE | |
22 | .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE | |
23 | .\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE | |
24 | .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL | |
25 | .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS | |
26 | .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) | |
27 | .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT | |
28 | .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY | |
29 | .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF | |
30 | .\" SUCH DAMAGE. | |
31 | .\" | |
32 | .\" @(#)5.t 5.1 (Berkeley) 4/17/91 | |
33 | .\" | |
34 | .\".ds RH "Advanced Topics | |
35 | .bp | |
36 | .nr H1 5 | |
37 | .nr H2 0 | |
38 | .LG | |
39 | .B | |
40 | .ce | |
41 | 5. ADVANCED TOPICS | |
42 | .sp 2 | |
43 | .R | |
44 | .NL | |
45 | .PP | |
46 | A number of facilities have yet to be discussed. For most users | |
47 | of the IPC the mechanisms already | |
48 | described will suffice in constructing distributed | |
49 | applications. However, others will find the need to utilize some | |
50 | of the features which we consider in this section. | |
51 | .NH 2 | |
52 | Out of band data | |
53 | .PP | |
54 | The stream socket abstraction includes the notion of \*(lqout | |
55 | of band\*(rq data. Out of band data is a logically independent | |
56 | transmission channel associated with each pair of connected | |
57 | stream sockets. Out of band data is delivered to the user | |
58 | independently of normal data. | |
59 | The abstraction defines that the out of band data facilities | |
60 | must support the reliable delivery of at least one | |
61 | out of band message at a time. This message may contain at least one | |
62 | byte of data, and at least one message may be pending delivery | |
63 | to the user at any one time. For communications protocols which | |
64 | support only in-band signaling (i.e. the urgent data is | |
65 | delivered in sequence with the normal data), the system normally extracts | |
66 | the data from the normal data stream and stores it separately. | |
67 | This allows users to choose between receiving the urgent data | |
68 | in order and receiving it out of sequence without having to | |
69 | buffer all the intervening data. It is possible | |
70 | to ``peek'' (via MSG_PEEK) at out of band data. | |
71 | If the socket has a process group, a SIGURG signal is generated | |
72 | when the protocol is notified of its existence. | |
73 | A process can set the process group | |
74 | or process id to be informed by the SIGURG signal via the | |
75 | appropriate \fIfcntl\fP call, as described below for | |
76 | SIGIO. | |
77 | If multiple sockets may have out of band data awaiting | |
78 | delivery, a \fIselect\fP call for exceptional conditions | |
79 | may be used to determine those sockets with such data pending. | |
80 | Neither the signal nor the select indicate the actual arrival | |
81 | of the out-of-band data, but only notification that it is pending. | |
82 | .PP | |
83 | In addition to the information passed, a logical mark is placed in | |
84 | the data stream to indicate the point at which the out | |
85 | of band data was sent. The remote login and remote shell | |
86 | applications use this facility to propagate signals between | |
87 | client and server processes. When a signal | |
88 | flushs any pending output from the remote process(es), all | |
89 | data up to the mark in the data stream is discarded. | |
90 | .PP | |
91 | To send an out of band message the MSG_OOB flag is supplied to | |
92 | a \fIsend\fP or \fIsendto\fP calls, | |
93 | while to receive out of band data MSG_OOB should be indicated | |
94 | when performing a \fIrecvfrom\fP or \fIrecv\fP call. | |
95 | To find out if the read pointer is currently pointing at | |
96 | the mark in the data stream, the SIOCATMARK ioctl is provided: | |
97 | .DS | |
98 | ioctl(s, SIOCATMARK, &yes); | |
99 | .DE | |
100 | If \fIyes\fP is a 1 on return, the next read will return data | |
101 | after the mark. Otherwise (assuming out of band data has arrived), | |
102 | the next read will provide data sent by the client prior | |
103 | to transmission of the out of band signal. The routine used | |
104 | in the remote login process to flush output on receipt of an | |
105 | interrupt or quit signal is shown in Figure 5. | |
106 | It reads the normal data up to the mark (to discard it), | |
107 | then reads the out-of-band byte. | |
108 | .KF | |
109 | .DS | |
110 | #include <sys/ioctl.h> | |
111 | #include <sys/file.h> | |
112 | ... | |
113 | oob() | |
114 | { | |
115 | int out = FWRITE, mark; | |
116 | char waste[BUFSIZ]; | |
117 | ||
118 | /* flush local terminal output */ | |
119 | ioctl(1, TIOCFLUSH, (char *)&out); | |
120 | for (;;) { | |
121 | if (ioctl(rem, SIOCATMARK, &mark) < 0) { | |
122 | perror("ioctl"); | |
123 | break; | |
124 | } | |
125 | if (mark) | |
126 | break; | |
127 | (void) read(rem, waste, sizeof (waste)); | |
128 | } | |
129 | if (recv(rem, &mark, 1, MSG_OOB) < 0) { | |
130 | perror("recv"); | |
131 | ... | |
132 | } | |
133 | ... | |
134 | } | |
135 | .DE | |
136 | .ce | |
137 | Figure 5. Flushing terminal I/O on receipt of out of band data. | |
138 | .sp | |
139 | .KE | |
140 | .PP | |
141 | A process may also read or peek at the out-of-band data | |
142 | without first reading up to the mark. | |
143 | This is more difficult when the underlying protocol delivers | |
144 | the urgent data in-band with the normal data, and only sends | |
145 | notification of its presence ahead of time (e.g., the TCP protocol | |
146 | used to implement streams in the Internet domain). | |
147 | With such protocols, the out-of-band byte may not yet have arrived | |
148 | when a \fIrecv\fP is done with the MSG_OOB flag. | |
149 | In that case, the call will return an error of EWOULDBLOCK. | |
150 | Worse, there may be enough in-band data in the input buffer | |
151 | that normal flow control prevents the peer from sending the urgent data | |
152 | until the buffer is cleared. | |
153 | The process must then read enough of the queued data | |
154 | that the urgent data may be delivered. | |
155 | .PP | |
156 | Certain programs that use multiple bytes of urgent data and must | |
157 | handle multiple urgent signals (e.g., \fItelnet\fP\|(1C)) | |
158 | need to retain the position of urgent data within the stream. | |
159 | This treatment is available as a socket-level option, SO_OOBINLINE; | |
160 | see \fIsetsockopt\fP\|(2) for usage. | |
161 | With this option, the position of urgent data (the \*(lqmark\*(rq) | |
162 | is retained, but the urgent data immediately follows the mark | |
163 | within the normal data stream returned without the MSG_OOB flag. | |
164 | Reception of multiple urgent indications causes the mark to move, | |
165 | but no out-of-band data are lost. | |
166 | .NH 2 | |
167 | Non-Blocking Sockets | |
168 | .PP | |
169 | It is occasionally convenient to make use of sockets | |
170 | which do not block; that is, I/O requests which | |
171 | cannot complete immediately and | |
172 | would therefore cause the process to be suspended awaiting completion are | |
173 | not executed, and an error code is returned. | |
174 | Once a socket has been created via | |
175 | the \fIsocket\fP call, it may be marked as non-blocking | |
176 | by \fIfcntl\fP as follows: | |
177 | .DS | |
178 | #include <fcntl.h> | |
179 | ... | |
180 | int s; | |
181 | ... | |
182 | s = socket(AF_INET, SOCK_STREAM, 0); | |
183 | ... | |
184 | if (fcntl(s, F_SETFL, FNDELAY) < 0) | |
185 | perror("fcntl F_SETFL, FNDELAY"); | |
186 | exit(1); | |
187 | } | |
188 | ... | |
189 | .DE | |
190 | .PP | |
191 | When performing non-blocking I/O on sockets, one must be | |
192 | careful to check for the error EWOULDBLOCK (stored in the | |
193 | global variable \fIerrno\fP), which occurs when | |
194 | an operation would normally block, but the socket it | |
195 | was performed on is marked as non-blocking. | |
196 | In particular, \fIaccept\fP, \fIconnect\fP, \fIsend\fP, \fIrecv\fP, | |
197 | \fIread\fP, and \fIwrite\fP can | |
198 | all return EWOULDBLOCK, and processes should be prepared | |
199 | to deal with such return codes. | |
200 | If an operation such as a \fIsend\fP cannot be done in its entirety, | |
201 | but partial writes are sensible (for example, when using a stream socket), | |
202 | the data that can be sent immediately will be processed, | |
203 | and the return value will indicate the amount actually sent. | |
204 | .NH 2 | |
205 | Interrupt driven socket I/O | |
206 | .PP | |
207 | The SIGIO signal allows a process to be notified | |
208 | via a signal when a socket (or more generally, a file | |
209 | descriptor) has data waiting to be read. Use of | |
210 | the SIGIO facility requires three steps: First, | |
211 | the process must set up a SIGIO signal handler | |
212 | by use of the \fIsignal\fP or \fIsigvec\fP calls. Second, | |
213 | it must set the process id or process group id which is to receive | |
214 | notification of pending input to its own process id, | |
215 | or the process group id of its process group (note that | |
216 | the default process group of a socket is group zero). | |
217 | This is accomplished by use of an \fIfcntl\fP call. | |
218 | Third, it must enable asynchronous notification of pending I/O requests | |
219 | with another \fIfcntl\fP call. Sample code to | |
220 | allow a given process to receive information on | |
221 | pending I/O requests as they occur for a socket \fIs\fP | |
222 | is given in Figure 6. With the addition of a handler for SIGURG, | |
223 | this code can also be used to prepare for receipt of SIGURG signals. | |
224 | .KF | |
225 | .DS | |
226 | #include <fcntl.h> | |
227 | ... | |
228 | int io_handler(); | |
229 | ... | |
230 | signal(SIGIO, io_handler); | |
231 | ||
232 | /* Set the process receiving SIGIO/SIGURG signals to us */ | |
233 | ||
234 | if (fcntl(s, F_SETOWN, getpid()) < 0) { | |
235 | perror("fcntl F_SETOWN"); | |
236 | exit(1); | |
237 | } | |
238 | ||
239 | /* Allow receipt of asynchronous I/O signals */ | |
240 | ||
241 | if (fcntl(s, F_SETFL, FASYNC) < 0) { | |
242 | perror("fcntl F_SETFL, FASYNC"); | |
243 | exit(1); | |
244 | } | |
245 | .DE | |
246 | .ce | |
247 | Figure 6. Use of asynchronous notification of I/O requests. | |
248 | .sp | |
249 | .KE | |
250 | .NH 2 | |
251 | Signals and process groups | |
252 | .PP | |
253 | Due to the existence of the SIGURG and SIGIO signals each socket has an | |
254 | associated process number, just as is done for terminals. | |
255 | This value is initialized to zero, | |
256 | but may be redefined at a later time with the F_SETOWN | |
257 | \fIfcntl\fP, such as was done in the code above for SIGIO. | |
258 | To set the socket's process id for signals, positive arguments | |
259 | should be given to the \fIfcntl\fP call. To set the socket's | |
260 | process group for signals, negative arguments should be | |
261 | passed to \fIfcntl\fP. Note that the process number indicates | |
262 | either the associated process id or the associated process | |
263 | group; it is impossible to specify both at the same time. | |
264 | A similar \fIfcntl\fP, F_GETOWN, is available for determining the | |
265 | current process number of a socket. | |
266 | .PP | |
267 | Another signal which is useful when constructing server processes | |
268 | is SIGCHLD. This signal is delivered to a process when any | |
269 | child processes have changed state. Normally servers use | |
270 | the signal to \*(lqreap\*(rq child processes that have exited | |
271 | without explicitly awaiting their termination | |
272 | or periodic polling for exit status. | |
273 | For example, the remote login server loop shown in Figure 2 | |
274 | may be augmented as shown in Figure 7. | |
275 | .KF | |
276 | .DS | |
277 | int reaper(); | |
278 | ... | |
279 | signal(SIGCHLD, reaper); | |
280 | listen(f, 5); | |
281 | for (;;) { | |
282 | int g, len = sizeof (from); | |
283 | ||
284 | g = accept(f, (struct sockaddr *)&from, &len,); | |
285 | if (g < 0) { | |
286 | if (errno != EINTR) | |
287 | syslog(LOG_ERR, "rlogind: accept: %m"); | |
288 | continue; | |
289 | } | |
290 | ... | |
291 | } | |
292 | ... | |
293 | #include <wait.h> | |
294 | reaper() | |
295 | { | |
296 | union wait status; | |
297 | ||
298 | while (wait3(&status, WNOHANG, 0) > 0) | |
299 | ; | |
300 | } | |
301 | .DE | |
302 | .sp | |
303 | .ce | |
304 | Figure 7. Use of the SIGCHLD signal. | |
305 | .sp | |
306 | .KE | |
307 | .PP | |
308 | If the parent server process fails to reap its children, | |
309 | a large number of \*(lqzombie\*(rq processes may be created. | |
310 | .NH 2 | |
311 | Pseudo terminals | |
312 | .PP | |
313 | Many programs will not function properly without a terminal | |
314 | for standard input and output. Since sockets do not provide | |
315 | the semantics of terminals, | |
316 | it is often necessary to have a process communicating over | |
317 | the network do so through a \fIpseudo-terminal\fP. A pseudo- | |
318 | terminal is actually a pair of devices, master and slave, | |
319 | which allow a process to serve as an active agent in communication | |
320 | between processes and users. Data written on the slave side | |
321 | of a pseudo-terminal is supplied as input to a process reading | |
322 | from the master side, while data written on the master side are | |
323 | processed as terminal input for the slave. | |
324 | In this way, the process manipulating | |
325 | the master side of the pseudo-terminal has control over the | |
326 | information read and written on the slave side | |
327 | as if it were manipulating the keyboard and reading the screen | |
328 | on a real terminal. | |
329 | The purpose of this abstraction is to | |
330 | preserve terminal semantics over a network connection\(em | |
331 | that is, the slave side appears as a normal terminal to | |
332 | any process reading from or writing to it. | |
333 | .PP | |
334 | For example, the remote | |
335 | login server uses pseudo-terminals for remote login sessions. | |
336 | A user logging in to a machine across the network is provided | |
337 | a shell with a slave pseudo-terminal as standard input, output, | |
338 | and error. The server process then handles the communication | |
339 | between the programs invoked by the remote shell and the user's | |
340 | local client process. | |
341 | When a user sends a character that generates an interrupt | |
342 | on the remote machine that flushes terminal output, | |
343 | the pseudo-terminal generates a control message for the server process. | |
344 | The server then sends an out of band message | |
345 | to the client process to signal a flush of data at the real terminal | |
346 | and on the intervening data buffered in the network. | |
347 | .PP | |
348 | Under 4.3BSD, the name of the slave side of a pseudo-terminal is of the form | |
349 | \fI/dev/ttyxy\fP, where \fIx\fP is a single letter | |
350 | starting at `p' and continuing to `t'. | |
351 | \fIy\fP is a hexadecimal digit (i.e., a single | |
352 | character in the range 0 through 9 or `a' through `f'). | |
353 | The master side of a pseudo-terminal is \fI/dev/ptyxy\fP, | |
354 | where \fIx\fP and \fIy\fP correspond to the | |
355 | slave side of the pseudo-terminal. | |
356 | .PP | |
357 | In general, the method of obtaining a pair of master and | |
358 | slave pseudo-terminals is to | |
359 | find a pseudo-terminal which | |
360 | is not currently in use. | |
361 | The master half of a pseudo-terminal is a single-open device; | |
362 | thus, each master may be opened in turn until an open succeeds. | |
363 | The slave side of the pseudo-terminal is then opened, | |
364 | and is set to the proper terminal modes if necessary. | |
365 | The process then \fIfork\fPs; the child closes | |
366 | the master side of the pseudo-terminal, and \fIexec\fPs the | |
367 | appropriate program. Meanwhile, the parent closes the | |
368 | slave side of the pseudo-terminal and begins reading and | |
369 | writing from the master side. Sample code making use of | |
370 | pseudo-terminals is given in Figure 8; this code assumes | |
371 | that a connection on a socket \fIs\fP exists, connected | |
372 | to a peer who wants a service of some kind, and that the | |
373 | process has disassociated itself from any previous controlling terminal. | |
374 | .KF | |
375 | .DS | |
376 | gotpty = 0; | |
377 | for (c = 'p'; !gotpty && c <= 's'; c++) { | |
378 | line = "/dev/ptyXX"; | |
379 | line[sizeof("/dev/pty")-1] = c; | |
380 | line[sizeof("/dev/ptyp")-1] = '0'; | |
381 | if (stat(line, &statbuf) < 0) | |
382 | break; | |
383 | for (i = 0; i < 16; i++) { | |
384 | line[sizeof("/dev/ptyp")-1] = "0123456789abcdef"[i]; | |
385 | master = open(line, O_RDWR); | |
386 | if (master > 0) { | |
387 | gotpty = 1; | |
388 | break; | |
389 | } | |
390 | } | |
391 | } | |
392 | if (!gotpty) { | |
393 | syslog(LOG_ERR, "All network ports in use"); | |
394 | exit(1); | |
395 | } | |
396 | ||
397 | line[sizeof("/dev/")-1] = 't'; | |
398 | slave = open(line, O_RDWR); /* \fIslave\fP is now slave side */ | |
399 | if (slave < 0) { | |
400 | syslog(LOG_ERR, "Cannot open slave pty %s", line); | |
401 | exit(1); | |
402 | } | |
403 | ||
404 | ioctl(slave, TIOCGETP, &b); /* Set slave tty modes */ | |
405 | b.sg_flags = CRMOD|XTABS|ANYP; | |
406 | ioctl(slave, TIOCSETP, &b); | |
407 | ||
408 | i = fork(); | |
409 | if (i < 0) { | |
410 | syslog(LOG_ERR, "fork: %m"); | |
411 | exit(1); | |
412 | } else if (i) { /* Parent */ | |
413 | close(slave); | |
414 | ... | |
415 | } else { /* Child */ | |
416 | (void) close(s); | |
417 | (void) close(master); | |
418 | dup2(slave, 0); | |
419 | dup2(slave, 1); | |
420 | dup2(slave, 2); | |
421 | if (slave > 2) | |
422 | (void) close(slave); | |
423 | ... | |
424 | } | |
425 | .DE | |
426 | .ce | |
427 | Figure 8. Creation and use of a pseudo terminal | |
428 | .sp | |
429 | .KE | |
430 | .NH 2 | |
431 | Selecting specific protocols | |
432 | .PP | |
433 | If the third argument to the \fIsocket\fP call is 0, | |
434 | \fIsocket\fP will select a default protocol to use with | |
435 | the returned socket of the type requested. | |
436 | The default protocol is usually correct, and alternate choices are not | |
437 | usually available. | |
438 | However, when using ``raw'' sockets to communicate directly with | |
439 | lower-level protocols or hardware interfaces, | |
440 | the protocol argument may be important for setting up demultiplexing. | |
441 | For example, raw sockets in the Internet family may be used to implement | |
442 | a new protocol above IP, and the socket will receive packets | |
443 | only for the protocol specified. | |
444 | To obtain a particular protocol one determines the protocol number | |
445 | as defined within the communication domain. For the Internet | |
446 | domain one may use one of the library routines | |
447 | discussed in section 3, such as \fIgetprotobyname\fP: | |
448 | .DS | |
449 | #include <sys/types.h> | |
450 | #include <sys/socket.h> | |
451 | #include <netinet/in.h> | |
452 | #include <netdb.h> | |
453 | ... | |
454 | pp = getprotobyname("newtcp"); | |
455 | s = socket(AF_INET, SOCK_STREAM, pp->p_proto); | |
456 | .DE | |
457 | This would result in a socket \fIs\fP using a stream | |
458 | based connection, but with protocol type of ``newtcp'' | |
459 | instead of the default ``tcp.'' | |
460 | .PP | |
461 | In the NS domain, the available socket protocols are defined in | |
462 | <\fInetns/ns.h\fP>. To create a raw socket for Xerox Error Protocol | |
463 | messages, one might use: | |
464 | .DS | |
465 | #include <sys/types.h> | |
466 | #include <sys/socket.h> | |
467 | #include <netns/ns.h> | |
468 | ... | |
469 | s = socket(AF_NS, SOCK_RAW, NSPROTO_ERROR); | |
470 | .DE | |
471 | .NH 2 | |
472 | Address binding | |
473 | .PP | |
474 | As was mentioned in section 2, | |
475 | binding addresses to sockets in the Internet and NS domains can be | |
476 | fairly complex. As a brief reminder, these associations | |
477 | are composed of local and foreign | |
478 | addresses, and local and foreign ports. Port numbers are | |
479 | allocated out of separate spaces, one for each system and one | |
480 | for each domain on that system. | |
481 | Through the \fIbind\fP system call, a | |
482 | process may specify half of an association, the | |
483 | <local address, local port> part, while the | |
484 | \fIconnect\fP | |
485 | and \fIaccept\fP | |
486 | primitives are used to complete a socket's association by | |
487 | specifying the <foreign address, foreign port> part. | |
488 | Since the association is created in two steps the association | |
489 | uniqueness requirement indicated previously could be violated unless | |
490 | care is taken. Further, it is unrealistic to expect user | |
491 | programs to always know proper values to use for the local address | |
492 | and local port since a host may reside on multiple networks and | |
493 | the set of allocated port numbers is not directly accessible | |
494 | to a user. | |
495 | .PP | |
496 | To simplify local address binding in the Internet domain the notion of a | |
497 | \*(lqwildcard\*(rq address has been provided. When an address | |
498 | is specified as INADDR_ANY (a manifest constant defined in | |
499 | <netinet/in.h>), the system interprets the address as | |
500 | \*(lqany valid address\*(rq. For example, to bind a specific | |
501 | port number to a socket, but leave the local address unspecified, | |
502 | the following code might be used: | |
503 | .DS | |
504 | #include <sys/types.h> | |
505 | #include <netinet/in.h> | |
506 | ... | |
507 | struct sockaddr_in sin; | |
508 | ... | |
509 | s = socket(AF_INET, SOCK_STREAM, 0); | |
510 | sin.sin_family = AF_INET; | |
511 | sin.sin_addr.s_addr = htonl(INADDR_ANY); | |
512 | sin.sin_port = htons(MYPORT); | |
513 | bind(s, (struct sockaddr *) &sin, sizeof (sin)); | |
514 | .DE | |
515 | Sockets with wildcarded local addresses may receive messages | |
516 | directed to the specified port number, and sent to any | |
517 | of the possible addresses assigned to a host. For example, | |
518 | if a host has addresses 128.32.0.4 and 10.0.0.78, and a socket is bound as | |
519 | above, the process will be | |
520 | able to accept connection requests which are addressed to | |
521 | 128.32.0.4 or 10.0.0.78. | |
522 | If a server process wished to only allow hosts on a | |
523 | given network connect to it, it would bind | |
524 | the address of the host on the appropriate network. | |
525 | .PP | |
526 | In a similar fashion, a local port may be left unspecified | |
527 | (specified as zero), in which case the system will select an | |
528 | appropriate port number for it. This shortcut will work | |
529 | both in the Internet and NS domains. For example, to | |
530 | bind a specific local address to a socket, but to leave the | |
531 | local port number unspecified: | |
532 | .DS | |
533 | hp = gethostbyname(hostname); | |
534 | if (hp == NULL) { | |
535 | ... | |
536 | } | |
537 | bcopy(hp->h_addr, (char *) sin.sin_addr, hp->h_length); | |
538 | sin.sin_port = htons(0); | |
539 | bind(s, (struct sockaddr *) &sin, sizeof (sin)); | |
540 | .DE | |
541 | The system selects the local port number based on two criteria. | |
542 | The first is that on 4BSD systems, | |
543 | Internet ports below IPPORT_RESERVED (1024) (for the Xerox domain, | |
544 | 0 through 3000) are reserved | |
545 | for privileged users (i.e., the super user); | |
546 | Internet ports above IPPORT_USERRESERVED (50000) are reserved | |
547 | for non-privileged servers. The second is | |
548 | that the port number is not currently bound to some other | |
549 | socket. In order to find a free Internet port number in the privileged | |
550 | range the \fIrresvport\fP library routine may be used as follows | |
551 | to return a stream socket in with a privileged port number: | |
552 | .DS | |
553 | int lport = IPPORT_RESERVED \- 1; | |
554 | int s; | |
78ed81a3 | 555 | \&... |
15637ed4 RG |
556 | s = rresvport(&lport); |
557 | if (s < 0) { | |
558 | if (errno == EAGAIN) | |
559 | fprintf(stderr, "socket: all ports in use\en"); | |
560 | else | |
561 | perror("rresvport: socket"); | |
562 | ... | |
563 | } | |
564 | .DE | |
565 | The restriction on allocating ports was done to allow processes | |
566 | executing in a \*(lqsecure\*(rq environment to perform authentication | |
567 | based on the originating address and port number. For example, | |
568 | the \fIrlogin\fP(1) command allows users to log in across a network | |
569 | without being asked for a password, if two conditions hold: | |
570 | First, the name of the system the user | |
571 | is logging in from is in the file | |
572 | \fI/etc/hosts.equiv\fP on the system he is logging | |
573 | in to (or the system name and the user name are in | |
574 | the user's \fI.rhosts\fP file in the user's home | |
575 | directory), and second, that the user's rlogin | |
576 | process is coming from a privileged port on the machine from which he is | |
577 | logging. The port number and network address of the | |
578 | machine from which the user is logging in can be determined either | |
579 | by the \fIfrom\fP result of the \fIaccept\fP call, or | |
580 | from the \fIgetpeername\fP call. | |
581 | .PP | |
582 | In certain cases the algorithm used by the system in selecting | |
583 | port numbers is unsuitable for an application. This is because | |
584 | associations are created in a two step process. For example, | |
585 | the Internet file transfer protocol, FTP, specifies that data | |
586 | connections must always originate from the same local port. However, | |
587 | duplicate associations are avoided by connecting to different foreign | |
588 | ports. In this situation the system would disallow binding the | |
589 | same local address and port number to a socket if a previous data | |
590 | connection's socket still existed. To override the default port | |
591 | selection algorithm, an option call must be performed prior | |
592 | to address binding: | |
593 | .DS | |
594 | ... | |
595 | int on = 1; | |
596 | ... | |
597 | setsockopt(s, SOL_SOCKET, SO_REUSEADDR, &on, sizeof(on)); | |
598 | bind(s, (struct sockaddr *) &sin, sizeof (sin)); | |
599 | .DE | |
600 | With the above call, local addresses may be bound which | |
601 | are already in use. This does not violate the uniqueness | |
602 | requirement as the system still checks at connect time to | |
603 | be sure any other sockets with the same local address and | |
604 | port do not have the same foreign address and port. | |
605 | If the association already exists, the error EADDRINUSE is returned. | |
606 | .NH 2 | |
607 | Broadcasting and determining network configuration | |
608 | .PP | |
609 | By using a datagram socket, it is possible to send broadcast | |
610 | packets on many networks supported by the system. | |
611 | The network itself must support broadcast; the system | |
612 | provides no simulation of broadcast in software. | |
613 | Broadcast messages can place a high load on a network since they force | |
614 | every host on the network to service them. Consequently, | |
615 | the ability to send broadcast packets has been limited | |
616 | to sockets which are explicitly marked as allowing broadcasting. | |
617 | Broadcast is typically used for one of two reasons: | |
618 | it is desired to find a resource on a local network without prior | |
619 | knowledge of its address, | |
620 | or important functions such as routing require that information | |
621 | be sent to all accessible neighbors. | |
622 | .PP | |
623 | To send a broadcast message, a datagram socket | |
624 | should be created: | |
625 | .DS | |
626 | s = socket(AF_INET, SOCK_DGRAM, 0); | |
627 | .DE | |
628 | or | |
629 | .DS | |
630 | s = socket(AF_NS, SOCK_DGRAM, 0); | |
631 | .DE | |
632 | The socket is marked as allowing broadcasting, | |
633 | .DS | |
634 | int on = 1; | |
635 | ||
636 | setsockopt(s, SOL_SOCKET, SO_BROADCAST, &on, sizeof (on)); | |
637 | .DE | |
638 | and at least a port number should be bound to the socket: | |
639 | .DS | |
640 | sin.sin_family = AF_INET; | |
641 | sin.sin_addr.s_addr = htonl(INADDR_ANY); | |
642 | sin.sin_port = htons(MYPORT); | |
643 | bind(s, (struct sockaddr *) &sin, sizeof (sin)); | |
644 | .DE | |
645 | or, for the NS domain, | |
646 | .DS | |
647 | sns.sns_family = AF_NS; | |
648 | netnum = htonl(net); | |
649 | sns.sns_addr.x_net = *(union ns_net *) &netnum; /* insert net number */ | |
650 | sns.sns_addr.x_port = htons(MYPORT); | |
651 | bind(s, (struct sockaddr *) &sns, sizeof (sns)); | |
652 | .DE | |
653 | The destination address of the message to be broadcast | |
654 | depends on the network(s) on which the message is to be broadcast. | |
655 | The Internet domain supports a shorthand notation for broadcast | |
656 | on the local network, the address INADDR_BROADCAST (defined in | |
657 | <\fInetinet/in.h\fP>. | |
658 | To determine the list of addresses for all reachable neighbors | |
659 | requires knowledge of the networks to which the host is connected. | |
660 | Since this information should | |
661 | be obtained in a host-independent fashion and may be impossible | |
662 | to derive, 4.3BSD provides a method of | |
663 | retrieving this information from the system data structures. | |
664 | The SIOCGIFCONF \fIioctl\fP call returns the interface | |
665 | configuration of a host in the form of a | |
666 | single \fIifconf\fP structure; this structure contains | |
667 | a ``data area'' which is made up of an array of | |
668 | of \fIifreq\fP structures, one for each network interface | |
669 | to which the host is connected. | |
670 | These structures are defined in | |
671 | \fI<net/if.h>\fP as follows: | |
672 | .DS | |
673 | .if t .ta .5i 1.0i 1.5i 3.5i | |
674 | .if n .ta .7i 1.4i 2.1i 3.4i | |
675 | struct ifconf { | |
676 | int ifc_len; /* size of associated buffer */ | |
677 | union { | |
678 | caddr_t ifcu_buf; | |
679 | struct ifreq *ifcu_req; | |
680 | } ifc_ifcu; | |
681 | }; | |
682 | ||
683 | #define ifc_buf ifc_ifcu.ifcu_buf /* buffer address */ | |
684 | #define ifc_req ifc_ifcu.ifcu_req /* array of structures returned */ | |
685 | ||
686 | #define IFNAMSIZ 16 | |
687 | ||
688 | struct ifreq { | |
689 | char ifr_name[IFNAMSIZ]; /* if name, e.g. "en0" */ | |
690 | union { | |
691 | struct sockaddr ifru_addr; | |
692 | struct sockaddr ifru_dstaddr; | |
693 | struct sockaddr ifru_broadaddr; | |
694 | short ifru_flags; | |
695 | caddr_t ifru_data; | |
696 | } ifr_ifru; | |
697 | }; | |
698 | ||
699 | .if t .ta \w' #define'u +\w' ifr_broadaddr'u +\w' ifr_ifru.ifru_broadaddr'u | |
700 | #define ifr_addr ifr_ifru.ifru_addr /* address */ | |
701 | #define ifr_dstaddr ifr_ifru.ifru_dstaddr /* other end of p-to-p link */ | |
702 | #define ifr_broadaddr ifr_ifru.ifru_broadaddr /* broadcast address */ | |
703 | #define ifr_flags ifr_ifru.ifru_flags /* flags */ | |
704 | #define ifr_data ifr_ifru.ifru_data /* for use by interface */ | |
705 | .DE | |
706 | The actual call which obtains the | |
707 | interface configuration is | |
708 | .DS | |
709 | struct ifconf ifc; | |
710 | char buf[BUFSIZ]; | |
711 | ||
712 | ifc.ifc_len = sizeof (buf); | |
713 | ifc.ifc_buf = buf; | |
714 | if (ioctl(s, SIOCGIFCONF, (char *) &ifc) < 0) { | |
715 | ... | |
716 | } | |
717 | .DE | |
718 | After this call \fIbuf\fP will contain one \fIifreq\fP structure for | |
719 | each network to which the host is connected, and | |
720 | \fIifc.ifc_len\fP will have been modified to reflect the number | |
721 | of bytes used by the \fIifreq\fP structures. | |
722 | .PP | |
723 | For each structure | |
724 | there exists a set of ``interface flags'' which tell | |
725 | whether the network corresponding to that interface is | |
726 | up or down, point to point or broadcast, etc. The | |
727 | SIOCGIFFLAGS \fIioctl\fP retrieves these | |
728 | flags for an interface specified by an \fIifreq\fP | |
729 | structure as follows: | |
730 | .DS | |
731 | struct ifreq *ifr; | |
732 | ||
733 | ifr = ifc.ifc_req; | |
734 | ||
735 | for (n = ifc.ifc_len / sizeof (struct ifreq); --n >= 0; ifr++) { | |
736 | /* | |
737 | * We must be careful that we don't use an interface | |
738 | * devoted to an address family other than those intended; | |
739 | * if we were interested in NS interfaces, the | |
740 | * AF_INET would be AF_NS. | |
741 | */ | |
742 | if (ifr->ifr_addr.sa_family != AF_INET) | |
743 | continue; | |
744 | if (ioctl(s, SIOCGIFFLAGS, (char *) ifr) < 0) { | |
745 | ... | |
746 | } | |
747 | /* | |
748 | * Skip boring cases. | |
749 | */ | |
750 | if ((ifr->ifr_flags & IFF_UP) == 0 || | |
751 | (ifr->ifr_flags & IFF_LOOPBACK) || | |
752 | (ifr->ifr_flags & (IFF_BROADCAST | IFF_POINTTOPOINT)) == 0) | |
753 | continue; | |
754 | .DE | |
755 | .PP | |
756 | Once the flags have been obtained, the broadcast address | |
757 | must be obtained. In the case of broadcast networks this is | |
758 | done via the SIOCGIFBRDADDR \fIioctl\fP, while for point-to-point networks | |
759 | the address of the destination host is obtained with SIOCGIFDSTADDR. | |
760 | .DS | |
761 | struct sockaddr dst; | |
762 | ||
763 | if (ifr->ifr_flags & IFF_POINTTOPOINT) { | |
764 | if (ioctl(s, SIOCGIFDSTADDR, (char *) ifr) < 0) { | |
765 | ... | |
766 | } | |
767 | bcopy((char *) ifr->ifr_dstaddr, (char *) &dst, sizeof (ifr->ifr_dstaddr)); | |
768 | } else if (ifr->ifr_flags & IFF_BROADCAST) { | |
769 | if (ioctl(s, SIOCGIFBRDADDR, (char *) ifr) < 0) { | |
770 | ... | |
771 | } | |
772 | bcopy((char *) ifr->ifr_broadaddr, (char *) &dst, sizeof (ifr->ifr_broadaddr)); | |
773 | } | |
774 | .DE | |
775 | .PP | |
776 | After the appropriate \fIioctl\fP's have obtained the broadcast | |
777 | or destination address (now in \fIdst\fP), the \fIsendto\fP call may be | |
778 | used: | |
779 | .DS | |
780 | sendto(s, buf, buflen, 0, (struct sockaddr *)&dst, sizeof (dst)); | |
781 | } | |
782 | .DE | |
783 | In the above loop one \fIsendto\fP occurs for every | |
784 | interface to which the host is connected that supports the notion of | |
785 | broadcast or point-to-point addressing. | |
786 | If a process only wished to send broadcast | |
787 | messages on a given network, code similar to that outlined above | |
788 | would be used, but the loop would need to find the | |
789 | correct destination address. | |
790 | .PP | |
791 | Received broadcast messages contain the senders address | |
792 | and port, as datagram sockets are bound before | |
793 | a message is allowed to go out. | |
794 | .NH 2 | |
795 | Socket Options | |
796 | .PP | |
797 | It is possible to set and get a number of options on sockets | |
798 | via the \fIsetsockopt\fP and \fIgetsockopt\fP system calls. | |
799 | These options include such things as marking a socket for | |
800 | broadcasting, not to route, to linger on close, etc. | |
801 | The general forms of the calls are: | |
802 | .DS | |
803 | setsockopt(s, level, optname, optval, optlen); | |
804 | .DE | |
805 | and | |
806 | .DS | |
807 | getsockopt(s, level, optname, optval, optlen); | |
808 | .DE | |
809 | .PP | |
810 | The parameters to the calls are as follows: \fIs\fP | |
811 | is the socket on which the option is to be applied. | |
812 | \fILevel\fP specifies the protocol layer on which the | |
813 | option is to be applied; in most cases this is | |
814 | the ``socket level'', indicated by the symbolic constant | |
815 | SOL_SOCKET, defined in \fI<sys/socket.h>.\fP | |
816 | The actual option is specified in \fIoptname\fP, and is | |
817 | a symbolic constant also defined in \fI<sys/socket.h>\fP. | |
818 | \fIOptval\fP and \fIOptlen\fP point to the value of the | |
819 | option (in most cases, whether the option is to be turned | |
820 | on or off), and the length of the value of the option, | |
821 | respectively. | |
822 | For \fIgetsockopt\fP, \fIoptlen\fP is | |
823 | a value-result parameter, initially set to the size of | |
824 | the storage area pointed to by \fIoptval\fP, and modified | |
825 | upon return to indicate the actual amount of storage used. | |
826 | .PP | |
827 | An example should help clarify things. It is sometimes | |
828 | useful to determine the type (e.g., stream, datagram, etc.) | |
829 | of an existing socket; programs | |
830 | under \fIinetd\fP (described below) may need to perform this | |
831 | task. This can be accomplished as follows via the | |
832 | SO_TYPE socket option and the \fIgetsockopt\fP call: | |
833 | .DS | |
834 | #include <sys/types.h> | |
835 | #include <sys/socket.h> | |
836 | ||
837 | int type, size; | |
838 | ||
839 | size = sizeof (int); | |
840 | ||
841 | if (getsockopt(s, SOL_SOCKET, SO_TYPE, (char *) &type, &size) < 0) { | |
842 | ... | |
843 | } | |
844 | .DE | |
845 | After the \fIgetsockopt\fP call, \fItype\fP will be set | |
846 | to the value of the socket type, as defined in | |
847 | \fI<sys/socket.h>\fP. If, for example, the socket were | |
848 | a datagram socket, \fItype\fP would have the value | |
849 | corresponding to SOCK_DGRAM. | |
850 | .NH 2 | |
851 | NS Packet Sequences | |
852 | .PP | |
853 | The semantics of NS connections demand that | |
854 | the user both be able to look inside the network header associated | |
855 | with any incoming packet and be able to specify what should go | |
856 | in certain fields of an outgoing packet. | |
857 | Using different calls to \fIsetsockopt\fP, it is possible | |
858 | to indicate whether prototype headers will be associated by | |
859 | the user with each outgoing packet (SO_HEADERS_ON_OUTPUT), | |
860 | to indicate whether the headers received by the system should be | |
861 | delivered to the user (SO_HEADERS_ON_INPUT), or to indicate | |
862 | default information that should be associated with all | |
863 | outgoing packets on a given socket (SO_DEFAULT_HEADERS). | |
864 | .PP | |
865 | The contents of a SPP header (minus the IDP header) are: | |
866 | .DS | |
867 | .if t .ta \w" #define"u +\w" u_short"u +2.0i | |
868 | struct sphdr { | |
869 | u_char sp_cc; /* connection control */ | |
870 | #define SP_SP 0x80 /* system packet */ | |
871 | #define SP_SA 0x40 /* send acknowledgement */ | |
872 | #define SP_OB 0x20 /* attention (out of band data) */ | |
873 | #define SP_EM 0x10 /* end of message */ | |
874 | u_char sp_dt; /* datastream type */ | |
875 | u_short sp_sid; /* source connection identifier */ | |
876 | u_short sp_did; /* destination connection identifier */ | |
877 | u_short sp_seq; /* sequence number */ | |
878 | u_short sp_ack; /* acknowledge number */ | |
879 | u_short sp_alo; /* allocation number */ | |
880 | }; | |
881 | .DE | |
882 | Here, the items of interest are the \fIdatastream type\fP and | |
883 | the \fIconnection control\fP fields. The semantics of the | |
884 | datastream type are defined by the application(s) in question; | |
885 | the value of this field is, by default, zero, but it can be | |
886 | used to indicate things such as Xerox's Bulk Data Transfer | |
887 | Protocol (in which case it is set to one). The connection control | |
888 | field is a mask of the flags defined just below it. The user may | |
889 | set or clear the end-of-message bit to indicate | |
890 | that a given message is the last of a given substream type, | |
891 | or may set/clear the attention bit as an alternate way to | |
892 | indicate that a packet should be sent out-of-band. | |
893 | As an example, to associate prototype headers with outgoing | |
894 | SPP packets, consider: | |
895 | .DS | |
896 | #include <sys/types.h> | |
897 | #include <sys/socket.h> | |
898 | #include <netns/ns.h> | |
899 | #include <netns/sp.h> | |
900 | ... | |
901 | struct sockaddr_ns sns, to; | |
902 | int s, on = 1; | |
903 | struct databuf { | |
904 | struct sphdr proto_spp; /* prototype header */ | |
905 | char buf[534]; /* max. possible data by Xerox std. */ | |
906 | } buf; | |
907 | ... | |
908 | s = socket(AF_NS, SOCK_SEQPACKET, 0); | |
909 | ... | |
910 | bind(s, (struct sockaddr *) &sns, sizeof (sns)); | |
911 | setsockopt(s, NSPROTO_SPP, SO_HEADERS_ON_OUTPUT, &on, sizeof(on)); | |
912 | ... | |
913 | buf.proto_spp.sp_dt = 1; /* bulk data */ | |
914 | buf.proto_spp.sp_cc = SP_EM; /* end-of-message */ | |
915 | strcpy(buf.buf, "hello world\en"); | |
916 | sendto(s, (char *) &buf, sizeof(struct sphdr) + strlen("hello world\en"), | |
917 | (struct sockaddr *) &to, sizeof(to)); | |
918 | ... | |
919 | .DE | |
920 | Note that one must be careful when writing headers; if the prototype | |
921 | header is not written with the data with which it is to be associated, | |
922 | the kernel will treat the first few bytes of the data as the | |
923 | header, with unpredictable results. | |
924 | To turn off the above association, and to indicate that packet | |
925 | headers received by the system should be passed up to the user, | |
926 | one might use: | |
927 | .DS | |
928 | #include <sys/types.h> | |
929 | #include <sys/socket.h> | |
930 | #include <netns/ns.h> | |
931 | #include <netns/sp.h> | |
932 | ... | |
933 | struct sockaddr sns; | |
934 | int s, on = 1, off = 0; | |
935 | ... | |
936 | s = socket(AF_NS, SOCK_SEQPACKET, 0); | |
937 | ... | |
938 | bind(s, (struct sockaddr *) &sns, sizeof (sns)); | |
939 | setsockopt(s, NSPROTO_SPP, SO_HEADERS_ON_OUTPUT, &off, sizeof(off)); | |
940 | setsockopt(s, NSPROTO_SPP, SO_HEADERS_ON_INPUT, &on, sizeof(on)); | |
941 | ... | |
942 | .DE | |
943 | .PP | |
944 | Output is handled somewhat differently in the IDP world. | |
945 | The header of an IDP-level packet looks like: | |
946 | .DS | |
947 | .if t .ta \w'struct 'u +\w" struct ns_addr"u +2.0i | |
948 | struct idp { | |
949 | u_short idp_sum; /* Checksum */ | |
950 | u_short idp_len; /* Length, in bytes, including header */ | |
951 | u_char idp_tc; /* Transport Control (i.e., hop count) */ | |
952 | u_char idp_pt; /* Packet Type (i.e., level 2 protocol) */ | |
953 | struct ns_addr idp_dna; /* Destination Network Address */ | |
954 | struct ns_addr idp_sna; /* Source Network Address */ | |
955 | }; | |
956 | .DE | |
957 | The primary field of interest in an IDP header is the \fIpacket type\fP | |
958 | field. The standard values for this field are (as defined | |
959 | in <\fInetns/ns.h\fP>): | |
960 | .DS | |
961 | .if t .ta \w" #define"u +\w" NSPROTO_ERROR"u +1.0i | |
962 | #define NSPROTO_RI 1 /* Routing Information */ | |
963 | #define NSPROTO_ECHO 2 /* Echo Protocol */ | |
964 | #define NSPROTO_ERROR 3 /* Error Protocol */ | |
965 | #define NSPROTO_PE 4 /* Packet Exchange */ | |
966 | #define NSPROTO_SPP 5 /* Sequenced Packet */ | |
967 | .DE | |
968 | For SPP connections, the contents of this field are | |
969 | automatically set to NSPROTO_SPP; for IDP packets, | |
970 | this value defaults to zero, which means ``unknown''. | |
971 | .PP | |
972 | Setting the value of that field with SO_DEFAULT_HEADERS is | |
973 | easy: | |
974 | .DS | |
975 | #include <sys/types.h> | |
976 | #include <sys/socket.h> | |
977 | #include <netns/ns.h> | |
978 | #include <netns/idp.h> | |
979 | ... | |
980 | struct sockaddr sns; | |
981 | struct idp proto_idp; /* prototype header */ | |
982 | int s, on = 1; | |
983 | ... | |
984 | s = socket(AF_NS, SOCK_DGRAM, 0); | |
985 | ... | |
986 | bind(s, (struct sockaddr *) &sns, sizeof (sns)); | |
987 | proto_idp.idp_pt = NSPROTO_PE; /* packet exchange */ | |
988 | setsockopt(s, NSPROTO_IDP, SO_DEFAULT_HEADERS, (char *) &proto_idp, | |
989 | sizeof(proto_idp)); | |
990 | ... | |
991 | .DE | |
992 | .PP | |
993 | Using SO_HEADERS_ON_OUTPUT is somewhat more difficult. When | |
994 | SO_HEADERS_ON_OUTPUT is turned on for an IDP socket, the socket | |
995 | becomes (for all intents and purposes) a raw socket. In this | |
996 | case, all the fields of the prototype header (except the | |
997 | length and checksum fields, which are computed by the kernel) | |
998 | must be filled in correctly in order for the socket to send and | |
999 | receive data in a sensible manner. To be more specific, the | |
1000 | source address must be set to that of the host sending the | |
1001 | data; the destination address must be set to that of the | |
1002 | host for whom the data is intended; the packet type must be | |
1003 | set to whatever value is desired; and the hopcount must be | |
1004 | set to some reasonable value (almost always zero). It should | |
1005 | also be noted that simply sending data using \fIwrite\fP | |
1006 | will not work unless a \fIconnect\fP or \fIsendto\fP call | |
1007 | is used, in spite of the fact that it is the destination | |
1008 | address in the prototype header that is used, not the one | |
1009 | given in either of those calls. For almost | |
1010 | all IDP applications , using SO_DEFAULT_HEADERS is easier and | |
1011 | more desirable than writing headers. | |
1012 | .NH 2 | |
1013 | Three-way Handshake | |
1014 | .PP | |
1015 | The semantics of SPP connections indicates that a three-way | |
1016 | handshake, involving changes in the datastream type, should \(em | |
1017 | but is not absolutely required to \(em take place before a SPP | |
1018 | connection is closed. Almost all SPP connections are | |
1019 | ``well-behaved'' in this manner; when communicating with | |
1020 | any process, it is best to assume that the three-way handshake | |
1021 | is required unless it is known for certain that it is not | |
1022 | required. In a three-way close, the closing process | |
1023 | indicates that it wishes to close the connection by sending | |
1024 | a zero-length packet with end-of-message set and with | |
1025 | datastream type 254. The other side of the connection | |
1026 | indicates that it is OK to close by sending a zero-length | |
1027 | packet with end-of-message set and datastream type 255. Finally, | |
1028 | the closing process replies with a zero-length packet with | |
1029 | substream type 255; at this point, the connection is considered | |
1030 | closed. The following code fragments are simplified examples | |
1031 | of how one might handle this three-way handshake at the user | |
1032 | level; in the future, support for this type of close will | |
1033 | probably be provided as part of the C library or as part of | |
1034 | the kernel. The first code fragment below illustrates how a process | |
1035 | might handle three-way handshake if it sees that the process it | |
1036 | is communicating with wants to close the connection: | |
1037 | .DS | |
1038 | #include <sys/types.h> | |
1039 | #include <sys/socket.h> | |
1040 | #include <netns/ns.h> | |
1041 | #include <netns/sp.h> | |
1042 | ... | |
1043 | #ifndef SPPSST_END | |
1044 | #define SPPSST_END 254 | |
1045 | #define SPPSST_ENDREPLY 255 | |
1046 | #endif | |
1047 | struct sphdr proto_sp; | |
1048 | int s; | |
1049 | ... | |
1050 | read(s, buf, BUFSIZE); | |
1051 | if (((struct sphdr *)buf)->sp_dt == SPPSST_END) { | |
1052 | /* | |
1053 | * SPPSST_END indicates that the other side wants to | |
1054 | * close. | |
1055 | */ | |
1056 | proto_sp.sp_dt = SPPSST_ENDREPLY; | |
1057 | proto_sp.sp_cc = SP_EM; | |
1058 | setsockopt(s, NSPROTO_SPP, SO_DEFAULT_HEADERS, (char *)&proto_sp, | |
1059 | sizeof(proto_sp)); | |
1060 | write(s, buf, 0); | |
1061 | /* | |
1062 | * Write a zero-length packet with datastream type = SPPSST_ENDREPLY | |
1063 | * to indicate that the close is OK with us. The packet that we | |
1064 | * don't see (because we don't look for it) is another packet | |
1065 | * from the other side of the connection, with SPPSST_ENDREPLY | |
1066 | * on it it, too. Once that packet is sent, the connection is | |
1067 | * considered closed; note that we really ought to retransmit | |
1068 | * the close for some time if we do not get a reply. | |
1069 | */ | |
1070 | close(s); | |
1071 | } | |
1072 | ... | |
1073 | .DE | |
1074 | To indicate to another process that we would like to close the | |
1075 | connection, the following code would suffice: | |
1076 | .DS | |
1077 | #include <sys/types.h> | |
1078 | #include <sys/socket.h> | |
1079 | #include <netns/ns.h> | |
1080 | #include <netns/sp.h> | |
1081 | ... | |
1082 | #ifndef SPPSST_END | |
1083 | #define SPPSST_END 254 | |
1084 | #define SPPSST_ENDREPLY 255 | |
1085 | #endif | |
1086 | struct sphdr proto_sp; | |
1087 | int s; | |
1088 | ... | |
1089 | proto_sp.sp_dt = SPPSST_END; | |
1090 | proto_sp.sp_cc = SP_EM; | |
1091 | setsockopt(s, NSPROTO_SPP, SO_DEFAULT_HEADERS, (char *)&proto_sp, | |
1092 | sizeof(proto_sp)); | |
1093 | write(s, buf, 0); /* send the end request */ | |
1094 | proto_sp.sp_dt = SPPSST_ENDREPLY; | |
1095 | setsockopt(s, NSPROTO_SPP, SO_DEFAULT_HEADERS, (char *)&proto_sp, | |
1096 | sizeof(proto_sp)); | |
1097 | /* | |
1098 | * We assume (perhaps unwisely) | |
1099 | * that the other side will send the | |
1100 | * ENDREPLY, so we'll just send our final ENDREPLY | |
1101 | * as if we'd seen theirs already. | |
1102 | */ | |
1103 | write(s, buf, 0); | |
1104 | close(s); | |
1105 | ... | |
1106 | .DE | |
1107 | .NH 2 | |
1108 | Packet Exchange | |
1109 | .PP | |
1110 | The Xerox standard protocols include a protocol that is both | |
1111 | reliable and datagram-oriented. This protocol is known as | |
1112 | Packet Exchange (PEX or PE) and, like SPP, is layered on top | |
1113 | of IDP. PEX is important for a number of things: Courier | |
1114 | remote procedure calls may be expedited through the use | |
1115 | of PEX, and many Xerox servers are located by doing a PEX | |
1116 | ``BroadcastForServers'' operation. Although there is no | |
1117 | implementation of PEX in the kernel, | |
1118 | it may be simulated at the user level with some clever coding | |
1119 | and the use of one peculiar \fIgetsockopt\fP. A PEX packet | |
1120 | looks like: | |
1121 | .DS | |
1122 | .if t .ta \w'struct 'u +\w" struct idp"u +2.0i | |
1123 | /* | |
1124 | * The packet-exchange header shown here is not defined | |
1125 | * as part of any of the system include files. | |
1126 | */ | |
1127 | struct pex { | |
1128 | struct idp p_idp; /* idp header */ | |
1129 | u_short ph_id[2]; /* unique transaction ID for pex */ | |
1130 | u_short ph_client; /* client type field for pex */ | |
1131 | }; | |
1132 | .DE | |
1133 | The \fIph_id\fP field is used to hold a ``unique id'' that | |
1134 | is used in duplicate suppression; the \fIph_client\fP | |
1135 | field indicates the PEX client type (similar to the packet | |
1136 | type field in the IDP header). PEX reliability stems from the | |
1137 | fact that it is an idempotent (``I send a packet to you, you | |
1138 | send a packet to me'') protocol. Processes on each side of | |
1139 | the connection may use the unique id to determine if they have | |
1140 | seen a given packet before (the unique id field differs on each | |
1141 | packet sent) so that duplicates may be detected, and to indicate | |
1142 | which message a given packet is in response to. If a packet with | |
1143 | a given unique id is sent and no response is received in a given | |
1144 | amount of time, the packet is retransmitted until it is decided | |
1145 | that no response will ever be received. To simulate PEX, one | |
1146 | must be able to generate unique ids -- something that is hard to | |
1147 | do at the user level with any real guarantee that the id is really | |
1148 | unique. Therefore, a means (via \fIgetsockopt\fP) has been provided | |
1149 | for getting unique ids from the kernel. The following code fragment | |
1150 | indicates how to get a unique id: | |
1151 | .DS | |
1152 | long uniqueid; | |
1153 | int s, idsize = sizeof(uniqueid); | |
1154 | ... | |
1155 | s = socket(AF_NS, SOCK_DGRAM, 0); | |
1156 | ... | |
1157 | /* get id from the kernel -- only on IDP sockets */ | |
1158 | getsockopt(s, NSPROTO_PE, SO_SEQNO, (char *)&uniqueid, &idsize); | |
1159 | ... | |
1160 | .DE | |
1161 | The retransmission and duplicate suppression code required to | |
1162 | simulate PEX fully is left as an exercise for the reader. | |
1163 | .NH 2 | |
1164 | Inetd | |
1165 | .PP | |
1166 | One of the daemons provided with 4.3BSD is \fIinetd\fP, the | |
1167 | so called ``internet super-server.'' \fIInetd\fP is invoked at boot | |
1168 | time, and determines from the file \fI/etc/inetd.conf\fP the | |
1169 | servers for which it is to listen. Once this information has been | |
1170 | read and a pristine environment created, \fIinetd\fP proceeds | |
1171 | to create one socket for each service it is to listen for, | |
1172 | binding the appropriate port number to each socket. | |
1173 | .PP | |
1174 | \fIInetd\fP then performs a \fIselect\fP on all these | |
1175 | sockets for read availability, waiting for somebody wishing | |
1176 | a connection to the service corresponding to | |
1177 | that socket. \fIInetd\fP then performs an \fIaccept\fP on | |
1178 | the socket in question, \fIfork\fPs, \fIdup\fPs the new | |
1179 | socket to file descriptors 0 and 1 (stdin and | |
1180 | stdout), closes other open file | |
1181 | descriptors, and \fIexec\fPs the appropriate server. | |
1182 | .PP | |
1183 | Servers making use of \fIinetd\fP are considerably simplified, | |
1184 | as \fIinetd\fP takes care of the majority of the IPC work | |
1185 | required in establishing a connection. The server invoked | |
1186 | by \fIinetd\fP expects the socket connected to its client | |
1187 | on file descriptors 0 and 1, and may immediately perform | |
1188 | any operations such as \fIread\fP, \fIwrite\fP, \fIsend\fP, | |
1189 | or \fIrecv\fP. Indeed, servers may use | |
1190 | buffered I/O as provided by the ``stdio'' conventions, as | |
1191 | long as as they remember to use \fIfflush\fP when appropriate. | |
1192 | .PP | |
1193 | One call which may be of interest to individuals writing | |
1194 | servers under \fIinetd\fP is the \fIgetpeername\fP call, | |
1195 | which returns the address of the peer (process) connected | |
1196 | on the other end of the socket. For example, to log the | |
1197 | Internet address in ``dot notation'' (e.g., ``128.32.0.4'') | |
1198 | of a client connected to a server under | |
1199 | \fIinetd\fP, the following code might be used: | |
1200 | .DS | |
1201 | struct sockaddr_in name; | |
1202 | int namelen = sizeof (name); | |
1203 | ... | |
1204 | if (getpeername(0, (struct sockaddr *)&name, &namelen) < 0) { | |
1205 | syslog(LOG_ERR, "getpeername: %m"); | |
1206 | exit(1); | |
1207 | } else | |
1208 | syslog(LOG_INFO, "Connection from %s", inet_ntoa(name.sin_addr)); | |
1209 | ... | |
1210 | .DE | |
1211 | While the \fIgetpeername\fP call is especially useful when | |
1212 | writing programs to run with \fIinetd\fP, it can be used | |
1213 | under other circumstances. Be warned, however, that \fIgetpeername\fP will | |
1214 | fail on UNIX domain sockets. |