| 1 | .\" Copyright (c) 1986 Regents of the University of California. |
| 2 | .\" All rights reserved. The Berkeley software License Agreement |
| 3 | .\" specifies the terms and conditions for redistribution. |
| 4 | .\" |
| 5 | .\" @(#)5.t 1.2 (Berkeley) %G% |
| 6 | .\" |
| 7 | .ds RH "Advanced Topics |
| 8 | .bp |
| 9 | .nr H1 5 |
| 10 | .nr H2 0 |
| 11 | .bp |
| 12 | .LG |
| 13 | .B |
| 14 | .ce |
| 15 | 5. ADVANCED TOPICS |
| 16 | .sp 2 |
| 17 | .R |
| 18 | .NL |
| 19 | .PP |
| 20 | A number of facilities have yet to be discussed. For most users |
| 21 | of the IPC the mechanisms already |
| 22 | described will suffice in constructing distributed |
| 23 | applications. However, others will find the need to utilize some |
| 24 | of the features which we consider in this section. |
| 25 | .NH 2 |
| 26 | Out of band data |
| 27 | .PP |
| 28 | The stream socket abstraction includes the notion of \*(lqout |
| 29 | of band\*(rq data. Out of band data is a logically independent |
| 30 | transmission channel associated with each pair of connected |
| 31 | stream sockets. Out of band data is delivered to the user |
| 32 | independently of normal data along with the SIGURG signal |
| 33 | (if multiple sockets may have out of band data awaiting |
| 34 | delivery, a \fIselect\fP call may be used to determine those |
| 35 | sockets with such data). A process can set the process group |
| 36 | or process id to be informed by the SIGURG signal via the |
| 37 | appropriate \fIfcntl\fP call, as described below for |
| 38 | SIGIO. |
| 39 | .PP |
| 40 | In addition to the information passed, a logical mark is placed in |
| 41 | the data stream to indicate the point at which the out |
| 42 | of band data was sent. The remote login and remote shell |
| 43 | applications use this facility to propagate signals between |
| 44 | client and server processes. When a signal is expected to |
| 45 | flush any pending output from the remote process(es), all |
| 46 | data up to the mark in the data stream is discarded. |
| 47 | .PP |
| 48 | The |
| 49 | stream abstraction defines that the out of band data facilities |
| 50 | must support the reliable delivery of at least one |
| 51 | out of band message at a time. This message may contain at least one |
| 52 | byte of data, and at least one message may be pending delivery |
| 53 | to the user at any one time. For communications protocols which |
| 54 | support only in-band signaling (i.e. the urgent data is |
| 55 | delivered in sequence with the normal data), the system extracts |
| 56 | the data from the normal data stream and stores it separately. |
| 57 | This allows users to choose between receiving the urgent data |
| 58 | in order and receiving it out of sequence without having to |
| 59 | buffer all the intervening data. It is not possible |
| 60 | to ``peek'' (via MSG_PEEK) at out of band data. |
| 61 | .PP |
| 62 | To send an out of band message the MSG_OOB flag is supplied to |
| 63 | a \fIsend\fP or \fIsendto\fP calls, |
| 64 | while to receive out of band data MSG_OOB should be indicated |
| 65 | when performing a \fIrecvfrom\fP or \fIrecv\fP call. |
| 66 | To find out if the read pointer is currently pointing at |
| 67 | the mark in the data stream, the SIOCATMARK ioctl is provided: |
| 68 | .DS |
| 69 | ioctl(s, SIOCATMARK, &yes); |
| 70 | .DE |
| 71 | If \fIyes\fP is a 1 on return, the next read will return data |
| 72 | after the mark. Otherwise (assuming out of band data has arrived), |
| 73 | the next read will provide data sent by the client prior |
| 74 | to transmission of the out of band signal. The routine used |
| 75 | in the remote login process to flush output on receipt of an |
| 76 | interrupt or quit signal is shown in Figure 5. |
| 77 | .KF |
| 78 | .DS |
| 79 | #include <sys/ioctl.h> |
| 80 | #include <sys/file.h> |
| 81 | ... |
| 82 | oob() |
| 83 | { |
| 84 | int out = FWRITE; |
| 85 | char waste[BUFSIZ], mark; |
| 86 | |
| 87 | /* flush local terminal output */ |
| 88 | ioctl(1, TIOCFLUSH, (char *)&out); |
| 89 | for (;;) { |
| 90 | if (ioctl(rem, SIOCATMARK, &mark) < 0) { |
| 91 | perror("ioctl"); |
| 92 | break; |
| 93 | } |
| 94 | if (mark) |
| 95 | break; |
| 96 | (void) read(rem, waste, sizeof (waste)); |
| 97 | } |
| 98 | if (recv(rem, &mark, 1, MSG_OOB) < 0) { |
| 99 | perror("recv"); |
| 100 | ... |
| 101 | } |
| 102 | ... |
| 103 | } |
| 104 | .DE |
| 105 | .ce |
| 106 | Figure 5. Flushing terminal i/o on receipt of out of band data. |
| 107 | .sp |
| 108 | .KE |
| 109 | .NH 2 |
| 110 | Interrupt driven socket i/o |
| 111 | .PP |
| 112 | The SIGIO signal allows a process to be notified |
| 113 | via a signal when a socket (or more generally, a file |
| 114 | descriptor) has data waiting to be read. Use of |
| 115 | the SIGIO facility requires three steps: First, |
| 116 | the process must set up a SIGIO signal handler |
| 117 | by use of the \fIsignal\fP call. Second, |
| 118 | it must set the process id or process group id which is to receive |
| 119 | notification of pending input to its own process id, |
| 120 | or the process group id of its process group (note that |
| 121 | the default process group of a socket is group zero). |
| 122 | This is accomplished by use of a \fIfcntl\fP call. |
| 123 | Third, it must turn on notification of pending i/o requests |
| 124 | with another \fIfcntl\fP call. Sample code to |
| 125 | allow a given process to receive information on |
| 126 | pending i/o requests as they occur for a socket \fIs\fP |
| 127 | is given in Figure 6. With slight change, this code can also |
| 128 | be used to prepare for receipt of SIGURG signals. |
| 129 | .KF |
| 130 | .DS |
| 131 | #include <fcntl.h> |
| 132 | ... |
| 133 | int io_handler(); |
| 134 | ... |
| 135 | signal(SIGIO, io_handler); |
| 136 | |
| 137 | /* Set the process receiving SIGIO/SIGURG signals to us */ |
| 138 | |
| 139 | if (fcntl(s, F_SETOWN, getpid()) < 0) { |
| 140 | perror("fcntl F_SETOWN"); |
| 141 | exit(1); |
| 142 | } |
| 143 | |
| 144 | /* Allow receipt of asynchronous i/o signals */ |
| 145 | |
| 146 | if (fcntl(s, F_SETFL, FASYNC) < 0) { |
| 147 | perror("fcntl F_SETFL, FASYNC"); |
| 148 | exit(1); |
| 149 | } |
| 150 | .DE |
| 151 | .ce |
| 152 | Figure 6. Use of asynchronous notification of i/o requests. |
| 153 | .sp |
| 154 | .KE |
| 155 | .NH 2 |
| 156 | Signals and process groups |
| 157 | .PP |
| 158 | Due to the existence of the SIGURG and SIGIO signals each socket has an |
| 159 | associated process number, just as is done for terminals. |
| 160 | This value is initialized to zero, |
| 161 | but may be redefined at a later time with the F_SETOWN |
| 162 | \fIfcntl\fP, such as was done in the code above for SIGIO. |
| 163 | To set the socket's process id for signals, positive arguments |
| 164 | should be given to the \fIfcntl\fP call. To set the socket's |
| 165 | process group for signals, negative arguments should be |
| 166 | passed to \fIfcntl\fP. Note that the process number indicates |
| 167 | either the associated process id or the associated process |
| 168 | group; it is impossible to specify both at the same time. |
| 169 | A similar \fIfcntl\fP, F_GETOWN, is available for determining the |
| 170 | current process number of a socket. |
| 171 | .PP |
| 172 | An old signal which is useful when constructing server processes |
| 173 | is SIGCHLD. This signal is delivered to a process when any |
| 174 | children processes have changed state. Normally servers use |
| 175 | the signal to \*(lqreap\*(rq child processes after exiting. |
| 176 | For example, the remote login server loop shown in Figure 2 |
| 177 | may be augmented as shown in Figure 7. |
| 178 | .KF |
| 179 | .DS |
| 180 | int reaper(); |
| 181 | ... |
| 182 | signal(SIGCHLD, reaper); |
| 183 | listen(f, 5); |
| 184 | for (;;) { |
| 185 | int g, len = sizeof (from); |
| 186 | |
| 187 | g = accept(f, (struct sockaddr *)&from, &len,); |
| 188 | if (g < 0) { |
| 189 | if (errno != EINTR) |
| 190 | syslog(LOG_ERR, "rlogind: accept: %m"); |
| 191 | continue; |
| 192 | } |
| 193 | ... |
| 194 | } |
| 195 | ... |
| 196 | #include <wait.h> |
| 197 | reaper() |
| 198 | { |
| 199 | union wait status; |
| 200 | |
| 201 | while (wait3(&status, WNOHANG, 0) > 0) |
| 202 | ; |
| 203 | } |
| 204 | .DE |
| 205 | .sp |
| 206 | .ce |
| 207 | Figure 7. Use of the SIGCHLD signal. |
| 208 | .sp |
| 209 | .KE |
| 210 | .PP |
| 211 | If the parent server process fails to reap its children, |
| 212 | a large number of \*(lqzombie\*(rq processes may be created. |
| 213 | .NH 2 |
| 214 | Pseudo terminals |
| 215 | .PP |
| 216 | Many programs will not function properly without a terminal |
| 217 | for standard input and output. Since a socket is not a terminal, |
| 218 | it is often necessary to have a process communicating over |
| 219 | the network do so through a \fIpseudo terminal\fP. A pseudo |
| 220 | terminal is actually a pair of devices, master and slave, |
| 221 | which allow a process to serve as an active agent in communication |
| 222 | between processes and users. Data written on the slave side |
| 223 | of a pseudo terminal is supplied as input to a process reading |
| 224 | from the master side, while data written on the master side is |
| 225 | given to the slave as input. In this way, the process manipulating |
| 226 | the master side of the pseudo terminal has control over the |
| 227 | information read and written on the slave side. |
| 228 | The purpose of this abstraction is to |
| 229 | preserve terminal semantics over a network connection \(em |
| 230 | that is, the slave side looks like a normal terminal to |
| 231 | any process reading from or writing to it. |
| 232 | .PP |
| 233 | For example, the remote |
| 234 | login server uses pseudo terminals for remote login sessions. |
| 235 | A user logging in to a machine across the network is provided |
| 236 | a shell with a slave pseudo terminal as standard input, output, |
| 237 | and error. The server process then handles the communication |
| 238 | between the programs invoked by the remote shell and the user's |
| 239 | local client process. When a user sends an interrupt or quit |
| 240 | signal to a process executing on a remote machine, the client |
| 241 | login program traps the signal, sends an out of band message |
| 242 | to the server process who then uses the signal number, sent |
| 243 | as the data value in the out of band message, to perform a |
| 244 | \fIkillpg\fP(2) on the appropriate process group. |
| 245 | .PP |
| 246 | Under 4.3BSD, the slave side of a pseudo terminal is |
| 247 | \fI/dev/ttyxy\fP, where \fIx\fP is a single letter |
| 248 | starting at `p' and perhaps continuing as far down |
| 249 | as `t'. \fIy\fP is a hexidecimal ``digit'' (i.e., a single |
| 250 | character in the range 0 through 9 or `a' through `f'). |
| 251 | The master side of a pseudo terminal is \fI/dev/ptyxy\fP, |
| 252 | where \fIx\fP and \fIy\fP correspond to the same letters |
| 253 | in the slave side of the pseudo terminal. |
| 254 | .PP |
| 255 | In general, the method of obtaining a pair of master and |
| 256 | slave pseudo terminals is made up of three components. |
| 257 | First, the process must find a pseudo terminal which |
| 258 | is not currently in use. Having done so, |
| 259 | it then opens both the master and the slave side of |
| 260 | the device, taking care to open the master side of the device first. |
| 261 | The process then \fIfork\fPs; the child closes |
| 262 | the master side of the pseudo terminal, and \fIexec\fPs the |
| 263 | appropriate program. Meanwhile, the parent closes the |
| 264 | slave side of the pseudo terminal and begins reading and |
| 265 | writing from the master side. Sample code making use of |
| 266 | pseudo terminals is given in Figure 8; this code assumes |
| 267 | that a connection on a socket \fIs\fP exists, connected |
| 268 | to a peer who wants a service of some kind, and that the |
| 269 | process has disassociated itself from a controlling terminal. |
| 270 | .KF |
| 271 | .DS |
| 272 | gotpty = 0; |
| 273 | for (c = 'p'; !gotpty && c <= 's'; c++) { |
| 274 | line = "/dev/ptyXX"; |
| 275 | line[sizeof("/dev/pty")-1] = c; |
| 276 | line[sizeof("/dev/ptyp")-1] = '0'; |
| 277 | if (stat(line, &statbuf) < 0) |
| 278 | break; |
| 279 | for (i = 0; i < 16; i++) { |
| 280 | line[sizeof("/dev/ptyp")-1] = "0123456789abcdef"[i]; |
| 281 | master = open(line, O_RDWR); |
| 282 | if (master > 0) { |
| 283 | gotpty = 1; |
| 284 | break; |
| 285 | } |
| 286 | } |
| 287 | } |
| 288 | if (!gotpty) { |
| 289 | syslog(LOG_ERR, "All network ports in use"); |
| 290 | exit(1); |
| 291 | } |
| 292 | |
| 293 | line[sizeof("/dev/")-1] = 't'; |
| 294 | slave = open(line, O_RDWR); /* \fIslave\fP is now slave side */ |
| 295 | if (slave < 0) { |
| 296 | syslog(LOG_ERR, "Cannot open slave pty %s", line); |
| 297 | exit(1); |
| 298 | } |
| 299 | |
| 300 | ioctl(slave, TIOCGETP, &b); /* Set slave tty modes */ |
| 301 | b.sg_flags = CRMOD|XTABS|ANYP; |
| 302 | ioctl(slave, TIOCSETP, &b); |
| 303 | |
| 304 | i = fork(); |
| 305 | if (i < 0) { |
| 306 | syslog(LOG_ERR, "fork: %m"); |
| 307 | exit(1); |
| 308 | } else if (i) { /* Parent */ |
| 309 | close(slave); |
| 310 | ... |
| 311 | } else { /* Child */ |
| 312 | (void) close(s); |
| 313 | (void) close(master); |
| 314 | dup2(slave, 0); |
| 315 | dup2(slave, 1); |
| 316 | dup2(slave, 2); |
| 317 | if (slave > 2) |
| 318 | (void) close(slave); |
| 319 | ... |
| 320 | } |
| 321 | .DE |
| 322 | .ce |
| 323 | Figure 8. Creation and use of a pseudo terminal |
| 324 | .sp |
| 325 | .KE |
| 326 | .NH 2 |
| 327 | Selecting specific protocols |
| 328 | .PP |
| 329 | If the third argument to the \fIsocket\fP call is 0, |
| 330 | \fIsocket\fP will select a default protocol to use with |
| 331 | the returned socket of the type requested. This |
| 332 | protocol should be correct for almost every situation. |
| 333 | Still, it is conceivable that the user may wish to specify |
| 334 | a particular protocol for use with a given socket. |
| 335 | .PP |
| 336 | To obtain a particular protocol one selects the protocol number, |
| 337 | as defined within the communication domain. For the Internet |
| 338 | domain the available protocols are defined in <\fInetinet/in.h\fP> |
| 339 | or, better yet, one may use one of the library routines |
| 340 | discussed in section 3, such as \fIgetprotobyname\fP: |
| 341 | .DS |
| 342 | #include <sys/types.h> |
| 343 | #include <sys/socket.h> |
| 344 | #include <netinet/in.h> |
| 345 | #include <netdb.h> |
| 346 | ... |
| 347 | pp = getprotobyname("newtcp"); |
| 348 | s = socket(AF_INET, SOCK_STREAM, pp->p_proto); |
| 349 | .DE |
| 350 | This would result in a socket \fIs\fP using a stream |
| 351 | based connection, but with protocol type of ``newtcp'' |
| 352 | instead of the default ``tcp.'' |
| 353 | .PP |
| 354 | In the NS domain, the available socket protocols are defined in |
| 355 | <\fInetns/ns.h\fP>. To create a raw socket for Xerox Error Protocol |
| 356 | messages, one might use: |
| 357 | .DS |
| 358 | #include <sys/types.h> |
| 359 | #include <sys/socket.h> |
| 360 | #include <netns/ns.h> |
| 361 | ... |
| 362 | s = socket(AF_NS, SOCK_RAW, NSPROTO_ERROR); |
| 363 | .DE |
| 364 | .NH 2 |
| 365 | Address binding |
| 366 | .PP |
| 367 | As was mentioned in section 2, |
| 368 | binding addresses to sockets in the Internet and NS domains can be |
| 369 | fairly complex. As a brief reminder, these associations |
| 370 | are composed of local and foreign |
| 371 | addresses, and local and foreign ports. Port numbers are |
| 372 | allocated out of separate spaces, one for each system and one |
| 373 | for each domain on that system. |
| 374 | Through the \fIbind\fP system call, a |
| 375 | process may specify half of an association, the |
| 376 | <local address, local port> part, while the |
| 377 | \fIconnect\fP |
| 378 | and \fIaccept\fP |
| 379 | primitives are used to complete a socket's association by |
| 380 | specifying the <foreign address, foreign port> part. |
| 381 | Since the association is created in two steps the association |
| 382 | uniqueness requirement indicated previously could be violated unless |
| 383 | care is taken. Further, it is unrealistic to expect user |
| 384 | programs to always know proper values to use for the local address |
| 385 | and local port since a host may reside on multiple networks and |
| 386 | the set of allocated port numbers is not directly accessible |
| 387 | to a user. |
| 388 | .PP |
| 389 | To simplify local address binding in the Internet domain the notion of a |
| 390 | \*(lqwildcard\*(rq address has been provided. When an address |
| 391 | is specified as INADDR_ANY (a manifest constant defined in |
| 392 | <netinet/in.h>), the system interprets the address as |
| 393 | \*(lqany valid address\*(rq. For example, to bind a specific |
| 394 | port number to a socket, but leave the local address unspecified, |
| 395 | the following code might be used: |
| 396 | .DS |
| 397 | #include <sys/types.h> |
| 398 | #include <netinet/in.h> |
| 399 | ... |
| 400 | struct sockaddr_in sin; |
| 401 | ... |
| 402 | s = socket(AF_INET, SOCK_STREAM, 0); |
| 403 | sin.sin_family = AF_INET; |
| 404 | sin.sin_addr.s_addr = htonl(INADDR_ANY); |
| 405 | sin.sin_port = htons(MYPORT); |
| 406 | bind(s, (struct sockaddr *) &sin, sizeof (sin)); |
| 407 | .DE |
| 408 | Sockets with wildcarded local addresses may receive messages |
| 409 | directed to the specified port number, and addressed to any |
| 410 | of the possible addresses assigned to a host. For example, |
| 411 | if a host is on a networks 128.32 and 10 and a socket is bound as |
| 412 | above, then an accept call is performed, the process will be |
| 413 | able to accept connection requests which arrive either from |
| 414 | network 128.32 or network 10. |
| 415 | If a server process wished to only allow hosts on a |
| 416 | given network connect to it, it would bind |
| 417 | the address of the host on the appropriate network. Such |
| 418 | an address could perhaps be determined by a routine |
| 419 | such as \fIgethostbynameandnet\fP, as mentioned in section 3. |
| 420 | .PP |
| 421 | In a similar fashion, a local port may be left unspecified |
| 422 | (specified as zero), in which case the system will select an |
| 423 | appropriate port number for it. This shortcut will work |
| 424 | both in the Internet and NS domains. For example, to |
| 425 | bind a specific local address to a socket, but to leave the |
| 426 | local port number unspecified: |
| 427 | .DS |
| 428 | hp = gethostbyname(hostname); |
| 429 | if (hp == NULL) { |
| 430 | ... |
| 431 | } |
| 432 | bcopy(hp->h_addr, (char *) sin.sin_addr, hp->h_length); |
| 433 | sin.sin_port = htons(0); |
| 434 | bind(s, (struct sockaddr *) &sin, sizeof (sin)); |
| 435 | .DE |
| 436 | The system selects the local port number based on two criteria. |
| 437 | The first is that on 4BSD systems, |
| 438 | local ports numbered 0 through 1023 (for the Xerox domain, |
| 439 | 0 through 3000) are reserved |
| 440 | for privileged users (i.e., the super user). The second is |
| 441 | that the port number is not currently bound to some other |
| 442 | socket. In order to find a free Internet port number in the privileged |
| 443 | range the \fIrresvport\fP library routine may be used as follows |
| 444 | to return a stream socket in with a privileged port number: |
| 445 | .DS |
| 446 | int lport = IPPORT_RESERVED \- 1; |
| 447 | int s; |
| 448 | ... |
| 449 | s = rresvport(&lport); |
| 450 | if (s < 0) { |
| 451 | if (errno == EAGAIN) |
| 452 | fprintf(stderr, "socket: all ports in use\en"); |
| 453 | else |
| 454 | perror("rresvport: socket"); |
| 455 | ... |
| 456 | } |
| 457 | .DE |
| 458 | The restriction on allocating ports was done to allow processes |
| 459 | executing in a \*(lqsecure\*(rq environment to perform authentication |
| 460 | based on the originating address and port number. For example, |
| 461 | the \fIrlogin\fP(1) command allows users to log in across a network |
| 462 | without being asked for a password, if two conditions hold: |
| 463 | First, the name of the system the user |
| 464 | is logging in from is in the file |
| 465 | \fI/etc/hosts.equiv\fP on the system he is logging |
| 466 | in to (or the system name and the user name are in |
| 467 | the user's \fI.rhosts\fP file in the user's home |
| 468 | directory), and second, that the user's rlogin |
| 469 | process is coming from a privileged port on the machine he is |
| 470 | logging in from. The port number and network address of the |
| 471 | machine the user is logging in from can be determined either |
| 472 | by the \fIfrom\fP value-result parameter to the \fIaccept\fP call, or |
| 473 | from the \fIgetpeername\fP call. |
| 474 | .PP |
| 475 | In certain cases the algorithm used by the system in selecting |
| 476 | port numbers is unsuitable for an application. This is due to |
| 477 | associations being created in a two step process. For example, |
| 478 | the Internet file transfer protocol, FTP, specifies that data |
| 479 | connections must always originate from the same local port. However, |
| 480 | duplicate associations are avoided by connecting to different foreign |
| 481 | ports. In this situation the system would disallow binding the |
| 482 | same local address and port number to a socket if a previous data |
| 483 | connection's socket were around. To override the default port |
| 484 | selection algorithm then an option call must be performed prior |
| 485 | to address binding: |
| 486 | .DS |
| 487 | ... |
| 488 | int on = 1; |
| 489 | ... |
| 490 | setsockopt(s, SOL_SOCKET, SO_REUSEADDR, &on, sizeof(on)); |
| 491 | bind(s, (struct sockaddr *) &sin, sizeof (sin)); |
| 492 | .DE |
| 493 | With the above call, local addresses may be bound which |
| 494 | are already in use. This does not violate the uniqueness |
| 495 | requirement as the system still checks at connect time to |
| 496 | be sure any other sockets with the same local address and |
| 497 | port do not have the same foreign address and port (if an |
| 498 | association already exists, the error EADDRINUSE is returned). |
| 499 | .NH 2 |
| 500 | Broadcasting and datagram sockets |
| 501 | .PP |
| 502 | By using a datagram socket it is possible to send broadcast |
| 503 | packets on many networks supported by the system (the network |
| 504 | itself must support the notion of broadcasting; the system |
| 505 | provides no broadcast simulation in software). Broadcast |
| 506 | messages can place a high load on a network since they force |
| 507 | every host on the network to service them. Consequently, |
| 508 | the ability to send broadcast packets has been limited |
| 509 | to sockets which are explicitly marked as allowing broadcasting. |
| 510 | .PP |
| 511 | To send a broadcast message, a datagram socket |
| 512 | should be created: |
| 513 | .DS |
| 514 | s = socket(AF_INET, SOCK_DGRAM, 0); |
| 515 | .DE |
| 516 | or |
| 517 | .DS |
| 518 | s = socket(AF_NS, SOCK_DGRAM, 0); |
| 519 | .DE |
| 520 | The socket is marked as allowing broadcasting, |
| 521 | .DS |
| 522 | int on = 1; |
| 523 | |
| 524 | setsockopt(s, SOL_SOCKET, SO_BROADCAST, &on, sizeof (on)); |
| 525 | .DE |
| 526 | and at least a port number should be bound to the socket: |
| 527 | .DS |
| 528 | sin.sin_family = AF_INET; |
| 529 | sin.sin_addr.s_addr = htonl(INADDR_ANY); |
| 530 | sin.sin_port = htons(MYPORT); |
| 531 | bind(s, (struct sockaddr *) &sin, sizeof (sin)); |
| 532 | .DE |
| 533 | or, for the NS domain, |
| 534 | .DS |
| 535 | sns.sns_family = AF_NS; |
| 536 | netnum = htonl(net); |
| 537 | sns.sns_addr.x_net = *(union ns_net *) &netnum; /* insert net number */ |
| 538 | sns.sns_addr.x_port = htons(MYPORT); |
| 539 | bind(s, (struct sockaddr *) &sns, sizeof (sns)); |
| 540 | .DE |
| 541 | The destination address of the message to be broadcast |
| 542 | depends on the network the message is to be broadcast |
| 543 | on, and therefore requires some knowledge of the networks |
| 544 | to which the host is connected. Since this information should |
| 545 | be obtained in a host-independent fashion, 4.3BSD provides a method of |
| 546 | retrieving this information from the system data structures. |
| 547 | The SIOCGIFCONF \fIioctl\fP call returns the interface |
| 548 | configuration of a host in the form of a |
| 549 | single \fIifconf\fP structure; this structure contains |
| 550 | a ``data area'' which is made up of an array of |
| 551 | of \fIifreq\fP structures, one for each network interface |
| 552 | to which the host is connected. |
| 553 | These structures are defined in |
| 554 | \fI<net/if.h>\fP as follows: |
| 555 | .DS |
| 556 | .if t .ta .5i 1.0i 1.5i 3.5i |
| 557 | .if n .ta .7i 1.4i 2.1i 3.4i |
| 558 | struct ifconf { |
| 559 | int ifc_len; /* size of associated buffer */ |
| 560 | union { |
| 561 | caddr_t ifcu_buf; |
| 562 | struct ifreq *ifcu_req; |
| 563 | } ifc_ifcu; |
| 564 | }; |
| 565 | |
| 566 | #define ifc_buf ifc_ifcu.ifcu_buf /* buffer address */ |
| 567 | #define ifc_req ifc_ifcu.ifcu_req /* array of structures returned */ |
| 568 | |
| 569 | #define IFNAMSIZ 16 |
| 570 | |
| 571 | struct ifreq { |
| 572 | char ifr_name[IFNAMSIZ]; /* if name, e.g. "en0" */ |
| 573 | union { |
| 574 | struct sockaddr ifru_addr; |
| 575 | struct sockaddr ifru_dstaddr; |
| 576 | struct sockaddr ifru_broadaddr; |
| 577 | short ifru_flags; |
| 578 | caddr_t ifru_data; |
| 579 | } ifr_ifru; |
| 580 | }; |
| 581 | |
| 582 | .if t .ta \w' #define'u +\w' ifr_broadaddr'u +\w' ifr_ifru.ifru_broadaddr'u |
| 583 | #define ifr_addr ifr_ifru.ifru_addr /* address */ |
| 584 | #define ifr_dstaddr ifr_ifru.ifru_dstaddr /* other end of p-to-p link */ |
| 585 | #define ifr_broadaddr ifr_ifru.ifru_broadaddr /* broadcast address */ |
| 586 | #define ifr_flags ifr_ifru.ifru_flags /* flags */ |
| 587 | #define ifr_data ifr_ifru.ifru_data /* for use by interface */ |
| 588 | .DE |
| 589 | The actual call which obtains the |
| 590 | interface configuration is |
| 591 | .DS |
| 592 | struct ifconf ifc; |
| 593 | char buf[BUFSIZ]; |
| 594 | |
| 595 | ifc.ifc_len = sizeof (buf); |
| 596 | ifc.ifc_buf = buf; |
| 597 | if (ioctl(s, SIOCGIFCONF, (char *) &ifc) < 0) { |
| 598 | ... |
| 599 | } |
| 600 | .DE |
| 601 | After this call \fIbuf\fP will contain one \fIifreq\fP structure for |
| 602 | each network to which the host is connected, and |
| 603 | \fIifc.ifc_len\fP will have been modified to reflect the number |
| 604 | of bytes used by the \fIifreq\fP structures. |
| 605 | .PP |
| 606 | For each structure |
| 607 | there exists a set of ``interface flags'' which tell |
| 608 | whether the network corresponding to that interface is |
| 609 | up or down, point to point or broadcast, etc. The |
| 610 | SIOCGIFFLAGS \fIioctl\fP retrieves these |
| 611 | flags for an interface specified by an \fIifreq\fP |
| 612 | structure as follows: |
| 613 | .DS |
| 614 | struct ifreq *ifr; |
| 615 | |
| 616 | ifr = ifc.ifc_req; |
| 617 | |
| 618 | for (n = ifc.ifc_len / sizeof (struct ifreq); --n >= 0; ifr++) { |
| 619 | /* |
| 620 | * We must be careful that we don't use an interface |
| 621 | * devoted to an address family other than our own; |
| 622 | * if we were interested in NS interfaces, the |
| 623 | * AF_INET would be AF_NS. |
| 624 | */ |
| 625 | if (ifr->ifr_addr.sa_family != AF_INET) |
| 626 | continue; |
| 627 | if (ioctl(s, SIOCGIFFLAGS, (char *) ifr) < 0) { |
| 628 | ... |
| 629 | } |
| 630 | if ((ifr->ifr_flags & IFF_UP) == 0 || /* Skip boring cases */ |
| 631 | (ifr->ifr_flags & (IFF_BROADCAST | IFF_POINTTOPOINT)) == 0) |
| 632 | continue; |
| 633 | .DE |
| 634 | .PP |
| 635 | Once the flags have been obtained, the broadcast address |
| 636 | must be obtained. In the case of broadcast networks this is |
| 637 | done via the SIOCGIFBRDADDR \fIioctl\fP, while for point-to-point networks |
| 638 | the address of the destination host is obtained with SIOCGIFDSTADDR. |
| 639 | .DS |
| 640 | struct sockaddr dst; |
| 641 | |
| 642 | if (ifr->ifr_flags & IFF_POINTTOPOINT) { |
| 643 | if (ioctl(s, SIOCGIFDSTADDR, (char *) ifr) < 0) { |
| 644 | ... |
| 645 | } |
| 646 | bcopy((char *) ifr->ifr_dstaddr, (char *) &dst, sizeof (ifr->ifr_dstaddr)); |
| 647 | } else if (ifr->ifr_flags & IFF_BROADCAST) { |
| 648 | if (ioctl(s, SIOCGIFBRDADDR, (char *) ifr) < 0) { |
| 649 | ... |
| 650 | } |
| 651 | bcopy((char *) ifr->ifr_broadaddr, (char *) &dst, sizeof (ifr->ifr_broadaddr)); |
| 652 | } |
| 653 | .DE |
| 654 | .PP |
| 655 | After the appropriate \fIioctl\fP's have obtained the broadcast |
| 656 | or destination address (now in \fIdst\fP), the \fIsendto\fP call may be |
| 657 | used: |
| 658 | .DS |
| 659 | sendto(s, buf, buflen, 0, (struct sockaddr *)&dst, sizeof (dst)); |
| 660 | } |
| 661 | .DE |
| 662 | In the above loop one \fIsendto\fP occurs for every |
| 663 | interface the host is connected to which supports the notion of |
| 664 | broadcast or point-to-point addressing. |
| 665 | If a process only wished to send broadcast |
| 666 | messages on a given network, code similar to that outlined above |
| 667 | would be used, but the loop would need to find the |
| 668 | correct destination address. |
| 669 | .PP |
| 670 | Received broadcast messages contain the senders address |
| 671 | and port, as datagram sockets are bound before |
| 672 | a message is allowed to go out. |
| 673 | .NH 2 |
| 674 | Socket Options |
| 675 | .PP |
| 676 | It is possible to set and get a number of options on sockets |
| 677 | via the \fIsetsockopt\fP and \fIgetsockopt\fP system calls. |
| 678 | These options include such things as marking a socket for |
| 679 | broadcasting, not to route, to linger on close, etc. |
| 680 | The general forms of the calls are: |
| 681 | .DS |
| 682 | setsockopt(s, level, optname, optval, optlen); |
| 683 | .DE |
| 684 | and |
| 685 | .DS |
| 686 | getsockopt(s, level, optname, optval, optlen); |
| 687 | .DE |
| 688 | .PP |
| 689 | The parameters to the calls are as follows: \fIs\fP |
| 690 | is the socket on which the option is to be applied. |
| 691 | \fILevel\fP specifies the protocol layer on which the |
| 692 | option is to be applied; in most cases this is |
| 693 | the ``socket level'', indicated by the symbolic constant |
| 694 | SOL_SOCKET, defined in \fI<sys/socket.h>.\fP |
| 695 | The actual option is specified in \fIoptname\fP, and is |
| 696 | a symbolic constant also defined in \fI<sys/socket.h>\fP. |
| 697 | \fIOptval\fP and \fIOptlen\fP point to the value of the |
| 698 | option (in most cases, whether the option is to be turned |
| 699 | on or off), and the length of the value of the option, |
| 700 | respectively. |
| 701 | For \fIgetsockopt\fP, \fIoptlen\fP is |
| 702 | a value-result parameter, initially set to the size of |
| 703 | the storage area pointed to by \fIoptval\fP, and modified |
| 704 | upon return to indicate the actual amount of storage used. |
| 705 | .PP |
| 706 | An example should help clarify things. It is sometimes |
| 707 | useful to determine the type (e.g., stream, datagram, etc.) |
| 708 | of an existing socket; programs |
| 709 | under \fIinetd\fP (described below) may need to perform this |
| 710 | task. This can be accomplished as follows via the |
| 711 | SO_TYPE socket option and the \fIgetsockopt\fP call: |
| 712 | .DS |
| 713 | #include <sys/types.h> |
| 714 | #include <sys/socket.h> |
| 715 | |
| 716 | int type, size; |
| 717 | |
| 718 | size = sizeof (int); |
| 719 | |
| 720 | if (getsockopt(s, SOL_SOCKET, SO_TYPE, (char *) &type, &size) < 0) { |
| 721 | ... |
| 722 | } |
| 723 | .DE |
| 724 | After the \fIgetsockopt\fP call, \fItype\fP will be set |
| 725 | to the value of the socket type, as defined in |
| 726 | \fI<sys/socket.h>\fP. If, for example, the socket were |
| 727 | a datagram socket, \fItype\fP would have the value |
| 728 | corresponding to SOCK_DGRAM. |
| 729 | .NH 2 |
| 730 | NS Packet Sequences |
| 731 | .PP |
| 732 | The semantics of NS connections demand that |
| 733 | the user both be able to look inside the network header associated |
| 734 | with any incoming packet and be able to specify what should go |
| 735 | in certain fields of an outgoing packet. The header of an |
| 736 | IDP-level packet looks like: |
| 737 | .DS |
| 738 | .if t .ta \w'struct 'u +\w" struct ns_addr"u +2.0i |
| 739 | struct idp { |
| 740 | u_short idp_sum; /* Checksum */ |
| 741 | u_short idp_len; /* Length, in bytes, including header */ |
| 742 | u_char idp_tc; /* Transport Control (i.e., hop count) */ |
| 743 | u_char idp_pt; /* Packet Type (i.e., level 2 protocol) */ |
| 744 | struct ns_addr idp_dna; /* Destination Network Address */ |
| 745 | struct ns_addr idp_sna; /* Source Network Address */ |
| 746 | }; |
| 747 | .DE |
| 748 | Most of the fields are filled in automatically; the only |
| 749 | field that the user should be concerned with is the |
| 750 | \fIpacket type\fP field. The standard values for this |
| 751 | field are (as defined in <\fInetns/ns.h\fP>): |
| 752 | .DS |
| 753 | .if t .ta \w" #define"u +\w" NSPROTO_ERROR"u +1.0i |
| 754 | #define NSPROTO_RI 1 /* Routing Information */ |
| 755 | #define NSPROTO_ECHO 2 /* Echo Protocol */ |
| 756 | #define NSPROTO_ERROR 3 /* Error Protocol */ |
| 757 | #define NSPROTO_PE 4 /* Packet Exchange */ |
| 758 | #define NSPROTO_SPP 5 /* Sequenced Packet */ |
| 759 | .DE |
| 760 | For SPP connections, the contents of this field are |
| 761 | automatically set to NSPROTO_SPP; for IDP packets, |
| 762 | this value defaults to zero, which means ``unknown''. |
| 763 | .PP |
| 764 | The contents of a SPP header (minus the IDP header) are: |
| 765 | .DS |
| 766 | .if t .ta \w" #define"u +\w" u_short"u +2.0i |
| 767 | struct sphdr { |
| 768 | u_char sp_cc; /* connection control */ |
| 769 | #define SP_SP 0x80 /* system packet */ |
| 770 | #define SP_SA 0x40 /* send acknowledgement */ |
| 771 | #define SP_OB 0x20 /* attention (out of band data) */ |
| 772 | #define SP_EM 0x10 /* end of message */ |
| 773 | u_char sp_dt; /* datastream type */ |
| 774 | u_short sp_sid; /* source connection identifier */ |
| 775 | u_short sp_did; /* destination connection identifier */ |
| 776 | u_short sp_seq; /* sequence number */ |
| 777 | u_short sp_ack; /* acknowledge number */ |
| 778 | u_short sp_alo; /* allocation number */ |
| 779 | }; |
| 780 | .DE |
| 781 | Here, the items of interest are the \fIdatastream type\fP and |
| 782 | the \fIconnection control\fP fields. The semantics of the |
| 783 | datastream type are defined by the application(s) in question; |
| 784 | the value of this field is, by default, zero, but it can be |
| 785 | used to indicate things such as Xerox's Bulk Data Transfer |
| 786 | Protocol (in which case it is set to one). The connection control |
| 787 | field is a mask of the flags defined above. The user may |
| 788 | set or clear the end-of-message bit to indicate |
| 789 | that a given message is the last of a given substream type, |
| 790 | or may set/clear the attention bit as an alternate way to |
| 791 | indicate that a packet should be sent out-of-band. |
| 792 | .PP |
| 793 | Using different calls to \fIsetsockopt\fP, is it possible |
| 794 | to indicate whether prototype headers will be associated by |
| 795 | the user with each outgoing packet (SO_HEADERS_ON_OUTPUT), |
| 796 | to indicate whether the headers received by the system should be |
| 797 | delivered to the user (SO_HEADERS_ON_INPUT), or to indicate |
| 798 | default information that should be associated with all |
| 799 | outgoing packets on a given socket (SO_DEFAULT_HEADERS). |
| 800 | For example, to associate prototype headers with outgoing |
| 801 | SPP packets, one might use: |
| 802 | .DS |
| 803 | #include <sys/types.h> |
| 804 | #include <sys/socket.h> |
| 805 | #include <netns/ns.h> |
| 806 | #include <netns/sp.h> |
| 807 | ... |
| 808 | struct sockaddr_ns sns, to; |
| 809 | int s, on = 1; |
| 810 | struct databuf { |
| 811 | struct sphdr proto_spp; /* prototype header */ |
| 812 | char buf[534]; /* max. possible data by Xerox std. */ |
| 813 | } buf; |
| 814 | ... |
| 815 | s = socket(AF_NS, SOCK_SEQPACKET, 0); |
| 816 | ... |
| 817 | bind(s, (struct sockaddr *) &sns, sizeof (sns)); |
| 818 | setsockopt(s, NSPROTO_SPP, SO_HEADERS_ON_OUTPUT, &on, sizeof(on)); |
| 819 | ... |
| 820 | buf.proto_spp.sp_dt = 1; /* bulk data */ |
| 821 | buf.proto_spp.sp_cc = SP_EM; /* end-of-message */ |
| 822 | strcpy(buf.buf, "hello world\en"); |
| 823 | sendto(s, (char *) &buf, sizeof(struct sphdr) + strlen("hello world\en"), |
| 824 | (struct sockaddr *) &to, sizeof(to)); |
| 825 | ... |
| 826 | .DE |
| 827 | Note that one must be careful when writing headers; if the prototype |
| 828 | header is not written with the data with which it is to be associated, |
| 829 | the kernel will treat the first few bytes of the data as the |
| 830 | header, with unpredictable results. |
| 831 | To turn off the above association, and to indicate that packet |
| 832 | headers received by the system should be passed up to the user, |
| 833 | one might use: |
| 834 | .DS |
| 835 | #include <sys/types.h> |
| 836 | #include <sys/socket.h> |
| 837 | #include <netns/ns.h> |
| 838 | #include <netns/sp.h> |
| 839 | ... |
| 840 | struct sockaddr sns; |
| 841 | int s, on = 1, off = 0; |
| 842 | ... |
| 843 | s = socket(AF_NS, SOCK_SEQPACKET, 0); |
| 844 | ... |
| 845 | bind(s, (struct sockaddr *) &sns, sizeof (sns)); |
| 846 | setsockopt(s, NSPROTO_SPP, SO_HEADERS_ON_OUTPUT, &off, sizeof(off)); |
| 847 | setsockopt(s, NSPROTO_SPP, SO_HEADERS_ON_INPUT, &on, sizeof(on)); |
| 848 | ... |
| 849 | .DE |
| 850 | To indicate a default prototype header to be associated with |
| 851 | the outgoing packets on an IDP datagram socket, one would use: |
| 852 | .DS |
| 853 | #include <sys/types.h> |
| 854 | #include <sys/socket.h> |
| 855 | #include <netns/ns.h> |
| 856 | #include <netns/idp.h> |
| 857 | ... |
| 858 | struct sockaddr sns; |
| 859 | struct idp proto_idp; /* prototype header */ |
| 860 | int s, on = 1; |
| 861 | ... |
| 862 | s = socket(AF_NS, SOCK_DGRAM, 0); |
| 863 | ... |
| 864 | bind(s, (struct sockaddr *) &sns, sizeof (sns)); |
| 865 | proto_idp.idp_pt = NSPROTO_PE; /* packet exchange */ |
| 866 | setsockopt(s, NSPROTO_IDP, SO_DEFAULT_HEADERS, (char *) &proto_idp, |
| 867 | sizeof(proto_idp)); |
| 868 | ... |
| 869 | .DE |
| 870 | .NH 2 |
| 871 | Three-way Handshake |
| 872 | .PP |
| 873 | The semantics of SPP connections indicates that a three-way |
| 874 | handshake, involving changes in the datastream type, should \(em |
| 875 | but is not absolutely required to \(em take place before a SPP |
| 876 | connection is closed. Almost all SPP connections are |
| 877 | ``well-behaved'' in this manner; when communicating with |
| 878 | any process, it is best to assume that the three-way handshake |
| 879 | is required unless it is known for certain that it is not |
| 880 | required. In a three-way close, the closing process |
| 881 | indicates that it wishes to close the connection by sending |
| 882 | a zero-length packet with end-of-message set and with |
| 883 | datastream type 254. The other side of the connection |
| 884 | indicates that it is OK to close by sending a zero-length |
| 885 | packet with end-of-message set and datastream type 255. Finally, |
| 886 | the closing process replies with a zero-length packet with |
| 887 | substream type 255; at this point, the connection is considered |
| 888 | closed. The following code fragments are simplified examples |
| 889 | of how one might handle this three-way handshake at the user |
| 890 | level; in the future, support for this type of close will |
| 891 | probably be provided as part of the C library or as part of |
| 892 | the kernel. The first code fragment below illustrates how a process |
| 893 | might handle three-way handshake if it sees that the process it |
| 894 | is communicating with wants to close the connection: |
| 895 | .DS |
| 896 | #include <sys/types.h> |
| 897 | #include <sys/socket.h> |
| 898 | #include <netns/ns.h> |
| 899 | #include <netns/sp.h> |
| 900 | ... |
| 901 | #ifndef SPPSST_END |
| 902 | #define SPPSST_END 254 |
| 903 | #define SPPSST_ENDREPLY 255 |
| 904 | #endif |
| 905 | struct sphdr proto_sp; |
| 906 | int s; |
| 907 | ... |
| 908 | read(s, buf, BUFSIZE); |
| 909 | if (((struct sphdr *)buf)->sp_dt == SPPSST_END) { |
| 910 | /* |
| 911 | * SPPSST_END indicates that the other side wants to |
| 912 | * close. |
| 913 | */ |
| 914 | proto_sp.sp_dt = SPPSST_ENDREPLY; |
| 915 | proto_sp.sp_cc = SP_EM; |
| 916 | setsockopt(s, NSPROTO_SPP, SO_DEFAULT_HEADERS, (char *)&proto_sp, |
| 917 | sizeof(proto_sp)); |
| 918 | write(s, buf, 0); |
| 919 | /* |
| 920 | * Write a zero-length packet with datastream type = SPPSST_ENDREPLY |
| 921 | * to indicate that the close is OK with us. The packet that we |
| 922 | * don't see (because we don't look for it) is another packet |
| 923 | * from the other side of the connection, with SPPSST_ENDREPLY |
| 924 | * on it it, too. Once that packet is sent, the connection is |
| 925 | * considered closed; note that we really ought to retransmit |
| 926 | * the close for some time if we do not get a reply. |
| 927 | */ |
| 928 | close(s); |
| 929 | } |
| 930 | ... |
| 931 | .DE |
| 932 | To indicate to another process that we would like to close the |
| 933 | connection, the following code would suffice: |
| 934 | .DS |
| 935 | #include <sys/types.h> |
| 936 | #include <sys/socket.h> |
| 937 | #include <netns/ns.h> |
| 938 | #include <netns/sp.h> |
| 939 | ... |
| 940 | #ifndef SPPSST_END |
| 941 | #define SPPSST_END 254 |
| 942 | #define SPPSST_ENDREPLY 255 |
| 943 | #endif |
| 944 | struct sphdr proto_sp; |
| 945 | int s; |
| 946 | ... |
| 947 | proto_sp.sp_dt = SPPSST_END; |
| 948 | proto_sp.sp_cc = SP_EM; |
| 949 | setsockopt(s, NSPROTO_SPP, SO_DEFAULT_HEADERS, (char *)&proto_sp, |
| 950 | sizeof(proto_sp)); |
| 951 | write(s, buf, 0); /* send the end request */ |
| 952 | proto_sp.sp_dt = SPPSST_ENDREPLY; |
| 953 | setsockopt(s, NSPROTO_SPP, SO_DEFAULT_HEADERS, (char *)&proto_sp, |
| 954 | sizeof(proto_sp)); |
| 955 | /* |
| 956 | * We assume (perhaps unwisely) |
| 957 | * that the other side will send the |
| 958 | * ENDREPLY, so we'll just send our final ENDREPLY |
| 959 | * as if we'd seen theirs already. |
| 960 | */ |
| 961 | write(s, buf, 0); |
| 962 | close(s); |
| 963 | ... |
| 964 | .DE |
| 965 | .NH 2 |
| 966 | Packet Exchange |
| 967 | .PP |
| 968 | The Xerox standard protocols include a protocol that is both |
| 969 | reliable and datagram-oriented. This protocol is known as |
| 970 | Packet Exchange (PEX or PE) and, like SPP, is layered on top |
| 971 | of IDP. PEX is important for a number of things: Courier |
| 972 | remote procedure calls may be expedited through the use |
| 973 | of PEX, and many Xerox servers are located by doing a PEX |
| 974 | ``BroadcastForServers'' operation. Although there is no |
| 975 | implementation of PEX in the kernel, |
| 976 | it may be simulated at the user level with some clever coding |
| 977 | and the use of one peculiar \fIgetsockopt\fP. A PEX packet |
| 978 | looks like: |
| 979 | .DS |
| 980 | .if t .ta \w'struct 'u +\w" struct idp"u +2.0i |
| 981 | /* |
| 982 | * The packet-exchange header shown here is not defined |
| 983 | * as part of any of the system include files. |
| 984 | */ |
| 985 | struct pex { |
| 986 | struct idp p_idp; /* idp header */ |
| 987 | u_short ph_id[2]; /* unique transaction ID for pex */ |
| 988 | u_short ph_client; /* client type field for pex */ |
| 989 | }; |
| 990 | .DE |
| 991 | The \fIph_id\fP field is used to hold a ``unique id'' that |
| 992 | is used in duplicate suppression; the \fIph_client\fP |
| 993 | field indicates the PEX client type (similar to the packet |
| 994 | type field in the IDP header). PEX reliability stems from the |
| 995 | fact that it is an idempotent (``I send a packet to you, you |
| 996 | send a packet to me'') protocol. Processes on each side of |
| 997 | the connection may use the unique id to determine if they have |
| 998 | seen a given packet before (the unique id field differs on each |
| 999 | packet sent) so that duplicates may be detected, and to indicate |
| 1000 | which message a given packet is in response to. If a packet with |
| 1001 | a given unique id is sent and no response is received in a given |
| 1002 | amount of time, the packet is retransmitted until it is decided |
| 1003 | that no response will ever be received. To simulate PEX, one |
| 1004 | must be able to generate unique ids -- something that is hard to |
| 1005 | do at the user level with any real guarantee that the id is really |
| 1006 | unique. Therefore, a means (via \fIgetsockopt\fP) has been provided |
| 1007 | for getting unique ids from the kernel. The following code fragment |
| 1008 | indicates how to get a unique id: |
| 1009 | .DS |
| 1010 | long uniqueid; |
| 1011 | int s, idsize = sizeof(uniqueid); |
| 1012 | ... |
| 1013 | s = socket(AF_NS, SOCK_DGRAM, 0); |
| 1014 | ... |
| 1015 | /* get id from the kernel -- only on IDP sockets */ |
| 1016 | getsockopt(s, NSPROTO_PE, SO_SEQNO, (char *)&uniqueid, &idsize); |
| 1017 | ... |
| 1018 | .DE |
| 1019 | The retransmission and duplicate suppression code required to |
| 1020 | simulate PEX fully is left as an exercise for the reader. |
| 1021 | .NH 2 |
| 1022 | Non-Blocking Sockets |
| 1023 | .PP |
| 1024 | It is occasionally convenient to make use of sockets |
| 1025 | which do not block; that is, i/o requests which |
| 1026 | would take time and |
| 1027 | would cause the process to wait for their completion are |
| 1028 | not executed, and an error code is returned. |
| 1029 | Once a socket has been created via |
| 1030 | the \fIsocket\fP call, it may be marked as non-blocking |
| 1031 | by \fIfcntl\fP as follows: |
| 1032 | .DS |
| 1033 | #include <fcntl.h> |
| 1034 | ... |
| 1035 | int s; |
| 1036 | ... |
| 1037 | s = socket(AF_INET, SOCK_STREAM, 0); |
| 1038 | ... |
| 1039 | if (fcntl(s, F_SETFL, FNDELAY) < 0) |
| 1040 | perror("fcntl F_SETFL, FNDELAY"); |
| 1041 | exit(1); |
| 1042 | } |
| 1043 | ... |
| 1044 | .DE |
| 1045 | .PP |
| 1046 | When performing non-blocking i/o on sockets, one must be |
| 1047 | careful to check for the error EWOULDBLOCK (stored in the |
| 1048 | global variable \fIerrno\fP), which occurs when |
| 1049 | an operation would normally block, but the socket it |
| 1050 | was performed on is marked as non-blocking. |
| 1051 | In particular, \fIaccept\fP, \fIconnect\fP, \fIsend\fP, \fIrecv\fP, |
| 1052 | \fIread\fP, and \fIwrite\fP can |
| 1053 | all return EWOULDBLOCK, and processes should be prepared |
| 1054 | to deal with such return codes. |
| 1055 | .NH 2 |
| 1056 | Inetd |
| 1057 | .PP |
| 1058 | One of the daemons provided with 4.3BSD is \fIinetd\fP, the |
| 1059 | so called ``internet super-server.'' \fIInetd\fP is invoked at boot |
| 1060 | time, and determines from the file \fI/etc/inetd.conf\fP the |
| 1061 | servers for which it is to listen. Once this information has been |
| 1062 | read and a pristine environment created, \fIinetd\fP proceeds |
| 1063 | to create one socket for each service it is to listen for, |
| 1064 | binding the appropriate port number to each socket. |
| 1065 | .PP |
| 1066 | \fIInetd\fP then performs a \fIselect\fP on all these |
| 1067 | sockets for read availability, waiting for somebody wishing |
| 1068 | a connection to the service corresponding to |
| 1069 | that socket. \fIInetd\fP then performs an \fIaccept\fP on |
| 1070 | the socket in question, \fIfork\fPs, \fIdup\fPs the new |
| 1071 | socket to file descriptors 0 and 1 (stdin and |
| 1072 | stdout), closes other open file |
| 1073 | descriptors, and \fIexec\fPs the appropriate server. |
| 1074 | .PP |
| 1075 | Servers making use of \fIinetd\fP are considerably simplified, |
| 1076 | as \fIinetd\fP takes care of the majority of the IPC work |
| 1077 | required in establishing a connection. The server invoked |
| 1078 | by \fIinetd\fP expects the socket connected to its client |
| 1079 | on file descriptors 0 and 1, and may immediately perform |
| 1080 | any operations such as \fIread\fP, \fIwrite\fP, \fIsend\fP, |
| 1081 | or \fIrecv\fP. Indeed, servers may use |
| 1082 | buffered i/o as provided by the ``stdio'' conventions, as |
| 1083 | long as as they remember to use \fIfflush\fP when appropriate. |
| 1084 | .PP |
| 1085 | One call which may be of interest to individuals writing |
| 1086 | servers under \fIinetd\fP is the \fIgetpeername\fP call, |
| 1087 | which returns the address of the peer (process) connected |
| 1088 | on the other end of the socket. For example, to log the |
| 1089 | Internet address in ``dot notation'' (e.g., ``128.32.0.4'') |
| 1090 | of a client connected to a server under |
| 1091 | \fIinetd\fP, the following code might be used: |
| 1092 | .DS |
| 1093 | struct sockaddr_in name; |
| 1094 | int namelen = sizeof (name); |
| 1095 | ... |
| 1096 | if (getpeername(0, (struct sockaddr *)&name, &namelen) < 0) { |
| 1097 | syslog(LOG_ERR, "getpeername: %m"); |
| 1098 | exit(1); |
| 1099 | } else |
| 1100 | syslog(LOG_INFO, "Connection from %s", inet_ntoa(name.sin_addr)); |
| 1101 | ... |
| 1102 | .DE |
| 1103 | While the \fIgetpeername\fP call is especially useful when |
| 1104 | writing programs to run with \fIinetd\fP, it can be used |
| 1105 | at any time. Be warned, however, that \fIgetpeername\fP will |
| 1106 | fail on UNIX domain sockets, as their addresses (i.e., pathnames) |
| 1107 | are inaccessible. |