Commit | Line | Data |
---|---|---|
8340f87c BJ |
1 | .NH |
2 | LOW-LEVEL I/O | |
3 | .PP | |
4 | This section describes the | |
5 | bottom level of I/O on the | |
6 | .UC UNIX | |
7 | system. | |
8 | The lowest level of I/O in | |
9 | .UC UNIX | |
10 | provides no buffering or any other services; | |
11 | it is in fact a direct entry into the operating system. | |
12 | You are entirely on your own, | |
13 | but on the other hand, | |
14 | you have the most control over what happens. | |
15 | And since the calls and usage are quite simple, | |
16 | this isn't as bad as it sounds. | |
17 | .NH 2 | |
18 | File Descriptors | |
19 | .PP | |
20 | In the | |
21 | .UC UNIX | |
22 | operating system, | |
23 | all input and output is done | |
24 | by reading or writing files, | |
25 | because all peripheral devices, even the user's terminal, | |
26 | are files in the file system. | |
27 | This means that a single, homogeneous interface | |
28 | handles all communication between a program and peripheral devices. | |
29 | .PP | |
30 | In the most general case, | |
31 | before reading or writing a file, | |
32 | it is necessary to inform the system | |
33 | of your intent to do so, | |
34 | a process called | |
35 | ``opening'' the file. | |
36 | If you are going to write on a file, | |
37 | it may also be necessary to create it. | |
38 | The system checks your right to do so | |
39 | (Does the file exist? | |
40 | Do you have permission to access it?), | |
41 | and if all is well, | |
42 | returns a small positive integer | |
43 | called a | |
44 | .ul | |
45 | file descriptor. | |
46 | Whenever I/O is to be done on the file, | |
47 | the file descriptor is used instead of the name to identify the file. | |
48 | (This is roughly analogous to the use of | |
49 | .UC READ(5,...) | |
50 | and | |
51 | .UC WRITE(6,...) | |
52 | in Fortran.) | |
53 | All | |
54 | information about an open file is maintained by the system; | |
55 | the user program refers to the file | |
56 | only | |
57 | by the file descriptor. | |
58 | .PP | |
59 | The file pointers discussed in section 3 | |
60 | are similar in spirit to file descriptors, | |
61 | but file descriptors are more fundamental. | |
62 | A file pointer is a pointer to a structure that contains, | |
63 | among other things, the file descriptor for the file in question. | |
64 | .PP | |
65 | Since input and output involving the user's terminal | |
66 | are so common, | |
67 | special arrangements exist to make this convenient. | |
68 | When the command interpreter (the | |
69 | ``shell'') | |
70 | runs a program, | |
71 | it opens | |
72 | three files, with file descriptors 0, 1, and 2, | |
73 | called the standard input, | |
74 | the standard output, and the standard error output. | |
75 | All of these are normally connected to the terminal, | |
76 | so if a program reads file descriptor 0 | |
77 | and writes file descriptors 1 and 2, | |
78 | it can do terminal I/O | |
79 | without worrying about opening the files. | |
80 | .PP | |
81 | If I/O is redirected | |
82 | to and from files with | |
83 | .UL < | |
84 | and | |
85 | .UL > , | |
86 | as in | |
87 | .P1 | |
88 | prog <infile >outfile | |
89 | .P2 | |
90 | the shell changes the default assignments for file descriptors | |
91 | 0 and 1 | |
92 | from the terminal to the named files. | |
93 | Similar observations hold if the input or output is associated with a pipe. | |
94 | Normally file descriptor 2 remains attached to the terminal, | |
95 | so error messages can go there. | |
96 | In all cases, | |
97 | the file assignments are changed by the shell, | |
98 | not by the program. | |
99 | The program does not need to know where its input | |
100 | comes from nor where its output goes, | |
101 | so long as it uses file 0 for input and 1 and 2 for output. | |
102 | .NH 2 | |
103 | Read and Write | |
104 | .PP | |
105 | All input and output is done by | |
106 | two functions called | |
107 | .UL read | |
108 | and | |
109 | .UL write . | |
110 | For both, the first argument is a file descriptor. | |
111 | The second argument is a buffer in your program where the data is to | |
112 | come from or go to. | |
113 | The third argument is the number of bytes to be transferred. | |
114 | The calls are | |
115 | .P1 | |
116 | n_read = read(fd, buf, n); | |
117 | ||
118 | n_written = write(fd, buf, n); | |
119 | .P2 | |
120 | Each call returns a byte count | |
121 | which is the number of bytes actually transferred. | |
122 | On reading, | |
123 | the number of bytes returned may be less than | |
124 | the number asked for, | |
125 | because fewer than | |
126 | .UL n | |
127 | bytes remained to be read. | |
128 | (When the file is a terminal, | |
129 | .UL read | |
130 | normally reads only up to the next newline, | |
131 | which is generally less than what was requested.) | |
132 | A return value of zero bytes implies end of file, | |
133 | and | |
134 | .UL -1 | |
135 | indicates an error of some sort. | |
136 | For writing, the returned value is the number of bytes | |
137 | actually written; | |
138 | it is generally an error if this isn't equal | |
139 | to the number supposed to be written. | |
140 | .PP | |
141 | The number of bytes to be read or written is quite arbitrary. | |
142 | The two most common values are | |
143 | 1, | |
144 | which means one character at a time | |
145 | (``unbuffered''), | |
146 | and | |
147 | 512, | |
148 | which corresponds to a physical blocksize on many peripheral devices. | |
149 | This latter size will be most efficient, | |
150 | but even character at a time I/O | |
151 | is not inordinately expensive. | |
152 | .PP | |
153 | Putting these facts together, | |
154 | we can write a simple program to copy | |
155 | its input to its output. | |
156 | This program will copy anything to anything, | |
157 | since the input and output can be redirected to any file or device. | |
158 | .P1 | |
159 | #define BUFSIZE 512 /* best size for PDP-11 UNIX */ | |
160 | ||
161 | main() /* copy input to output */ | |
162 | { | |
163 | char buf[BUFSIZE]; | |
164 | int n; | |
165 | ||
166 | while ((n = read(0, buf, BUFSIZE)) > 0) | |
167 | write(1, buf, n); | |
168 | exit(0); | |
169 | } | |
170 | .P2 | |
171 | If the file size is not a multiple of | |
172 | .UL BUFSIZE , | |
173 | some | |
174 | .UL read | |
175 | will return a smaller number of bytes | |
176 | to be written by | |
177 | .UL write ; | |
178 | the next call to | |
179 | .UL read | |
180 | after that | |
181 | will return zero. | |
182 | .PP | |
183 | It is instructive to see how | |
184 | .UL read | |
185 | and | |
186 | .UL write | |
187 | can be used to construct | |
188 | higher level routines like | |
189 | .UL getchar , | |
190 | .UL putchar , | |
191 | etc. | |
192 | For example, | |
193 | here is a version of | |
194 | .UL getchar | |
195 | which does unbuffered input. | |
196 | .P1 | |
197 | #define CMASK 0377 /* for making char's > 0 */ | |
198 | ||
199 | getchar() /* unbuffered single character input */ | |
200 | { | |
201 | char c; | |
202 | ||
203 | return((read(0, &c, 1) > 0) ? c & CMASK : EOF); | |
204 | } | |
205 | .P2 | |
206 | .UL c | |
207 | .ul | |
208 | must | |
209 | be declared | |
210 | .UL char , | |
211 | because | |
212 | .UL read | |
213 | accepts a character pointer. | |
214 | The character being returned must be masked with | |
215 | .UL 0377 | |
216 | to ensure that it is positive; | |
217 | otherwise sign extension may make it negative. | |
218 | (The constant | |
219 | .UL 0377 | |
220 | is appropriate for the | |
221 | .UC PDP -11 | |
222 | but not necessarily for other machines.) | |
223 | .PP | |
224 | The second version of | |
225 | .UL getchar | |
226 | does input in big chunks, | |
227 | and hands out the characters one at a time. | |
228 | .P1 | |
229 | #define CMASK 0377 /* for making char's > 0 */ | |
230 | #define BUFSIZE 512 | |
231 | ||
232 | getchar() /* buffered version */ | |
233 | { | |
234 | static char buf[BUFSIZE]; | |
235 | static char *bufp = buf; | |
236 | static int n = 0; | |
237 | ||
238 | if (n == 0) { /* buffer is empty */ | |
239 | n = read(0, buf, BUFSIZE); | |
240 | bufp = buf; | |
241 | } | |
242 | return((--n >= 0) ? *bufp++ & CMASK : EOF); | |
243 | } | |
244 | .P2 | |
245 | .NH 2 | |
246 | Open, Creat, Close, Unlink | |
247 | .PP | |
248 | Other than the default | |
249 | standard input, output and error files, | |
250 | you must explicitly open files in order to | |
251 | read or write them. | |
252 | There are two system entry points for this, | |
253 | .UL open | |
254 | and | |
255 | .UL creat | |
256 | [sic]. | |
257 | .PP | |
258 | .UL open | |
259 | is rather like the | |
260 | .UL fopen | |
261 | discussed in the previous section, | |
262 | except that instead of returning a file pointer, | |
263 | it returns a file descriptor, | |
264 | which is just an | |
265 | .UL int . | |
266 | .P1 | |
267 | int fd; | |
268 | ||
269 | fd = open(name, rwmode); | |
270 | .P2 | |
271 | As with | |
272 | .UL fopen , | |
273 | the | |
274 | .UL name | |
275 | argument | |
276 | is a character string corresponding to the external file name. | |
277 | The access mode argument | |
278 | is different, however: | |
279 | .UL rwmode | |
280 | is 0 for read, 1 for write, and 2 for read and write access. | |
281 | .UL open | |
282 | returns | |
283 | .UL -1 | |
284 | if any error occurs; | |
285 | otherwise it returns a valid file descriptor. | |
286 | .PP | |
287 | It is an error to | |
288 | try to | |
289 | .UL open | |
290 | a file that does not exist. | |
291 | The entry point | |
292 | .UL creat | |
293 | is provided to create new files, | |
294 | or to re-write old ones. | |
295 | .P1 | |
296 | fd = creat(name, pmode); | |
297 | .P2 | |
298 | returns a file descriptor | |
299 | if it was able to create the file | |
300 | called | |
301 | .UL name , | |
302 | and | |
303 | .UL -1 | |
304 | if not. | |
305 | If the file | |
306 | already exists, | |
307 | .UL creat | |
308 | will truncate it to zero length; | |
309 | it is not an error to | |
310 | .UL creat | |
311 | a file that already exists. | |
312 | .PP | |
313 | If the file is brand new, | |
314 | .UL creat | |
315 | creates it with the | |
316 | .ul | |
317 | protection mode | |
318 | specified by | |
319 | the | |
320 | .UL pmode | |
321 | argument. | |
322 | In the | |
323 | .UC UNIX | |
324 | file system, | |
325 | there are nine bits of protection information | |
326 | associated with a file, | |
327 | controlling read, write and execute permission for | |
328 | the owner of the file, | |
329 | for the owner's group, | |
330 | and for all others. | |
331 | Thus a three-digit octal number | |
332 | is most convenient for specifying the permissions. | |
333 | For example, | |
334 | 0755 | |
335 | specifies read, write and execute permission for the owner, | |
336 | and read and execute permission for the group and everyone else. | |
337 | .PP | |
338 | To illustrate, | |
339 | here is a simplified version of | |
340 | the | |
341 | .UC UNIX | |
342 | utility | |
343 | .IT cp , | |
344 | a program which copies one file to another. | |
345 | (The main simplification is that our version | |
346 | copies only one file, | |
347 | and does not permit the second argument | |
348 | to be a directory.) | |
349 | .P1 | |
350 | #define NULL 0 | |
351 | #define BUFSIZE 512 | |
352 | #define PMODE 0644 /* RW for owner, R for group, others */ | |
353 | ||
354 | main(argc, argv) /* cp: copy f1 to f2 */ | |
355 | int argc; | |
356 | char *argv[]; | |
357 | { | |
358 | int f1, f2, n; | |
359 | char buf[BUFSIZE]; | |
360 | ||
361 | if (argc != 3) | |
362 | error("Usage: cp from to", NULL); | |
363 | if ((f1 = open(argv[1], 0)) == -1) | |
364 | error("cp: can't open %s", argv[1]); | |
365 | if ((f2 = creat(argv[2], PMODE)) == -1) | |
366 | error("cp: can't create %s", argv[2]); | |
367 | ||
368 | while ((n = read(f1, buf, BUFSIZE)) > 0) | |
369 | if (write(f2, buf, n) != n) | |
370 | error("cp: write error", NULL); | |
371 | exit(0); | |
372 | } | |
373 | .P2 | |
374 | .P1 | |
375 | error(s1, s2) /* print error message and die */ | |
376 | char *s1, *s2; | |
377 | { | |
378 | printf(s1, s2); | |
379 | printf("\n"); | |
380 | exit(1); | |
381 | } | |
382 | .P2 | |
383 | .PP | |
384 | As we said earlier, | |
385 | there is a limit (typically 15-25) | |
386 | on the number of files which a program | |
387 | may have open simultaneously. | |
388 | Accordingly, any program which intends to process | |
389 | many files must be prepared to re-use | |
390 | file descriptors. | |
391 | The routine | |
392 | .UL close | |
393 | breaks the connection between a file descriptor | |
394 | and an open file, | |
395 | and frees the | |
396 | file descriptor for use with some other file. | |
397 | Termination of a program | |
398 | via | |
399 | .UL exit | |
400 | or return from the main program closes all open files. | |
401 | .PP | |
402 | The function | |
403 | .UL unlink(filename) | |
404 | removes the file | |
405 | .UL filename | |
406 | from the file system. | |
407 | .NH 2 | |
408 | Random Access \(em Seek and Lseek | |
409 | .PP | |
410 | File I/O is normally sequential: | |
411 | each | |
412 | .UL read | |
413 | or | |
414 | .UL write | |
415 | takes place at a position in the file | |
416 | right after the previous one. | |
417 | When necessary, however, | |
418 | a file can be read or written in any arbitrary order. | |
419 | The | |
420 | system call | |
421 | .UL lseek | |
422 | provides a way to move around in | |
423 | a file without actually reading | |
424 | or writing: | |
425 | .P1 | |
426 | lseek(fd, offset, origin); | |
427 | .P2 | |
428 | forces the current position in the file | |
429 | whose descriptor is | |
430 | .UL fd | |
431 | to move to position | |
432 | .UL offset , | |
433 | which is taken relative to the location | |
434 | specified by | |
435 | .UL origin . | |
436 | Subsequent reading or writing will begin at that position. | |
437 | .UL offset | |
438 | is | |
439 | a | |
440 | .UL long ; | |
441 | .UL fd | |
442 | and | |
443 | .UL origin | |
444 | are | |
445 | .UL int 's. | |
446 | .UL origin | |
447 | can be 0, 1, or 2 to specify that | |
448 | .UL offset | |
449 | is to be | |
450 | measured from | |
451 | the beginning, from the current position, or from the | |
452 | end of the file respectively. | |
453 | For example, | |
454 | to append to a file, | |
455 | seek to the end before writing: | |
456 | .P1 | |
457 | lseek(fd, 0L, 2); | |
458 | .P2 | |
459 | To get back to the beginning (``rewind''), | |
460 | .P1 | |
461 | lseek(fd, 0L, 0); | |
462 | .P2 | |
463 | Notice the | |
464 | .UL 0L | |
465 | argument; | |
466 | it could also be written as | |
467 | .UL (long)\ 0 . | |
468 | .PP | |
469 | With | |
470 | .UL lseek , | |
471 | it is possible to treat files more or less like large arrays, | |
472 | at the price of slower access. | |
473 | For example, the following simple function reads any number of bytes | |
474 | from any arbitrary place in a file. | |
475 | .P1 | |
476 | get(fd, pos, buf, n) /* read n bytes from position pos */ | |
477 | int fd, n; | |
478 | long pos; | |
479 | char *buf; | |
480 | { | |
481 | lseek(fd, pos, 0); /* get to pos */ | |
482 | return(read(fd, buf, n)); | |
483 | } | |
484 | .P2 | |
485 | .PP | |
486 | In pre-version 7 | |
487 | .UC UNIX , | |
488 | the basic entry point to the I/O system | |
489 | is called | |
490 | .UL seek . | |
491 | .UL seek | |
492 | is identical to | |
493 | .UL lseek , | |
494 | except that its | |
495 | .UL offset | |
496 | argument is an | |
497 | .UL int | |
498 | rather than a | |
499 | .UL long . | |
500 | Accordingly, | |
501 | since | |
502 | .UC PDP -11 | |
503 | integers have only 16 bits, | |
504 | the | |
505 | .UL offset | |
506 | specified | |
507 | for | |
508 | .UL seek | |
509 | is limited to 65,535; | |
510 | for this reason, | |
511 | .UL origin | |
512 | values of 3, 4, 5 cause | |
513 | .UL seek | |
514 | to multiply the given offset by 512 | |
515 | (the number of bytes in one physical block) | |
516 | and then interpret | |
517 | .UL origin | |
518 | as if it were 0, 1, or 2 respectively. | |
519 | Thus to get to an arbitrary place in a large file | |
520 | requires two seeks, first one which selects | |
521 | the block, then one which | |
522 | has | |
523 | .UL origin | |
524 | equal to 1 and moves to the desired byte within the block. | |
525 | .NH 2 | |
526 | Error Processing | |
527 | .PP | |
528 | The routines discussed in this section, | |
529 | and in fact all the routines which are direct entries into the system | |
530 | can incur errors. | |
531 | Usually they indicate an error by returning a value of \-1. | |
532 | Sometimes it is nice to know what sort of error occurred; | |
533 | for this purpose all these routines, when appropriate, | |
534 | leave an error number in the external cell | |
535 | .UL errno . | |
536 | The meanings of the various error numbers are | |
537 | listed | |
538 | in the introduction to Section II | |
539 | of the | |
540 | .I | |
541 | .UC UNIX | |
542 | Programmer's Manual, | |
543 | .R | |
544 | so your program can, for example, determine if | |
545 | an attempt to open a file failed because it did not exist | |
546 | or because the user lacked permission to read it. | |
547 | Perhaps more commonly, | |
548 | you may want to print out the | |
549 | reason for failure. | |
550 | The routine | |
551 | .UL perror | |
552 | will print a message associated with the value | |
553 | of | |
554 | .UL errno ; | |
555 | more generally, | |
556 | .UL sys\_errno | |
557 | is an array of character strings which can be indexed | |
558 | by | |
559 | .UL errno | |
560 | and printed by your program. |