raise enlastdel to 50
[unix-history] / .ref-BSD-3 / usr / doc / uprog / p4
CommitLineData
8340f87c
BJ
1.NH
2LOW-LEVEL I/O
3.PP
4This section describes the
5bottom level of I/O on the
6.UC UNIX
7system.
8The lowest level of I/O in
9.UC UNIX
10provides no buffering or any other services;
11it is in fact a direct entry into the operating system.
12You are entirely on your own,
13but on the other hand,
14you have the most control over what happens.
15And since the calls and usage are quite simple,
16this isn't as bad as it sounds.
17.NH 2
18File Descriptors
19.PP
20In the
21.UC UNIX
22operating system,
23all input and output is done
24by reading or writing files,
25because all peripheral devices, even the user's terminal,
26are files in the file system.
27This means that a single, homogeneous interface
28handles all communication between a program and peripheral devices.
29.PP
30In the most general case,
31before reading or writing a file,
32it is necessary to inform the system
33of your intent to do so,
34a process called
35``opening'' the file.
36If you are going to write on a file,
37it may also be necessary to create it.
38The system checks your right to do so
39(Does the file exist?
40Do you have permission to access it?),
41and if all is well,
42returns a small positive integer
43called a
44.ul
45file descriptor.
46Whenever I/O is to be done on the file,
47the file descriptor is used instead of the name to identify the file.
48(This is roughly analogous to the use of
49.UC READ(5,...)
50and
51.UC WRITE(6,...)
52in Fortran.)
53All
54information about an open file is maintained by the system;
55the user program refers to the file
56only
57by the file descriptor.
58.PP
59The file pointers discussed in section 3
60are similar in spirit to file descriptors,
61but file descriptors are more fundamental.
62A file pointer is a pointer to a structure that contains,
63among other things, the file descriptor for the file in question.
64.PP
65Since input and output involving the user's terminal
66are so common,
67special arrangements exist to make this convenient.
68When the command interpreter (the
69``shell'')
70runs a program,
71it opens
72three files, with file descriptors 0, 1, and 2,
73called the standard input,
74the standard output, and the standard error output.
75All of these are normally connected to the terminal,
76so if a program reads file descriptor 0
77and writes file descriptors 1 and 2,
78it can do terminal I/O
79without worrying about opening the files.
80.PP
81If I/O is redirected
82to and from files with
83.UL <
84and
85.UL > ,
86as in
87.P1
88prog <infile >outfile
89.P2
90the shell changes the default assignments for file descriptors
910 and 1
92from the terminal to the named files.
93Similar observations hold if the input or output is associated with a pipe.
94Normally file descriptor 2 remains attached to the terminal,
95so error messages can go there.
96In all cases,
97the file assignments are changed by the shell,
98not by the program.
99The program does not need to know where its input
100comes from nor where its output goes,
101so long as it uses file 0 for input and 1 and 2 for output.
102.NH 2
103Read and Write
104.PP
105All input and output is done by
106two functions called
107.UL read
108and
109.UL write .
110For both, the first argument is a file descriptor.
111The second argument is a buffer in your program where the data is to
112come from or go to.
113The third argument is the number of bytes to be transferred.
114The calls are
115.P1
116n_read = read(fd, buf, n);
117
118n_written = write(fd, buf, n);
119.P2
120Each call returns a byte count
121which is the number of bytes actually transferred.
122On reading,
123the number of bytes returned may be less than
124the number asked for,
125because fewer than
126.UL n
127bytes remained to be read.
128(When the file is a terminal,
129.UL read
130normally reads only up to the next newline,
131which is generally less than what was requested.)
132A return value of zero bytes implies end of file,
133and
134.UL -1
135indicates an error of some sort.
136For writing, the returned value is the number of bytes
137actually written;
138it is generally an error if this isn't equal
139to the number supposed to be written.
140.PP
141The number of bytes to be read or written is quite arbitrary.
142The two most common values are
1431,
144which means one character at a time
145(``unbuffered''),
146and
147512,
148which corresponds to a physical blocksize on many peripheral devices.
149This latter size will be most efficient,
150but even character at a time I/O
151is not inordinately expensive.
152.PP
153Putting these facts together,
154we can write a simple program to copy
155its input to its output.
156This program will copy anything to anything,
157since the input and output can be redirected to any file or device.
158.P1
159#define BUFSIZE 512 /* best size for PDP-11 UNIX */
160
161main() /* copy input to output */
162{
163 char buf[BUFSIZE];
164 int n;
165
166 while ((n = read(0, buf, BUFSIZE)) > 0)
167 write(1, buf, n);
168 exit(0);
169}
170.P2
171If the file size is not a multiple of
172.UL BUFSIZE ,
173some
174.UL read
175will return a smaller number of bytes
176to be written by
177.UL write ;
178the next call to
179.UL read
180after that
181will return zero.
182.PP
183It is instructive to see how
184.UL read
185and
186.UL write
187can be used to construct
188higher level routines like
189.UL getchar ,
190.UL putchar ,
191etc.
192For example,
193here is a version of
194.UL getchar
195which does unbuffered input.
196.P1
197#define CMASK 0377 /* for making char's > 0 */
198
199getchar() /* unbuffered single character input */
200{
201 char c;
202
203 return((read(0, &c, 1) > 0) ? c & CMASK : EOF);
204}
205.P2
206.UL c
207.ul
208must
209be declared
210.UL char ,
211because
212.UL read
213accepts a character pointer.
214The character being returned must be masked with
215.UL 0377
216to ensure that it is positive;
217otherwise sign extension may make it negative.
218(The constant
219.UL 0377
220is appropriate for the
221.UC PDP -11
222but not necessarily for other machines.)
223.PP
224The second version of
225.UL getchar
226does input in big chunks,
227and hands out the characters one at a time.
228.P1
229#define CMASK 0377 /* for making char's > 0 */
230#define BUFSIZE 512
231
232getchar() /* buffered version */
233{
234 static char buf[BUFSIZE];
235 static char *bufp = buf;
236 static int n = 0;
237
238 if (n == 0) { /* buffer is empty */
239 n = read(0, buf, BUFSIZE);
240 bufp = buf;
241 }
242 return((--n >= 0) ? *bufp++ & CMASK : EOF);
243}
244.P2
245.NH 2
246Open, Creat, Close, Unlink
247.PP
248Other than the default
249standard input, output and error files,
250you must explicitly open files in order to
251read or write them.
252There are two system entry points for this,
253.UL open
254and
255.UL creat
256[sic].
257.PP
258.UL open
259is rather like the
260.UL fopen
261discussed in the previous section,
262except that instead of returning a file pointer,
263it returns a file descriptor,
264which is just an
265.UL int .
266.P1
267int fd;
268
269fd = open(name, rwmode);
270.P2
271As with
272.UL fopen ,
273the
274.UL name
275argument
276is a character string corresponding to the external file name.
277The access mode argument
278is different, however:
279.UL rwmode
280is 0 for read, 1 for write, and 2 for read and write access.
281.UL open
282returns
283.UL -1
284if any error occurs;
285otherwise it returns a valid file descriptor.
286.PP
287It is an error to
288try to
289.UL open
290a file that does not exist.
291The entry point
292.UL creat
293is provided to create new files,
294or to re-write old ones.
295.P1
296fd = creat(name, pmode);
297.P2
298returns a file descriptor
299if it was able to create the file
300called
301.UL name ,
302and
303.UL -1
304if not.
305If the file
306already exists,
307.UL creat
308will truncate it to zero length;
309it is not an error to
310.UL creat
311a file that already exists.
312.PP
313If the file is brand new,
314.UL creat
315creates it with the
316.ul
317protection mode
318specified by
319the
320.UL pmode
321argument.
322In the
323.UC UNIX
324file system,
325there are nine bits of protection information
326associated with a file,
327controlling read, write and execute permission for
328the owner of the file,
329for the owner's group,
330and for all others.
331Thus a three-digit octal number
332is most convenient for specifying the permissions.
333For example,
3340755
335specifies read, write and execute permission for the owner,
336and read and execute permission for the group and everyone else.
337.PP
338To illustrate,
339here is a simplified version of
340the
341.UC UNIX
342utility
343.IT cp ,
344a program which copies one file to another.
345(The main simplification is that our version
346copies only one file,
347and does not permit the second argument
348to be a directory.)
349.P1
350#define NULL 0
351#define BUFSIZE 512
352#define PMODE 0644 /* RW for owner, R for group, others */
353
354main(argc, argv) /* cp: copy f1 to f2 */
355int argc;
356char *argv[];
357{
358 int f1, f2, n;
359 char buf[BUFSIZE];
360
361 if (argc != 3)
362 error("Usage: cp from to", NULL);
363 if ((f1 = open(argv[1], 0)) == -1)
364 error("cp: can't open %s", argv[1]);
365 if ((f2 = creat(argv[2], PMODE)) == -1)
366 error("cp: can't create %s", argv[2]);
367
368 while ((n = read(f1, buf, BUFSIZE)) > 0)
369 if (write(f2, buf, n) != n)
370 error("cp: write error", NULL);
371 exit(0);
372}
373.P2
374.P1
375error(s1, s2) /* print error message and die */
376char *s1, *s2;
377{
378 printf(s1, s2);
379 printf("\n");
380 exit(1);
381}
382.P2
383.PP
384As we said earlier,
385there is a limit (typically 15-25)
386on the number of files which a program
387may have open simultaneously.
388Accordingly, any program which intends to process
389many files must be prepared to re-use
390file descriptors.
391The routine
392.UL close
393breaks the connection between a file descriptor
394and an open file,
395and frees the
396file descriptor for use with some other file.
397Termination of a program
398via
399.UL exit
400or return from the main program closes all open files.
401.PP
402The function
403.UL unlink(filename)
404removes the file
405.UL filename
406from the file system.
407.NH 2
408Random Access \(em Seek and Lseek
409.PP
410File I/O is normally sequential:
411each
412.UL read
413or
414.UL write
415takes place at a position in the file
416right after the previous one.
417When necessary, however,
418a file can be read or written in any arbitrary order.
419The
420system call
421.UL lseek
422provides a way to move around in
423a file without actually reading
424or writing:
425.P1
426lseek(fd, offset, origin);
427.P2
428forces the current position in the file
429whose descriptor is
430.UL fd
431to move to position
432.UL offset ,
433which is taken relative to the location
434specified by
435.UL origin .
436Subsequent reading or writing will begin at that position.
437.UL offset
438is
439a
440.UL long ;
441.UL fd
442and
443.UL origin
444are
445.UL int 's.
446.UL origin
447can be 0, 1, or 2 to specify that
448.UL offset
449is to be
450measured from
451the beginning, from the current position, or from the
452end of the file respectively.
453For example,
454to append to a file,
455seek to the end before writing:
456.P1
457lseek(fd, 0L, 2);
458.P2
459To get back to the beginning (``rewind''),
460.P1
461lseek(fd, 0L, 0);
462.P2
463Notice the
464.UL 0L
465argument;
466it could also be written as
467.UL (long)\ 0 .
468.PP
469With
470.UL lseek ,
471it is possible to treat files more or less like large arrays,
472at the price of slower access.
473For example, the following simple function reads any number of bytes
474from any arbitrary place in a file.
475.P1
476get(fd, pos, buf, n) /* read n bytes from position pos */
477int fd, n;
478long pos;
479char *buf;
480{
481 lseek(fd, pos, 0); /* get to pos */
482 return(read(fd, buf, n));
483}
484.P2
485.PP
486In pre-version 7
487.UC UNIX ,
488the basic entry point to the I/O system
489is called
490.UL seek .
491.UL seek
492is identical to
493.UL lseek ,
494except that its
495.UL offset
496argument is an
497.UL int
498rather than a
499.UL long .
500Accordingly,
501since
502.UC PDP -11
503integers have only 16 bits,
504the
505.UL offset
506specified
507for
508.UL seek
509is limited to 65,535;
510for this reason,
511.UL origin
512values of 3, 4, 5 cause
513.UL seek
514to multiply the given offset by 512
515(the number of bytes in one physical block)
516and then interpret
517.UL origin
518as if it were 0, 1, or 2 respectively.
519Thus to get to an arbitrary place in a large file
520requires two seeks, first one which selects
521the block, then one which
522has
523.UL origin
524equal to 1 and moves to the desired byte within the block.
525.NH 2
526Error Processing
527.PP
528The routines discussed in this section,
529and in fact all the routines which are direct entries into the system
530can incur errors.
531Usually they indicate an error by returning a value of \-1.
532Sometimes it is nice to know what sort of error occurred;
533for this purpose all these routines, when appropriate,
534leave an error number in the external cell
535.UL errno .
536The meanings of the various error numbers are
537listed
538in the introduction to Section II
539of the
540.I
541.UC UNIX
542Programmer's Manual,
543.R
544so your program can, for example, determine if
545an attempt to open a file failed because it did not exist
546or because the user lacked permission to read it.
547Perhaps more commonly,
548you may want to print out the
549reason for failure.
550The routine
551.UL perror
552will print a message associated with the value
553of
554.UL errno ;
555more generally,
556.UL sys\_errno
557is an array of character strings which can be indexed
558by
559.UL errno
560and printed by your program.