Research V6 development
[unix-history] / usr / doc / unix / p2
CommitLineData
9ddf7a71
DR
1.s2
23.6 I/O calls
3.es
4The system calls to do I/O are designed to eliminate
5the differences between the various devices and styles of
6access.
7There is no distinction between ``random''
8and ``sequential'' I/O, nor is any logical record size imposed
9by the system.
10The size of an ordinary file is determined
11by the highest byte written on it;
12no predetermination of the size of a file is necessary
13or possible.
14.pg
15To illustrate the essentials of I/O in \*sUNIX\*n, some of the basic calls are
16summarized below
17in an anonymous language which will
18indicate the required parameters without getting into the
19complexities of machine language programming.
20Each call to the system may potentially result in an error
21return, which for simplicity is not represented
22in the calling sequence.
23.pg
24To read or write a file assumed to exist already, it must
25be opened by the following call:
26.dc
27filep = open\|(\|name, flag\|)
28.ec
29\fIName\fR indicates the name of the file.
30An arbitrary path name may be given.
31The \fIflag\fR argument indicates whether the file is to be read, written,
32or ``updated,'' that is read and written simultaneously.
33.pg
34The returned value
35.ft I
36filep
37.ft R
38is called a
39.ft I
40file descriptor.
41.ft R
42It is a small integer used to identify the file
43in subsequent calls to read, write
44or otherwise manipulate the file.
45.pg
46To create a new file or completely rewrite an old one,
47there is a \fIcreate\fR system call which
48creates the given file if it does not exist,
49or truncates it to zero length
50if it does exist.
51\fICreate\fR also opens the new file for writing
52and, like \fIopen,\fR returns a file descriptor.
53.pg
54There are no user-visible locks in the file system, nor is there any
55restriction on the number of users who may have a file
56open for reading or writing.
57Although it is possible for the contents of a file
58to become scrambled when two users write on it simultaneously,
59in practice difficulties do not arise.
60We take the view that locks are neither
61necessary nor sufficient, in our environment,
62to prevent interference between users of the same file.
63They are unnecessary because we are not
64faced with large, single-file data bases
65maintained by independent processes.
66They are insufficient because
67locks in the ordinary sense, whereby
68one user is prevented from writing on a file which another
69user is reading,
70cannot prevent confusion
71when, for example, both users are editing
72a file with an editor which makes
73a copy of the file being edited.
74.pg
75It should be said that the system has
76sufficient internal interlocks to maintain
77the logical consistency of the file system
78when two users engage simultaneously in such inconvenient
79activities as writing on
80the same file,
81creating files in the same directory,
82or deleting each other's open files.
83.pg
84Except as indicated below, reading and writing
85are sequential.
86This means that if a particular
87byte in the file was the last byte written (or read),
88the next I/O call implicitly refers to the
89first following byte.
90For each open file there is a pointer, maintained
91by the system,
92which indicates the next byte to be read
93or written.
94If \fIn\fR bytes are read or written, the pointer advances
95by \fIn\fR bytes.
96.pg
97Once a file is open, the following calls
98may be used.
99.dc
100n = read\|(\|filep, buffer, count\|)
101.dc
102n = write\|(\|filep, buffer, count\|)
103.ec
104Up to
105.ft I
106count
107.ft R
108bytes are transmitted between the file specified
109by
110.ft I
111filep
112.ft R
113and the byte array
114specified by
115.ft I
116buffer.
117.ft R
118The returned value
119.ft I
120n
121.ft R
122is the number of bytes actually transmitted.
123In the
124.ft I
125write
126.ft R
127case,
128.ft I
129n
130.ft R
131is the same as
132.ft I
133count
134.ft R
135except under exceptional conditions like I/O errors or
136end of physical medium on special files;
137in a
138.ft I
139read,
140.ft R
141however,
142.ft I
143n
144.ft R
145may without error be less than
146.ft I
147count.
148.ft R
149If the read pointer is so near the end of the
150file that reading \fIcount\fR characters
151would cause reading beyond the end, only sufficient
152bytes are transmitted to reach the end of the
153file;
154also, typewriter-like devices
155never return more than one line of input.
156When a \fIread\fR call returns with \fIn\fR equal
157to zero, it indicates the end of the file.
158For disk files this occurs when the read pointer
159becomes equal to the current
160size of the file.
161It is possible to generate an end-of-file
162from a typewriter by use of an escape
163sequence which depends on the device used.
164.pg
165Bytes written on a file affect only those implied by
166the position of the write pointer and the
167count; no other part of the file
168is changed.
169If the last byte lies beyond the end of the file, the
170file is grown as needed.
171.pg
172To do random (direct access) I/O
173it is only necessary to move the read or write pointer
174to the appropriate location in the file.
175.dc
176location = seek\|(\|filep, offset, base\|)
177.ec
178The pointer
179associated with \fIfilep\fR is moved to a position \fIoffset\fR
180bytes from the beginning of the file, from the current position
181of the pointer, or from the end of the file,
182depending on
183.ft I
184base.
185.ft R
186.ft I
187Offset
188.ft R
189may be negative.
190For some devices (e.g. paper
191tape and
192typewriters) seek calls are
193ignored.
194The actual offset from the beginning of the file
195to which the pointer was moved is returned
196in \fIlocation\fR.
197.s2
1983.6.1 Other I/O calls
199.es
200There are several additional system entries
201having to do with I/O and with the file
202system which will not be discussed.
203For example:
204close a file,
205get the status of a file,
206change the protection mode or the owner
207of a file,
208create a directory,
209make a link to an existing file,
210delete a file.
211.s1
2124. Implementation of the file system
213.es
214As mentioned in \(sc3.2 above, a directory entry contains
215only a name for the associated file and a pointer to the
216file itself.
217This pointer is an integer called the
218.ft I
219i-number
220.ft
221(for index number)
222of the file.
223When the file is accessed,
224its i-number is used as an index into
225a system table (the \fIi-list\|\fR) stored in a known
226part of the device on which
227the directory resides.
228The entry thereby found (the file's
229\fIi-node\fR\|)
230contains
231the description of the file:
232.sp
233.tr |
234.in .5i
235.ti -\w'1. 'u
2361.|its owner;
237.ti -\w'1. 'u
2382.|its protection bits;
239.ti -\w'1. 'u
2403.|the physical disk or tape addresses for the file contents;
241.ti -\w'1. 'u
2424.|its size;
243.ti -\w'1. 'u
2445.|time of last modification;
245.ti -\w'1. 'u
2466.|the number of links to the file; that is, the number
247of times it appears in a directory;
248.ti -\w'1. 'u
2497.|a bit indicating whether the
250file is a directory;
251.ti -\w'1. 'u
2528.|a bit indicating whether the file is a special file;
253.ti -\w'1. 'u
2549.|a bit indicating whether the file is ``large'' or ``small.''
255.sp
256.tr ||
257.in 0
258The purpose of an
259.ft I
260open
261.ft R
262or
263.ft I
264create
265.ft R
266system call is to turn the path name given by the user
267into an i-number
268by searching the explicitly or implicitly named directories.
269Once a file is open,
270its device, i-number, and read/write pointer are stored in a system table
271indexed by the file descriptor returned by the
272.ft I
273open
274.ft R
275or
276.ft I
277create.
278.ft R
279Thus the file descriptor supplied during a subsequent
280call to read or write the
281file
282may be easily related to the information necessary to access the file.
283.pg
284When a new file is created,
285an i-node is allocated for it and a directory entry is made
286which contains the name of the file and the i-node
287number.
288Making a link to an existing file involves
289creating a directory entry with the new name,
290copying the i-number from the original file entry,
291and incrementing the link-count field of the i-node.
292Removing (deleting) a file is done by
293decrementing the
294link-count of the i-node specified by its directory entry
295and erasing the directory entry.
296If the link-count drops to 0,
297any disk blocks in the file
298are freed and the i-node is deallocated.
299.pg
300The space on all fixed or removable disks which
301contain a file system is divided into a number of
302512-byte
303blocks logically addressed from 0 up to a limit which
304depends on the device.
305There is space in the i-node of each file for eight device addresses.
306A \fIsmall\fR (non-special) file fits into eight or fewer
307blocks; in this case the addresses of the
308blocks themselves are stored.
309For \fIlarge\fR (non-special) files, seven of the eight device addresses may point
310to indirect blocks each containing 256 addresses
311for the data blocks of the file.
312If required,
313the eighth word is the address of a double-indirect block containing 256 more addresses
314of indirect blocks.
315Thus files may conceptually grow to (7+256)\v'-.3'.\v'.3'256\v'-.3'.\v'.3'512 bytes;
316actually they are restricted to
31716,777,216
318(\|2\u\*t24\*n\d\|) bytes.
319Once opened, a small file (size 1 to 8 blocks)
320can be accessed directly.
321A large file (size 9 to 32768 blocks)
322requires one additional access
323to read below logical block 1792
324(7\v'-.3'.\v'.3'256)
325and two additional references above 1792.
326.pg
327The foregoing discussion applies to ordinary files.
328When an I/O request is made to a file whose i-node indicates that it
329is special,
330the last seven device address words are immaterial,
331and the first is interpreted as
332a pair of bytes which constitute an internal
333.ft I
334device name.
335.ft R
336These bytes specify
337respectively a device type
338and subdevice number.
339The device type indicates which
340system routine will deal with I/O on that device;
341the subdevice number selects, for example, a disk drive
342attached to a particular controller or one of several
343similar typewriter interfaces.
344.pg
345In this environment, the implementation of the
346.ft I
347mount
348.ft R
349system call (\(sc3.4) is quite straightforward.
350.ft I
351Mount
352.ft R
353maintains a system table whose
354argument is the i-number and device name of the
355ordinary file specified
356during the
357.ft I
358mount,
359.ft R
360and whose corresponding value is the
361device name of the indicated special file.
362This table is searched for each (i-number, device)-pair
363which turns up while a path name is being scanned
364during an
365.it open
366or
367.it create;
368if a match is found,
369the i-number is replaced by 1 (which is the i-number of the root
370directory on all file systems),
371and the device name is replaced by the table value.
372.pg
373To the user, both reading and writing of files appear to
374be synchronous and unbuffered.
375That is, immediately after
376return from a \fIread\fR call the data are available, and conversely
377after a \fIwrite\fR the user's workspace may be reused.
378In fact the system maintains a rather complicated
379buffering mechanism which reduces greatly the number
380of I/O operations required to access a file.
381Suppose a \fIwrite\fR call is made specifying transmission
382of a single byte.
383U\*sNIX\*n will search its buffers to see
384whether the affected disk block currently resides in core memory;
385if not, it will be read in from the device.
386Then the affected byte is replaced in the buffer and an
387entry is made in a list of blocks to be written.
388The return from the \fIwrite\fR call may then take place,
389although the actual I/O may not be completed until a later time.
390Conversely, if a single byte is read, the system determines
391whether the secondary storage block in which the byte is located is already
392in one of the system's buffers; if so, the byte can be returned immediately.
393If not, the block is read into a buffer and the byte picked out.
394.pg
395The system recognizes when
396a program has
397made accesses to
398sequential blocks of a file,
399and asynchronously
400pre-reads the next block.
401This significantly reduces
402the running time of most programs
403while adding little to
404system overhead.
405.pg
406A program which reads or writes files in units of 512 bytes
407has an advantage over a program which reads or writes
408a single byte at a time, but the gain is not immense;
409it comes mainly from the avoidance of system overhead.
410A program which is used rarely or which does
411no great volume of I/O may quite reasonably
412read and write in units as small as it wishes.
413.pg
414The notion of the i-list is an unusual feature
415of \*sUNIX\*n.
416In practice, this method of organizing the file system
417has proved quite reliable and easy to deal with.
418To the system itself, one of its strengths is
419the fact that each file has a short, unambiguous name
420which is related in a simple way to the protection, addressing,
421and other information needed to access the file.
422It also permits a quite simple and rapid
423algorithm for checking the consistency of a file system,
424for example verification
425that the portions of each device containing useful information
426and those free to be allocated are disjoint and together
427exhaust the space on the device.
428This algorithm is independent
429of the directory hierarchy, since it need only scan
430the linearly-organized i-list.
431At the same time the notion of the i-list induces certain
432peculiarities not found in other file system organizations.
433For example, there is the question of who is to be charged
434for the space a file occupies,
435since all directory entries for a file have equal status.
436Charging the owner of a file is unfair in general,
437since one user may create a file, another may link to
438it, and the first user may delete the file.
439The first user is still the owner of the
440file, but it should be charged
441to the second user.
442The simplest reasonably fair algorithm
443seems to be to spread the charges
444equally among users who have links to a file.
445The current version of \*sUNIX\*n avoids the
446issue by not charging any fees at all.
447.s2
4484.1 Efficiency of the file system
449.es
450To provide an indication of the overall efficiency
451of \*sUNIX\*n and of the file system in particular,
452timings were made of the assembly of
453a 8848-line program.
454The assembly was run alone on the machine;
455the total clock time was 32 seconds,
456for a rate of 276 lines per second.
457The time was divided as follows:
45866% assembler execution time,
45921% system overhead,
46013% disk wait time.
461We will not attempt any interpretation of these figures
462nor any comparison with other systems, but merely
463note that we are
464generally satisfied with the overall performance
465of the system.