include problem
[unix-history] / .ref-BSD-3 / usr / doc / cacm / p2
CommitLineData
8340f87c
BJ
1.SH
23.6 I/O calls
3.PP
4The system calls to do I/O are designed to eliminate
5the differences between the various devices and styles of
6access.
7There is no distinction between ``random''
8and ``sequential'' I/O, nor is any logical record size imposed
9by the system.
10The size of an ordinary file is determined
11by the number of bytes written on it;
12no predetermination of the size of a file is necessary
13or possible.
14.PP
15To illustrate the essentials of I/O,
16some of the basic calls are
17summarized below
18in an anonymous language that will
19indicate the required parameters without getting into the
20underlying
21complexities.
22Each call to the system may potentially result in an error
23return, which for simplicity is not represented
24in the calling sequence.
25.PP
26To read or write a file assumed to exist already, it must
27be opened by the following call:
28.P1
29filep = open\|(\|name, flag\|)
30.P2
31where
32.UL name
33indicates the name of the file.
34An arbitrary path name may be given.
35The
36.UL flag
37argument indicates whether the file is to be read, written,
38or ``updated,'' that is, read and written simultaneously.
39.PP
40The returned value
41.UL filep
42is called a
43.IT "file descriptor" .
44It is a small integer used to identify the file
45in subsequent calls to read, write,
46or otherwise manipulate the file.
47.PP
48To create a new file or completely rewrite an old one,
49there is a
50.UL create
51system call that
52creates the given file if it does not exist,
53or truncates it to zero length
54if it does exist;
55.UL create
56also opens the new file for writing
57and, like
58.UL open ,
59returns a file descriptor.
60.PP
61The file system maintains no locks visible to the user, nor is there any
62restriction on the number of users who may have a file
63open for reading or writing.
64Although it is possible for the contents of a file
65to become scrambled when two users write on it simultaneously,
66in practice difficulties do not arise.
67We take the view that locks are neither
68necessary nor sufficient, in our environment,
69to prevent interference between users of the same file.
70They are unnecessary because we are not
71faced with large, single-file data bases
72maintained by independent processes.
73They are insufficient because
74locks in the ordinary sense, whereby
75one user is prevented from writing on a file that another
76user is reading,
77cannot prevent confusion
78when, for example, both users are editing
79a file with an editor that makes
80a copy of the file being edited.
81.PP
82There are, however,
83sufficient internal interlocks to maintain
84the logical consistency of the file system
85when two users engage simultaneously in
86activities such as writing on
87the same file,
88creating files in the same directory,
89or deleting each other's open files.
90.PP
91Except as indicated below, reading and writing
92are sequential.
93This means that if a particular
94byte in the file was the last byte written (or read),
95the next I/O call implicitly refers to the
96immediately following byte.
97For each open file there is a pointer, maintained
98inside the system,
99that indicates the next byte to be read
100or written.
101If
102.IT n
103bytes are read or written, the pointer advances
104by
105.IT n
106bytes.
107.PP
108Once a file is open, the following calls
109may be used:
110.P1
111n = read\|(\|filep, buffer, count\|)
112n = write\|(\|filep, buffer, count\|)
113.P2
114Up to
115.UL count
116bytes are transmitted between the file specified
117by
118.UL filep
119and the byte array
120specified by
121.UL buffer .
122The returned value
123.UL n
124is the number of bytes actually transmitted.
125In the
126.UL write
127case,
128.UL n
129is the same as
130.UL count
131except under exceptional conditions, such as I/O errors or
132end of physical medium on special files;
133in a
134.UL read ,
135however,
136.UL n
137may without error be less than
138.UL count .
139If the read pointer is so near the end of the
140file that reading
141.UL count
142characters
143would cause reading beyond the end, only sufficient
144bytes are transmitted to reach the end of the
145file;
146also, typewriter-like terminals
147never return more than one line of input.
148When a
149.UL read
150call returns with
151.UL n
152equal
153to zero, the end of the file has been reached.
154For disk files this occurs when the read pointer
155becomes equal to the current
156size of the file.
157It is possible to generate an end-of-file
158from a terminal by use of an escape
159sequence that depends on the device used.
160.PP
161Bytes written affect only those parts of a file implied by
162the position of the write pointer and the
163count; no other part of the file
164is changed.
165If the last byte lies beyond the end of the file, the
166file is made to grow as needed.
167.PP
168To do random (direct-access) I/O
169it is only necessary to move the read or write pointer
170to the appropriate location in the file.
171.P1
172location = lseek\|(\|filep, offset, base\|)
173.P2
174The pointer
175associated with
176.UL filep
177is moved to a position
178.UL offset
179bytes from the beginning of the file, from the current position
180of the pointer, or from the end of the file,
181depending on
182.UL base.
183.UL \&offset
184may be negative.
185For some devices (e.g., paper
186tape and
187terminals) seek calls are
188ignored.
189The actual offset from the beginning of the file
190to which the pointer was moved is returned
191in
192.UL location .
193.PP
194There are several additional system entries
195having to do with I/O and with the file
196system that will not be discussed.
197For example:
198close a file,
199get the status of a file,
200change the protection mode or the owner
201of a file,
202create a directory,
203make a link to an existing file,
204delete a file.
205.SH
206IV. IMPLEMENTATION OF THE FILE SYSTEM
207.PP
208As mentioned in Section 3.2 above, a directory entry contains
209only a name for the associated file and a pointer to the
210file itself.
211This pointer is an integer called the
212.IT i-number
213(for index number)
214of the file.
215When the file is accessed,
216its i-number is used as an index into
217a system table (the
218.IT i-list \|)
219stored in a known
220part of the device on which
221the directory resides.
222The entry found thereby (the file's
223.IT i-node \|)
224contains
225the description of the file:
226.IP i
227the user and group-\*sID\*n of its owner
228.IP ii
229its protection bits
230.IP iii
231the physical disk or tape addresses for the file contents
232.IP iv
233its size
234.IP v
235time of creation, last use, and last modification
236.IP vi
237the number of links to the file, that is, the number of times it appears in a directory
238.IP vii
239a code indicating whether the file is a directory, an ordinary file, or a special file.
240.LP
241The purpose of an
242.UL open
243or
244.UL create
245system call is to turn the path name given by the user
246into an i-number
247by searching the explicitly or implicitly named directories.
248Once a file is open,
249its device, i-number, and read/write pointer are stored in a system table
250indexed by the file descriptor returned by the
251.UL open
252or
253.UL create .
254Thus, during a subsequent
255call to read or write the
256file,
257the descriptor
258may be easily related to the information necessary to access the file.
259.PP
260When a new file is created,
261an i-node is allocated for it and a directory entry is made
262that contains the name of the file and the i-node
263number.
264Making a link to an existing file involves
265creating a directory entry with the new name,
266copying the i-number from the original file entry,
267and incrementing the link-count field of the i-node.
268Removing (deleting) a file is done by
269decrementing the
270link-count of the i-node specified by its directory entry
271and erasing the directory entry.
272If the link-count drops to 0,
273any disk blocks in the file
274are freed and the i-node is de-allocated.
275.PP
276The space on all disks that
277contain a file system is divided into a number of
278512-byte
279blocks logically addressed from 0 up to a limit that
280depends on the device.
281There is space in the i-node of each file for 13 device addresses.
282For nonspecial files,
283the first 10 device addresses point at the first
28410 blocks of the file.
285If the file is larger than 10 blocks,
286the 11 device address points to an
287indirect block containing up to 128 addresses
288of additional blocks in the file.
289Still larger files use the twelfth device address
290of the i-node to point to
291a double-indirect block naming
292128 indirect blocks,
293each
294pointing to 128 blocks of the file.
295If required,
296the thirteenth device address is
297a triple-indirect block.
298Thus files may conceptually grow to
299[\|(10+128+128\u\s62\s0\d+128\u\s63\s0\d)\*m512\|] bytes.
300Once opened,
301bytes numbered below 5120 can be read with a single
302disk access;
303bytes in the range 5120 to 70,656
304require two accesses;
305bytes in the range 70,656
306to 8,459,264
307require three accesses;
308bytes from there to the
309largest file
310(1,082,201,088)
311require four accesses.
312In practice,
313a device cache mechanism
314(see below)
315proves effective in eliminating
316most of the indirect fetches.
317.PP
318The foregoing discussion applies to ordinary files.
319When an I/O request is made to a file whose i-node indicates that it
320is special,
321the last 12 device address words are immaterial,
322and the first specifies
323an internal
324.IT "device name" ,
325which is interpreted as a pair of numbers
326representing,
327respectively, a device type
328and subdevice number.
329The device type indicates which
330system routine will deal with I/O on that device;
331the subdevice number selects, for example, a disk drive
332attached to a particular controller or one of several
333similar terminal interfaces.
334.PP
335In this environment, the implementation of the
336.UL mount
337system call (Section 3.4) is quite straightforward.
338.UL \&mount
339maintains a system table whose
340argument is the i-number and device name of the
341ordinary file specified
342during the
343.UL mount ,
344and whose corresponding value is the
345device name of the indicated special file.
346This table is searched for each i-number/device pair
347that turns up while a path name is being scanned
348during an
349.UL open
350or
351.UL create ;
352if a match is found,
353the i-number is replaced by the i-number of the root
354directory
355and the device name is replaced by the table value.
356.PP
357To the user, both reading and writing of files appear to
358be synchronous and unbuffered.
359That is, immediately after
360return from a
361.UL read
362call the data are available; conversely,
363after a
364.UL write
365the user's workspace may be reused.
366In fact, the system maintains a rather complicated
367buffering mechanism that reduces greatly the number
368of I/O operations required to access a file.
369Suppose a
370.UL write
371call is made specifying transmission
372of a single byte.
373The system
374will search its buffers to see
375whether the affected disk block currently resides in main memory;
376if not, it will be read in from the device.
377Then the affected byte is replaced in the buffer and an
378entry is made in a list of blocks to be written.
379The return from the
380.UL write
381call may then take place,
382although the actual I/O may not be completed until a later time.
383Conversely, if a single byte is read, the system determines
384whether the secondary storage block in which the byte is located is already
385in one of the system's buffers; if so, the byte can be returned immediately.
386If not, the block is read into a buffer and the byte picked out.
387.PP
388The system recognizes when
389a program has
390made accesses to
391sequential blocks of a file,
392and asynchronously
393pre-reads the next block.
394This significantly reduces
395the running time of most programs
396while adding little to
397system overhead.
398.PP
399A program that reads or writes files in units of 512 bytes
400has an advantage over a program that reads or writes
401a single byte at a time, but the gain is not immense;
402it comes mainly from the avoidance of system overhead.
403If a program is used rarely or does
404no great volume of I/O, it may quite reasonably
405read and write in units as small as it wishes.
406.PP
407The notion of the i-list is an unusual feature
408of
409.UX .
410In practice, this method of organizing the file system
411has proved quite reliable and easy to deal with.
412To the system itself, one of its strengths is
413the fact that each file has a short, unambiguous name
414related in a simple way to the protection, addressing,
415and other information needed to access the file.
416It also permits a quite simple and rapid
417algorithm for checking the consistency of a file system,
418for example, verification
419that the portions of each device containing useful information
420and those free to be allocated are disjoint and together
421exhaust the space on the device.
422This algorithm is independent
423of the directory hierarchy, because it need only scan
424the linearly organized i-list.
425At the same time the notion of the i-list induces certain
426peculiarities not found in other file system organizations.
427For example, there is the question of who is to be charged
428for the space a file occupies,
429because all directory entries for a file have equal status.
430Charging the owner of a file is unfair in general,
431for one user may create a file, another may link to
432it, and the first user may delete the file.
433The first user is still the owner of the
434file, but it should be charged
435to the second user.
436The simplest reasonably fair algorithm
437seems to be to spread the charges
438equally among users who have links to a file.
439Many installations
440avoid the
441issue by not charging any fees at all.