Commit | Line | Data |
---|---|---|
8340f87c BJ |
1 | .SH |
2 | 3.6 I/O calls | |
3 | .PP | |
4 | The system calls to do I/O are designed to eliminate | |
5 | the differences between the various devices and styles of | |
6 | access. | |
7 | There is no distinction between ``random'' | |
8 | and ``sequential'' I/O, nor is any logical record size imposed | |
9 | by the system. | |
10 | The size of an ordinary file is determined | |
11 | by the number of bytes written on it; | |
12 | no predetermination of the size of a file is necessary | |
13 | or possible. | |
14 | .PP | |
15 | To illustrate the essentials of I/O, | |
16 | some of the basic calls are | |
17 | summarized below | |
18 | in an anonymous language that will | |
19 | indicate the required parameters without getting into the | |
20 | underlying | |
21 | complexities. | |
22 | Each call to the system may potentially result in an error | |
23 | return, which for simplicity is not represented | |
24 | in the calling sequence. | |
25 | .PP | |
26 | To read or write a file assumed to exist already, it must | |
27 | be opened by the following call: | |
28 | .P1 | |
29 | filep = open\|(\|name, flag\|) | |
30 | .P2 | |
31 | where | |
32 | .UL name | |
33 | indicates the name of the file. | |
34 | An arbitrary path name may be given. | |
35 | The | |
36 | .UL flag | |
37 | argument indicates whether the file is to be read, written, | |
38 | or ``updated,'' that is, read and written simultaneously. | |
39 | .PP | |
40 | The returned value | |
41 | .UL filep | |
42 | is called a | |
43 | .IT "file descriptor" . | |
44 | It is a small integer used to identify the file | |
45 | in subsequent calls to read, write, | |
46 | or otherwise manipulate the file. | |
47 | .PP | |
48 | To create a new file or completely rewrite an old one, | |
49 | there is a | |
50 | .UL create | |
51 | system call that | |
52 | creates the given file if it does not exist, | |
53 | or truncates it to zero length | |
54 | if it does exist; | |
55 | .UL create | |
56 | also opens the new file for writing | |
57 | and, like | |
58 | .UL open , | |
59 | returns a file descriptor. | |
60 | .PP | |
61 | The file system maintains no locks visible to the user, nor is there any | |
62 | restriction on the number of users who may have a file | |
63 | open for reading or writing. | |
64 | Although it is possible for the contents of a file | |
65 | to become scrambled when two users write on it simultaneously, | |
66 | in practice difficulties do not arise. | |
67 | We take the view that locks are neither | |
68 | necessary nor sufficient, in our environment, | |
69 | to prevent interference between users of the same file. | |
70 | They are unnecessary because we are not | |
71 | faced with large, single-file data bases | |
72 | maintained by independent processes. | |
73 | They are insufficient because | |
74 | locks in the ordinary sense, whereby | |
75 | one user is prevented from writing on a file that another | |
76 | user is reading, | |
77 | cannot prevent confusion | |
78 | when, for example, both users are editing | |
79 | a file with an editor that makes | |
80 | a copy of the file being edited. | |
81 | .PP | |
82 | There are, however, | |
83 | sufficient internal interlocks to maintain | |
84 | the logical consistency of the file system | |
85 | when two users engage simultaneously in | |
86 | activities such as writing on | |
87 | the same file, | |
88 | creating files in the same directory, | |
89 | or deleting each other's open files. | |
90 | .PP | |
91 | Except as indicated below, reading and writing | |
92 | are sequential. | |
93 | This means that if a particular | |
94 | byte in the file was the last byte written (or read), | |
95 | the next I/O call implicitly refers to the | |
96 | immediately following byte. | |
97 | For each open file there is a pointer, maintained | |
98 | inside the system, | |
99 | that indicates the next byte to be read | |
100 | or written. | |
101 | If | |
102 | .IT n | |
103 | bytes are read or written, the pointer advances | |
104 | by | |
105 | .IT n | |
106 | bytes. | |
107 | .PP | |
108 | Once a file is open, the following calls | |
109 | may be used: | |
110 | .P1 | |
111 | n = read\|(\|filep, buffer, count\|) | |
112 | n = write\|(\|filep, buffer, count\|) | |
113 | .P2 | |
114 | Up to | |
115 | .UL count | |
116 | bytes are transmitted between the file specified | |
117 | by | |
118 | .UL filep | |
119 | and the byte array | |
120 | specified by | |
121 | .UL buffer . | |
122 | The returned value | |
123 | .UL n | |
124 | is the number of bytes actually transmitted. | |
125 | In the | |
126 | .UL write | |
127 | case, | |
128 | .UL n | |
129 | is the same as | |
130 | .UL count | |
131 | except under exceptional conditions, such as I/O errors or | |
132 | end of physical medium on special files; | |
133 | in a | |
134 | .UL read , | |
135 | however, | |
136 | .UL n | |
137 | may without error be less than | |
138 | .UL count . | |
139 | If the read pointer is so near the end of the | |
140 | file that reading | |
141 | .UL count | |
142 | characters | |
143 | would cause reading beyond the end, only sufficient | |
144 | bytes are transmitted to reach the end of the | |
145 | file; | |
146 | also, typewriter-like terminals | |
147 | never return more than one line of input. | |
148 | When a | |
149 | .UL read | |
150 | call returns with | |
151 | .UL n | |
152 | equal | |
153 | to zero, the end of the file has been reached. | |
154 | For disk files this occurs when the read pointer | |
155 | becomes equal to the current | |
156 | size of the file. | |
157 | It is possible to generate an end-of-file | |
158 | from a terminal by use of an escape | |
159 | sequence that depends on the device used. | |
160 | .PP | |
161 | Bytes written affect only those parts of a file implied by | |
162 | the position of the write pointer and the | |
163 | count; no other part of the file | |
164 | is changed. | |
165 | If the last byte lies beyond the end of the file, the | |
166 | file is made to grow as needed. | |
167 | .PP | |
168 | To do random (direct-access) I/O | |
169 | it is only necessary to move the read or write pointer | |
170 | to the appropriate location in the file. | |
171 | .P1 | |
172 | location = lseek\|(\|filep, offset, base\|) | |
173 | .P2 | |
174 | The pointer | |
175 | associated with | |
176 | .UL filep | |
177 | is moved to a position | |
178 | .UL offset | |
179 | bytes from the beginning of the file, from the current position | |
180 | of the pointer, or from the end of the file, | |
181 | depending on | |
182 | .UL base. | |
183 | .UL \&offset | |
184 | may be negative. | |
185 | For some devices (e.g., paper | |
186 | tape and | |
187 | terminals) seek calls are | |
188 | ignored. | |
189 | The actual offset from the beginning of the file | |
190 | to which the pointer was moved is returned | |
191 | in | |
192 | .UL location . | |
193 | .PP | |
194 | There are several additional system entries | |
195 | having to do with I/O and with the file | |
196 | system that will not be discussed. | |
197 | For example: | |
198 | close a file, | |
199 | get the status of a file, | |
200 | change the protection mode or the owner | |
201 | of a file, | |
202 | create a directory, | |
203 | make a link to an existing file, | |
204 | delete a file. | |
205 | .SH | |
206 | IV. IMPLEMENTATION OF THE FILE SYSTEM | |
207 | .PP | |
208 | As mentioned in Section 3.2 above, a directory entry contains | |
209 | only a name for the associated file and a pointer to the | |
210 | file itself. | |
211 | This pointer is an integer called the | |
212 | .IT i-number | |
213 | (for index number) | |
214 | of the file. | |
215 | When the file is accessed, | |
216 | its i-number is used as an index into | |
217 | a system table (the | |
218 | .IT i-list \|) | |
219 | stored in a known | |
220 | part of the device on which | |
221 | the directory resides. | |
222 | The entry found thereby (the file's | |
223 | .IT i-node \|) | |
224 | contains | |
225 | the description of the file: | |
226 | .IP i | |
227 | the user and group-\*sID\*n of its owner | |
228 | .IP ii | |
229 | its protection bits | |
230 | .IP iii | |
231 | the physical disk or tape addresses for the file contents | |
232 | .IP iv | |
233 | its size | |
234 | .IP v | |
235 | time of creation, last use, and last modification | |
236 | .IP vi | |
237 | the number of links to the file, that is, the number of times it appears in a directory | |
238 | .IP vii | |
239 | a code indicating whether the file is a directory, an ordinary file, or a special file. | |
240 | .LP | |
241 | The purpose of an | |
242 | .UL open | |
243 | or | |
244 | .UL create | |
245 | system call is to turn the path name given by the user | |
246 | into an i-number | |
247 | by searching the explicitly or implicitly named directories. | |
248 | Once a file is open, | |
249 | its device, i-number, and read/write pointer are stored in a system table | |
250 | indexed by the file descriptor returned by the | |
251 | .UL open | |
252 | or | |
253 | .UL create . | |
254 | Thus, during a subsequent | |
255 | call to read or write the | |
256 | file, | |
257 | the descriptor | |
258 | may be easily related to the information necessary to access the file. | |
259 | .PP | |
260 | When a new file is created, | |
261 | an i-node is allocated for it and a directory entry is made | |
262 | that contains the name of the file and the i-node | |
263 | number. | |
264 | Making a link to an existing file involves | |
265 | creating a directory entry with the new name, | |
266 | copying the i-number from the original file entry, | |
267 | and incrementing the link-count field of the i-node. | |
268 | Removing (deleting) a file is done by | |
269 | decrementing the | |
270 | link-count of the i-node specified by its directory entry | |
271 | and erasing the directory entry. | |
272 | If the link-count drops to 0, | |
273 | any disk blocks in the file | |
274 | are freed and the i-node is de-allocated. | |
275 | .PP | |
276 | The space on all disks that | |
277 | contain a file system is divided into a number of | |
278 | 512-byte | |
279 | blocks logically addressed from 0 up to a limit that | |
280 | depends on the device. | |
281 | There is space in the i-node of each file for 13 device addresses. | |
282 | For nonspecial files, | |
283 | the first 10 device addresses point at the first | |
284 | 10 blocks of the file. | |
285 | If the file is larger than 10 blocks, | |
286 | the 11 device address points to an | |
287 | indirect block containing up to 128 addresses | |
288 | of additional blocks in the file. | |
289 | Still larger files use the twelfth device address | |
290 | of the i-node to point to | |
291 | a double-indirect block naming | |
292 | 128 indirect blocks, | |
293 | each | |
294 | pointing to 128 blocks of the file. | |
295 | If required, | |
296 | the thirteenth device address is | |
297 | a triple-indirect block. | |
298 | Thus files may conceptually grow to | |
299 | [\|(10+128+128\u\s62\s0\d+128\u\s63\s0\d)\*m512\|] bytes. | |
300 | Once opened, | |
301 | bytes numbered below 5120 can be read with a single | |
302 | disk access; | |
303 | bytes in the range 5120 to 70,656 | |
304 | require two accesses; | |
305 | bytes in the range 70,656 | |
306 | to 8,459,264 | |
307 | require three accesses; | |
308 | bytes from there to the | |
309 | largest file | |
310 | (1,082,201,088) | |
311 | require four accesses. | |
312 | In practice, | |
313 | a device cache mechanism | |
314 | (see below) | |
315 | proves effective in eliminating | |
316 | most of the indirect fetches. | |
317 | .PP | |
318 | The foregoing discussion applies to ordinary files. | |
319 | When an I/O request is made to a file whose i-node indicates that it | |
320 | is special, | |
321 | the last 12 device address words are immaterial, | |
322 | and the first specifies | |
323 | an internal | |
324 | .IT "device name" , | |
325 | which is interpreted as a pair of numbers | |
326 | representing, | |
327 | respectively, a device type | |
328 | and subdevice number. | |
329 | The device type indicates which | |
330 | system routine will deal with I/O on that device; | |
331 | the subdevice number selects, for example, a disk drive | |
332 | attached to a particular controller or one of several | |
333 | similar terminal interfaces. | |
334 | .PP | |
335 | In this environment, the implementation of the | |
336 | .UL mount | |
337 | system call (Section 3.4) is quite straightforward. | |
338 | .UL \&mount | |
339 | maintains a system table whose | |
340 | argument is the i-number and device name of the | |
341 | ordinary file specified | |
342 | during the | |
343 | .UL mount , | |
344 | and whose corresponding value is the | |
345 | device name of the indicated special file. | |
346 | This table is searched for each i-number/device pair | |
347 | that turns up while a path name is being scanned | |
348 | during an | |
349 | .UL open | |
350 | or | |
351 | .UL create ; | |
352 | if a match is found, | |
353 | the i-number is replaced by the i-number of the root | |
354 | directory | |
355 | and the device name is replaced by the table value. | |
356 | .PP | |
357 | To the user, both reading and writing of files appear to | |
358 | be synchronous and unbuffered. | |
359 | That is, immediately after | |
360 | return from a | |
361 | .UL read | |
362 | call the data are available; conversely, | |
363 | after a | |
364 | .UL write | |
365 | the user's workspace may be reused. | |
366 | In fact, the system maintains a rather complicated | |
367 | buffering mechanism that reduces greatly the number | |
368 | of I/O operations required to access a file. | |
369 | Suppose a | |
370 | .UL write | |
371 | call is made specifying transmission | |
372 | of a single byte. | |
373 | The system | |
374 | will search its buffers to see | |
375 | whether the affected disk block currently resides in main memory; | |
376 | if not, it will be read in from the device. | |
377 | Then the affected byte is replaced in the buffer and an | |
378 | entry is made in a list of blocks to be written. | |
379 | The return from the | |
380 | .UL write | |
381 | call may then take place, | |
382 | although the actual I/O may not be completed until a later time. | |
383 | Conversely, if a single byte is read, the system determines | |
384 | whether the secondary storage block in which the byte is located is already | |
385 | in one of the system's buffers; if so, the byte can be returned immediately. | |
386 | If not, the block is read into a buffer and the byte picked out. | |
387 | .PP | |
388 | The system recognizes when | |
389 | a program has | |
390 | made accesses to | |
391 | sequential blocks of a file, | |
392 | and asynchronously | |
393 | pre-reads the next block. | |
394 | This significantly reduces | |
395 | the running time of most programs | |
396 | while adding little to | |
397 | system overhead. | |
398 | .PP | |
399 | A program that reads or writes files in units of 512 bytes | |
400 | has an advantage over a program that reads or writes | |
401 | a single byte at a time, but the gain is not immense; | |
402 | it comes mainly from the avoidance of system overhead. | |
403 | If a program is used rarely or does | |
404 | no great volume of I/O, it may quite reasonably | |
405 | read and write in units as small as it wishes. | |
406 | .PP | |
407 | The notion of the i-list is an unusual feature | |
408 | of | |
409 | .UX . | |
410 | In practice, this method of organizing the file system | |
411 | has proved quite reliable and easy to deal with. | |
412 | To the system itself, one of its strengths is | |
413 | the fact that each file has a short, unambiguous name | |
414 | related in a simple way to the protection, addressing, | |
415 | and other information needed to access the file. | |
416 | It also permits a quite simple and rapid | |
417 | algorithm for checking the consistency of a file system, | |
418 | for example, verification | |
419 | that the portions of each device containing useful information | |
420 | and those free to be allocated are disjoint and together | |
421 | exhaust the space on the device. | |
422 | This algorithm is independent | |
423 | of the directory hierarchy, because it need only scan | |
424 | the linearly organized i-list. | |
425 | At the same time the notion of the i-list induces certain | |
426 | peculiarities not found in other file system organizations. | |
427 | For example, there is the question of who is to be charged | |
428 | for the space a file occupies, | |
429 | because all directory entries for a file have equal status. | |
430 | Charging the owner of a file is unfair in general, | |
431 | for one user may create a file, another may link to | |
432 | it, and the first user may delete the file. | |
433 | The first user is still the owner of the | |
434 | file, but it should be charged | |
435 | to the second user. | |
436 | The simplest reasonably fair algorithm | |
437 | seems to be to spread the charges | |
438 | equally among users who have links to a file. | |
439 | Many installations | |
440 | avoid the | |
441 | issue by not charging any fees at all. |