Commit | Line | Data |
---|---|---|
9ddf7a71 DR |
1 | .s2 |
2 | 3.6 I/O calls | |
3 | .es | |
4 | The system calls to do I/O are designed to eliminate | |
5 | the differences between the various devices and styles of | |
6 | access. | |
7 | There is no distinction between ``random'' | |
8 | and ``sequential'' I/O, nor is any logical record size imposed | |
9 | by the system. | |
10 | The size of an ordinary file is determined | |
11 | by the highest byte written on it; | |
12 | no predetermination of the size of a file is necessary | |
13 | or possible. | |
14 | .pg | |
15 | To illustrate the essentials of I/O in \*sUNIX\*n, some of the basic calls are | |
16 | summarized below | |
17 | in an anonymous language which will | |
18 | indicate the required parameters without getting into the | |
19 | complexities of machine language programming. | |
20 | Each call to the system may potentially result in an error | |
21 | return, which for simplicity is not represented | |
22 | in the calling sequence. | |
23 | .pg | |
24 | To read or write a file assumed to exist already, it must | |
25 | be opened by the following call: | |
26 | .dc | |
27 | filep = open\|(\|name, flag\|) | |
28 | .ec | |
29 | \fIName\fR indicates the name of the file. | |
30 | An arbitrary path name may be given. | |
31 | The \fIflag\fR argument indicates whether the file is to be read, written, | |
32 | or ``updated,'' that is read and written simultaneously. | |
33 | .pg | |
34 | The returned value | |
35 | .ft I | |
36 | filep | |
37 | .ft R | |
38 | is called a | |
39 | .ft I | |
40 | file descriptor. | |
41 | .ft R | |
42 | It is a small integer used to identify the file | |
43 | in subsequent calls to read, write | |
44 | or otherwise manipulate the file. | |
45 | .pg | |
46 | To create a new file or completely rewrite an old one, | |
47 | there is a \fIcreate\fR system call which | |
48 | creates the given file if it does not exist, | |
49 | or truncates it to zero length | |
50 | if it does exist. | |
51 | \fICreate\fR also opens the new file for writing | |
52 | and, like \fIopen,\fR returns a file descriptor. | |
53 | .pg | |
54 | There are no user-visible locks in the file system, nor is there any | |
55 | restriction on the number of users who may have a file | |
56 | open for reading or writing. | |
57 | Although it is possible for the contents of a file | |
58 | to become scrambled when two users write on it simultaneously, | |
59 | in practice difficulties do not arise. | |
60 | We take the view that locks are neither | |
61 | necessary nor sufficient, in our environment, | |
62 | to prevent interference between users of the same file. | |
63 | They are unnecessary because we are not | |
64 | faced with large, single-file data bases | |
65 | maintained by independent processes. | |
66 | They are insufficient because | |
67 | locks in the ordinary sense, whereby | |
68 | one user is prevented from writing on a file which another | |
69 | user is reading, | |
70 | cannot prevent confusion | |
71 | when, for example, both users are editing | |
72 | a file with an editor which makes | |
73 | a copy of the file being edited. | |
74 | .pg | |
75 | It should be said that the system has | |
76 | sufficient internal interlocks to maintain | |
77 | the logical consistency of the file system | |
78 | when two users engage simultaneously in such inconvenient | |
79 | activities as writing on | |
80 | the same file, | |
81 | creating files in the same directory, | |
82 | or deleting each other's open files. | |
83 | .pg | |
84 | Except as indicated below, reading and writing | |
85 | are sequential. | |
86 | This means that if a particular | |
87 | byte in the file was the last byte written (or read), | |
88 | the next I/O call implicitly refers to the | |
89 | first following byte. | |
90 | For each open file there is a pointer, maintained | |
91 | by the system, | |
92 | which indicates the next byte to be read | |
93 | or written. | |
94 | If \fIn\fR bytes are read or written, the pointer advances | |
95 | by \fIn\fR bytes. | |
96 | .pg | |
97 | Once a file is open, the following calls | |
98 | may be used. | |
99 | .dc | |
100 | n = read\|(\|filep, buffer, count\|) | |
101 | .dc | |
102 | n = write\|(\|filep, buffer, count\|) | |
103 | .ec | |
104 | Up to | |
105 | .ft I | |
106 | count | |
107 | .ft R | |
108 | bytes are transmitted between the file specified | |
109 | by | |
110 | .ft I | |
111 | filep | |
112 | .ft R | |
113 | and the byte array | |
114 | specified by | |
115 | .ft I | |
116 | buffer. | |
117 | .ft R | |
118 | The returned value | |
119 | .ft I | |
120 | n | |
121 | .ft R | |
122 | is the number of bytes actually transmitted. | |
123 | In the | |
124 | .ft I | |
125 | write | |
126 | .ft R | |
127 | case, | |
128 | .ft I | |
129 | n | |
130 | .ft R | |
131 | is the same as | |
132 | .ft I | |
133 | count | |
134 | .ft R | |
135 | except under exceptional conditions like I/O errors or | |
136 | end of physical medium on special files; | |
137 | in a | |
138 | .ft I | |
139 | read, | |
140 | .ft R | |
141 | however, | |
142 | .ft I | |
143 | n | |
144 | .ft R | |
145 | may without error be less than | |
146 | .ft I | |
147 | count. | |
148 | .ft R | |
149 | If the read pointer is so near the end of the | |
150 | file that reading \fIcount\fR characters | |
151 | would cause reading beyond the end, only sufficient | |
152 | bytes are transmitted to reach the end of the | |
153 | file; | |
154 | also, typewriter-like devices | |
155 | never return more than one line of input. | |
156 | When a \fIread\fR call returns with \fIn\fR equal | |
157 | to zero, it indicates the end of the file. | |
158 | For disk files this occurs when the read pointer | |
159 | becomes equal to the current | |
160 | size of the file. | |
161 | It is possible to generate an end-of-file | |
162 | from a typewriter by use of an escape | |
163 | sequence which depends on the device used. | |
164 | .pg | |
165 | Bytes written on a file affect only those implied by | |
166 | the position of the write pointer and the | |
167 | count; no other part of the file | |
168 | is changed. | |
169 | If the last byte lies beyond the end of the file, the | |
170 | file is grown as needed. | |
171 | .pg | |
172 | To do random (direct access) I/O | |
173 | it is only necessary to move the read or write pointer | |
174 | to the appropriate location in the file. | |
175 | .dc | |
176 | location = seek\|(\|filep, offset, base\|) | |
177 | .ec | |
178 | The pointer | |
179 | associated with \fIfilep\fR is moved to a position \fIoffset\fR | |
180 | bytes from the beginning of the file, from the current position | |
181 | of the pointer, or from the end of the file, | |
182 | depending on | |
183 | .ft I | |
184 | base. | |
185 | .ft R | |
186 | .ft I | |
187 | Offset | |
188 | .ft R | |
189 | may be negative. | |
190 | For some devices (e.g. paper | |
191 | tape and | |
192 | typewriters) seek calls are | |
193 | ignored. | |
194 | The actual offset from the beginning of the file | |
195 | to which the pointer was moved is returned | |
196 | in \fIlocation\fR. | |
197 | .s2 | |
198 | 3.6.1 Other I/O calls | |
199 | .es | |
200 | There are several additional system entries | |
201 | having to do with I/O and with the file | |
202 | system which will not be discussed. | |
203 | For example: | |
204 | close a file, | |
205 | get the status of a file, | |
206 | change the protection mode or the owner | |
207 | of a file, | |
208 | create a directory, | |
209 | make a link to an existing file, | |
210 | delete a file. | |
211 | .s1 | |
212 | 4. Implementation of the file system | |
213 | .es | |
214 | As mentioned in \(sc3.2 above, a directory entry contains | |
215 | only a name for the associated file and a pointer to the | |
216 | file itself. | |
217 | This pointer is an integer called the | |
218 | .ft I | |
219 | i-number | |
220 | .ft | |
221 | (for index number) | |
222 | of the file. | |
223 | When the file is accessed, | |
224 | its i-number is used as an index into | |
225 | a system table (the \fIi-list\|\fR) stored in a known | |
226 | part of the device on which | |
227 | the directory resides. | |
228 | The entry thereby found (the file's | |
229 | \fIi-node\fR\|) | |
230 | contains | |
231 | the description of the file: | |
232 | .sp | |
233 | .tr | | |
234 | .in .5i | |
235 | .ti -\w'1. 'u | |
236 | 1.|its owner; | |
237 | .ti -\w'1. 'u | |
238 | 2.|its protection bits; | |
239 | .ti -\w'1. 'u | |
240 | 3.|the physical disk or tape addresses for the file contents; | |
241 | .ti -\w'1. 'u | |
242 | 4.|its size; | |
243 | .ti -\w'1. 'u | |
244 | 5.|time of last modification; | |
245 | .ti -\w'1. 'u | |
246 | 6.|the number of links to the file; that is, the number | |
247 | of times it appears in a directory; | |
248 | .ti -\w'1. 'u | |
249 | 7.|a bit indicating whether the | |
250 | file is a directory; | |
251 | .ti -\w'1. 'u | |
252 | 8.|a bit indicating whether the file is a special file; | |
253 | .ti -\w'1. 'u | |
254 | 9.|a bit indicating whether the file is ``large'' or ``small.'' | |
255 | .sp | |
256 | .tr || | |
257 | .in 0 | |
258 | The purpose of an | |
259 | .ft I | |
260 | open | |
261 | .ft R | |
262 | or | |
263 | .ft I | |
264 | create | |
265 | .ft R | |
266 | system call is to turn the path name given by the user | |
267 | into an i-number | |
268 | by searching the explicitly or implicitly named directories. | |
269 | Once a file is open, | |
270 | its device, i-number, and read/write pointer are stored in a system table | |
271 | indexed by the file descriptor returned by the | |
272 | .ft I | |
273 | open | |
274 | .ft R | |
275 | or | |
276 | .ft I | |
277 | create. | |
278 | .ft R | |
279 | Thus the file descriptor supplied during a subsequent | |
280 | call to read or write the | |
281 | file | |
282 | may be easily related to the information necessary to access the file. | |
283 | .pg | |
284 | When a new file is created, | |
285 | an i-node is allocated for it and a directory entry is made | |
286 | which contains the name of the file and the i-node | |
287 | number. | |
288 | Making a link to an existing file involves | |
289 | creating a directory entry with the new name, | |
290 | copying the i-number from the original file entry, | |
291 | and incrementing the link-count field of the i-node. | |
292 | Removing (deleting) a file is done by | |
293 | decrementing the | |
294 | link-count of the i-node specified by its directory entry | |
295 | and erasing the directory entry. | |
296 | If the link-count drops to 0, | |
297 | any disk blocks in the file | |
298 | are freed and the i-node is deallocated. | |
299 | .pg | |
300 | The space on all fixed or removable disks which | |
301 | contain a file system is divided into a number of | |
302 | 512-byte | |
303 | blocks logically addressed from 0 up to a limit which | |
304 | depends on the device. | |
305 | There is space in the i-node of each file for eight device addresses. | |
306 | A \fIsmall\fR (non-special) file fits into eight or fewer | |
307 | blocks; in this case the addresses of the | |
308 | blocks themselves are stored. | |
309 | For \fIlarge\fR (non-special) files, seven of the eight device addresses may point | |
310 | to indirect blocks each containing 256 addresses | |
311 | for the data blocks of the file. | |
312 | If required, | |
313 | the eighth word is the address of a double-indirect block containing 256 more addresses | |
314 | of indirect blocks. | |
315 | Thus files may conceptually grow to (7+256)\v'-.3'.\v'.3'256\v'-.3'.\v'.3'512 bytes; | |
316 | actually they are restricted to | |
317 | 16,777,216 | |
318 | (\|2\u\*t24\*n\d\|) bytes. | |
319 | Once opened, a small file (size 1 to 8 blocks) | |
320 | can be accessed directly. | |
321 | A large file (size 9 to 32768 blocks) | |
322 | requires one additional access | |
323 | to read below logical block 1792 | |
324 | (7\v'-.3'.\v'.3'256) | |
325 | and two additional references above 1792. | |
326 | .pg | |
327 | The foregoing discussion applies to ordinary files. | |
328 | When an I/O request is made to a file whose i-node indicates that it | |
329 | is special, | |
330 | the last seven device address words are immaterial, | |
331 | and the first is interpreted as | |
332 | a pair of bytes which constitute an internal | |
333 | .ft I | |
334 | device name. | |
335 | .ft R | |
336 | These bytes specify | |
337 | respectively a device type | |
338 | and subdevice number. | |
339 | The device type indicates which | |
340 | system routine will deal with I/O on that device; | |
341 | the subdevice number selects, for example, a disk drive | |
342 | attached to a particular controller or one of several | |
343 | similar typewriter interfaces. | |
344 | .pg | |
345 | In this environment, the implementation of the | |
346 | .ft I | |
347 | mount | |
348 | .ft R | |
349 | system call (\(sc3.4) is quite straightforward. | |
350 | .ft I | |
351 | Mount | |
352 | .ft R | |
353 | maintains a system table whose | |
354 | argument is the i-number and device name of the | |
355 | ordinary file specified | |
356 | during the | |
357 | .ft I | |
358 | mount, | |
359 | .ft R | |
360 | and whose corresponding value is the | |
361 | device name of the indicated special file. | |
362 | This table is searched for each (i-number, device)-pair | |
363 | which turns up while a path name is being scanned | |
364 | during an | |
365 | .it open | |
366 | or | |
367 | .it create; | |
368 | if a match is found, | |
369 | the i-number is replaced by 1 (which is the i-number of the root | |
370 | directory on all file systems), | |
371 | and the device name is replaced by the table value. | |
372 | .pg | |
373 | To the user, both reading and writing of files appear to | |
374 | be synchronous and unbuffered. | |
375 | That is, immediately after | |
376 | return from a \fIread\fR call the data are available, and conversely | |
377 | after a \fIwrite\fR the user's workspace may be reused. | |
378 | In fact the system maintains a rather complicated | |
379 | buffering mechanism which reduces greatly the number | |
380 | of I/O operations required to access a file. | |
381 | Suppose a \fIwrite\fR call is made specifying transmission | |
382 | of a single byte. | |
383 | U\*sNIX\*n will search its buffers to see | |
384 | whether the affected disk block currently resides in core memory; | |
385 | if not, it will be read in from the device. | |
386 | Then the affected byte is replaced in the buffer and an | |
387 | entry is made in a list of blocks to be written. | |
388 | The return from the \fIwrite\fR call may then take place, | |
389 | although the actual I/O may not be completed until a later time. | |
390 | Conversely, if a single byte is read, the system determines | |
391 | whether the secondary storage block in which the byte is located is already | |
392 | in one of the system's buffers; if so, the byte can be returned immediately. | |
393 | If not, the block is read into a buffer and the byte picked out. | |
394 | .pg | |
395 | The system recognizes when | |
396 | a program has | |
397 | made accesses to | |
398 | sequential blocks of a file, | |
399 | and asynchronously | |
400 | pre-reads the next block. | |
401 | This significantly reduces | |
402 | the running time of most programs | |
403 | while adding little to | |
404 | system overhead. | |
405 | .pg | |
406 | A program which reads or writes files in units of 512 bytes | |
407 | has an advantage over a program which reads or writes | |
408 | a single byte at a time, but the gain is not immense; | |
409 | it comes mainly from the avoidance of system overhead. | |
410 | A program which is used rarely or which does | |
411 | no great volume of I/O may quite reasonably | |
412 | read and write in units as small as it wishes. | |
413 | .pg | |
414 | The notion of the i-list is an unusual feature | |
415 | of \*sUNIX\*n. | |
416 | In practice, this method of organizing the file system | |
417 | has proved quite reliable and easy to deal with. | |
418 | To the system itself, one of its strengths is | |
419 | the fact that each file has a short, unambiguous name | |
420 | which is related in a simple way to the protection, addressing, | |
421 | and other information needed to access the file. | |
422 | It also permits a quite simple and rapid | |
423 | algorithm for checking the consistency of a file system, | |
424 | for example verification | |
425 | that the portions of each device containing useful information | |
426 | and those free to be allocated are disjoint and together | |
427 | exhaust the space on the device. | |
428 | This algorithm is independent | |
429 | of the directory hierarchy, since it need only scan | |
430 | the linearly-organized i-list. | |
431 | At the same time the notion of the i-list induces certain | |
432 | peculiarities not found in other file system organizations. | |
433 | For example, there is the question of who is to be charged | |
434 | for the space a file occupies, | |
435 | since all directory entries for a file have equal status. | |
436 | Charging the owner of a file is unfair in general, | |
437 | since one user may create a file, another may link to | |
438 | it, and the first user may delete the file. | |
439 | The first user is still the owner of the | |
440 | file, but it should be charged | |
441 | to the second user. | |
442 | The simplest reasonably fair algorithm | |
443 | seems to be to spread the charges | |
444 | equally among users who have links to a file. | |
445 | The current version of \*sUNIX\*n avoids the | |
446 | issue by not charging any fees at all. | |
447 | .s2 | |
448 | 4.1 Efficiency of the file system | |
449 | .es | |
450 | To provide an indication of the overall efficiency | |
451 | of \*sUNIX\*n and of the file system in particular, | |
452 | timings were made of the assembly of | |
453 | a 8848-line program. | |
454 | The assembly was run alone on the machine; | |
455 | the total clock time was 32 seconds, | |
456 | for a rate of 276 lines per second. | |
457 | The time was divided as follows: | |
458 | 66% assembler execution time, | |
459 | 21% system overhead, | |
460 | 13% disk wait time. | |
461 | We will not attempt any interpretation of these figures | |
462 | nor any comparison with other systems, but merely | |
463 | note that we are | |
464 | generally satisfied with the overall performance | |
465 | of the system. |