Commit | Line | Data |
---|---|---|
5c938adf KB |
1 | .\" Copyright (c) 1982, 1993 |
2 | .\" The Regents of the University of California. All rights reserved. | |
a38b2411 | 3 | .\" |
3edcb7c8 KB |
4 | .\" %sccs.include.redist.roff% |
5 | .\" | |
5c938adf | 6 | .\" @(#)2.t 8.1 (Berkeley) %G% |
a38b2411 KM |
7 | .\" |
8 | .ds RH Overview of the file system | |
9 | .NH | |
10 | Overview of the file system | |
11 | .PP | |
6a1194d8 | 12 | The file system is discussed in detail in [Mckusick84]; |
a38b2411 KM |
13 | this section gives a brief overview. |
14 | .NH 2 | |
15 | Superblock | |
16 | .PP | |
17 | A file system is described by its | |
18 | .I "super-block" . | |
19 | The super-block is built when the file system is created (\c | |
20 | .I newfs (8)) | |
21 | and never changes. | |
22 | The super-block | |
23 | contains the basic parameters of the file system, | |
24 | such as the number of data blocks it contains | |
25 | and a count of the maximum number of files. | |
26 | Because the super-block contains critical data, | |
27 | .I newfs | |
28 | replicates it to protect against catastrophic loss. | |
29 | The | |
30 | .I "default super block" | |
31 | always resides at a fixed offset from the beginning | |
32 | of the file system's disk partition. | |
33 | The | |
34 | .I "redundant super blocks" | |
35 | are not referenced unless a head crash | |
36 | or other hard disk error causes the default super-block | |
37 | to be unusable. | |
38 | The redundant blocks are sprinkled throughout the disk partition. | |
39 | .PP | |
40 | Within the file system are files. | |
41 | Certain files are distinguished as directories and contain collections | |
42 | of pointers to files that may themselves be directories. | |
43 | Every file has a descriptor associated with it called an | |
44 | .I "inode". | |
45 | The inode contains information describing ownership of the file, | |
46 | time stamps indicating modification and access times for the file, | |
47 | and an array of indices pointing to the data blocks for the file. | |
48 | In this section, | |
49 | we assume that the first 12 blocks | |
50 | of the file are directly referenced by values stored | |
51 | in the inode structure itself\(dg. | |
52 | .FS | |
53 | \(dgThe actual number may vary from system to system, but is usually in | |
54 | the range 5-13. | |
55 | .FE | |
56 | The inode structure may also contain references to indirect blocks | |
57 | containing further data block indices. | |
58 | In a file system with a 4096 byte block size, a singly indirect | |
59 | block contains 1024 further block addresses, | |
60 | a doubly indirect block contains 1024 addresses of further single indirect | |
61 | blocks, | |
62 | and a triply indirect block contains 1024 addresses of further doubly indirect | |
145bc69d | 63 | blocks (the triple indirect block is never needed in practice). |
a38b2411 KM |
64 | .PP |
65 | In order to create files with up to | |
66 | 2\(ua32 bytes, | |
67 | using only two levels of indirection, | |
68 | the minimum size of a file system block is 4096 bytes. | |
69 | The size of file system blocks can be any power of two | |
70 | greater than or equal to 4096. | |
71 | The block size of the file system is maintained in the super-block, | |
72 | so it is possible for file systems of different block sizes | |
73 | to be accessible simultaneously on the same system. | |
74 | The block size must be decided when | |
75 | .I newfs | |
76 | creates the file system; | |
77 | the block size cannot be subsequently | |
78 | changed without rebuilding the file system. | |
79 | .NH 2 | |
80 | Summary information | |
81 | .PP | |
82 | Associated with the super block is non replicated | |
83 | .I "summary information" . | |
84 | The summary information changes | |
85 | as the file system is modified. | |
86 | The summary information contains | |
87 | the number of blocks, fragments, inodes and directories in the file system. | |
88 | .NH 2 | |
89 | Cylinder groups | |
90 | .PP | |
91 | The file system partitions the disk into one or more areas called | |
92 | .I "cylinder groups". | |
93 | A cylinder group is comprised of one or more consecutive | |
94 | cylinders on a disk. | |
95 | Each cylinder group includes inode slots for files, a | |
96 | .I "block map" | |
97 | describing available blocks in the cylinder group, | |
98 | and summary information describing the usage of data blocks | |
99 | within the cylinder group. | |
100 | A fixed number of inodes is allocated for each cylinder group | |
101 | when the file system is created. | |
102 | The current policy is to allocate one inode for each 2048 | |
103 | bytes of disk space; | |
104 | this is expected to be far more inodes than will ever be needed. | |
105 | .PP | |
106 | All the cylinder group bookkeeping information could be | |
107 | placed at the beginning of each cylinder group. | |
108 | However if this approach were used, | |
109 | all the redundant information would be on the top platter. | |
110 | A single hardware failure that destroyed the top platter | |
111 | could cause the loss of all copies of the redundant super-blocks. | |
112 | Thus the cylinder group bookkeeping information | |
113 | begins at a floating offset from the beginning of the cylinder group. | |
114 | The offset for | |
115 | the | |
116 | .I "i+1" st | |
117 | cylinder group is about one track further | |
118 | from the beginning of the cylinder group | |
119 | than it was for the | |
120 | .I "i" th | |
121 | cylinder group. | |
122 | In this way, | |
123 | the redundant | |
124 | information spirals down into the pack; | |
125 | any single track, cylinder, | |
126 | or platter can be lost without losing all copies of the super-blocks. | |
127 | Except for the first cylinder group, | |
128 | the space between the beginning of the cylinder group | |
129 | and the beginning of the cylinder group information stores data. | |
130 | .NH 2 | |
131 | Fragments | |
132 | .PP | |
133 | To avoid waste in storing small files, | |
134 | the file system space allocator divides a single | |
135 | file system block into one or more | |
136 | .I "fragments". | |
137 | The fragmentation of the file system is specified | |
138 | when the file system is created; | |
139 | each file system block can be optionally broken into | |
140 | 2, 4, or 8 addressable fragments. | |
141 | The lower bound on the size of these fragments is constrained | |
142 | by the disk sector size; | |
143 | typically 512 bytes is the lower bound on fragment size. | |
144 | The block map associated with each cylinder group | |
145 | records the space availability at the fragment level. | |
146 | Aligned fragments are examined | |
147 | to determine block availability. | |
148 | .PP | |
149 | On a file system with a block size of 4096 bytes | |
150 | and a fragment size of 1024 bytes, | |
151 | a file is represented by zero or more 4096 byte blocks of data, | |
152 | and possibly a single fragmented block. | |
153 | If a file system block must be fragmented to obtain | |
154 | space for a small amount of data, | |
155 | the remainder of the block is made available for allocation | |
156 | to other files. | |
157 | For example, | |
158 | consider an 11000 byte file stored on | |
159 | a 4096/1024 byte file system. | |
160 | This file uses two full size blocks and a 3072 byte fragment. | |
161 | If no fragments with at least 3072 bytes | |
162 | are available when the file is created, | |
163 | a full size block is split yielding the necessary 3072 byte | |
164 | fragment and an unused 1024 byte fragment. | |
165 | This remaining fragment can be allocated to another file, as needed. | |
166 | .NH 2 | |
167 | Updates to the file system | |
168 | .PP | |
169 | Every working day hundreds of files | |
170 | are created, modified, and removed. | |
171 | Every time a file is modified, | |
172 | the operating system performs a | |
173 | series of file system updates. | |
174 | These updates, when written on disk, yield a consistent file system. | |
175 | The file system stages | |
176 | all modifications of critical information; | |
177 | modification can | |
178 | either be completed or cleanly backed out after a crash. | |
179 | Knowing the information that is first written to the file system, | |
180 | deterministic procedures can be developed to | |
181 | repair a corrupted file system. | |
182 | To understand this process, | |
183 | the order that the update | |
184 | requests were being honored must first be understood. | |
185 | .PP | |
186 | When a user program does an operation to change the file system, | |
187 | such as a | |
188 | .I write , | |
189 | the data to be written is copied into an internal | |
190 | .I "in-core" | |
191 | buffer in the kernel. | |
192 | Normally, the disk update is handled asynchronously; | |
193 | the user process is allowed to proceed even though | |
194 | the data has not yet been written to the disk. | |
195 | The data, | |
196 | along with the inode information reflecting the change, | |
197 | is eventually written out to disk. | |
198 | The real disk write may not happen until long after the | |
199 | .I write | |
200 | system call has returned. | |
201 | Thus at any given time, the file system, | |
202 | as it resides on the disk, | |
203 | lags the state of the file system represented by the in-core information. | |
204 | .PP | |
205 | The disk information is updated to reflect the in-core information | |
206 | when the buffer is required for another use, | |
207 | when a | |
208 | .I sync (2) | |
209 | is done (at 30 second intervals) by | |
210 | .I "/etc/update" "(8)," | |
211 | or by manual operator intervention with the | |
212 | .I sync (8) | |
213 | command. | |
214 | If the system is halted without writing out the in-core information, | |
215 | the file system on the disk will be in an inconsistent state. | |
216 | .PP | |
217 | If all updates are done asynchronously, several serious | |
218 | inconsistencies can arise. | |
219 | One inconsistency is that a block may be claimed by two inodes. | |
220 | Such an inconsistency can occur when the system is halted before | |
221 | the pointer to the block in the old inode has been cleared | |
222 | in the copy of the old inode on the disk, | |
223 | and after the pointer to the block in the new inode has been written out | |
224 | to the copy of the new inode on the disk. | |
225 | Here, | |
226 | there is no deterministic method for deciding | |
227 | which inode should really claim the block. | |
228 | A similar problem can arise with a multiply claimed inode. | |
229 | .PP | |
230 | The problem with asynchronous inode updates | |
231 | can be avoided by doing all inode deallocations synchronously. | |
232 | Consequently, | |
233 | inodes and indirect blocks are written to the disk synchronously | |
234 | (\fIi.e.\fP the process blocks until the information is | |
235 | really written to disk) | |
236 | when they are being deallocated. | |
237 | Similarly inodes are kept consistent by synchronously | |
238 | deleting, adding, or changing directory entries. | |
239 | .ds RH Fixing corrupted file systems |