BSD 4_3_Reno development
[unix-history] / usr / share / man / cat8 / vax / crash.0
CommitLineData
3922de1d
C
1
2
3
4CRASH(8V) 1990 CRASH(8V)
5
6
7
8N\bNA\bAM\bME\bE
9 crash - what happens when the system crashes
10
11D\bDE\bES\bSC\bCR\bRI\bIP\bPT\bTI\bIO\bON\bN
12 This section explains what happens when the system crashes
13 and (very briefly) how to analyze crash dumps.
14
15 When the system crashes voluntarily it prints a message of
16 the form
17
18 panic: why i gave up the ghost
19
20 on the console, takes a dump on a mass storage peripheral,
21 and then invokes an automatic reboot procedure as described
22 in _\br_\be_\bb_\bo_\bo_\bt(8). (If auto-reboot is disabled on the front
23 panel of the machine the system will simply halt at this
24 point.) Unless some unexpected inconsistency is encountered
25 in the state of the file systems due to hardware or software
26 failure, the system will then resume multi-user operations.
27
28 The system has a large number of internal consistency
29 checks; if one of these fails, then it will panic with a
30 very short message indicating which one failed. In many
31 instances, this will be the name of the routine which
32 detected the error, or a two-word description of the incon-
33 sistency. A full understanding of most panic messages
34 requires perusal of the source code for the system.
35
36 The most common cause of system failures is hardware
37 failure, which can reflect itself in different ways. Here
38 are the messages which are most likely, with some hints as
39 to causes. Left unstated in all cases is the possibility
40 that hardware or software error produced the message in some
41 unexpected way.
42
43 i\bii\bin\bni\bit\bt
44 This cryptic panic message results from a failure to
45 mount the root filesystem during the bootstrap process.
46 Either the root filesystem has been corrupted, or the
47 system is attempting to use the wrong device as root
48 filesystem. Usually, an alternate copy of the system
49 binary or an alternate root filesystem can be used to
50 bring up the system to investigate.
51
52 C\bCa\ban\bn'\b't\bt e\bex\bxe\bec\bc /\b/s\bsb\bbi\bin\bn/\b/i\bin\bni\bit\bt
53 This is not a panic message, as reboots are likely to
54 be futile. Late in the bootstrap procedure, the system
55 was unable to locate and execute the initialization
56 process, _\bi_\bn_\bi_\bt(8). The root filesystem is incorrect or
57 has been corrupted, or the mode or type of /sbin/init
58 forbids execution.
59
60
61
62
63Printed 7/27/90 June 1
64
65
66
67
68
69
70CRASH(8V) 1990 CRASH(8V)
71
72
73
74 I\bIO\bO e\ber\brr\br i\bin\bn p\bpu\bus\bsh\bh
75 h\bha\bar\brd\bd I\bIO\bO e\ber\brr\br i\bin\bn s\bsw\bwa\bap\bp
76 The system encountered an error trying to write to the
77 paging device or an error in reading critical informa-
78 tion from a disk drive. The offending disk should be
79 fixed if it is broken or unreliable.
80
81 r\bre\bea\bal\bll\blo\boc\bcc\bcg\bg:\b: b\bba\bad\bd o\bop\bpt\bti\bim\bm
82 i\bia\bal\bll\blo\boc\bc:\b: d\bdu\bup\bp a\bal\bll\blo\boc\bc
83 a\bal\bll\blo\boc\bcc\bcg\bgb\bbl\blk\bk:\b: c\bcy\byl\bl g\bgr\bro\bou\bup\bps\bs c\bco\bor\brr\bru\bup\bpt\bte\bed\bd
84 i\bia\bal\bll\blo\boc\bcc\bcg\bg:\b: m\bma\bap\bp c\bco\bor\brr\bru\bup\bpt\bte\bed\bd
85 f\bfr\bre\bee\be:\b: f\bfr\bre\bee\bei\bin\bng\bg f\bfr\bre\bee\be b\bbl\blo\boc\bck\bk
86 f\bfr\bre\bee\be:\b: f\bfr\bre\bee\bei\bin\bng\bg f\bfr\bre\bee\be f\bfr\bra\bag\bg
87 i\bif\bfr\bre\bee\be:\b: f\bfr\bre\bee\bei\bin\bng\bg f\bfr\bre\bee\be i\bin\bno\bod\bde\be
88 a\bal\bll\blo\boc\bcc\bcg\bg:\b: m\bma\bap\bp c\bco\bor\brr\bru\bup\bpt\bte\bed\bd
89 These panic messages are among those that may be pro-
90 duced when filesystem inconsistencies are detected.
91 The problem generally results from a failure to repair
92 damaged filesystems after a crash, hardware failures,
93 or other condition that should not normally occur. A
94 filesystem check will normally correct the problem.
95
96 t\bti\bim\bme\beo\bou\but\bt t\bta\bab\bbl\ble\be o\bov\bve\ber\brf\bfl\blo\bow\bw
97 This really shouldn't be a panic, but until the data
98 structure involved is made to be extensible, running
99 out of entries causes a crash. If this happens, make
100 the timeout table bigger.
101
102 K\bKS\bSP\bP n\bno\bot\bt v\bva\bal\bli\bid\bd
103 S\bSB\bBI\bI f\bfa\bau\bul\blt\bt
104 C\bCH\bHM\bM?\b? i\bin\bn k\bke\ber\brn\bne\bel\bl
105 These indicate either a serious bug in the system or,
106 more often, a glitch or failing hardware. If SBI
107 faults recur, check out the hardware or call field ser-
108 vice. If the other faults recur, there is likely a bug
109 somewhere in the system, although these can be caused
110 by a flakey processor. Run processor microdiagnostics.
111
112 m\bma\bac\bch\bhi\bin\bne\be c\bch\bhe\bec\bck\bk %\b%x\bx:\b:
113 _\bd_\be_\bs_\bc_\br_\bi_\bp_\bt_\bi_\bo_\bn
114
115 _\bm_\ba_\bc_\bh_\bi_\bn_\be _\bd_\be_\bp_\be_\bn_\bd_\be_\bn_\bt _\bm_\ba_\bc_\bh_\bi_\bn_\be-_\bc_\bh_\be_\bc_\bk _\bi_\bn_\bf_\bo_\br_\bm_\ba_\bt_\bi_\bo_\bn
116 Machine checks are different on each type of CPU. Most
117 of the internal processor registers are saved at the
118 time of the fault and are printed on the console. For
119 most processors, there is one line that summarizes the
120 type of machine check. Often, the nature of the prob-
121 lem is apparent from this messaage and/or the contents
122 of key registers. The VAX Hardware Handbook should be
123 consulted, and, if necessary, your friendly field ser-
124 vice people should be informed of the problem.
125
126
127
128
129Printed 7/27/90 June 2
130
131
132
133
134
135
136CRASH(8V) 1990 CRASH(8V)
137
138
139
140 t\btr\bra\bap\bp t\bty\byp\bpe\be %\b%d\bd,\b, c\bco\bod\bde\be=\b=%\b%x\bx,\b, p\bpc\bc=\b=%\b%x\bx
141 A unexpected trap has occurred within the system; the
142 trap types are:
143
144 0 reserved addressing fault
145 1 privileged instruction fault
146 2 reserved operand fault
147 3 bpt instruction fault
148 4 xfc instruction fault
149 5 system call trap
150 6 arithmetic trap
151 7 ast delivery trap
152 8 segmentation fault
153 9 protection fault
154 10 trace trap
155 11 compatibility mode fault
156 12 page fault
157 13 page table fault
158
159 The favorite trap types in system crashes are trap
160 types 8 and 9, indicating a wild reference. The code
161 is the referenced address, and the pc at the time of
162 the fault is printed. These problems tend to be easy
163 to track down if they are kernel bugs since the proces-
164 sor stops cold, but random flakiness seems to cause
165 this sometimes. The debugger can be used to locate the
166 instruction and subroutine corresponding to the PC
167 value. If that is insufficient to suggest the nature
168 of the problem, more detailed examination of the system
169 status at the time of the trap usually can produce an
170 explanation.
171
172 i\bin\bni\bit\bt d\bdi\bie\bed\bd
173 The system initialization process has exited. This is
174 bad news, as no new users will then be able to log in.
175 Rebooting is the only fix, so the system just does it
176 right away.
177
178 o\bou\but\bt o\bof\bf m\bmb\bbu\buf\bfs\bs:\b: m\bma\bap\bp f\bfu\bul\bll\bl
179 The network has exhausted its private page map for net-
180 work buffers. This usually indicates that buffers are
181 being lost, and rather than allow the system to slowly
182 degrade, it reboots immediately. The map may be made
183 larger if necessary.
184
185 That completes the list of panic types you are likely to
186 see.
187
188 When the system crashes it writes (or at least attempts to
189 write) an image of memory into the back end of the dump dev-
190 ice, usually the same as the primary swap area. After the
191 system is rebooted, the program _\bs_\ba_\bv_\be_\bc_\bo_\br_\be(8) runs and
192
193
194
195Printed 7/27/90 June 3
196
197
198
199
200
201
202CRASH(8V) 1990 CRASH(8V)
203
204
205
206 preserves a copy of this core image and the current system
207 in a specified directory for later perusal. See _\bs_\ba_\bv_\be_\bc_\bo_\br_\be(8)
208 for details.
209
210 To analyze a dump you should begin by running _\ba_\bd_\bb(1) with
211 the -\b-k\bk flag on the system load image and core dump. If the
212 core image is the result of a panic, the panic message is
213 printed. Normally the command ``$c'' will provide a stack
214 trace from the point of the crash and this will provide a
215 clue as to what went wrong. A more complete discussion of
216 system debugging is impossible here. See, however, ``Using
217 ADB to Debug the UNIX Kernel''.
218
219S\bSE\bEE\bE A\bAL\bLS\bSO\bO
220 adb(1), reboot(8)
221 _\bV_\bA_\bX _\b1_\b1/_\b7_\b8_\b0 _\bS_\by_\bs_\bt_\be_\bm _\bM_\ba_\bi_\bn_\bt_\be_\bn_\ba_\bn_\bc_\be _\bG_\bu_\bi_\bd_\be and _\bV_\bA_\bX _\bH_\ba_\br_\bd_\bw_\ba_\br_\be _\bH_\ba_\bn_\bd_\b-
222 _\bb_\bo_\bo_\bk for more information about machine checks.
223 _\bU_\bs_\bi_\bn_\bg _\bA_\bD_\bB _\bt_\bo _\bD_\be_\bb_\bu_\bg _\bt_\bh_\be _\bU_\bN_\bI_\bX _\bK_\be_\br_\bn_\be_\bl
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261Printed 7/27/90 June 4
262
263
264