Commit | Line | Data |
---|---|---|
e7839a72 C |
1 | .TH CRASH 8V "1 September 1981" |
2 | .UC 4 | |
3 | .SH NAME | |
4 | crash \- what happens when the system crashes | |
5 | .SH DESCRIPTION | |
6 | This section explains what happens when the system crashes and how | |
7 | you can analyze crash dumps. | |
8 | .PP | |
9 | When the system crashes voluntarily it prints a message of the form | |
10 | .IP | |
11 | panic: why i gave up the ghost | |
12 | .LP | |
13 | on the console, takes a dump on a mass storage peripheral, | |
14 | and then invokes an automatic reboot procedure as | |
15 | described in | |
16 | .IR reboot (8). | |
17 | (If auto-reboot is disabled on the front panel of the machine the system | |
18 | will simply halt at this point.) | |
19 | Unless some unexpected inconsistency is encountered in the state | |
20 | of the file systems due to hardware or software failure the system | |
21 | will then resume multi-user operations. | |
22 | .PP | |
23 | The system has a large number of internal consistency checks; if one | |
24 | of these fails, then it will panic with a very short message indicating | |
25 | which one failed. | |
26 | .PP | |
27 | The most common cause of system failures is hardware failure, which | |
28 | can reflect itself in different ways. Here are the messages which | |
29 | you are likely to encounter, with some hints as to causes. | |
30 | Left unstated in all cases is the possibility that hardware or software | |
31 | error produced the message in some unexpected way. | |
32 | .TP | |
33 | .B IO err in push | |
34 | .ns | |
35 | .TP | |
36 | .B hard IO err in swap | |
37 | The system encountered an error trying to write to the paging device | |
38 | or an error in reading critical information from a disk drive. | |
39 | You should fix your disk if it is broken or unreliable. | |
40 | .TP | |
41 | .B timeout table overflow | |
42 | .ns | |
43 | This really shouldn't be a panic, but until we fix up the data structure | |
44 | involved, running out of entries causes a crash. If this happens, | |
45 | you should make the timeout table bigger. | |
46 | .TP | |
47 | .B KSP not valid | |
48 | .ns | |
49 | .TP | |
50 | .B SBI fault | |
51 | .ns | |
52 | .TP | |
53 | .B CHM? in kernel | |
54 | These indicate either a serious bug in the system or, more often, | |
55 | a glitch or failing hardware. | |
56 | If SBI faults recur, check out the hardware or call | |
57 | field service. If the other faults recur, there is likely a bug somewhere | |
58 | in the system, although these can be caused by a flakey processor. | |
59 | Run processor microdiagnostics. | |
60 | .TP | |
61 | .B machine check %x: | |
62 | .I description | |
63 | .ns | |
64 | .TP | |
65 | .I \0\0\0machine dependent machine-check information | |
66 | .ns | |
67 | We should describe machine checks, and will someday. | |
68 | For now, ask someone who knows (like your friendly field service people). | |
69 | .TP | |
70 | .B trap type %d, code=%d, pc=%x | |
71 | A unexpected trap has occurred within the system; the trap types are: | |
72 | .sp | |
73 | .nf | |
74 | 0 reserved addressing fault | |
75 | 1 privileged instruction fault | |
76 | 2 reserved operand fault | |
77 | 3 bpt instruction fault | |
78 | 4 xfc instruction fault | |
79 | 5 system call trap | |
80 | 6 arithmetic trap | |
81 | 7 ast delivery trap | |
82 | 8 segmentation fault | |
83 | 9 protection fault | |
84 | 10 trace trap | |
85 | 11 compatibility mode fault | |
86 | 12 page fault | |
87 | 13 page table fault | |
88 | .fi | |
89 | .sp | |
90 | The favorite trap types in system crashes are trap types 8 and 9, | |
91 | indicating | |
92 | a wild reference. The code is the referenced address, and the pc at the | |
93 | time of the fault is printed. These problems tend to be easy to track | |
94 | down if they are kernel bugs since the processor stops cold, but random | |
95 | flakiness seems to cause this sometimes. | |
96 | .TP | |
97 | .B init died | |
98 | The system initialization process has exited. This is bad news, as no new | |
99 | users will then be able to log in. Rebooting is the only fix, so the | |
100 | system just does it right away. | |
101 | .PP | |
102 | That completes the list of panic types you are likely to see. | |
103 | .PP | |
104 | When the system crashes it writes (or at least attempts to write) | |
105 | an image of memory into the back end of the primary swap | |
106 | area. After the system is rebooted, the program | |
107 | .IR savecore (8) | |
108 | runs and preserves a copy of this core image and the current | |
109 | system in a specified directory for later perusal. See | |
110 | .IR savecore (8) | |
111 | for details. | |
112 | .PP | |
113 | To analyze a dump you should begin by running | |
114 | .IR adb (1) | |
115 | with the | |
116 | .B \-k | |
117 | flag on the core dump. | |
118 | Normally the command | |
119 | ``*(intstack-4)$c'' | |
120 | will provide a stack trace from the point of | |
121 | the crash and this will provide a clue as to | |
122 | what went wrong. | |
123 | A more complete discussion | |
124 | of system debugging is impossible here. | |
125 | See, however, | |
126 | ``Using ADB to Debug the UNIX Kernel''. | |
127 | .SH "SEE ALSO" | |
128 | adb(1), | |
129 | analyze(8), | |
130 | reboot(8) | |
131 | .br | |
132 | .I "VAX 11/780 System Maintenance Guide" | |
133 | for more information about machine checks. | |
134 | .br | |
135 | .I "Using ADB to Debug the UNIX Kernel" |