Commit | Line | Data |
---|---|---|
c3f80101 C |
1 | .TH CRASH 8V "1 September 1981" |
2 | .UC 4 | |
3 | .SH NAME | |
4 | crash \- what happens when the system crashes | |
5 | .SH DESCRIPTION | |
6 | This section explains what happens when the system crashes and how | |
7 | you can analyze crash dumps. | |
8 | .PP | |
9 | When the system crashes voluntarily it prints a message of the form | |
10 | .IP | |
11 | panic: why i gave up the ghost | |
12 | .LP | |
13 | on the console, takes a dump on a mass storage peripheral, | |
14 | and then invokes an automatic reboot procedure as | |
15 | described in | |
16 | .IR reboot (8). | |
17 | (If auto-reboot is disabled on the front panel of the machine the system | |
18 | will simply halt at this point.) | |
19 | Unless some unexpected inconsistency is encountered in the state | |
20 | of the file systems due to hardware or software failure the system | |
21 | will then resume multi-user operations. | |
22 | .PP | |
23 | The system has a large number of internal consistency checks; if one | |
24 | of these fails, then it will panic with a very short message indicating | |
25 | which one failed. | |
26 | .PP | |
27 | The most common cause of system failures is hardware failure, which | |
28 | can reflect itself in different ways. Here are the messages which | |
29 | you are likely to encounter, with some hints as to causes. | |
30 | Left unstated in all cases is the possibility that hardware or software | |
31 | error produced the message in some unexpected way. | |
32 | .TP | |
33 | .B IO err in push | |
34 | .ns | |
35 | .TP | |
36 | .B hard IO err in swap | |
37 | The system encountered an error trying to write to the paging device | |
38 | or an error in reading critical information from a disk drive. | |
39 | You should fix your disk if it is broken or unreliable. | |
40 | .TP | |
41 | .B timeout table overflow | |
42 | .ns | |
43 | This really shouldn't be a panic, but until we fix up the data structure | |
44 | involved, running out of entries causes a crash. If this happens, | |
45 | you should make the timeout table bigger. | |
46 | .TP | |
47 | .B KSP not valid | |
48 | .ns | |
49 | .TP | |
50 | .B SBI fault | |
51 | .ns | |
52 | .TP | |
53 | .B CHM? in kernel | |
54 | These indicate either a serious bug in the system or, more often, | |
55 | a glitch or failing hardware. | |
56 | If SBI faults recur, check out the hardware or call | |
57 | field service. If the other faults recur, there is likely a bug somewhere | |
58 | in the system, although these can be caused by a flakey processor. | |
59 | Run processor microdiagnostics. | |
60 | .TP | |
61 | .B machine check %x: | |
62 | .I description | |
63 | .ns | |
64 | .TP | |
65 | .I \0\0\0machine dependent machine-check information | |
66 | .ns | |
67 | We should describe machine checks, and will someday. | |
68 | For now, ask someone who knows (like your friendly field service people). | |
69 | .TP | |
70 | .B trap type %d, code=%d, pc=%x | |
71 | A unexpected trap has occurred within the system; the trap types are: | |
72 | .sp | |
73 | .nf | |
74 | 0 reserved addressing fault | |
75 | 1 privileged instruction fault | |
76 | 2 reserved operand fault | |
77 | 3 bpt instruction fault | |
78 | 4 xfc instruction fault | |
79 | 5 system call trap | |
80 | 6 arithmetic trap | |
81 | 7 ast delivery trap | |
82 | 8 segmentation fault | |
83 | 9 protection fault | |
84 | 10 trace trap | |
85 | 11 compatibility mode fault | |
86 | 12 page fault | |
87 | 13 page table fault | |
88 | .fi | |
89 | .sp | |
90 | The favorite trap types in system crashes are trap types 8 and 9, | |
91 | indicating | |
92 | a wild reference. The code is the referenced address, and the pc at the | |
93 | time of the fault is printed. These problems tend to be easy to track | |
94 | down if they are kernel bugs since the processor stops cold, but random | |
95 | flakiness seems to cause this sometimes. | |
96 | .TP | |
97 | .B init died | |
98 | The system initialization process has exited. This is bad news, as no new | |
99 | users will then be able to log in. Rebooting is the only fix, so the | |
100 | system just does it right away. | |
101 | .PP | |
102 | That completes the list of panic types you are likely to see. | |
103 | .PP | |
104 | When the system crashes it writes (or at least attempts to write) | |
105 | an image of memory into the back end of the primary swap | |
106 | area. After the system is rebooted, the program | |
107 | .IR savecore (8) | |
108 | runs and preserves a copy of this core image and the current | |
109 | system in a specified directory for later perusal. See | |
110 | .IR savecore (8) | |
111 | for details. | |
112 | .PP | |
113 | To analyze a dump you should begin by running | |
114 | .I "ps \-alxk" | |
115 | to print the process table at the time of the crash. | |
116 | Use | |
117 | .IR adb (1) | |
118 | to examine | |
119 | .IR /vmcore . | |
120 | The location | |
121 | .I _rpb+0t508 | |
122 | is the bottom of a stack onto which were pushed the stack pointer | |
123 | .BR sp , | |
124 | .B PCBB | |
125 | (containing the physical address of a | |
126 | .IR u_area ), | |
127 | .BR MAPEN , | |
128 | .BR IPL , | |
129 | and registers | |
130 | .BR r13 \- r0 | |
131 | (in that order). | |
132 | .BR r13 (fp) | |
133 | is the system frame pointer and the stack is used in standard | |
134 | .B calls | |
135 | format. Use | |
136 | .IR adb (1) | |
137 | to get a reverse calling order. | |
138 | In most cases this procedure will give | |
139 | an idea of what is wrong. | |
140 | A more complete discussion | |
141 | of system debugging is impossible here. | |
142 | See, however, | |
143 | .IR analyze (8) | |
144 | for some more hints. | |
145 | .SH "SEE ALSO" | |
146 | adb(1), | |
147 | analyze(8), | |
148 | reboot(8) | |
149 | .br | |
150 | .I "VAX 11/780 System Maintenance Guide" | |
151 | for more information about machine checks. | |
152 | .SH BUGS | |
153 | There should be a better program than | |
154 | .IR analyze (8) | |
155 | available which prints out more of the system | |
156 | state symbolically after a crash to lessen the tedious | |
157 | tasks involved in crash analysis. |