BSD 4_1c_2 development
[unix-history] / usr / man / man8 / crash.8
CommitLineData
c3f80101
C
1.TH CRASH 8V "1 September 1981"
2.UC 4
3.SH NAME
4crash \- what happens when the system crashes
5.SH DESCRIPTION
6This section explains what happens when the system crashes and how
7you can analyze crash dumps.
8.PP
9When the system crashes voluntarily it prints a message of the form
10.IP
11panic: why i gave up the ghost
12.LP
13on the console, takes a dump on a mass storage peripheral,
14and then invokes an automatic reboot procedure as
15described in
16.IR reboot (8).
17(If auto-reboot is disabled on the front panel of the machine the system
18will simply halt at this point.)
19Unless some unexpected inconsistency is encountered in the state
20of the file systems due to hardware or software failure the system
21will then resume multi-user operations.
22.PP
23The system has a large number of internal consistency checks; if one
24of these fails, then it will panic with a very short message indicating
25which one failed.
26.PP
27The most common cause of system failures is hardware failure, which
28can reflect itself in different ways. Here are the messages which
29you are likely to encounter, with some hints as to causes.
30Left unstated in all cases is the possibility that hardware or software
31error produced the message in some unexpected way.
32.TP
33.B IO err in push
34.ns
35.TP
36.B hard IO err in swap
37The system encountered an error trying to write to the paging device
38or an error in reading critical information from a disk drive.
39You should fix your disk if it is broken or unreliable.
40.TP
41.B timeout table overflow
42.ns
43This really shouldn't be a panic, but until we fix up the data structure
44involved, running out of entries causes a crash. If this happens,
45you should make the timeout table bigger.
46.TP
47.B KSP not valid
48.ns
49.TP
50.B SBI fault
51.ns
52.TP
53.B CHM? in kernel
54These indicate either a serious bug in the system or, more often,
55a glitch or failing hardware.
56If SBI faults recur, check out the hardware or call
57field service. If the other faults recur, there is likely a bug somewhere
58in the system, although these can be caused by a flakey processor.
59Run processor microdiagnostics.
60.TP
61.B machine check %x:
62.I description
63.ns
64.TP
65.I \0\0\0machine dependent machine-check information
66.ns
67We should describe machine checks, and will someday.
68For now, ask someone who knows (like your friendly field service people).
69.TP
70.B trap type %d, code=%d, pc=%x
71A unexpected trap has occurred within the system; the trap types are:
72.sp
73.nf
740 reserved addressing fault
751 privileged instruction fault
762 reserved operand fault
773 bpt instruction fault
784 xfc instruction fault
795 system call trap
806 arithmetic trap
817 ast delivery trap
828 segmentation fault
839 protection fault
8410 trace trap
8511 compatibility mode fault
8612 page fault
8713 page table fault
88.fi
89.sp
90The favorite trap types in system crashes are trap types 8 and 9,
91indicating
92a wild reference. The code is the referenced address, and the pc at the
93time of the fault is printed. These problems tend to be easy to track
94down if they are kernel bugs since the processor stops cold, but random
95flakiness seems to cause this sometimes.
96.TP
97.B init died
98The system initialization process has exited. This is bad news, as no new
99users will then be able to log in. Rebooting is the only fix, so the
100system just does it right away.
101.PP
102That completes the list of panic types you are likely to see.
103.PP
104When the system crashes it writes (or at least attempts to write)
105an image of memory into the back end of the primary swap
106area. After the system is rebooted, the program
107.IR savecore (8)
108runs and preserves a copy of this core image and the current
109system in a specified directory for later perusal. See
110.IR savecore (8)
111for details.
112.PP
113To analyze a dump you should begin by running
114.I "ps \-alxk"
115to print the process table at the time of the crash.
116Use
117.IR adb (1)
118to examine
119.IR /vmcore .
120The location
121.I _rpb+0t508
122is the bottom of a stack onto which were pushed the stack pointer
123.BR sp ,
124.B PCBB
125(containing the physical address of a
126.IR u_area ),
127.BR MAPEN ,
128.BR IPL ,
129and registers
130.BR r13 \- r0
131(in that order).
132.BR r13 (fp)
133is the system frame pointer and the stack is used in standard
134.B calls
135format. Use
136.IR adb (1)
137to get a reverse calling order.
138In most cases this procedure will give
139an idea of what is wrong.
140A more complete discussion
141of system debugging is impossible here.
142See, however,
143.IR analyze (8)
144for some more hints.
145.SH "SEE ALSO"
146adb(1),
147analyze(8),
148reboot(8)
149.br
150.I "VAX 11/780 System Maintenance Guide"
151for more information about machine checks.
152.SH BUGS
153There should be a better program than
154.IR analyze (8)
155available which prints out more of the system
156state symbolically after a crash to lessen the tedious
157tasks involved in crash analysis.