Commit | Line | Data |
---|---|---|
b42074ab CL |
1 | .\" Copyright (c) 1990, 1991 Regents of the University of California. |
2 | .\" All rights reserved. | |
610c010b | 3 | .\" |
b42074ab | 4 | .\" %sccs.include.redist.man% |
610c010b | 5 | .\" |
b42074ab CL |
6 | .\" @(#)crash.8 5.2 (Berkeley) %G% |
7 | .\" | |
8 | .Dd | |
9 | .Dt CRASH 8 | |
10 | .Os | |
11 | .Sh NAME | |
12 | .Nm crash | |
13 | .Nd UNIX system failures | |
14 | .Sh DESCRIPTION | |
15 | This section explains a bit about system crashes | |
610c010b | 16 | and (very briefly) how to analyze crash dumps. |
b42074ab | 17 | .Pp |
610c010b | 18 | When the system crashes voluntarily it prints a message of the form |
b42074ab | 19 | .Bd -ragged -offset indent |
610c010b | 20 | panic: why i gave up the ghost |
b42074ab CL |
21 | .Ed |
22 | .Pp | |
610c010b KB |
23 | on the console, takes a dump on a mass storage peripheral, |
24 | and then invokes an automatic reboot procedure as | |
25 | described in | |
b42074ab | 26 | .Xr reboot 8 . |
610c010b KB |
27 | Unless some unexpected inconsistency is encountered in the state |
28 | of the file systems due to hardware or software failure, the system | |
29 | will then resume multi-user operations. | |
b42074ab | 30 | .Pp |
610c010b KB |
31 | The system has a large number of internal consistency checks; if one |
32 | of these fails, then it will panic with a very short message indicating | |
33 | which one failed. | |
34 | In many instances, this will be the name of the routine which detected | |
35 | the error, or a two-word description of the inconsistency. | |
36 | A full understanding of most panic messages requires perusal of the | |
37 | source code for the system. | |
b42074ab | 38 | .Pp |
610c010b KB |
39 | The most common cause of system failures is hardware failure, which |
40 | can reflect itself in different ways. Here are the messages which | |
41 | are most likely, with some hints as to causes. | |
42 | Left unstated in all cases is the possibility that hardware or software | |
43 | error produced the message in some unexpected way. | |
b42074ab CL |
44 | .Pp |
45 | .Bl -tag -width Ds -compact | |
46 | .It Sy iinit | |
610c010b KB |
47 | This cryptic panic message results from a failure to mount the root filesystem |
48 | during the bootstrap process. | |
49 | Either the root filesystem has been corrupted, | |
50 | or the system is attempting to use the wrong device as root filesystem. | |
51 | Usually, an alternate copy of the system binary or an alternate root | |
52 | filesystem can be used to bring up the system to investigate. | |
b42074ab CL |
53 | .Pp |
54 | .It Sy "Can't exec /etc/init" | |
610c010b KB |
55 | This is not a panic message, as reboots are likely to be futile. |
56 | Late in the bootstrap procedure, the system was unable to locate | |
57 | and execute the initialization process, | |
b42074ab | 58 | .Xr init 8 . |
610c010b | 59 | The root filesystem is incorrect or has been corrupted, or the mode |
b42074ab CL |
60 | or type of |
61 | .Pa /etc/init | |
62 | forbids execution. | |
63 | .Pp | |
64 | .It Sy "IO err in push" | |
65 | .It Sy "hard IO err in swap" | |
610c010b KB |
66 | The system encountered an error trying to write to the paging device |
67 | or an error in reading critical information from a disk drive. | |
68 | The offending disk should be fixed if it is broken or unreliable. | |
b42074ab CL |
69 | .Pp |
70 | .It Sy "realloccg: bad optim" | |
71 | .It Sy "ialloc: dup alloc" | |
72 | .It Sy "alloccgblk:cyl groups corrupted" | |
73 | .It Sy "ialloccg: map corrupted" | |
74 | .It Sy "free: freeing free block" | |
75 | .It Sy "free: freeing free frag" | |
76 | .It Sy "ifree: freeing free inode" | |
77 | .It Sy "alloccg: map corrupted" | |
610c010b KB |
78 | These panic messages are among those that may be produced |
79 | when filesystem inconsistencies are detected. | |
80 | The problem generally results from a failure to repair damaged filesystems | |
81 | after a crash, hardware failures, or other condition that should not | |
82 | normally occur. | |
83 | A filesystem check will normally correct the problem. | |
b42074ab CL |
84 | .Pp |
85 | .It Sy "timeout table overflow" | |
610c010b KB |
86 | This really shouldn't be a panic, but until the data structure |
87 | involved is made to be extensible, running out of entries causes a crash. | |
88 | If this happens, make the timeout table bigger. | |
b42074ab CL |
89 | .Pp |
90 | .It Sy "trap type %d, code = %x, v = %x" | |
610c010b | 91 | An unexpected trap has occurred within the system; the trap types are: |
b42074ab | 92 | .Bl -column xxxx -offset indent |
610c010b KB |
93 | 0 bus error |
94 | 1 address error | |
95 | 2 illegal instruction | |
96 | 3 divide by zero | |
b42074ab CL |
97 | .No 4\t Em chk No instruction |
98 | .No 5\t Em trapv No instruction | |
610c010b KB |
99 | 6 privileged instruction |
100 | 7 trace trap | |
101 | 8 MMU fault | |
102 | 9 simulated software interrupt | |
103 | 10 format error | |
104 | 11 FP coprocessor fault | |
105 | 12 coprocessor fault | |
106 | 13 simulated AST | |
b42074ab CL |
107 | .El |
108 | .Pp | |
610c010b KB |
109 | The favorite trap type in system crashes is trap type 8, |
110 | indicating a wild reference. | |
b42074ab CL |
111 | ``code'' (hex) is the concatenation of the |
112 | MMU | |
113 | status register | |
610c010b KB |
114 | (see <hp300/cpu.h>) |
115 | in the high 16 bits and the 68020 special status word | |
116 | (see the 68020 manual, page 6-17) | |
117 | in the low 16. | |
118 | ``v'' (hex) is the virtual address which caused the fault. | |
119 | Additionally, the kernel will dump about a screenful of semi-useful | |
120 | information. | |
121 | ``pid'' (decimal) is the process id of the process running at the | |
122 | time of the exception. | |
123 | Note that if we panic in an interrupt routine, | |
124 | this process may not be related to the panic. | |
125 | ``ps'' (hex) is the 68020 processor status register ``ps''. | |
126 | ``pc'' (hex) is the value of the program counter saved | |
127 | on the hardware exception frame. | |
128 | It may | |
b42074ab | 129 | .Em not |
610c010b KB |
130 | be the PC of the instruction causing the fault. |
131 | ``sfc'' and ``dfc'' (hex) are the 68020 source/destination function codes. | |
132 | They should always be one. | |
b42074ab CL |
133 | ``p0'' and ``p1'' are the |
134 | VAX-like | |
135 | region registers. | |
610c010b | 136 | They are of the form: |
b42074ab CL |
137 | .Pp |
138 | .Bd -ragged -offset indent | |
139 | <length> '@' <kernel VA> | |
140 | .Ed | |
141 | .Pp | |
610c010b KB |
142 | where both are in hex. |
143 | Following these values are a dump of the processor registers (hex). | |
144 | Finally, is a dump of the stack (user/kernel) at the time of the offense. | |
b42074ab CL |
145 | .Pp |
146 | .It Sy "init died" | |
610c010b KB |
147 | The system initialization process has exited. This is bad news, as no new |
148 | users will then be able to log in. Rebooting is the only fix, so the | |
149 | system just does it right away. | |
b42074ab CL |
150 | .Pp |
151 | .It Sy "out of mbufs: map full" | |
610c010b KB |
152 | The network has exhausted its private page map for network buffers. |
153 | This usually indicates that buffers are being lost, and rather than | |
154 | allow the system to slowly degrade, it reboots immediately. | |
155 | The map may be made larger if necessary. | |
b42074ab CL |
156 | .El |
157 | .Pp | |
610c010b | 158 | That completes the list of panic types you are likely to see. |
b42074ab | 159 | .Pp |
610c010b KB |
160 | When the system crashes it writes (or at least attempts to write) |
161 | an image of memory into the back end of the dump device, | |
162 | usually the same as the primary swap | |
163 | area. After the system is rebooted, the program | |
b42074ab | 164 | .Xr savecore 8 |
610c010b KB |
165 | runs and preserves a copy of this core image and the current |
166 | system in a specified directory for later perusal. See | |
b42074ab | 167 | .Xr savecore 8 |
610c010b | 168 | for details. |
b42074ab | 169 | .Pp |
610c010b | 170 | To analyze a dump you should begin by running |
b42074ab | 171 | .Xr adb 1 |
610c010b | 172 | with the |
b42074ab | 173 | .Fl k |
610c010b KB |
174 | flag on the system load image and core dump. |
175 | If the core image is the result of a panic, | |
176 | the panic message is printed. | |
177 | Normally the command | |
178 | ``$c'' | |
179 | will provide a stack trace from the point of | |
180 | the crash and this will provide a clue as to | |
181 | what went wrong. | |
b42074ab CL |
182 | For more details consult |
183 | .%T "Using ADB to Debug the UNIX Kernel" . | |
184 | .Sh SEE ALSO | |
185 | .Xr adb 1 , | |
186 | .Xr reboot 8 | |
187 | .Rs | |
188 | .%T "MC68020 32-bit Microprocessor User's Manual" | |
189 | .Re | |
190 | .Rs | |
191 | .%T "Using ADB to Debug the UNIX Kernel | |
192 | .Re | |
193 | .Rs | |
194 | .%T "4.3BSD for the HP300" | |
195 | .Re | |
196 | .Sh HISTORY | |
197 | A | |
198 | .Nm | |
199 | man page appeared in Version 6 AT&T UNIX. |