macro and text revision (-mdoc version 3)
[unix-history] / usr / src / share / man / man8 / man8.hp300 / crash.8
CommitLineData
b42074ab
CL
1.\" Copyright (c) 1990, 1991 Regents of the University of California.
2.\" All rights reserved.
610c010b 3.\"
b42074ab 4.\" %sccs.include.redist.man%
610c010b 5.\"
b42074ab
CL
6.\" @(#)crash.8 5.2 (Berkeley) %G%
7.\"
8.Dd
9.Dt CRASH 8
10.Os
11.Sh NAME
12.Nm crash
13.Nd UNIX system failures
14.Sh DESCRIPTION
15This section explains a bit about system crashes
610c010b 16and (very briefly) how to analyze crash dumps.
b42074ab 17.Pp
610c010b 18When the system crashes voluntarily it prints a message of the form
b42074ab 19.Bd -ragged -offset indent
610c010b 20panic: why i gave up the ghost
b42074ab
CL
21.Ed
22.Pp
610c010b
KB
23on the console, takes a dump on a mass storage peripheral,
24and then invokes an automatic reboot procedure as
25described in
b42074ab 26.Xr reboot 8 .
610c010b
KB
27Unless some unexpected inconsistency is encountered in the state
28of the file systems due to hardware or software failure, the system
29will then resume multi-user operations.
b42074ab 30.Pp
610c010b
KB
31The system has a large number of internal consistency checks; if one
32of these fails, then it will panic with a very short message indicating
33which one failed.
34In many instances, this will be the name of the routine which detected
35the error, or a two-word description of the inconsistency.
36A full understanding of most panic messages requires perusal of the
37source code for the system.
b42074ab 38.Pp
610c010b
KB
39The most common cause of system failures is hardware failure, which
40can reflect itself in different ways. Here are the messages which
41are most likely, with some hints as to causes.
42Left unstated in all cases is the possibility that hardware or software
43error produced the message in some unexpected way.
b42074ab
CL
44.Pp
45.Bl -tag -width Ds -compact
46.It Sy iinit
610c010b
KB
47This cryptic panic message results from a failure to mount the root filesystem
48during the bootstrap process.
49Either the root filesystem has been corrupted,
50or the system is attempting to use the wrong device as root filesystem.
51Usually, an alternate copy of the system binary or an alternate root
52filesystem can be used to bring up the system to investigate.
b42074ab
CL
53.Pp
54.It Sy "Can't exec /etc/init"
610c010b
KB
55This is not a panic message, as reboots are likely to be futile.
56Late in the bootstrap procedure, the system was unable to locate
57and execute the initialization process,
b42074ab 58.Xr init 8 .
610c010b 59The root filesystem is incorrect or has been corrupted, or the mode
b42074ab
CL
60or type of
61.Pa /etc/init
62forbids execution.
63.Pp
64.It Sy "IO err in push"
65.It Sy "hard IO err in swap"
610c010b
KB
66The system encountered an error trying to write to the paging device
67or an error in reading critical information from a disk drive.
68The offending disk should be fixed if it is broken or unreliable.
b42074ab
CL
69.Pp
70.It Sy "realloccg: bad optim"
71.It Sy "ialloc: dup alloc"
72.It Sy "alloccgblk:cyl groups corrupted"
73.It Sy "ialloccg: map corrupted"
74.It Sy "free: freeing free block"
75.It Sy "free: freeing free frag"
76.It Sy "ifree: freeing free inode"
77.It Sy "alloccg: map corrupted"
610c010b
KB
78These panic messages are among those that may be produced
79when filesystem inconsistencies are detected.
80The problem generally results from a failure to repair damaged filesystems
81after a crash, hardware failures, or other condition that should not
82normally occur.
83A filesystem check will normally correct the problem.
b42074ab
CL
84.Pp
85.It Sy "timeout table overflow"
610c010b
KB
86This really shouldn't be a panic, but until the data structure
87involved is made to be extensible, running out of entries causes a crash.
88If this happens, make the timeout table bigger.
b42074ab
CL
89.Pp
90.It Sy "trap type %d, code = %x, v = %x"
610c010b 91An unexpected trap has occurred within the system; the trap types are:
b42074ab 92.Bl -column xxxx -offset indent
610c010b
KB
930 bus error
941 address error
952 illegal instruction
963 divide by zero
b42074ab
CL
97.No 4\t Em chk No instruction
98.No 5\t Em trapv No instruction
610c010b
KB
996 privileged instruction
1007 trace trap
1018 MMU fault
1029 simulated software interrupt
10310 format error
10411 FP coprocessor fault
10512 coprocessor fault
10613 simulated AST
b42074ab
CL
107.El
108.Pp
610c010b
KB
109The favorite trap type in system crashes is trap type 8,
110indicating a wild reference.
b42074ab
CL
111``code'' (hex) is the concatenation of the
112MMU
113status register
610c010b
KB
114(see <hp300/cpu.h>)
115in the high 16 bits and the 68020 special status word
116(see the 68020 manual, page 6-17)
117in the low 16.
118``v'' (hex) is the virtual address which caused the fault.
119Additionally, the kernel will dump about a screenful of semi-useful
120information.
121``pid'' (decimal) is the process id of the process running at the
122time of the exception.
123Note that if we panic in an interrupt routine,
124this process may not be related to the panic.
125``ps'' (hex) is the 68020 processor status register ``ps''.
126``pc'' (hex) is the value of the program counter saved
127on the hardware exception frame.
128It may
b42074ab 129.Em not
610c010b
KB
130be the PC of the instruction causing the fault.
131``sfc'' and ``dfc'' (hex) are the 68020 source/destination function codes.
132They should always be one.
b42074ab
CL
133``p0'' and ``p1'' are the
134VAX-like
135region registers.
610c010b 136They are of the form:
b42074ab
CL
137.Pp
138.Bd -ragged -offset indent
139<length> '@' <kernel VA>
140.Ed
141.Pp
610c010b
KB
142where both are in hex.
143Following these values are a dump of the processor registers (hex).
144Finally, is a dump of the stack (user/kernel) at the time of the offense.
b42074ab
CL
145.Pp
146.It Sy "init died"
610c010b
KB
147The system initialization process has exited. This is bad news, as no new
148users will then be able to log in. Rebooting is the only fix, so the
149system just does it right away.
b42074ab
CL
150.Pp
151.It Sy "out of mbufs: map full"
610c010b
KB
152The network has exhausted its private page map for network buffers.
153This usually indicates that buffers are being lost, and rather than
154allow the system to slowly degrade, it reboots immediately.
155The map may be made larger if necessary.
b42074ab
CL
156.El
157.Pp
610c010b 158That completes the list of panic types you are likely to see.
b42074ab 159.Pp
610c010b
KB
160When the system crashes it writes (or at least attempts to write)
161an image of memory into the back end of the dump device,
162usually the same as the primary swap
163area. After the system is rebooted, the program
b42074ab 164.Xr savecore 8
610c010b
KB
165runs and preserves a copy of this core image and the current
166system in a specified directory for later perusal. See
b42074ab 167.Xr savecore 8
610c010b 168for details.
b42074ab 169.Pp
610c010b 170To analyze a dump you should begin by running
b42074ab 171.Xr adb 1
610c010b 172with the
b42074ab 173.Fl k
610c010b
KB
174flag on the system load image and core dump.
175If the core image is the result of a panic,
176the panic message is printed.
177Normally the command
178``$c''
179will provide a stack trace from the point of
180the crash and this will provide a clue as to
181what went wrong.
b42074ab
CL
182For more details consult
183.%T "Using ADB to Debug the UNIX Kernel" .
184.Sh SEE ALSO
185.Xr adb 1 ,
186.Xr reboot 8
187.Rs
188.%T "MC68020 32-bit Microprocessor User's Manual"
189.Re
190.Rs
191.%T "Using ADB to Debug the UNIX Kernel
192.Re
193.Rs
194.%T "4.3BSD for the HP300"
195.Re
196.Sh HISTORY
197A
198.Nm
199man page appeared in Version 6 AT&T UNIX.