crash \- what to do when the system crashes
This section gives at least a few clues about how to proceed if the
It can't pretend to be complete.
If the reason for the crash is not evident
(see below for guidance on `evident')
you may want to try to dump the system if you feel up to
At the moment a dump can be taken only on magtape.
With a tape mounted and ready,
stop the machine, load address 44, and start.
This should write a copy of all of core
on the tape with an EOF mark.
Any error is taken to mean the end of core has been reached.
This means that you must be sure the ring is in,
the tape is ready, and the tape is clean and new.
If the dump fails, you can try again,
but some of the registers will be lost.
See below for what to do with the tape.
In restarting after a crash,
always bring up the system single-user.
This is accomplished by following the directions in
as modified for your particular installation;
a single-user system is indicated by having a particular value
in the switches (173030 unless you've changed
as the system starts executing.
on all file systems which could have been in use at the time
If any serious file system problems are found, they should be repaired.
When you are satisfied with the health of your disks,
check and set the date if necessary,
This is most easily accomplished by changing the
single-user value in the switches to something else,
To even boot \s8UNIX\s10 at all,
three files (and the directories leading to them)
the initialization program
must be present and executable.
the CPU will loop in user mode at location 6.
If either does not exist,
the symptom is best described
Shell with proper standard input and output.
If you cannot get the system to boot,
a runnable system must be obtained from
The root file system may then be doctored as
a mounted file system as described below.
If there are any problems with the root
it is probably prudent to go to a
backup system to avoid working on a
The first rule to keep in mind is that an addled disk
should be treated gently;
it shouldn't be mounted unless necessary,
and if it is very valuable yet
in quite bad shape, perhaps it should be dumped before
This is an area where experience and informed courage count for much.
typically fall into two kinds.
problems with the free list:
duplicates in the free list, or free blocks also in files.
These can be cured easily with an
If the same block appears in more than one file
or if a file contains bad blocks,
the files should be deleted, and the free list reconstructed.
The best way to delete such a file is to use
then remove its directory entries.
If any of the affected files is really precious,
you can try to copy it to another device
have more directory entries than links.
Such situations are potentially dangerous;
discusses a special case of the problem.
All the directory entries for the file should be removed.
If on the other hand there are more links than directory entries,
there is no danger of spreading infection, but merely some disk space
It is sufficient to copy the file (if it has any entries and is useful)
on its inode and remove any directory
there may be inodes reported by
that have 0 links and 0 entries.
These occur on the root device when the system is stopped
with pipes open, and on other file systems when the system
stops with files that have been deleted while still open.
will free the inode, and an
recover any missing blocks.
on the console typewriter when it voluntarily crashes.
Here is the current list of such messages,
with enough information to provide
a hope at least of the remedy.
The message has the form `panic: ...',
possibly accompanied by other information.
Left unstated in all cases
is the possibility that hardware or software
error produced the message in some unexpected way.
routine was called with a nonexistent major device as argument.
Definitely hardware or software error.
Null device table entry for the major device used as argument to
Definitely hardware or software error.
An I/O error reading the super-block for the root file system
A mounted file system has no more i-nodes when creating a file.
Sorry, the device isn't available;
A device has disappeared from the mounted-device table.
Definitely hardware or software error.
Like `no fs', but produced elsewhere.
The in-core inode table is full.
Try increasing NINODE in param.h.
Shouldn't be a panic, just a user error.
neither the line nor programmable clock was found to exist.
An unrecoverable I/O error during a swap.
Really shouldn't be a panic,
The directory containing a file being deleted can't be found.
A program needs to be swapped out, and there is no more swap space.
This really shouldn't be a panic, but there is no easy fix.
A pure procedure program is being executed,
and the table for such things is full.
This shouldn't be a panic.
An unexpected trap has occurred within the system.
This is accompanied by three numbers:
a `ka6', which is the contents of the segmentation
register for the area in which the system's stack is kept;
`aps', which is the location where the hardware stored
the program status word during the trap;
and a `trap type' which encodes
recursive system call (TRAP instruction)
11/70 cache parity, or programmed interrupt
In some of these cases it is
possible for octal 20 to be added into the trap type;
this indicates that the processor was in user mode when the trap occurred.
If you wish to examine the stack after such a trap,
either dump the system, or use the console switches to examine core;
the required address mapping is described below.
should be taken care of before attempting to look at dumps.
The dump should be read into the file
At this point, you should execute
to print the process table and the users who were on
at the time of the crash.
the registers R0, R1, R2, R3, R4, R5, SP
and KDSA6 (KISA6 for 11/40s) are stored.
If the dump had to be restarted,
Next, take the value of KA6 (location 022(8) in the dump)
multiplied by 0100(8) and dump 01000(8) bytes starting from there.
This is the per-process data associated with the process running
at the time of the crash.
the addresses 140000 to 141776.
R5 is C's frame or display pointer.
Stored at (R5) is the old R5 pointing to the previous
is the saved PC of the calling procedure.
you obtain an R5 value of 141756, which
is where the user's R5 is stored.
you have to look for a plausible
R5, PC pair and continue from there.
Each PC should be looked up in the system's name list
to get a reverse calling order.
In most cases this procedure will give
an idea of what is wrong.
A more complete discussion
of system debugging is impossible here.
clri(1), icheck(1), dcheck(1), boot(8)