new version; (Chris Torek)

author Keith Bostic <bostic@ucbvax.Berkeley.EDU>

Sat, 24 Oct 1987 03:32:51 +0000 (19:32 -0800)

committer Keith Bostic <bostic@ucbvax.Berkeley.EDU>

Sat, 24 Oct 1987 03:32:51 +0000 (19:32 -0800)
author Keith Bostic <bostic@ucbvax.Berkeley.EDU>
Sat, 24 Oct 1987 03:32:51 +0000 (19:32 -0800)
committer Keith Bostic <bostic@ucbvax.Berkeley.EDU>
Sat, 24 Oct 1987 03:32:51 +0000 (19:32 -0800)
diff --git a/usr/src/share/man/man4/man4.vax/uda.4 b/usr/src/share/man/man4/man4.vax/uda.4

index b525938..0b9756d 100644 (file)
--- a/usr/src/share/man/man4/man4.vax/uda.4
+++ b/usr/src/share/man/man4/man4.vax/uda.4
@@ -1,83 +1,58 @@
-.\" Copyright (c) 1980 Regents of the University of California.
+.\" Copyright (c) 1980, 1987 Regents of the University of California.
  .\" All rights reserved.  The Berkeley software License Agreement
  .\" specifies the terms and conditions for redistribution.
  .\"
  .\" All rights reserved.  The Berkeley software License Agreement
  .\" specifies the terms and conditions for redistribution.
  .\"
-.\"    @(#)uda.4       6.2 (Berkeley) %G%
+.\"    @(#)uda.4       6.3 (Berkeley) %G%
  .\"
  .TH UDA 4 ""
  .UC 4
  .SH NAME
  .\"
  .TH UDA 4 ""
  .UC 4
  .SH NAME
-uda \- UDA-50 disk controller interface
+uda \- UDA50 disk controller interface
  .SH SYNOPSIS
  .B "controller uda0 at uba0 csr 0172150 vector udaintr"
  .br
  .SH SYNOPSIS
  .B "controller uda0 at uba0 csr 0172150 vector udaintr"
  .br
-.B "disk ra0 at uda0 drive 0"
+.b "disk ra0 at uda0 drive 0"
  .SH DESCRIPTION
  .SH DESCRIPTION
-This is a driver for the DEC UDA-50 disk controller
-and for other compatible controllers.
-The UDA-50 communicates with the host through a packet
-oriented protocol termed the Mass Storage Control Protocol (MSCP).
+This is a driver for the DEC UDA50 disk controller and other
+compatible controllers.  The UDA50 communicates with the host through
+a packet protocol known as the Mass Storage Control Protocol (MSCP).
  Consult the file
  .RI < vax/mscp.h >
  for a detailed description of this protocol.
  .PP
  Files with minor device numbers 0 through 7 refer to various portions
  Consult the file
  .RI < vax/mscp.h >
  for a detailed description of this protocol.
  .PP
  Files with minor device numbers 0 through 7 refer to various portions
-of drive 0;
-minor devices 8 through 15 refer to drive 1, etc.
-The standard device names begin with ``ra'' followed by
-the drive number and then a letter a-h for partitions 0-7 respectively.
+of drive 0; minor devices 8 through 15 refer to drive 1, etc.  The
+standard device names begin with `ra' followed by the drive number
+and then a letter a-h for partitions 0-7 respectively.
  The character ? stands here for a drive number in the range 0-7.
  .PP
  The character ? stands here for a drive number in the range 0-7.
  .PP
-The block files access the disk via the system's normal
-buffering mechanism and may be read and written without regard to
-physical disk records.  There is also a `raw' interface
-which provides for direct transmission between the disk
-and the user's read or write buffer.
-A single read or write call results in exactly one I/O operation
-and therefore raw I/O is considerably more efficient when
-many words are transmitted.  The names of the raw files
-conventionally begin with an extra `r.'
+The block files access the disk via the system's normal buffering
+mechanism mechanism and may be read and written without regard to
+physical disk records.  There is also a `raw' interface which provides
+for direct transmission between the disk and the user's read or write
+buffer.  A single read or write call results in exactly one I/O
+operation and therefore raw I/O is considerably more efficient when
+many words are transmitted.  The names of the raw files conventionally
+begin with an extra `r'.
  .PP
  In raw I/O counts should be a multiple of 512 bytes (a disk sector).
  Likewise
  .I seek
  calls should specify a multiple of 512 bytes.
  .SH "DISK SUPPORT"
  .PP
  In raw I/O counts should be a multiple of 512 bytes (a disk sector).
  Likewise
  .I seek
  calls should specify a multiple of 512 bytes.
  .SH "DISK SUPPORT"
-This driver configures the drive type of each drive
-when it is first opened.
-A partition table in the driver is required for each type of disk.
-The origin and size (in sectors) of the pseudo-disks
-on each drive are shown below.
-Not all partitions begin on
-cylinder boundaries, as on other drives, because previous drivers
-used one partition table for all drive types.
-Variants of the partition tables are common;
-check the driver and the file
+This driver configures the type of each drive when it is first
+encountered.  A partition table in the driver is required for each type
+of disk.  The origin and size (in sectors) of the pseudo-disks on each
+drive are shown below.  Not all partitions begin on cylinder
+boundaries, as on other drives, because previous drivers used one
+partition table for all drive types.  Variants of the partition tables
+are common; check the driver and the file
  .IR /etc/disktab ( disktab (5))
  for other possibilities.
  .PP
  .nf
  .ta .5i +\w'000000    'u +\w'000000    'u +\w'000000    'u +\w'000000    'u
  .PP
  .IR /etc/disktab ( disktab (5))
  for other possibilities.
  .PP
  .nf
  .ta .5i +\w'000000    'u +\w'000000    'u +\w'000000    'u +\w'000000    'u
  .PP
-RC25 partitions
-       disk    start   length
-       ra?a    0       15884
-       ra?b    15884   10032
-       ra?c    0       50902
-       ra?g    25916   24986
-RD52 partitions
-       disk    start   length
-       ra?a    0       15884
-       ra?b    15884   9766
-       ra?c    0       60480
-       ra?g    25650   34830
-RD53 partitions
-       disk    start   length
-       ra?a    0       15884
-       ra?b    15884   33440
-       ra?c    0       138672
-       ra?g    49324   89348
-       ra?h    15884   122788
  RA60 partitions
         disk    start   length
         ra?a    0       15884
  RA60 partitions
         disk    start   length
         ra?a    0       15884
@@ -123,66 +98,353 @@ RA81 partitions with 4.2BSD-compatible partitions
  .DT
  .fi
  .PP
  .DT
  .fi
  .PP
-The ra?a partition is normally used for the root file system,
-the ra?b partition as a paging area,
-and the ra?c partition for pack-pack copying (it maps the entire disk).
+The ra?a partition is normally used for the root file system, the ra?b
+partition as a paging area, and the ra?c partition for pack-pack
+copying (it maps the entire disk).
  .SH FILES
  /dev/ra[0-9][a-f]
  .br
  /dev/rra[0-9][a-f]
  .SH DIAGNOSTICS
  .SH FILES
  /dev/ra[0-9][a-f]
  .br
  /dev/rra[0-9][a-f]
  .SH DIAGNOSTICS
-.BR "uda: ubinfo %x" .
-(VAX 11/750 only.)
-When allocating UNIBUS resources, the driver found it already
-had resources previously allocated.  This indicates a bug in
-the driver.
-.PP
-.BR "udasa %o, state %d" .
-(Additional status information given after a hard i/o error.)
-The values of the UDA-50 status register and the internal
-driver state are printed.
-.PP
-.BR "uda%d: random interrupt ignored" .
-An unexpected interrupt was received (e.g. when no i/o was
-pending).  The interrupt is ignored.
-.PP
-.BR "uda%d: interrupt in unknown state %d ignored" .
-An interrupt was received when the driver was in an unknown
-internal state.  Indicates a hardware problem or a driver bug.
-.PP
-.BR "uda%d: fatal error (%o)" .
-The UDA-50 indicated a ``fatal error'' in the status returned
-to the host.  The contents of the status register are displayed.
-.PP
-.BR OFFLINE .
-(Additional status information given after a hard i/o error.)
-A hard i/o error occurred because the drive was not on-line.
-.PP
-.BR "status %o" .
-(Additional status information given after a hard i/o error.)
-The status information returned from the UDA-50 is tacked onto
-the end of the hard error message printed on the console.
-.PP
-.BR "uda: unknown packet" .
-An MSCP packet of unknown type was received from the UDA-50.
-Check the cabling to the controller.
-.PP
-The following errors are interpretations of MSCP error messages
-returned by the UDA-50 to the host.
-.PP
-.BR "uda%d: %s error, controller error, event 0%o" .
-.PP
-.BR "uda%d: %s error, host memory access error, event 0%o, addr 0%o" .
-.PP
-.BR "uda%d: %s error, disk transfer error, unit %d" .
-.PP
-.BR "uda%d: %s error, SDI error, unit %d, event 0%o" .
-.PP
-.BR "uda%d: %s error, small disk error, unit %d, event 0%o, cyl %d" .
+.TP
+panic: udaslave
+No command packets were available while the driver was looking
+for disk drives.  The controller is not extending enough credits
+to use the drives.
+.TP
+uda%d: no response to Get Unit Status request
+A disk drive was found, but did not respond to a status request.
+This is either a hardware problem or someone pulling unit number
+plugs very fast.
+.TP
+uda%d: unit %d off line
+While searching for drives, the controller found one that
+seems to be manually disabled.  It is ignored.
+.TP
+uda%d: unable to get unit status
+Something went wrong while trying to determine the status of
+a disk drive.  This is followed by an error detail.
+.TP
+uda%d: unit %d, next %d
+This probably never happens, but I wanted to know if it did.  I
+have no idea what one should do about it.
+.TP
+uda%d: cannot handle unit number %d (max is %d)
+The controller found a drive whose unit number is too large.
+Valid unit numbers are those in the range [0..7].
+.TP
+uda%d: unit %d (media ID `%s') is of unknown type %d; ignored
+The controller found a drive whose type is not known, and thus has
+no partitioning.  The drive has been ignored.  You can add the type
+to the udatypes[] table, now that you know what it is:  The media
+ID will be something like `DU RA25'.
+.TP
+uda%d: uballoc map failed
+Unibus resource map allocation failed during initialisation.  This
+can only happen if you have 496 devices on a Unibus.
+.TP
+uda%d: timeout during init
+The controller did not initialise within ten seconds.  A hardware
+problem, but it sometimes goes away if you try again.
+.TP
+uda%d: init failed, sa=%b
+The controller refused to initalise.
+.TP
+uda%d: controller hung
+The controller never finished initialisation.  Retrying may sometimes
+fix it.
+.TP
+ra%d: drive will not come on line
+The drive will not come on line, probably because it is spun down.
+This should be preceded by a message giving details as to why the
+drive stayed off line.
+.TP
+uda%d: still hung
+When the controller hangs, the driver occasionally tries to reinitialise
+it.  This means it just tried, without success.
+.TP
+panic: udastart: bp==NULL
+A bug in the driver has put an empty drive queue on a controller queue.
+.TP
+uda%d: command ring too small
+If you increase NCMDL2, you may see a performance improvement.
+(See /sys/vaxuba/uda.c.)
+.TP
+panic: udastart
+A drive was found marked for status or on-line functions while performing
+status or on-line functions.  This indicates a bug in the driver.
+.TP
+uda%d: controller error, sa=%b
+The controller reported an error.  The driver will reset it and retry
+pending I/O.
+.TP
+uda%d: stray intr
+The controller interrupted when it should have stayed quiet.  The
+interrupt has been ignored.
+.TP
+uda%d: init step %d failed, sa=%b
+The controller reported an error during the named initialisation step.
+The driver will retry initialisation later.
+.TP
+uda%d: version %d model %d
+An informational message giving the revision level of the controller.
+.TP
+uda%d: DMA burst size set to %d
+An informational message showing the DMA burst size, in words.
+.TP
+panic: udaintr
+Indicates a bug in the generic MSCP code.
+.TP
+uda%d: driver bug, state %d
+The driver has a bogus value for the controller state.  Something
+is quite wrong.  This is immediately followed by a `panic: udastate'.
+.TP
+uda%d: purge bdp %d
+A benign message tracing BDP purges.  I have been trying to figure
+out what BDP purges are for.  You might want to comment out this
+call to log() in /sys/vaxuba/uda.c.
+.TP
+.RI "uda%d: SETCTLRC failed: " detail
+The Set Controller Characteristics command (the last part of the
+controller initialisation sequence) failed.  The
+.I detail
+message tells why.
+.TP
+.RI "uda%d: attempt to bring ra%d on line failed: " detail
+The drive could not be brought on line.  The
+.I detail
+message tells why.
+.TP
+uda%d: ra%d: unknown type %d
+The type index of the named drive is not known to the driver, so the
+drive will be ignored.
+.TP
+ra%d: changed types! was %s
+A drive somehow changed from one kind to another, e.g., from an RA80
+to an RA60.  The driver believes the new type.
+.TP
+ra%d: %s, size = %d sectors
+The named drive is of the given type, and has that many sectors of
+user-file area.  This is printed during configuration.
+.TP
+.RI "uda%d: attempt to get status for ra%d failed: " detail
+A status request failed.  The
+.I detail
+message should tell why.
+.TP
+ra%d: unit %d, nspt %d, group %d, ntpc %d, rctsize %d,
+.br
+.ti -5
+nrpt %d, nrct %d
+.br
+Information about the geometry of the named drive.  This is not
+used by the driver, but can one setting up
+.I disktab
+entries, e.g.  Note that the sectors per track, group, and tracks per
+cylinder values are those after bad blocking is accounted for, and will
+differ slightly from the actual hardware setup.  This message also
+reports the MSCP unit number for the drive.  Errors tend to include
+only the MSCP unit number, rather than the drive number, since that
+is all the driver can tell at the time.
+.TP
+ra%d: bad block report: %d
+The drive has reported the given block as bad.  If there are multiple
+bad blocks, the drive will report only the first; in this case this
+message will be followed by `+ others'.  Get DEC to forward the
+block with EVRLK.
+.TP
+ra%d: serious exception reported
+I have no idea what this really means.
+.TP
+panic: udareplace
+The controller reported completion of a REPLACE operation.  The
+driver never issues any REPLACEs, so something is wrong.
+.TP
+panic: udabb
+The controller reported completion of bad block related I/O.  The
+driver never issues any such, so something is wrong.
+.TP
+uda%d: lost interrupt
+The controller has gone out to lunch, and is being reset to try to bring
+it back.
+.TP
+panic: mscp_go: AEB_MAX_BP too small
+You defined AVOID_EMULEX_BUG and increased NCMDL2 and Emulex has
+new firmware.  Raise AEB_MAX_BP or turn off AVOID_EMULEX_BUG.
+.TP
+uda%d: unit %d: unknown message type 0x%x ignored
+The controller responded with a mysterious message type. See
+/sys/vax/mscp.h for a list of known message types.  This is probably
+a controller hardware problem.
+.TP
+uda%d: unit %d out of range
+The disk drive unit number (the unit plug) is higher than the
+maximum number the driver allows (currently 7).
+.TP
+uda%d: unit %d not configured, \fImessage\fP ignored
+The named disk drive has announced its presence to the controller,
+but was not, or cannot now be, configured into the running system.
+.I Message
+is one of `available attention' (an `I am here' message) or
+`stray response op 0x%x status 0x%x' (anything else).
+.TP
+ra%d: bad lbn (%d)?
+The drive has reported an invalid command error, probably due to an
+invalid block number.  If the lbn value is very much greater than the
+size reported by the drive, this is the problem.  It is probably due to
+an improperly configured partition table.  Other invalid commands
+indicate a bug in the driver, or hardware trouble.
+.TP
+ra%d: duplicate ONLINE ignored
+The drive has come on-line while already on-line.  This condition
+can probably be ignored (and has been).
+.TP
+ra%d: io done, but no buffer?
+Hardware trouble, or a bug; the drive has finished an I/O request,
+but the response has an invalid (zero) command reference number.
+.TP
+Emulex SC41/MS screwup: uda%d, got %d correct, then
+.br
+.ti -5
+changed 0x%x to 0x%x
+.br
+You turned on AVOID_EMULEX_BUG, and the driver successfully
+avoided the bug.  The number of correctly-handled requests is
+reported, along with the expected and actual values relating to
+the bug being avoided.
+.TP
+panic: unrecoverable Emulex screwup
+You turned on AVOID_EMULEX_BUG, but Emulex was too clever and
+avoided the avoidance.  Try turning on MSCP_PARANOIA instead.
+.TP
+uda%d: bad response packet ignored
+You turned on MSCP_PARANOIA, and the driver caught the controller in
+a lie.  The lie has been ignored, and the controller will soon be
+reset (after a `lost' interrupt).  This is followed by a hex dump of
+the offending packet.
+.TP
+ra%d: bogus REPLACE end
+The drive has reported finishing a bad sector replacement, but the
+driver never issues bad sector replacement commands.  The report
+is ignored.  This is likely a hardware problem.
+.TP
+ra%d: unknown opcode 0x%x status 0x%x ignored
+The drive has reported something that the driver cannot understand.
+Perhaps DEC has been inventive, or perhaps your hardware is ill.
+This is followed by a hex dump of the offending packet.
+.TP
+uda%d: %s error datagram
+The controller has reported some kind of error, either `hard'
+(unrecoverable) or `soft' (recoverable).  If the controller is going on
+(attempting to fix the problem), this message includes the remark
+`(continuing)'.  Emulex controllers wrongly claim that all soft errors
+are hard errors.  This message may be followed by
+one of the following 5 messages, depending on its type, and will always
+be followed by a failure detail message (also listed below).
+.RS
+.TP
+memory addr 0x%x
+A host memory access error; this is the address that could not be
+read.
+.TP
+unit %d: level %d retry %d, %s %d
+A typical disk error; the retry count and error recovery levels are
+printed, along with the block type (`lbn', or logical block; or `rbn',
+or replacement block) and number.  If the string is something else, DEC
+has been clever, or your hardware has gone to Australia for vacation
+(unless you live there; then it might be in New Zealand, or Brazil).
+.TP
+unit %d: %s %d
+Also a disk error, but an `SDI' error, whatever that is.  (I doubt
+it has anything to do with Ronald Reagan.)  This lists the block
+type (`lbn' or `rbn') and number.
+.TP
+unit %d: small disk error, cyl %d
+Yet another kind of disk error, but for small disks.  (`That's what
+it says, guv'nor.  Dunnask me what it means.')
+.TP
+unit %d: unknown error, format 0x%x
+A mysterious error: the given format code is not known.
+.RE
  .PP
  .PP
-.BR "uda%d: %s error, unknown error, unit %d, format 0%o, event 0%o" .
-.SH BUGS
-The partition tables attempt to combine compatibility
-with previous drivers and functionality; this is impossible.
-The best solution would be to read the partition tables
-off the drive.
+The detail messages are as follows:
+.RS
+.TP
+success (%s) (code 0, subcode %d)
+Everything worked, but the controller thought it would let you know
+that something went wrong.  No matter what subcode, this can probably
+be ignored.
+.TP
+invalid command (%s) (code 1, subcode %d)
+This probably cannot occur unless the hardware is out; %s should be
+`invalid msg length', meaning some command was too short or too long.
+.TP
+command aborted (unknown subcode) (code 2, subcode %d)
+This should never occur, as the driver never aborts commands.
+.TP
+unit offline (%s) (code 3, subcode %d)
+The drive is offline, either because it is not around (`unknown
+drive'), stopped (`not mounted'), out of order (`inoperative'), has the
+same unit number as some other drive (`duplicate'), or has been
+disabled for diagnostics (`in diagnosis').
+.TP
+unit available (unknown subcode) (code 4, subcode %d)
+The controller has decided to report a perfectly normal event as
+an error.  (Why?)
+.TP
+media format error (%s) (code 5, subcode %d)
+The drive cannot be used without reformatting.  The Format Control
+Table cannot be read (`fct unread - edc'), there is a bad sector
+header (`invalid sector header'), the drive is not set for 512-byte
+sectors (`not 512 sectors'), the drive is not formatted (`not formatted'),
+or the FCT has an uncorrectable ECC error (`fct ecc').
+.TP
+write protected (%s) (code 6, subcode %d)
+The drive is write protected, either by the front panel switch
+(`hardware') or via the driver (`software').  The driver never
+sets software write protect.
+.TP
+compare error (unknown subcode) (code 7, subcode %d)
+A compare operation showed some sort of difference.  The driver
+never uses compare operations.
+.TP
+data error (%s) (code 7, subcode %d)
+Something went wrong reading or writing a data sector.  A `forced
+error' is a software-asserted error used to mark a sector that contains
+suspect data.  Rewriting the sector will clear the forced error.  This
+is normally set only during bad block replacment, and the driver does
+no bad block replacement, so these should not occur.  A `header
+compare' error probably means the block is shot.  A `sync timeout'
+presumably has something to do with sector synchronisation.
+An `uncorrectable ecc' error is an ordinary data error that cannot
+be fixed via ECC logic.  A `%d symbol ecc' error is a data error
+that can be (and presumably has been) corrected by the ECC logic.
+It might indicate a sector that is imperfect but usable, or that
+is starting to go bad.  If any of these errors recur, the sector
+may need to be replaced.
+.TP
+host buffer access error (%s) (code %d, subcode %d)
+Something went wrong while trying to copy data to or from the host
+(Vax).  The subcode is one of `odd xfer addr', `odd xfer count',
+`non-exist. memory', or `memory parity'.  The first two could be a
+software glitch; the last two indicate hardware problems.
+.TP
+controller error (%s) (code %d, subcode %d)
+The controller has detected a hardware error in itself.  A
+`serdes overrun' is a serialiser / deserialiser overrun; `edc'
+probably stands for `error detection code'; and `inconsistent
+internal data struct' is obvious.
+.TP
+drive error (%s) (code %d, subcode %d)
+Either the controller or the drive has detected a hardware error
+in the drive.  I am not sure what an `sdi command timeout' is, but
+these seem to occur benignly on occasion.  A `ctlr detected protocol'
+error means that the controller and drive do not agree on a protocol;
+this could be a cabling problem, or a version mismatch.  A `positioner'
+error means the drive seek hardware is ailing; `lost rd/wr ready'
+means the drive read/write logic is sick; and `drive clock dropout'
+means that the drive clock logic is bad, or the media is hopelessly
+scrambled.  I have no idea what `lost recvr ready' means.  A `drive 
+detected error' is a catch-all for drive hardware trouble; `ctlr
+detected pulse or parity' errors are often caused by cabling problems.
+.RE
author	Keith Bostic <bostic@ucbvax.Berkeley.EDU>
	Sat, 24 Oct 1987 03:32:51 +0000 (19:32 -0800)
committer	Keith Bostic <bostic@ucbvax.Berkeley.EDU>
	Sat, 24 Oct 1987 03:32:51 +0000 (19:32 -0800)