.\"tbl ... ^ [tn]roff -ms
Assembler Reference Manual
This document describes the usage and input syntax
of the \s8UNIX PDP\s10-11 assembler \fIas\fP.
of the \s8PDP\s10-11 are not described.
The input syntax of the \s8UNIX\s10 assembler is generally
similar to that of the \s8DEC\s10 assembler \s8PAL\s10-11\s8R\s10, although
its internal workings and output format
It may be useful to read the publication \s8DEC\s10-11-\s8ASDB\s10-\s8D\s10,
which describes \s8PAL\s10-11\s8R\s10, although naturally
one must use care in assuming that its rules apply
\fIAs\fP is a rather ordinary assembler without
It produces an output file that contains
relocation information and a complete
thus the output is acceptable to the \s8UNIX\s10 link-editor
may be used to combine the outputs of several
assembler runs and to obtain
object programs from libraries.
The output format has been designed
so that if a program contains no unresolved
ref%er%ences to external symbols, it is executable
without further processing.
\fIas\fP is used as follows:
as \fR[\fB \-u \fR] [ \fB\-o \fIoutput\fR ] \fIfile\s6\d1\u\s10 .\|.\|.
If the optional ``\-u'' argument is
given, all undefined symbols
in the current assembly will be made undefined-external.
See the \fB.globl\fR directive below.
The other arguments name files
which are concatenated and assembled.
Thus programs may be written in several
pieces and assembled together.
The output of the assembler is by default placed on
the file \fIa.out\fR in the current directory;
the ``\-o'' flag causes the output to be placed on the named file.
If there were no unresolved
external ref%er%ences, and no errors detected,
the output file is marked executable; otherwise, if it is
produced at all, it is made non-executable.
Assembler tokens include identifiers (alternatively, ``symbols'' or ``names''),
constants, and operators.
An identifier consists of a sequence of alphanumeric characters (including
period ``\|\fB.\fR\|'', underscore ``\(ul'',
of which the first may not
Only the first eight characters are significant.
When a name begins with a tilde, the tilde is discarded
and that occurrence of the identifier generates
a unique entry in the symbol table which can match
no other occurrence of the identifier.
by the C compiler to place names of local variables
in the output symbol table
A temporary symbol consists of a digit followed by ``f\|'' or
Temporary symbols are discussed fully in \(sc5.1.
An octal constant consists of a sequence of digits; ``8'' and
``9'' are taken to have octal value 10 and 11.
is truncated to 16 bits and interpreted in two's complement
A decimal constant consists of a sequence of digits terminated
by a decimal point ``\fB.\fR''. The magnitude of the constant should be
representable in 15 bits; i.e., be less than 32,768.
A single-character constant consists of a single quote ``\|\(fm\|''
followed by an \s8ASCII\s10 character not a new-line.
Certain dual-character escape sequences
are acceptable in place of the \s8ASCII\s10 character to represent
new-line and other non-graphics (see \fIString state%ments\fP, \(sc5.5).
The constant's value has the code for the
given character in the least significant
byte of the word and is null-padded on the left.
A double-character constant consists of a double
quote ``\|"\|'' followed by a pair of \s8ASCII\s10 characters
Certain dual-character escape sequences are acceptable
in place of either of the \s8ASCII\s10 characters
to represent new-line and other non-graphics
(see \fIString state%ments\fR, \(sc5.5).
The constant's value has the code for the first
given character in the least significant
byte and that for the second character in
the most significant byte.
There are several single- and double-character
may be interspersed freely between tokens, but may
not be used within tokens (except character constants).
A blank or tab is required to separate adjacent
identifiers or constants not otherwise separated.
The character ``\|/\|'' introduces a comment, which extends
through the end of the line on which it appears.
Comments are ignored by the assembler.
fall into three segments: the text segment, the data segment, and the bss segment.
The text segment is the one in which the assembler begins,
and it is the one into which instructions are typically placed.
The \s8UNIX\s10 system will, if desired,
enforce the purity of the text segment of programs by
trapping write operations
Object programs produced by the assembler must be processed
by the link-editor \fIld\fR
if the text segment is to be write-protected.
A single copy of the text
segment is shared among all processes
executing such a program.
The data segment is available for placing
data or instructions which
will be modified during execution.
Anything which may go in the text segment may be put
In programs with write-protected, sharable text segments,
data segment contains the initialized but variable
If the text segment is not pure, the data segment begins
if the text segment is pure, the data segment begins at the lowest
8K byte boundary after the text segment.
The bss segment may not contain any explicitly initialized code
The length of the bss segment (like that of text or data)
is determined by the high-water mark of the location counter
The bss segment is actually an extension of
the data segment and begins immediately after it.
At the start of execution of a program, the bss segment
Typically the bss segment is set up
by state%ments exemplified by
lab\fB: .\fR = \fB.\fR+10
The advantage in using the bss segment
for storage that starts off empty is that the initialization
information need not be stored in the output file.
See also \fILocation counter\fP and \fIAssignment state%ments\fP
One special symbol, ``\|\fB.\fP\|'', is the location counter.
Its value at any time is the offset
within the appropriate segment of the start of
the state%ment in which it appears.
The location counter may be assigned to,
with the restriction that the
current segment may not change;
the value of ``\|\fB.\fP\|'' may not decrease.
If the effect of the assignment is to increase the value of ``\|\fB.\fP\|'',
the required number of null bytes are generated
(but see \fISegments\fP above).
A source program is composed of a sequence of
Statements are separated either by new-lines
There are five kinds of state%ments: null state%ments,
expression state%ments, assignment state%ments,
Any kind of state%ment may be preceded by
There are two kinds of label:
name labels and numeric labels.
A name label consists of a name followed
The effect of a name label is to assign the current
value and type of the location counter ``\|\fB.\fP\|''
An error is indicated in pass 1 if the
an error is indicated in pass 2 if the ``\|\fB.\fP\|''
value assigned changes the definition
A numeric label consists of a digit \fI0\fR to \fI9\fR followed by a colon (\|:\|).
Such a label serves to define temporary
symbols of the form ``\fIn\fR\|b'' and ``\fIn\fR\|f\|'', where \fIn\fR is
As in the case of name labels, a numeric label assigns
the current value and type of ``\|\fB.\fP\|'' to the temporary
However, several numeric labels with the same
digit may be used within the same assembly.
Ref%er%ences of the form ``\fIn\fR\|f\|'' refer to the first
numeric label ``\fIn\|\fR:'' \fIf\fR\|orward from the ref%er%ence;
``\fIn\|\fRb'' symbols refer to the first ``\fIn\|\fR\|:'' label
\fIb\|\fRackward from the ref%er%ence.
This sort of temporary label was introduced by Knuth
[\fIThe Art of Computer Programming, Vol I: Fundamental Algorithms\|\fR].
Such labels tend to conserve both the symbol table
space of the assembler and the
inventive powers of the programmer.
A null state%ment is an empty state%ment (which may, however,
A null state%ment is ignored by the assembler.
Common examples of null state%ments are empty
lines or lines containing only a label.
5.3 Expression state%ments
An expression state%ment consists of an arithmetic
expression not beginning with
The assembler computes its (16-bit) value
and places it in the output stream, together with the
appropriate relocation bits.
5.4 Assignment state%ments
An assignment state%ment consists of an identifier, an equals sign (\|=\|),
The value and type of the expression are assigned to
It is not required that the type or value be
the same in pass 2 as in pass 1, nor is it an
error to redefine any symbol by assignment.
Any external attribute of the expression is lost across
This means that it is not possible to declare a global
symbol by assigning to it, and that it is impossible
to define a symbol to be offset from a non-locally
it is permissible to assign to the
location counter ``\|\fB.\fP\|''.
It is required, however, that the type of
the expression assigned be of the same type
and it is forbidden to decrease the value
In practice, the most common assignment to ``\|\fB.\fP\|'' has the form
for some number \fIn;\fR this has the effect of generating
A string state%ment generates a sequence of bytes containing \s8ASCII\s10 characters.
A string state%ment consists of a left string quote ``<''
followed by a sequence of \s8ASCII\s10 characters not including newline,
followed by a right string quote ``>''.
Any of the \s8ASCII\s10 characters may
be replaced by a two-character escape sequence to represent
certain non-graphic characters, as follows:
The last two are included so that the escape character
and the right string quote may be represented.
The same escape sequences
may also be used within single- and double-character
constants (see \(sc2.3 above).
Keyword state%ments are numerically the most common type,
since most machine instructions are of this
A keyword state%ment begins with one of the many predefined
keywords of the assembler;
the syntax of the remainder depends
All the keywords are listed below with the syntax they require.
An expression is a sequence of symbols representing a value.
Its constituents are identifiers, constants, temporary symbols,
Each expression has a type.
All operators in expressions are fundamentally binary in
nature; if an operand is missing on the left, a 0
of absolute type is assumed.
is two's complement and has 16 bits of precision.
All operators have equal precedence, and expressions
strictly left to right except for the effect
when there is no operand between
exactly the same as if a ``+'' had appeared.
division (note that plain ``\|/\|'' starts a comment)
\fIa\fR\|!\|\fIb\fR is \fIa \fBor \fR(\|\fBnot \fIb\fR\|);
i.e., the \fBor\fR of the first operand and
the one's complement of the second; most common use is
result has the value of first operand and the type of the second;
most often used to define new machine instructions
with syntax identical to existing instructions.
Expressions may be grouped by use of square brackets ``\|[\|\|]\|''.
(Round parentheses are reserved for address modes.)
The assembler deals with a number of types
of expressions. Most types
are attached to keywords and used to select the
routine which treats that keyword. The types likely
to be met explicitly are:
Upon first encounter, each symbol is undefined.
It may become undefined if it is assigned an undefined expression.
It is an error to attempt to assemble an undefined
expression in pass 2; in pass 1, it is not (except that
certain keywords require operands which are not undefined).
.IP "undefined external" 8
A symbol which is declared \fB.globl\fR but not defined
in the current assembly is an undefined
If such a symbol is declared, the link editor \fIld\fR
must be used to load the assembler's output with
another routine that defines the undefined ref%er%ence.
An absolute symbol is defined ultimately from a constant.
Its value is unaffected by any possible future applications
of the link-editor to the output file.
The value of a text symbol is measured
with respect to the beginning of the text segment of the program.
If the assembler output is link-edited, its text
symbols may change in value
not be the first in the link editor's output.
Most text symbols are defined by appearing as labels.
At the start of an assembly, the value of ``\|\fB.\fP\|'' is text 0.
The value of a data symbol is measured
with respect to the origin of the data segment of a program.
Like text symbols, the value of a data symbol may change
during a subsequent link-editor run since previously
loaded programs may have data segments.
After the first \fB.data\fR state%ment, the value of ``\|\fB.\fP\|''
The value of a bss symbol is measured from
the beginning of the bss segment of a program.
Like text and data symbols, the value of a bss symbol
may change during a subsequent link-editor
run, since previously loaded programs may have bss segments.
After the first \fB.bss\fR state%ment, the value of ``\|\fB.\fP\|'' is bss 0.
.IP "external absolute, text, data, or bss" 8
symbols declared \fB.globl\fR
but defined within an assembly as absolute, text, data, or bss
symbols may be used exactly as if they were not
declared \fB.globl\fR; however, their value and type are available
to the link editor so that the program may be loaded with others
that ref%er%ence these symbols.
Either they or symbols defined from them must
be used to refer to the six general-purpose,
the 2 special-purpose machine registers.
The behavior of the floating register names
is identical to that of the corresponding
general register names; the former
are provided as a mnemonic aid.
Each keyword known to the assembler has a type which
is used to select the routine which processes
the associated keyword state%ment.
The behavior of such symbols
when not used as keywords is the same as if they were absolute.
6.3 Type propagation in expressions
When operands are combined by expression operators,
the result has a type which depends on the types
of the operands and on the operator.
The rules involved are complex to state but
were intended to be sensible and predictable.
For purposes of expression evaluation the
The combination rules are then:
is undefined, the result is undefined.
If both operands are absolute, the result is absolute.
If an absolute is combined with one of the ``other types''
or with a register expression, the result
has the register or other type.
one can refer to r3 as ``r0+3''.
If two operands of ``other type'' are combined,
An ``other type'' combined with an explicitly
discussed type other than absolute
Further rules applying to particular operators
If one operand is text-, data-, or bss-segment
relocatable, or is an undefined external,
the result has the postulated type and the other operand
If the first operand is a relocatable
text-, data-, or bss-segment symbol, the second operand
may be absolute (in which case the result has the
type of the first operand);
or the second operand may have the same type
as the first (in which case the result is absolute).
If the first operand is external undefined, the second must be
All other combinations are illegal.
This operator follows no other rule than
that the result has the value
of the first operand and the type of the second.
It is illegal to apply these operators to any but absolute
The keywords listed below introduce
state%ments that generate data in unusual forms or
influence the later operations of the assembler.
means that 0 or more instances of the given stuff may appear.
Also, boldface tokens are literals, italic words
7.1 \fB.byte \fIexpression \fR[ \fB, \fIexpression \fR] .\|.\|.
The \fIexpression\fRs in the comma-separated
list are truncated to 8 bits and assembled in successive
The expressions must be absolute.
This state%ment and the string state%ment above are the only ones
that assemble data one byte at at time.
If the location counter ``\|\fB.\fP\|'' is odd, it is advanced by one
so the next state%ment will be assembled
7.3 \fB.if \fIexpression\fR
The \fIexpression\fR must be absolute and defined in pass 1.
If its value is nonzero, the \fB.if\fR is ignored; if zero,
the state%ments between the \fB.if\fR and the matching \fB.endif\fR
\&\fB.if\fR may be nested.
The effect of \fB.if\fR cannot extend beyond
the end of the input file in which it appears.
(The state%ments are not totally ignored, in
sense: \fB.if\fRs and \fB.endif\fRs are scanned for, and
are entered in the symbol table.
Thus names occurring only inside
will show up as undefined if the symbol
This state%ment marks the end of a conditionally-assembled section of code.
7.5 \fB.globl \fIname \fR[ \fB,\fI name \fR] .\|.\|.
This state%ment makes the \fInames\fR external.
If they are otherwise defined (by assignment or
they act within the assembly exactly as if
the \fB.globl\fR state%ment were not given; however,
the link editor \fIld\fR may be used
to combine this routine with other routines that refer
Conversely, if the given symbols are not defined
within the current assembly, the link editor
can combine the output of this assembly
with that of others which define the symbols.
As discussed in \(sc1, it is possible to force
the assembler to make all otherwise
undefined symbols external.
These three pseudo-operations cause the
assembler to begin assembling into the text, data, or
bss segment respectively.
Assembly starts in the text segment.
It is forbidden to assemble any
code or data into the bss segment, but symbols may
be defined and ``\|\fB.\fP\|'' moved about by assignment.
7.9 \fB.comm\fI name \fB, \fIexpression\fR
Provided the \fIname\fR is not defined elsewhere,
this state%ment is equivalent to
That is, the type of \fIname\fR
is ``undefined external'', and its value is \fIexpression\fR.
In fact the \fIname\fR behaves
in the current assembly just like an
However, the link-editor \fIld\fR has been special-cased
so that all external symbols which are not
otherwise defined, and which have a non-zero
value, are defined to lie in the bss
segment, and enough space is left after the
symbol to hold \fIexpression\fR
All symbols which become defined in this way
are located before all the explicitly defined
Because of the rather complicated instruction and addressing
structure of the \s8PDP\s10-11, the syntax of machine instruction
Although the following sections give the syntax
in detail, the machine handbooks should
be consulted on the semantics.
8.1 Sources and Destinations
The syntax of general source and destination
Each must have one of the following forms,
where \fIreg\fR is a register symbol, and \fIexpr\fR
is any sort of expression:
(\|\fIreg\fB\|)\|+ \fR0 20+\fIreg\fB
\fB\-\|(\|\fIreg\fB\|) \fR0 40+\fIreg\fR
\fIexpr\|\fB(\|\fIreg\fB\|) \fR1 60+\fIreg\fB
(\|\fIreg\fB\|) \fR0 10+\fIreg\fB
*\|\fIreg\fB \fR0 10+\fIreg\fB
\fB*\|(\|\fIreg\fB\|)\|+ \fR0 30+\fIreg\fB
\fB*\|\-\|(\|\fIreg\fB\|) \fR0 50+\fIreg\fB
*\|(\|\fIreg\fB\|) \fR1 70+\fIreg\fB
\fB*\|\fIexpr\fB\|(\|\fIreg\fB\|) \fR1 70+\fIreg\fB
The \fIwords\fR column gives the number of address words generated;
the \fImode\fR column gives the octal address-mode number.
The syntax of the address forms is
identical to that in \s8DEC\s10 assemblers, except that ``*'' has
been substituted for ``@''
and ``$'' for ``#''; the \s8UNIX\s10 typing conventions make ``@'' and ``#''
Notice that mode ``*reg'' is identical to ``(reg)'';
that ``*(reg)'' generates an index word (namely, 0);
and that addresses consisting of an unadorned expression
are assembled as pc-relative ref%er%ences independent
of the type of the expression.
To force a non-relative ref%er%ence, the form ``*$expr'' can
be used, but notice that further indirection is impossible.
8.3 Simple machine instructions
The following instructions
are defined as absolute symbols:
The \s8PDP\s10-11 hardware allows more than one of the ``clear''
class, or alternatively more than one of the ``set'' class
to be \fBor\fR-ed together; this may be expressed as follows:
The following instructions take an expression as operand.
The expression must lie in the same segment as the ref%er%ence,
cannot be undefined-external,
and its value cannot differ from the current location of ``\|\fB.\fP\|''
blt bec \fR(=\fB bcc\fR)\fB
bmi bes \fR(=\fB bcs\fR)\fB
\fBbes\fR (``branch on error set'')
and \fBbec\fR (``branch on error clear'')
are intended to test the error bit
returned by system calls (which
8.5 Extended branch instructions
The following symbols are followed by an expression
in the same segment as ``\|\fB.\|\fP''.
If the target address is close enough,
a branch-type instruction is generated;
if the address is too far away,
a \fBjmp\fR will be used.
\fBjbr\fR turns into a plain \fBjmp\fR
if its target is too remote;
the others (whose names are contructed
by replacing the ``b'' in the branch instruction's
turn into the converse branch over a \fBjmp\fR
8.6 Single operand instructions
symbols are names of single-operand
of address expected is discussed in \(sc8.1 above.
8.7 Double operand instructions
The following instructions take a general source
and destination (\(sc8.1), separated by a comma, as operands.
8.8 Miscellaneous instructions
The following instructions have
a register name, \fIsrc\fR and \fIdst\fR a general source
(\(sc8.1), and \fIexpr\fR is an expression:
\fBash \fIsrc\|,\|reg \fR(or, \fBals\fR)\fB
\fBashc \fIsrc\|,\|reg \fR(or, \fBalsc\fR)\fB
\fBmul \fIsrc\|,\|reg \fR(or, \fBmpy\fR)\fB
\fBdiv \fIsrc\|,\|reg \fR(or, \fBdvd\fR)\fR
\fBsob \fIreg\|,\|expr\fB
\fBsys\fR is another name for the \fBtrap\fR instruction.
It is used to code system calls.
Its operand is required to be expressible in 6 bits.
The expression in \fBmark\fR must be expressible
in six bits, and the expression in \fBsob\fR must
be in the same segment as ``\fB\|.\|\fR'',
must not be external-undefined, must be less than ``\|\fB.\fR\|'',
and must be within 510 bytes of ``\|\fB.\fR\|''.
8.9 Floating-point unit instructions
The following floating-point operations are defined,
with syntax as indicated:
\fBmovf \fIfsrc,\|freg \fR(= ldf\fR\|)
\fBmovf \fIfreg,\|fdst \fR(= stf\fR\|)
\fBmovif \fIsrc,\|freg \fR(= ldcif\fR\|)
\fBmovfi \fIfreg,\|dst \fR(= stcfi\fR\|)
\fBmovof \fIfsrc,\|freg \fR(= ldcdf\fR\|)
\fBmovfo \fIfreg,\|fdst \fR(= stcfd\fR\|)
\fBmovie \fIsrc,\|freg \fR(= ldexp\fR)
\fBmovei \fIfreg,\|dst \fR(= stexp\fR)
\fIfsrc\fR, \fIfdst\fR, and \fIfreg\fR mean floating-point
source, destination, and register respectively.
Their syntax is identical to that for
their non-floating counterparts, but
floating registers 0-3 can be a \fIfreg\fR.
The names of several of the operations
have been changed to bring out an analogy with
certain fixed-point instructions.
The only strange case is \fBmovf\fR, which turns into
either \fBstf\fR or \fBldf\fR
depending respectively on whether its first operand is
Warning: \fBldf\fR sets the floating condition codes,
The symbol ``\fB\|.\|.\|\fR''
\fIrelocation counter\fR.
Just before each assembled word is placed in the output stream,
the current value of this symbol is added to the word
if the word refers to a text, data or bss segment location.
If the output word is a pc-relative address word
that refers to an absolute location,
the value of ``\fB\|.\|.\|\fR'' is subtracted.
Thus the value of ``\fB\|.\|.\|\fR'' can be taken to mean
the starting memory location of the program.
The initial value of ``\|\fB.\|.\fR\|'' is 0.
The value of ``\|\fB.\|.\fR\|'' may be changed by assignment.
Such a course of action is sometimes
necessary, but the consequences
should be carefully thought out.
It is particularly ticklish
to change ``\|\fB.\|.\fR\|'' midway in an assembly
or to do so in a program which will
be treated by the loader, which has
its own notions of ``\|\fB.\|.\fR\|''.
System call names are not predefined.
They may be found in the file
an input file cannot be read, its name
followed by a question mark is typed and assembly
When syntactic or semantic errors occur, a single-character diagnostic is typed out
together with the line number and the file name in which it
occurred. Errors in pass 1 cause cancellation of pass 2.
> string not terminated properly
* indirection (\|*\|) used illegally
\&\fB.\fR illegal assignment to ``\|\fB.\fR\|''
\s8A\s10 error in address
\s8B\s10 branch address is odd or too remote
\s8E\s10 error in expression
\s8F\s10 error in local (``f\|'' or ``b'') type symbol
\s8G\s10 garbage (unknown) character
\s8I\s10 end of file inside an \fB.if\fR
\s8M\s10 multiply defined symbol as label
\s8O\s10 word quantity assembled at odd address
\s8P\s10 phase error\(em ``\|\fB.\fP\|'' different in pass 1 and 2
\s8R\s10 relocation error
\s8U\s10 undefined symbol