usr/src/old/as.vax/PSD.doc/asdocs1.me

.\"
.\"     Copyright (c) 1982 Regents of the University of California
.\"     @(#)asdocs1.me 1.3 %G%
.\"
.EQ
delim $$
.EN
.(l C
.i "\*(VX Assembler Reference Manual"
.sp 2.0v
John F. Reiser
Bell Laboratories,
Holmdel, NJ
.sp 1.0v
.i and
.sp 1.0v
Robert R. Henry\**
.(f
\**Preparation of this paper supported in part
by the National Science Foundation under grant MCS #78-07291.
.)f
Electronics Research Laboratory
University of California
Berkeley, CA  94720
.sp 1.0v
November 5, 1979
.sp 1.0v
.i Revised
August 11, 1982
.)l
.SH 1 Introduction
.pp
This document describes the usage and input syntax
of the \*(UX \*(VX-11 assembler
.i as .
.i As
is designed for assembling the code produced by the
\*(CL compiler;
certain concessions have been made to handle code written
directly by people,
but in general little sympathy has been extended.
This document is intended only for the writer of a compiler or a maintainer
of the assembler.
.SH 2 "Assembler Revisions since November 5, 1979"
.pp
These are the major changes to
.i as
since the last release:
.ip -
.i As
has been updated to assemble the new instructions and
data formats for
.q G
and
.q H
floating point numbers,
as well as the new queue instructions.
.ip -
.i As
no longer supports the colon operator for initializing fields.
.ip -
.i As
no longer supports the
.b stab
directive.
.ip -
A new keyword,
.b .global
has been added
to augment the keyword
.b .globl .
.ip -
.i As
accepts and can generate a
.i "binary symbolic"
intermediate representation of the assembly code.
This standard, efficient format is
intended to be used with assembly code optimizers
and newer compilers.
This intermediate form is documented in the appendix.
(These additional producers and consumers
have not yet been implemented.)
.SH 1 "Usage"
.pp
.i As
is invoked with these command arguments:
.br
.sp 0.25v
as
[
.b \-LVWJR
]
[
.b \-d $n$
]
[
.b \-DTS
]
[
.b \-t
.i directory
]
[
.b \-o
.i output
]
[ $name sub 1$ ] $...$
[ $name sub n$ ]
.br
.sp 0.25v
.pp
The
.b \-L
flag instructs the assembler to save labels beginning with a
.q L
in the symbol table portion of the
.i output
file.
Labels are not saved by default,
as the default action of the link editor
.i ld
is to discard them anyway.
.pp
The
.b \-V
flag tells the assembler to place its interpass temporary
file into virtual memory.
In normal circumstances,
the system manager will decide where the temporary file should lie.
Our experiments
with very large temporary files show that placing the temporary
file into virtual memory will save about 13% of the assembly time,
where the size of the temporary file is about 350K bytes.
Most assembler sources will not be this long.
.pp
The
.b \-W
turns of all warning error reporting.
.pp
The
.b \-J
flag forces \*(UX style pseudo\-branch
instructions with destinations further away than a
byte displacement to be
turned into jump instructions with 4 byte offsets.
The
.b \-J
flag buys you nothing if
.b \-d2
is set.
(See \(sc8.4, and future work described in \(sc11)
.pp
The
.b \-R
flag effectively turns
.q "\fB.data\fP $n$"
directives into
.q "\fB.text\fP $n$"
directives.
This obviates the need to run editor scripts on assembler source to
.q "read\-only"
fix initialized data segments.
Uninitialized data (via
.b .lcomm
and
.b .comm
directives)
is still assembled into the data or bss segments.
.pp
The
.b \-d
flag specifies the number of bytes
which the assembler should allow for a displacement when the value of the
displacement expression is undefined in the first pass.
The possible values of
.i n
are 1, 2, or 4;
the assembler uses 4 bytes
if
.b -d
is not specified.
See \(sc8.2.
.pp
Provided the
.b \-V
flag is not set,
the
.b \-t
flag causes the assembler to place its single temporary file
in the
.i directory
instead of in
.i /tmp .
.pp
The
.b \-o
flag causes the output to be placed on the file
.i output .
By default,
the output of the assembler is placed in the file
.i a.out
in the current directory.
.pp
The input to the assembler is normally taken from the standard input.
If file arguments occur,
then the input is taken sequentially from the files
$name sub 1$,
$name sub 2~...~name sub n$
This is not to say that the files are assembled seperately;
$name sub 1$ is effectively concatenated to $name sub 2$,
so multiple definitions cannot occur amongst the input sources.
.pp
.pp
The
.b \-D
(debug),
.b \-T
(token trace),
and the
.b \-S
(symbol table)
flags enable assembler trace information,
provided that the assembler has been compiled with
the debugging code enabled.
The information printed is long and boring,
but useful when debugging the assembler.
.SH 1 "Lexical conventions"
.pp
Assembler tokens include identifiers (alternatively,
.q symbols
or
.q names ),
constants,
and operators.
.SH 2 "Identifiers"
.pp
An identifier consists of a sequence of alphanumeric characters
(including
period
.q "\fB\|.\|\fP" ,
underscore
.q "\*(US" ,
and
dollar
.q "\*(DL" ).
The first character may not be numeric.
Identifier may be (practically) arbitrary long;
all characters are significant.
.SH 2 "Constants"
.SH 3 "Scalar constants"
.pp
All scalar (non floating point)
constants are (potentially) 128 bits wide.
Such constants are interpreted as two's complement numbers.
Note that 64 bit (quad words) and 128 bit (octal word) integers
are only partially supported by the \*(VX hardware.
In addition,
128 bit integers are only supported by the extended \*(VX architecture.
.i As
supports 64 and 128 bit integers
only so they can be used as immediate constants
or to fill intialized data space.
.i As
can not perform arithmetic on constants larger than 32 bits.
.pp
Scalar constants are initially evaluated to a full 128 bits,
but are pared down by discarding high order copies of the sign bit
and categorizing the number as a long, quad or octal integer.
Numbers with less precision than 32 bits are treated as 32 bit quantities.
.pp
The digits are
.q 0123456789abcdefABCDEF
with the obvious values.
.pp
An octal constant consists of a sequence of digits with a leading zero.
.pp
A decimal constant consists of a sequence of digits without a leading zero.
.pp
A hexadecimal constant consists of the characters
.q 0x
(or
.q 0X )
followed by a sequence of digits.
.pp
A single-character constant consists of a single quote
.q "\|\(fm\|"
followed by an \*(AC character,
including \*(AC newline.
The constant's value is the code for the
given character.
.SH 3 "Floating Point Constants"
.pp
Floating point constants are internally represented
in the \*(VX floating point format
that is specified by the lexical form of the constant.
Using the meta notation that
[dec] is a decimal digit (\c
.q "0123456789" ),
[expt] is a type specification character (\c,
.q "fFdDhHgG" ),
[expe] is a exponent delimiter and type specification character (\c,
.q "eEfFdDhHgG" ),
$x sup roman "*"$ means 0 or more occurances of $x$,
$x sup +$ means 1 or more occurances of $x$,
then the general lexical form of a floating point number is:
.ce 1
0[expe]([+-])$roman "[dec]" sup +$(.)($roman "[dec]" sup roman "*"$)([expt]([+-])($roman "dec]" sup +$))
.ce 0
The standard semantic interpretation is used for the
signed integer, fraction and signed power of 10 exponent.
If the exponent delimiter is specified,
it must be either an
.q e
or
.q E ,
or must agree with the initial type specification character that is used.
The type specification character specifies
the type and representation of the constructed number, as follows:
.(b
.TS
center;
c l c
c l n.
type character  floating representation size (bits)
_
f, F    F format floating       32
d, D    D format floating       64
g, G    G format floating       64
h, H    H format floating       128
.TE
.)b
Note that
.q G
and
.q H
format floating point numbers are not supported
by all implementations of the \*(VX architecture.
.i As
does not require the augmented architecture in order to run.
.pp
The assembler uses the library routine
.i atof()
to convert
.q F
and
.q D
numbers,
and uses its own conversion routine
(derived from
.i atof ,
and believed to be numerically accurate)
to convert
.q G
and
.q H
floating point numbers.
.pp
Collectively,
all floating point numbers,
together with quad and octal scalars are called
.i Bignums .
When
.i as
requires a Bignum,
a 32 bit scalar quantity may also be used.
.SH 3 "String Constants"
.pp
A string constant is defined using
the same syntax and semantics as the \*(CL langauge uses.
Strings begin and end with a
.q "''"
(double quote).
The \*(DM assembler conventions for flexible string quoting is
not implemented.
All \*(CL backslash conventions are observed;
the backslash conventions
peculiar to the \*(PD assembler are not observed.
Strings are known by their value and their length;
the assembler does not implicitly end strings with a null byte.
.SH 2 "Operators"
.pp
There are several single-character
operators;
see \(sc6.1.
.SH 2 "Blanks"
.pp
Blank and tab characters
may be interspersed freely between tokens,
but may not be used within tokens (except character constants).
A blank or tab is required to separate adjacent
identifiers or constants not otherwise separated.
.SH 2 "Scratch Mark Comments"
.pp
The character
.q "#"
introduces a comment,
which extends through the end of the line on which it appears.
Comments starting in column 1,
having the format
.q "# $expression~~string$"
are interpreted as an indication that the assembler is now assembling
file
.i string
at line
.i expression .
Thus, one can use the \*(CL preprocessor on an assembly language source file,
and use the
.i #include
and
.i #define
preprocessor directives.
(Note that there may not be an assembler comment starting in column
1 if the assembler source is given to the \*(CL preprocessor,
as it will be intrepreted by the preprocessor in a way not intended.)
Comments are otherwise ignored by the assembler.
.SH 2 "\*(CL Style Comments"
.pp
The assembler will recognize \*(CL style comments,
introduced with the prologue
.b "/*"
and ending with the epilogue
.b "*/" .
\*(CL style comments may extend across multiple lines,
and are the preferred comment style
to use if one chooses to use the \*(CL preprocessor.
.SH 1 "Segments and Location Counters"
.pp
Assembled code and data fall into three segments:  the text segment,
the data segment,
and the bss segment.
The \*(UX operating system makes
some assumptions about the content of these segments;
the assembler does not.
Within the text and data segments there are a number of sub-segments,
distinguished by number (\c
.q "\fBtext\fP 0" ,
.q "\fBtext\fP 1" ,
$...$
.q "\fBdata\fP 0" ,
.q "\fBdata\fP 1" ,
$...$).
Currently there are four subsegments each in text and data.
The subsegments are for programming convenience only.
.pp
Before writing the output file,
the assembler zero-pads each text subsegment to a multiple of four
bytes and then concatenates the subsegments in order to form the text segment;
an analogous operation is done for the data segment.
Requesting that the loader define symbols and storage regions is the only
action allowed by the assembler with respect to the bss segment.
Assembly begins in
.q "\fBtext\fP 0" .
.pp
Associated with each (sub)segment is an implicit location counter which
begins at zero and is incremented by 1 for each byte assembled into the
(sub)segment.
There is no way to explicitly reference a location counter.
Note that the location counters of subsegments other than
.q "\fBtext\fP 0"
and
.q "\fBdata\fP 0"
behave peculiarly due to the concatenation used to form
the text and data segments.