usr/src/old/as.vax/PSD.doc/asdocs1.me

.\" Copyright (c) 1982 The Regents of the University of California.
.\" All rights reserved.
.\"
.\" Redistribution and use in source and binary forms, with or without
.\" modification, are permitted provided that the following conditions
.\" are met:
.\" 1. Redistributions of source code must retain the above copyright
.\"    notice, this list of conditions and the following disclaimer.
.\" 2. Redistributions in binary form must reproduce the above copyright
.\"    notice, this list of conditions and the following disclaimer in the
.\"    documentation and/or other materials provided with the distribution.
.\" 3. All advertising materials mentioning features or use of this software
.\"    must display the following acknowledgement:
.\"     This product includes software developed by the University of
.\"     California, Berkeley and its contributors.
.\" 4. Neither the name of the University nor the names of its contributors
.\"    may be used to endorse or promote products derived from this software
.\"    without specific prior written permission.
.\"
.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
.\" ARE DISCLAIMED.  IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
.\" SUCH DAMAGE.
.\"
.\"     @(#)asdocs1.me  5.1 (Berkeley) 4/17/91
.\"
.EQ
delim $$
.EN
.(l C
.i "\*(VS \*(AM"
.sp 2.0v
John F. Reiser
Bell Laboratories,
Holmdel, NJ
.sp 1.0v
.i and
.sp 1.0v
Robert R. Henry\**
.(f
\**Preparation of this paper supported in part
by the National Science Foundation under grant MCS #78-07291.
.)f
Electronics Research Laboratory
University of California
Berkeley, CA  94720
.sp 1.0v
November 5, 1979
.sp 1.0v
.i Revised
\*(TD
.)l
.SH 1 Introduction
.pp
This document describes the usage and input syntax
of the \*(UX \*(VX-11 assembler
.i as .
.i As
is designed for assembling the code produced by the
\*(CL compiler;
certain concessions have been made to handle code written
directly by people,
but in general little sympathy has been extended.
This document is intended only for the writer of a compiler or a maintainer
of the assembler.
.SH 2 "Assembler Revisions since November 5, 1979"
.pp
There has been one major change to
.i as
since the last release.
.i As
has been updated to assemble the new instructions and
data formats for
.q G
and
.q H
floating point numbers,
as well as the new queue instructions.
.SH 2 "Features Supported, but No Longer Encouraged as of \*(TD"
.pp
These feature(s) in
.i as
are supported, but no longer encouraged.
.ip -
The colon operator for field initialization is likely to disappear.
.SH 1 "Usage"
.pp
.i As
is invoked with these command arguments:
.br
.sp 0.25v
as
[
.b \-LVWJR
]
[
.b \-d $n$
]
[
.b \-DTS
]
[
.b \-t
.i directory
]
[
.b \-o
.i output
]
[ $name sub 1$ ] $...$
[ $name sub n$ ]
.br
.sp 0.25v
.pp
The
.b \-L
flag instructs the assembler to save labels beginning with a
.q L
in the symbol table portion of the
.i output
file.
Labels are not saved by default,
as the default action of the link editor
.i ld
is to discard them anyway.
.pp
The
.b \-V
flag tells the assembler to place its interpass temporary
file into virtual memory.
In normal circumstances,
the system manager will decide where the temporary file should lie.
Our experiments
with very large temporary files show that placing the temporary
file into virtual memory will save about 13% of the assembly time,
where the size of the temporary file is about 350K bytes.
Most assembler sources will not be this long.
.pp
The
.b \-W
turns of all warning error reporting.
.pp
The
.b \-J
flag forces \*(UX style pseudo\-branch
instructions with destinations further away than a
byte displacement to be
turned into jump instructions with 4 byte offsets.
The
.b \-J
flag buys you nothing if
.b \-d2
is set.
(See \(sc8.4, and future work described in \(sc11)
.pp
The
.b \-R
flag effectively turns
.q "\fB.data\fP $n$"
directives into
.q "\fB.text\fP $n$"
directives.
This obviates the need to run editor scripts on assembler source to
.q "read\-only"
fix initialized data segments.
Uninitialized data (via
.b .lcomm
and
.b .comm
directives)
is still assembled into the data or bss segments.
.pp
The
.b \-d
flag specifies the number of bytes
which the assembler should allow for a displacement when the value of the
displacement expression is undefined in the first pass.
The possible values of
.i n
are 1, 2, or 4;
the assembler uses 4 bytes
if
.b -d
is not specified.
See \(sc8.2.
.pp
Provided the
.b \-V
flag is not set,
the
.b \-t
flag causes the assembler to place its single temporary file
in the
.i directory
instead of in
.i /tmp .
.pp
The
.b \-o
flag causes the output to be placed on the file
.i output .
By default,
the output of the assembler is placed in the file
.i a.out
in the current directory.
.pp
The input to the assembler is normally taken from the standard input.
If file arguments occur,
then the input is taken sequentially from the files
$name sub 1$,
$name sub 2~...~name sub n$
This is not to say that the files are assembled separately;
$name sub 1$ is effectively concatenated to $name sub 2$,
so multiple definitions cannot occur amongst the input sources.
.pp
.pp
The
.b \-D
(debug),
.b \-T
(token trace),
and the
.b \-S
(symbol table)
flags enable assembler trace information,
provided that the assembler has been compiled with
the debugging code enabled.
The information printed is long and boring,
but useful when debugging the assembler.
.SH 1 "Lexical conventions"
.pp
Assembler tokens include identifiers (alternatively,
.q symbols
or
.q names ),
constants,
and operators.
.SH 2 "Identifiers"
.pp
An identifier consists of a sequence of alphanumeric characters
(including
period
.q "\fB\|.\|\fP" ,
underscore
.q "\*(US" ,
and
dollar
.q "\*(DL" ).
The first character may not be numeric.
Identifiers may be (practically) arbitrary long;
all characters are significant.
.SH 2 "Constants"
.SH 3 "Scalar constants"
.pp
All scalar (non floating point)
constants are (potentially) 128 bits wide.
Such constants are interpreted as two's complement numbers.
Note that 64 bit (quad words) and 128 bit (octal word) integers
are only partially supported by the \*(VX hardware.
In addition,
128 bit integers are only supported by the extended \*(VX architecture.
.i As
supports 64 and 128 bit integers
only so they can be used as immediate constants
or to fill initialized data space.
.i As
can not perform arithmetic on constants larger than 32 bits.
.pp
Scalar constants are initially evaluated to a full 128 bits,
but are pared down by discarding high order copies of the sign bit
and categorizing the number as a long, quad or octal integer.
Numbers with less precision than 32 bits are treated as 32 bit quantities.
.pp
The digits are
.q 0123456789abcdefABCDEF
with the obvious values.
.pp
An octal constant consists of a sequence of digits with a leading zero.
.pp
A decimal constant consists of a sequence of digits without a leading zero.
.pp
A hexadecimal constant consists of the characters
.q 0x
(or
.q 0X )
followed by a sequence of digits.
.pp
A single-character constant consists of a single quote
.q "\|\(fm\|"
followed by an \*(AC character,
including \*(AC newline.
The constant's value is the code for the
given character.
.SH 3 "Floating Point Constants"
.pp
Floating point constants are internally represented
in the \*(VX floating point format
that is specified by the lexical form of the constant.
Using the meta notation that
[dec] is a decimal digit (\c
.q "0123456789" ),
[expt] is a type specification character (\c,
.q "fFdDhHgG" ),
[expe] is a exponent delimiter and type specification character (\c,
.q "eEfFdDhHgG" ),
$x sup roman "*"$ means 0 or more occurences of $x$,
$x sup +$ means 1 or more occurences of $x$,
then the general lexical form of a floating point number is:
.ce 1
0[expe]([+-])$roman "[dec]" sup +$(.)($roman "[dec]" sup roman "*"$)([expt]([+-])($roman "dec]" sup +$))
.ce 0
The standard semantic interpretation is used for the
signed integer, fraction and signed power of 10 exponent.
If the exponent delimiter is specified,
it must be either an
.q e
or
.q E ,
or must agree with the initial type specification character that is used.
The type specification character specifies
the type and representation of the constructed number, as follows:
.(b
.TS
center;
c l c
c l n.
type character  floating representation size (bits)
_
f, F    F format floating       32
d, D    D format floating       64
g, G    G format floating       64
h, H    H format floating       128
.TE
.)b
Note that
.q G
and
.q H
format floating point numbers are not supported
by all implementations of the \*(VX architecture.
.i As
does not require the augmented architecture in order to run.
.pp
The assembler uses the library routine
.i atof()
to convert
.q F
and
.q D
numbers,
and uses its own conversion routine
(derived from
.i atof ,
and believed to be numerically accurate)
to convert
.q G
and
.q H
floating point numbers.
.pp
Collectively,
all floating point numbers,
together with quad and octal scalars are called
.i Bignums .
When
.i as
requires a Bignum,
a 32 bit scalar quantity may also be used.
.SH 3 "String Constants"
.pp
A string constant is defined using
the same syntax and semantics as the \*(CL language uses.
Strings begin and end with a
.q "''"
(double quote).
The \*(DM assembler conventions for flexible string quoting is
not implemented.
All \*(CL backslash conventions are observed;
the backslash conventions
peculiar to the \*(PD assembler are not observed.
Strings are known by their value and their length;
the assembler does not implicitly end strings with a null byte.
.SH 2 "Operators"
.pp
There are several single-character
operators;
see \(sc6.1.
.SH 2 "Blanks"
.pp
Blank and tab characters
may be interspersed freely between tokens,
but may not be used within tokens (except character constants).
A blank or tab is required to separate adjacent
identifiers or constants not otherwise separated.
.SH 2 "Scratch Mark Comments"
.pp
The character
.q "#"
introduces a comment,
which extends through the end of the line on which it appears.
Comments starting in column 1,
having the format
.q "# $expression~~string$" ,
are interpreted as an indication that the assembler is now assembling
file
.i string
at line
.i expression .
Thus, one can use the \*(CL preprocessor on an assembly language source file,
and use the
.i #include
and
.i #define
preprocessor directives.
(Note that there may not be an assembler comment starting in column
1 if the assembler source is given to the \*(CL preprocessor,
as it will be interpreted by the preprocessor in a way not intended.)
Comments are otherwise ignored by the assembler.
.SH 2 "\*(CL Style Comments"
.pp
The assembler will recognize \*(CL style comments,
introduced with the prologue
.b "/*"
and ending with the epilogue
.b "*/" .
\*(CL style comments may extend across multiple lines,
and are the preferred comment style
to use if one chooses to use the \*(CL preprocessor.
.SH 1 "Segments and Location Counters"
.pp
Assembled code and data fall into three segments:  the text segment,
the data segment,
and the bss segment.
The \*(UX operating system makes
some assumptions about the content of these segments;
the assembler does not.
Within the text and data segments there are a number of sub-segments,
distinguished by number (\c
.q "\fBtext\fP 0" ,
.q "\fBtext\fP 1" ,
$...$
.q "\fBdata\fP 0" ,
.q "\fBdata\fP 1" ,
$...$).
Currently there are four subsegments each in text and data.
The subsegments are for programming convenience only.
.pp
Before writing the output file,
the assembler zero-pads each text subsegment to a multiple of four
bytes and then concatenates the subsegments in order to form the text segment;
an analogous operation is done for the data segment.
Requesting that the loader define symbols and storage regions is the only
action allowed by the assembler with respect to the bss segment.
Assembly begins in
.q "\fBtext\fP 0" .
.pp
Associated with each (sub)segment is an implicit location counter which
begins at zero and is incremented by 1 for each byte assembled into the
(sub)segment.
There is no way to explicitly reference a location counter.
Note that the location counters of subsegments other than
.q "\fBtext\fP 0"
and
.q "\fBdata\fP 0"
behave peculiarly due to the concatenation used to form
the text and data segments.