usr/src/old/as.vax/PSD.doc/asdocs1.me

.\"
.\"	Copyright (c) 1982 Regents of the University of California
.\"	@(#)asdocs1.me 1.7 %G%
.\"
.EQ
delim $$
.EN
.(l C
.i "\*(VS \*(AM"
.sp 2.0v
John F. Reiser
Bell Laboratories,
Holmdel, NJ
.sp 1.0v
.i and
.sp 1.0v
Robert R. Henry\**
.(f
\**Preparation of this paper supported in part
by the National Science Foundation under grant MCS #78-07291.
.)f
Electronics Research Laboratory
University of California
Berkeley, CA  94720
.sp 1.0v
November 5, 1979
.sp 1.0v
.i Revised
\*(TD
.)l
.SH 1 Introduction
.pp
This document describes the usage and input syntax
of the \*(UX \*(VX-11 assembler
.i as .
.i As
is designed for assembling the code produced by the
\*(CL compiler;
certain concessions have been made to handle code written
directly by people,
but in general little sympathy has been extended.
This document is intended only for the writer of a compiler or a maintainer
of the assembler.
.SH 2 "Assembler Revisions since November 5, 1979"
.pp
There has been one major change to
.i as
since the last release.
.i As
has been updated to assemble the new instructions and
data formats for
.q G
and
.q H
floating point numbers,
as well as the new queue instructions.
.SH 2 "Features Supported, but No Longer Encouraged as of \*(TD"
.pp
These feature(s) in
.i as
are supported, but no longer encouraged.
.ip -
The colon operator for field initialization is likely to disappear.
.SH 1 "Usage"
.pp
.i As
is invoked with these command arguments:
.br
.sp 0.25v
as
[
.b \-LVWJR
]
[
.b \-d $n$
]
[
.b \-DTS
]
[
.b \-t
.i directory
]
[
.b \-o
.i output
]
[ $name sub 1$ ] $...$
[ $name sub n$ ]
.br
.sp 0.25v
.pp
The
.b \-L
flag instructs the assembler to save labels beginning with a
.q L
in the symbol table portion of the
.i output
file.
Labels are not saved by default,
as the default action of the link editor
.i ld
is to discard them anyway.
.pp
The
.b \-V
flag tells the assembler to place its interpass temporary
file into virtual memory.
In normal circumstances,
the system manager will decide where the temporary file should lie.
Our experiments
with very large temporary files show that placing the temporary
file into virtual memory will save about 13% of the assembly time,
where the size of the temporary file is about 350K bytes.
Most assembler sources will not be this long.
.pp
The
.b \-W
turns of all warning error reporting.
.pp
The
.b \-J
flag forces \*(UX style pseudo\-branch
instructions with destinations further away than a
byte displacement to be
turned into jump instructions with 4 byte offsets.
The
.b \-J
flag buys you nothing if
.b \-d2
is set.
(See \(sc8.4, and future work described in \(sc11)
.pp
The
.b \-R
flag effectively turns
.q "\fB.data\fP $n$"
directives into
.q "\fB.text\fP $n$"
directives.
This obviates the need to run editor scripts on assembler source to
.q "read\-only"
fix initialized data segments.
Uninitialized data (via
.b .lcomm
and
.b .comm
directives)
is still assembled into the data or bss segments.
.pp
The
.b \-d
flag specifies the number of bytes
which the assembler should allow for a displacement when the value of the
displacement expression is undefined in the first pass.
The possible values of
.i n
are 1, 2, or 4;
the assembler uses 4 bytes
if
.b -d
is not specified.
See \(sc8.2.
.pp
Provided the
.b \-V
flag is not set,
the
.b \-t
flag causes the assembler to place its single temporary file
in the
.i directory
instead of in
.i /tmp .
.pp
The
.b \-o
flag causes the output to be placed on the file
.i output .
By default,
the output of the assembler is placed in the file
.i a.out
in the current directory.
.pp
The input to the assembler is normally taken from the standard input.
If file arguments occur,
then the input is taken sequentially from the files
$name sub 1$,
$name sub 2~...~name sub n$
This is not to say that the files are assembled separately;
$name sub 1$ is effectively concatenated to $name sub 2$,
so multiple definitions cannot occur amongst the input sources.
.pp
.pp
The
.b \-D
(debug),
.b \-T
(token trace),
and the
.b \-S
(symbol table)
flags enable assembler trace information,
provided that the assembler has been compiled with
the debugging code enabled.
The information printed is long and boring,
but useful when debugging the assembler.
.SH 1 "Lexical conventions"
.pp
Assembler tokens include identifiers (alternatively,
.q symbols
or
.q names ),
constants,
and operators.
.SH 2 "Identifiers"
.pp
An identifier consists of a sequence of alphanumeric characters
(including
period
.q "\fB\|.\|\fP" ,
underscore
.q "\*(US" ,
and
dollar
.q "\*(DL" ).
The first character may not be numeric.
Identifiers may be (practically) arbitrary long;
all characters are significant.
.SH 2 "Constants"
.SH 3 "Scalar constants"
.pp
All scalar (non floating point)
constants are (potentially) 128 bits wide.
Such constants are interpreted as two's complement numbers.
Note that 64 bit (quad words) and 128 bit (octal word) integers
are only partially supported by the \*(VX hardware.
In addition,
128 bit integers are only supported by the extended \*(VX architecture.
.i As
supports 64 and 128 bit integers
only so they can be used as immediate constants
or to fill initialized data space.
.i As
can not perform arithmetic on constants larger than 32 bits.
.pp
Scalar constants are initially evaluated to a full 128 bits,
but are pared down by discarding high order copies of the sign bit
and categorizing the number as a long, quad or octal integer.
Numbers with less precision than 32 bits are treated as 32 bit quantities.
.pp
The digits are
.q 0123456789abcdefABCDEF
with the obvious values.
.pp
An octal constant consists of a sequence of digits with a leading zero.
.pp
A decimal constant consists of a sequence of digits without a leading zero.
.pp
A hexadecimal constant consists of the characters
.q 0x
(or
.q 0X )
followed by a sequence of digits.
.pp
A single-character constant consists of a single quote
.q "\|\(fm\|"
followed by an \*(AC character,
including \*(AC newline.
The constant's value is the code for the
given character.
.SH 3 "Floating Point Constants"
.pp
Floating point constants are internally represented
in the \*(VX floating point format
that is specified by the lexical form of the constant.
Using the meta notation that
[dec] is a decimal digit (\c
.q "0123456789" ),
[expt] is a type specification character (\c,
.q "fFdDhHgG" ),
[expe] is a exponent delimiter and type specification character (\c,
.q "eEfFdDhHgG" ),
$x sup roman "*"$ means 0 or more occurences of $x$,
$x sup +$ means 1 or more occurences of $x$,
then the general lexical form of a floating point number is:
.ce 1
0[expe]([+-])$roman "[dec]" sup +$(.)($roman "[dec]" sup roman "*"$)([expt]([+-])($roman "dec]" sup +$))
.ce 0
The standard semantic interpretation is used for the
signed integer, fraction and signed power of 10 exponent.
If the exponent delimiter is specified,
it must be either an
.q e
or
.q E ,
or must agree with the initial type specification character that is used.
The type specification character specifies
the type and representation of the constructed number, as follows:
.(b
.TS
center;
c l c
c l n.
type character	floating representation	size (bits)
_
f, F	F format floating	32
d, D	D format floating	64
g, G	G format floating	64
h, H	H format floating	128
.TE
.)b
Note that
.q G
and
.q H
format floating point numbers are not supported
by all implementations of the \*(VX architecture.
.i As
does not require the augmented architecture in order to run.
.pp
The assembler uses the library routine
.i atof()
to convert
.q F
and
.q D
numbers,
and uses its own conversion routine
(derived from
.i atof ,
and believed to be numerically accurate)
to convert
.q G
and
.q H
floating point numbers.
.pp
Collectively,
all floating point numbers,
together with quad and octal scalars are called
.i Bignums .
When
.i as
requires a Bignum,
a 32 bit scalar quantity may also be used.
.SH 3 "String Constants"
.pp
A string constant is defined using
the same syntax and semantics as the \*(CL language uses.
Strings begin and end with a
.q "''"
(double quote).
The \*(DM assembler conventions for flexible string quoting is
not implemented.
All \*(CL backslash conventions are observed;
the backslash conventions
peculiar to the \*(PD assembler are not observed.
Strings are known by their value and their length;
the assembler does not implicitly end strings with a null byte.
.SH 2 "Operators"
.pp
There are several single-character
operators;
see \(sc6.1.
.SH 2 "Blanks"
.pp
Blank and tab characters
may be interspersed freely between tokens,
but may not be used within tokens (except character constants).
A blank or tab is required to separate adjacent
identifiers or constants not otherwise separated.
.SH 2 "Scratch Mark Comments"
.pp
The character
.q "#"
introduces a comment,
which extends through the end of the line on which it appears.
Comments starting in column 1,
having the format
.q "# $expression~~string$" ,
are interpreted as an indication that the assembler is now assembling
file
.i string
at line
.i expression .
Thus, one can use the \*(CL preprocessor on an assembly language source file,
and use the
.i #include
and
.i #define
preprocessor directives.
(Note that there may not be an assembler comment starting in column
1 if the assembler source is given to the \*(CL preprocessor,
as it will be interpreted by the preprocessor in a way not intended.)
Comments are otherwise ignored by the assembler.
.SH 2 "\*(CL Style Comments"
.pp
The assembler will recognize \*(CL style comments,
introduced with the prologue
.b "/*"
and ending with the epilogue
.b "*/" .
\*(CL style comments may extend across multiple lines,
and are the preferred comment style
to use if one chooses to use the \*(CL preprocessor.
.SH 1 "Segments and Location Counters"
.pp
Assembled code and data fall into three segments:  the text segment,
the data segment,
and the bss segment.
The \*(UX operating system makes
some assumptions about the content of these segments;
the assembler does not.
Within the text and data segments there are a number of sub-segments,
distinguished by number (\c
.q "\fBtext\fP 0" ,
.q "\fBtext\fP 1" ,
$...$
.q "\fBdata\fP 0" ,
.q "\fBdata\fP 1" ,
$...$).
Currently there are four subsegments each in text and data.
The subsegments are for programming convenience only.
.pp
Before writing the output file,
the assembler zero-pads each text subsegment to a multiple of four
bytes and then concatenates the subsegments in order to form the text segment;
an analogous operation is done for the data segment.
Requesting that the loader define symbols and storage regions is the only
action allowed by the assembler with respect to the bss segment.
Assembly begins in
.q "\fBtext\fP 0" .
.pp
Associated with each (sub)segment is an implicit location counter which
begins at zero and is incremented by 1 for each byte assembled into the
(sub)segment.
There is no way to explicitly reference a location counter.
Note that the location counters of subsegments other than
.q "\fBtext\fP 0"
and
.q "\fBdata\fP 0"
behave peculiarly due to the concatenation used to form
the text and data segments.
Commit	Line	Data
	1	.\"
	2	.\" Copyright (c) 1982 Regents of the University of California
	3	.\" @(#)asdocs1.me 1.7 %G%
	4	.\"
	5	.EQ
	6	delim $$
	7	.EN
	8	.(l C
	9	.i "\(VS \(AM"
	10	.sp 2.0v
	11	John F. Reiser
	12	Bell Laboratories,
	13	Holmdel, NJ
	14	.sp 1.0v
	15	.i and
	16	.sp 1.0v
	17	Robert R. Henry\**
	18	.(f
	19	\**Preparation of this paper supported in part
	20	by the National Science Foundation under grant MCS #78-07291.
	21	.)f
	22	Electronics Research Laboratory
	23	University of California
	24	Berkeley, CA 94720
	25	.sp 1.0v
	26	November 5, 1979
	27	.sp 1.0v
	28	.i Revised
	29	\*(TD
	30	.)l
	31	.SH 1 Introduction
	32	.pp
	33	This document describes the usage and input syntax
	34	of the \(UX \(VX-11 assembler
	35	.i as .
	36	.i As
	37	is designed for assembling the code produced by the
	38	\*(CL compiler;
	39	certain concessions have been made to handle code written
	40	directly by people,
	41	but in general little sympathy has been extended.
	42	This document is intended only for the writer of a compiler or a maintainer
	43	of the assembler.
	44	.SH 2 "Assembler Revisions since November 5, 1979"
	45	.pp
	46	There has been one major change to
	47	.i as
	48	since the last release.
	49	.i As
	50	has been updated to assemble the new instructions and
	51	data formats for
	52	.q G
	53	and
	54	.q H
	55	floating point numbers,
	56	as well as the new queue instructions.
	57	.SH 2 "Features Supported, but No Longer Encouraged as of \*(TD"
	58	.pp
	59	These feature(s) in
	60	.i as
	61	are supported, but no longer encouraged.
	62	.ip -
	63	The colon operator for field initialization is likely to disappear.
	64	.SH 1 "Usage"
	65	.pp
	66	.i As
	67	is invoked with these command arguments:
	68	.br
	69	.sp 0.25v
	70	as
	71	[
	72	.b \-LVWJR
	73	]
	74	[
	75	.b \-d $n$
	76	]
	77	[
	78	.b \-DTS
	79	]
	80	[
	81	.b \-t
	82	.i directory
	83	]
	84	[
	85	.b \-o
	86	.i output
	87	]
	88	[ $name sub 1$ ] $...$
	89	[ $name sub n$ ]
	90	.br
	91	.sp 0.25v
	92	.pp
	93	The
	94	.b \-L
	95	flag instructs the assembler to save labels beginning with a
	96	.q L
	97	in the symbol table portion of the
	98	.i output
	99	file.
	100	Labels are not saved by default,
	101	as the default action of the link editor
	102	.i ld
	103	is to discard them anyway.
	104	.pp
	105	The
	106	.b \-V
	107	flag tells the assembler to place its interpass temporary
	108	file into virtual memory.
	109	In normal circumstances,
	110	the system manager will decide where the temporary file should lie.
	111	Our experiments
	112	with very large temporary files show that placing the temporary
	113	file into virtual memory will save about 13% of the assembly time,
	114	where the size of the temporary file is about 350K bytes.
	115	Most assembler sources will not be this long.
	116	.pp
	117	The
	118	.b \-W
	119	turns of all warning error reporting.
	120	.pp
	121	The
	122	.b \-J
	123	flag forces \*(UX style pseudo\-branch
	124	instructions with destinations further away than a
	125	byte displacement to be
	126	turned into jump instructions with 4 byte offsets.
	127	The
	128	.b \-J
	129	flag buys you nothing if
	130	.b \-d2
	131	is set.
	132	(See \(sc8.4, and future work described in \(sc11)
	133	.pp
	134	The
	135	.b \-R
	136	flag effectively turns
	137	.q "\fB.data\fP $n$"
	138	directives into
	139	.q "\fB.text\fP $n$"
	140	directives.
	141	This obviates the need to run editor scripts on assembler source to
	142	.q "read\-only"
	143	fix initialized data segments.
	144	Uninitialized data (via
	145	.b .lcomm
	146	and
	147	.b .comm
	148	directives)
	149	is still assembled into the data or bss segments.
	150	.pp
	151	The
	152	.b \-d
	153	flag specifies the number of bytes
	154	which the assembler should allow for a displacement when the value of the
	155	displacement expression is undefined in the first pass.
	156	The possible values of
	157	.i n
	158	are 1, 2, or 4;
	159	the assembler uses 4 bytes
	160	if
	161	.b -d
	162	is not specified.
	163	See \(sc8.2.
	164	.pp
	165	Provided the
	166	.b \-V
	167	flag is not set,
	168	the
	169	.b \-t
	170	flag causes the assembler to place its single temporary file
	171	in the
	172	.i directory
	173	instead of in
	174	.i /tmp .
	175	.pp
	176	The
	177	.b \-o
	178	flag causes the output to be placed on the file
	179	.i output .
	180	By default,
	181	the output of the assembler is placed in the file
	182	.i a.out
	183	in the current directory.
	184	.pp
	185	The input to the assembler is normally taken from the standard input.
	186	If file arguments occur,
	187	then the input is taken sequentially from the files
	188	$name sub 1$,
	189	$name sub 2~...~name sub n$
	190	This is not to say that the files are assembled separately;
	191	$name sub 1$ is effectively concatenated to $name sub 2$,
	192	so multiple definitions cannot occur amongst the input sources.
	193	.pp
	194	.pp
	195	The
	196	.b \-D
	197	(debug),
	198	.b \-T
	199	(token trace),
	200	and the
	201	.b \-S
	202	(symbol table)
	203	flags enable assembler trace information,
	204	provided that the assembler has been compiled with
	205	the debugging code enabled.
	206	The information printed is long and boring,
	207	but useful when debugging the assembler.
	208	.SH 1 "Lexical conventions"
	209	.pp
	210	Assembler tokens include identifiers (alternatively,
	211	.q symbols
	212	or
	213	.q names ),
	214	constants,
	215	and operators.
	216	.SH 2 "Identifiers"
	217	.pp
	218	An identifier consists of a sequence of alphanumeric characters
	219	(including
	220	period
	221	.q "\fB\\|.\\|\fP" ,
	222	underscore
	223	.q "\*(US" ,
	224	and
	225	dollar
	226	.q "\*(DL" ).
	227	The first character may not be numeric.
	228	Identifiers may be (practically) arbitrary long;
	229	all characters are significant.
	230	.SH 2 "Constants"
	231	.SH 3 "Scalar constants"
	232	.pp
	233	All scalar (non floating point)
	234	constants are (potentially) 128 bits wide.
	235	Such constants are interpreted as two's complement numbers.
	236	Note that 64 bit (quad words) and 128 bit (octal word) integers
	237	are only partially supported by the \*(VX hardware.
	238	In addition,
	239	128 bit integers are only supported by the extended \*(VX architecture.
	240	.i As
	241	supports 64 and 128 bit integers
	242	only so they can be used as immediate constants
	243	or to fill initialized data space.
	244	.i As
	245	can not perform arithmetic on constants larger than 32 bits.
	246	.pp
	247	Scalar constants are initially evaluated to a full 128 bits,
	248	but are pared down by discarding high order copies of the sign bit
	249	and categorizing the number as a long, quad or octal integer.
	250	Numbers with less precision than 32 bits are treated as 32 bit quantities.
	251	.pp
	252	The digits are
	253	.q 0123456789abcdefABCDEF
	254	with the obvious values.
	255	.pp
	256	An octal constant consists of a sequence of digits with a leading zero.
	257	.pp
	258	A decimal constant consists of a sequence of digits without a leading zero.
	259	.pp
	260	A hexadecimal constant consists of the characters
	261	.q 0x
	262	(or
	263	.q 0X )
	264	followed by a sequence of digits.
	265	.pp
	266	A single-character constant consists of a single quote
	267	.q "\\|\(fm\\|"
	268	followed by an \*(AC character,
	269	including \*(AC newline.
	270	The constant's value is the code for the
	271	given character.
	272	.SH 3 "Floating Point Constants"
	273	.pp
	274	Floating point constants are internally represented
	275	in the \*(VX floating point format
	276	that is specified by the lexical form of the constant.
	277	Using the meta notation that
	278	[dec] is a decimal digit (\c
	279	.q "0123456789" ),
	280	[expt] is a type specification character (\c,
	281	.q "fFdDhHgG" ),
	282	[expe] is a exponent delimiter and type specification character (\c,
	283	.q "eEfFdDhHgG" ),
	284	$x sup roman "*"$ means 0 or more occurences of $x$,
	285	$x sup +$ means 1 or more occurences of $x$,
	286	then the general lexical form of a floating point number is:
	287	.ce 1
	288	0[expe]([+-])$roman "[dec]" sup +$(.)($roman "[dec]" sup roman "*"$)([expt]([+-])($roman "dec]" sup +$))
	289	.ce 0
	290	The standard semantic interpretation is used for the
	291	signed integer, fraction and signed power of 10 exponent.
	292	If the exponent delimiter is specified,
	293	it must be either an
	294	.q e
	295	or
	296	.q E ,
	297	or must agree with the initial type specification character that is used.
	298	The type specification character specifies
	299	the type and representation of the constructed number, as follows:
	300	.(b
	301	.TS
	302	center;
	303	c l c
	304	c l n.
	305	type character floating representation size (bits)
	306	_
	307	f, F F format floating 32
	308	d, D D format floating 64
	309	g, G G format floating 64
	310	h, H H format floating 128
	311	.TE
	312	.)b
	313	Note that
	314	.q G
	315	and
	316	.q H
	317	format floating point numbers are not supported
	318	by all implementations of the \*(VX architecture.
	319	.i As
	320	does not require the augmented architecture in order to run.
	321	.pp
	322	The assembler uses the library routine
	323	.i atof()
	324	to convert
	325	.q F
	326	and
	327	.q D
	328	numbers,
	329	and uses its own conversion routine
	330	(derived from
	331	.i atof ,
	332	and believed to be numerically accurate)
	333	to convert
	334	.q G
	335	and
	336	.q H
	337	floating point numbers.
	338	.pp
	339	Collectively,
	340	all floating point numbers,
	341	together with quad and octal scalars are called
	342	.i Bignums .
	343	When
	344	.i as
	345	requires a Bignum,
	346	a 32 bit scalar quantity may also be used.
	347	.SH 3 "String Constants"
	348	.pp
	349	A string constant is defined using
	350	the same syntax and semantics as the \*(CL language uses.
	351	Strings begin and end with a
	352	.q "''"
	353	(double quote).
	354	The \*(DM assembler conventions for flexible string quoting is
	355	not implemented.
	356	All \*(CL backslash conventions are observed;
	357	the backslash conventions
	358	peculiar to the \*(PD assembler are not observed.
	359	Strings are known by their value and their length;
	360	the assembler does not implicitly end strings with a null byte.
	361	.SH 2 "Operators"
	362	.pp
	363	There are several single-character
	364	operators;
	365	see \(sc6.1.
	366	.SH 2 "Blanks"
	367	.pp
	368	Blank and tab characters
	369	may be interspersed freely between tokens,
	370	but may not be used within tokens (except character constants).
	371	A blank or tab is required to separate adjacent
	372	identifiers or constants not otherwise separated.
	373	.SH 2 "Scratch Mark Comments"
	374	.pp
	375	The character
	376	.q "#"
	377	introduces a comment,
	378	which extends through the end of the line on which it appears.
	379	Comments starting in column 1,
	380	having the format
	381	.q "# $expression~~string$" ,
	382	are interpreted as an indication that the assembler is now assembling
	383	file
	384	.i string
	385	at line
	386	.i expression .
	387	Thus, one can use the \*(CL preprocessor on an assembly language source file,
	388	and use the
	389	.i #include
	390	and
	391	.i #define
	392	preprocessor directives.
	393	(Note that there may not be an assembler comment starting in column
	394	1 if the assembler source is given to the \*(CL preprocessor,
	395	as it will be interpreted by the preprocessor in a way not intended.)
	396	Comments are otherwise ignored by the assembler.
	397	.SH 2 "\*(CL Style Comments"
	398	.pp
	399	The assembler will recognize \*(CL style comments,
	400	introduced with the prologue
	401	.b "/*"
	402	and ending with the epilogue
	403	.b "*/" .
	404	\*(CL style comments may extend across multiple lines,
	405	and are the preferred comment style
	406	to use if one chooses to use the \*(CL preprocessor.
	407	.SH 1 "Segments and Location Counters"
	408	.pp
	409	Assembled code and data fall into three segments: the text segment,
	410	the data segment,
	411	and the bss segment.
	412	The \*(UX operating system makes
	413	some assumptions about the content of these segments;
	414	the assembler does not.
	415	Within the text and data segments there are a number of sub-segments,
	416	distinguished by number (\c
	417	.q "\fBtext\fP 0" ,
	418	.q "\fBtext\fP 1" ,
	419	$...$
	420	.q "\fBdata\fP 0" ,
	421	.q "\fBdata\fP 1" ,
	422	$...$).
	423	Currently there are four subsegments each in text and data.
	424	The subsegments are for programming convenience only.
	425	.pp
	426	Before writing the output file,
	427	the assembler zero-pads each text subsegment to a multiple of four
	428	bytes and then concatenates the subsegments in order to form the text segment;
	429	an analogous operation is done for the data segment.
	430	Requesting that the loader define symbols and storage regions is the only
	431	action allowed by the assembler with respect to the bss segment.
	432	Assembly begins in
	433	.q "\fBtext\fP 0" .
	434	.pp
	435	Associated with each (sub)segment is an implicit location counter which
	436	begins at zero and is incremented by 1 for each byte assembled into the
	437	(sub)segment.
	438	There is no way to explicitly reference a location counter.
	439	Note that the location counters of subsegments other than
	440	.q "\fBtext\fP 0"
	441	and
	442	.q "\fBdata\fP 0"
	443	behave peculiarly due to the concatenation used to form
	444	the text and data segments.