| 1 | .\" |
| 2 | .\" Copyright (c) 1982 Regents of the University of California |
| 3 | .\" @(#)asdocs1.me 1.7 %G% |
| 4 | .\" |
| 5 | .EQ |
| 6 | delim $$ |
| 7 | .EN |
| 8 | .(l C |
| 9 | .i "\*(VS \*(AM" |
| 10 | .sp 2.0v |
| 11 | John F. Reiser |
| 12 | Bell Laboratories, |
| 13 | Holmdel, NJ |
| 14 | .sp 1.0v |
| 15 | .i and |
| 16 | .sp 1.0v |
| 17 | Robert R. Henry\** |
| 18 | .(f |
| 19 | \**Preparation of this paper supported in part |
| 20 | by the National Science Foundation under grant MCS #78-07291. |
| 21 | .)f |
| 22 | Electronics Research Laboratory |
| 23 | University of California |
| 24 | Berkeley, CA 94720 |
| 25 | .sp 1.0v |
| 26 | November 5, 1979 |
| 27 | .sp 1.0v |
| 28 | .i Revised |
| 29 | \*(TD |
| 30 | .)l |
| 31 | .SH 1 Introduction |
| 32 | .pp |
| 33 | This document describes the usage and input syntax |
| 34 | of the \*(UX \*(VX-11 assembler |
| 35 | .i as . |
| 36 | .i As |
| 37 | is designed for assembling the code produced by the |
| 38 | \*(CL compiler; |
| 39 | certain concessions have been made to handle code written |
| 40 | directly by people, |
| 41 | but in general little sympathy has been extended. |
| 42 | This document is intended only for the writer of a compiler or a maintainer |
| 43 | of the assembler. |
| 44 | .SH 2 "Assembler Revisions since November 5, 1979" |
| 45 | .pp |
| 46 | There has been one major change to |
| 47 | .i as |
| 48 | since the last release. |
| 49 | .i As |
| 50 | has been updated to assemble the new instructions and |
| 51 | data formats for |
| 52 | .q G |
| 53 | and |
| 54 | .q H |
| 55 | floating point numbers, |
| 56 | as well as the new queue instructions. |
| 57 | .SH 2 "Features Supported, but No Longer Encouraged as of \*(TD" |
| 58 | .pp |
| 59 | These feature(s) in |
| 60 | .i as |
| 61 | are supported, but no longer encouraged. |
| 62 | .ip - |
| 63 | The colon operator for field initialization is likely to disappear. |
| 64 | .SH 1 "Usage" |
| 65 | .pp |
| 66 | .i As |
| 67 | is invoked with these command arguments: |
| 68 | .br |
| 69 | .sp 0.25v |
| 70 | as |
| 71 | [ |
| 72 | .b \-LVWJR |
| 73 | ] |
| 74 | [ |
| 75 | .b \-d $n$ |
| 76 | ] |
| 77 | [ |
| 78 | .b \-DTS |
| 79 | ] |
| 80 | [ |
| 81 | .b \-t |
| 82 | .i directory |
| 83 | ] |
| 84 | [ |
| 85 | .b \-o |
| 86 | .i output |
| 87 | ] |
| 88 | [ $name sub 1$ ] $...$ |
| 89 | [ $name sub n$ ] |
| 90 | .br |
| 91 | .sp 0.25v |
| 92 | .pp |
| 93 | The |
| 94 | .b \-L |
| 95 | flag instructs the assembler to save labels beginning with a |
| 96 | .q L |
| 97 | in the symbol table portion of the |
| 98 | .i output |
| 99 | file. |
| 100 | Labels are not saved by default, |
| 101 | as the default action of the link editor |
| 102 | .i ld |
| 103 | is to discard them anyway. |
| 104 | .pp |
| 105 | The |
| 106 | .b \-V |
| 107 | flag tells the assembler to place its interpass temporary |
| 108 | file into virtual memory. |
| 109 | In normal circumstances, |
| 110 | the system manager will decide where the temporary file should lie. |
| 111 | Our experiments |
| 112 | with very large temporary files show that placing the temporary |
| 113 | file into virtual memory will save about 13% of the assembly time, |
| 114 | where the size of the temporary file is about 350K bytes. |
| 115 | Most assembler sources will not be this long. |
| 116 | .pp |
| 117 | The |
| 118 | .b \-W |
| 119 | turns of all warning error reporting. |
| 120 | .pp |
| 121 | The |
| 122 | .b \-J |
| 123 | flag forces \*(UX style pseudo\-branch |
| 124 | instructions with destinations further away than a |
| 125 | byte displacement to be |
| 126 | turned into jump instructions with 4 byte offsets. |
| 127 | The |
| 128 | .b \-J |
| 129 | flag buys you nothing if |
| 130 | .b \-d2 |
| 131 | is set. |
| 132 | (See \(sc8.4, and future work described in \(sc11) |
| 133 | .pp |
| 134 | The |
| 135 | .b \-R |
| 136 | flag effectively turns |
| 137 | .q "\fB.data\fP $n$" |
| 138 | directives into |
| 139 | .q "\fB.text\fP $n$" |
| 140 | directives. |
| 141 | This obviates the need to run editor scripts on assembler source to |
| 142 | .q "read\-only" |
| 143 | fix initialized data segments. |
| 144 | Uninitialized data (via |
| 145 | .b .lcomm |
| 146 | and |
| 147 | .b .comm |
| 148 | directives) |
| 149 | is still assembled into the data or bss segments. |
| 150 | .pp |
| 151 | The |
| 152 | .b \-d |
| 153 | flag specifies the number of bytes |
| 154 | which the assembler should allow for a displacement when the value of the |
| 155 | displacement expression is undefined in the first pass. |
| 156 | The possible values of |
| 157 | .i n |
| 158 | are 1, 2, or 4; |
| 159 | the assembler uses 4 bytes |
| 160 | if |
| 161 | .b -d |
| 162 | is not specified. |
| 163 | See \(sc8.2. |
| 164 | .pp |
| 165 | Provided the |
| 166 | .b \-V |
| 167 | flag is not set, |
| 168 | the |
| 169 | .b \-t |
| 170 | flag causes the assembler to place its single temporary file |
| 171 | in the |
| 172 | .i directory |
| 173 | instead of in |
| 174 | .i /tmp . |
| 175 | .pp |
| 176 | The |
| 177 | .b \-o |
| 178 | flag causes the output to be placed on the file |
| 179 | .i output . |
| 180 | By default, |
| 181 | the output of the assembler is placed in the file |
| 182 | .i a.out |
| 183 | in the current directory. |
| 184 | .pp |
| 185 | The input to the assembler is normally taken from the standard input. |
| 186 | If file arguments occur, |
| 187 | then the input is taken sequentially from the files |
| 188 | $name sub 1$, |
| 189 | $name sub 2~...~name sub n$ |
| 190 | This is not to say that the files are assembled separately; |
| 191 | $name sub 1$ is effectively concatenated to $name sub 2$, |
| 192 | so multiple definitions cannot occur amongst the input sources. |
| 193 | .pp |
| 194 | .pp |
| 195 | The |
| 196 | .b \-D |
| 197 | (debug), |
| 198 | .b \-T |
| 199 | (token trace), |
| 200 | and the |
| 201 | .b \-S |
| 202 | (symbol table) |
| 203 | flags enable assembler trace information, |
| 204 | provided that the assembler has been compiled with |
| 205 | the debugging code enabled. |
| 206 | The information printed is long and boring, |
| 207 | but useful when debugging the assembler. |
| 208 | .SH 1 "Lexical conventions" |
| 209 | .pp |
| 210 | Assembler tokens include identifiers (alternatively, |
| 211 | .q symbols |
| 212 | or |
| 213 | .q names ), |
| 214 | constants, |
| 215 | and operators. |
| 216 | .SH 2 "Identifiers" |
| 217 | .pp |
| 218 | An identifier consists of a sequence of alphanumeric characters |
| 219 | (including |
| 220 | period |
| 221 | .q "\fB\|.\|\fP" , |
| 222 | underscore |
| 223 | .q "\*(US" , |
| 224 | and |
| 225 | dollar |
| 226 | .q "\*(DL" ). |
| 227 | The first character may not be numeric. |
| 228 | Identifiers may be (practically) arbitrary long; |
| 229 | all characters are significant. |
| 230 | .SH 2 "Constants" |
| 231 | .SH 3 "Scalar constants" |
| 232 | .pp |
| 233 | All scalar (non floating point) |
| 234 | constants are (potentially) 128 bits wide. |
| 235 | Such constants are interpreted as two's complement numbers. |
| 236 | Note that 64 bit (quad words) and 128 bit (octal word) integers |
| 237 | are only partially supported by the \*(VX hardware. |
| 238 | In addition, |
| 239 | 128 bit integers are only supported by the extended \*(VX architecture. |
| 240 | .i As |
| 241 | supports 64 and 128 bit integers |
| 242 | only so they can be used as immediate constants |
| 243 | or to fill initialized data space. |
| 244 | .i As |
| 245 | can not perform arithmetic on constants larger than 32 bits. |
| 246 | .pp |
| 247 | Scalar constants are initially evaluated to a full 128 bits, |
| 248 | but are pared down by discarding high order copies of the sign bit |
| 249 | and categorizing the number as a long, quad or octal integer. |
| 250 | Numbers with less precision than 32 bits are treated as 32 bit quantities. |
| 251 | .pp |
| 252 | The digits are |
| 253 | .q 0123456789abcdefABCDEF |
| 254 | with the obvious values. |
| 255 | .pp |
| 256 | An octal constant consists of a sequence of digits with a leading zero. |
| 257 | .pp |
| 258 | A decimal constant consists of a sequence of digits without a leading zero. |
| 259 | .pp |
| 260 | A hexadecimal constant consists of the characters |
| 261 | .q 0x |
| 262 | (or |
| 263 | .q 0X ) |
| 264 | followed by a sequence of digits. |
| 265 | .pp |
| 266 | A single-character constant consists of a single quote |
| 267 | .q "\|\(fm\|" |
| 268 | followed by an \*(AC character, |
| 269 | including \*(AC newline. |
| 270 | The constant's value is the code for the |
| 271 | given character. |
| 272 | .SH 3 "Floating Point Constants" |
| 273 | .pp |
| 274 | Floating point constants are internally represented |
| 275 | in the \*(VX floating point format |
| 276 | that is specified by the lexical form of the constant. |
| 277 | Using the meta notation that |
| 278 | [dec] is a decimal digit (\c |
| 279 | .q "0123456789" ), |
| 280 | [expt] is a type specification character (\c, |
| 281 | .q "fFdDhHgG" ), |
| 282 | [expe] is a exponent delimiter and type specification character (\c, |
| 283 | .q "eEfFdDhHgG" ), |
| 284 | $x sup roman "*"$ means 0 or more occurences of $x$, |
| 285 | $x sup +$ means 1 or more occurences of $x$, |
| 286 | then the general lexical form of a floating point number is: |
| 287 | .ce 1 |
| 288 | 0[expe]([+-])$roman "[dec]" sup +$(.)($roman "[dec]" sup roman "*"$)([expt]([+-])($roman "dec]" sup +$)) |
| 289 | .ce 0 |
| 290 | The standard semantic interpretation is used for the |
| 291 | signed integer, fraction and signed power of 10 exponent. |
| 292 | If the exponent delimiter is specified, |
| 293 | it must be either an |
| 294 | .q e |
| 295 | or |
| 296 | .q E , |
| 297 | or must agree with the initial type specification character that is used. |
| 298 | The type specification character specifies |
| 299 | the type and representation of the constructed number, as follows: |
| 300 | .(b |
| 301 | .TS |
| 302 | center; |
| 303 | c l c |
| 304 | c l n. |
| 305 | type character floating representation size (bits) |
| 306 | _ |
| 307 | f, F F format floating 32 |
| 308 | d, D D format floating 64 |
| 309 | g, G G format floating 64 |
| 310 | h, H H format floating 128 |
| 311 | .TE |
| 312 | .)b |
| 313 | Note that |
| 314 | .q G |
| 315 | and |
| 316 | .q H |
| 317 | format floating point numbers are not supported |
| 318 | by all implementations of the \*(VX architecture. |
| 319 | .i As |
| 320 | does not require the augmented architecture in order to run. |
| 321 | .pp |
| 322 | The assembler uses the library routine |
| 323 | .i atof() |
| 324 | to convert |
| 325 | .q F |
| 326 | and |
| 327 | .q D |
| 328 | numbers, |
| 329 | and uses its own conversion routine |
| 330 | (derived from |
| 331 | .i atof , |
| 332 | and believed to be numerically accurate) |
| 333 | to convert |
| 334 | .q G |
| 335 | and |
| 336 | .q H |
| 337 | floating point numbers. |
| 338 | .pp |
| 339 | Collectively, |
| 340 | all floating point numbers, |
| 341 | together with quad and octal scalars are called |
| 342 | .i Bignums . |
| 343 | When |
| 344 | .i as |
| 345 | requires a Bignum, |
| 346 | a 32 bit scalar quantity may also be used. |
| 347 | .SH 3 "String Constants" |
| 348 | .pp |
| 349 | A string constant is defined using |
| 350 | the same syntax and semantics as the \*(CL language uses. |
| 351 | Strings begin and end with a |
| 352 | .q "''" |
| 353 | (double quote). |
| 354 | The \*(DM assembler conventions for flexible string quoting is |
| 355 | not implemented. |
| 356 | All \*(CL backslash conventions are observed; |
| 357 | the backslash conventions |
| 358 | peculiar to the \*(PD assembler are not observed. |
| 359 | Strings are known by their value and their length; |
| 360 | the assembler does not implicitly end strings with a null byte. |
| 361 | .SH 2 "Operators" |
| 362 | .pp |
| 363 | There are several single-character |
| 364 | operators; |
| 365 | see \(sc6.1. |
| 366 | .SH 2 "Blanks" |
| 367 | .pp |
| 368 | Blank and tab characters |
| 369 | may be interspersed freely between tokens, |
| 370 | but may not be used within tokens (except character constants). |
| 371 | A blank or tab is required to separate adjacent |
| 372 | identifiers or constants not otherwise separated. |
| 373 | .SH 2 "Scratch Mark Comments" |
| 374 | .pp |
| 375 | The character |
| 376 | .q "#" |
| 377 | introduces a comment, |
| 378 | which extends through the end of the line on which it appears. |
| 379 | Comments starting in column 1, |
| 380 | having the format |
| 381 | .q "# $expression~~string$" , |
| 382 | are interpreted as an indication that the assembler is now assembling |
| 383 | file |
| 384 | .i string |
| 385 | at line |
| 386 | .i expression . |
| 387 | Thus, one can use the \*(CL preprocessor on an assembly language source file, |
| 388 | and use the |
| 389 | .i #include |
| 390 | and |
| 391 | .i #define |
| 392 | preprocessor directives. |
| 393 | (Note that there may not be an assembler comment starting in column |
| 394 | 1 if the assembler source is given to the \*(CL preprocessor, |
| 395 | as it will be interpreted by the preprocessor in a way not intended.) |
| 396 | Comments are otherwise ignored by the assembler. |
| 397 | .SH 2 "\*(CL Style Comments" |
| 398 | .pp |
| 399 | The assembler will recognize \*(CL style comments, |
| 400 | introduced with the prologue |
| 401 | .b "/*" |
| 402 | and ending with the epilogue |
| 403 | .b "*/" . |
| 404 | \*(CL style comments may extend across multiple lines, |
| 405 | and are the preferred comment style |
| 406 | to use if one chooses to use the \*(CL preprocessor. |
| 407 | .SH 1 "Segments and Location Counters" |
| 408 | .pp |
| 409 | Assembled code and data fall into three segments: the text segment, |
| 410 | the data segment, |
| 411 | and the bss segment. |
| 412 | The \*(UX operating system makes |
| 413 | some assumptions about the content of these segments; |
| 414 | the assembler does not. |
| 415 | Within the text and data segments there are a number of sub-segments, |
| 416 | distinguished by number (\c |
| 417 | .q "\fBtext\fP 0" , |
| 418 | .q "\fBtext\fP 1" , |
| 419 | $...$ |
| 420 | .q "\fBdata\fP 0" , |
| 421 | .q "\fBdata\fP 1" , |
| 422 | $...$). |
| 423 | Currently there are four subsegments each in text and data. |
| 424 | The subsegments are for programming convenience only. |
| 425 | .pp |
| 426 | Before writing the output file, |
| 427 | the assembler zero-pads each text subsegment to a multiple of four |
| 428 | bytes and then concatenates the subsegments in order to form the text segment; |
| 429 | an analogous operation is done for the data segment. |
| 430 | Requesting that the loader define symbols and storage regions is the only |
| 431 | action allowed by the assembler with respect to the bss segment. |
| 432 | Assembly begins in |
| 433 | .q "\fBtext\fP 0" . |
| 434 | .pp |
| 435 | Associated with each (sub)segment is an implicit location counter which |
| 436 | begins at zero and is incremented by 1 for each byte assembled into the |
| 437 | (sub)segment. |
| 438 | There is no way to explicitly reference a location counter. |
| 439 | Note that the location counters of subsegments other than |
| 440 | .q "\fBtext\fP 0" |
| 441 | and |
| 442 | .q "\fBdata\fP 0" |
| 443 | behave peculiarly due to the concatenation used to form |
| 444 | the text and data segments. |