| 1 | .SH |
| 2 | Pass Two |
| 3 | .PP |
| 4 | Code generation is far less well understood than |
| 5 | parsing or lexical analysis, and for this reason |
| 6 | the second pass is far harder to discuss in a file by file manner. |
| 7 | A great deal of the difficulty is in understanding the issues |
| 8 | and the strategies employed to meet them. |
| 9 | Any particular function is likely to be reasonably straightforward. |
| 10 | .PP |
| 11 | Thus, this part of the paper will concentrate a good deal on the broader |
| 12 | aspects of strategy in the code generator, |
| 13 | and will not get too intimate with the details. |
| 14 | .SH |
| 15 | Overview. |
| 16 | .PP |
| 17 | It is difficult to organize a code generator to be flexible enough to |
| 18 | generate code for a large number of machines, |
| 19 | and still be efficient for any one of them. |
| 20 | Flexibility is also important when it comes time to tune |
| 21 | the code generator to improve the output code quality. |
| 22 | On the other hand, too much flexibility can lead to semantically |
| 23 | incorrect code, and potentially a combinatorial explosion in the |
| 24 | number of cases to be considered in the compiler. |
| 25 | .PP |
| 26 | One goal of the code generator is to have a high degree of correctness. |
| 27 | It is very desirable to have the compiler detect its own inability to |
| 28 | generate correct code, rather than to produce incorrect code. |
| 29 | This goal is achieved by having a simple model of the job |
| 30 | to be done (e.g., an expression tree) |
| 31 | and a simple model of the machine state |
| 32 | (e.g., which registers are free). |
| 33 | The act of generating an instruction performs a transformation |
| 34 | on the tree and the machine state; |
| 35 | hopefully, the tree eventually gets |
| 36 | reduced to a single node. |
| 37 | If each of these instruction/transformation pairs is correct, |
| 38 | and if the machine state model really represents the actual machine, |
| 39 | and if the transformations reduce the input tree to the desired single node, then the |
| 40 | output code will be correct. |
| 41 | .PP |
| 42 | For most real machines, there is no definitive theory of code generation that |
| 43 | encompasses all the C operators. |
| 44 | Thus the selection of which instruction/transformations to generate, and in what |
| 45 | order, will have a heuristic flavor. |
| 46 | If, for some expression tree, no transformation applies, or, more |
| 47 | seriously, if the heuristics select a sequence of instruction/transformations |
| 48 | that do not in fact reduce the tree, the compiler |
| 49 | will report its inability to generate code, and abort. |
| 50 | .PP |
| 51 | A major part of the code generator is concerned with the model and the transformations, |
| 52 | \(em most of this is machine independent, or depends only on simple tables. |
| 53 | The flexibility comes from the heuristics that guide the transformations |
| 54 | of the trees, the selection of subgoals, and the ordering of the computation. |
| 55 | .SH |
| 56 | The Machine Model |
| 57 | .PP |
| 58 | The machine is assumed to have a number of registers, of at most two different |
| 59 | types: |
| 60 | .I A |
| 61 | and |
| 62 | .I B . |
| 63 | Within each register class, there may be scratch (temporary) registers and dedicated registers |
| 64 | (e.g., register variables, the stack pointer, etc.). |
| 65 | Requests to allocate and free registers involve only the temporary registers. |
| 66 | .PP |
| 67 | Each of the registers in the machine is given a name and a number |
| 68 | in the |
| 69 | .I mac2defs |
| 70 | file; the numbers are used as indices into various tables |
| 71 | that describe the registers, so they should be kept small. |
| 72 | One such table is the |
| 73 | .I rstatus |
| 74 | table on file |
| 75 | .I local2.c . |
| 76 | This table is indexed by register number, and |
| 77 | contains expressions made up from manifest constants describing the register types: |
| 78 | SAREG for dedicated AREG's, SAREG\(orSTAREG for scratch AREGS's, |
| 79 | and SBREG and SBREG\(orSTBREG similarly for BREG's. |
| 80 | There are macros that access this information: |
| 81 | .I isbreg(r) |
| 82 | returns true if register number |
| 83 | .I r |
| 84 | is a BREG, and |
| 85 | .I istreg(r) |
| 86 | returns true if register number |
| 87 | .I r |
| 88 | is a temporary AREG or BREG. |
| 89 | Another table, |
| 90 | .I rnames , |
| 91 | contains the register names; this is used when |
| 92 | putting out assembler code and diagnostics. |
| 93 | .PP |
| 94 | The usage of registers is kept track of by an array called |
| 95 | .I busy . |
| 96 | .I Busy[r] |
| 97 | is the number of uses of register |
| 98 | .I r |
| 99 | in the current tree being processed. |
| 100 | The allocation and freeing of registers will be discussed later as part of the code generation |
| 101 | algorithm. |
| 102 | .SH |
| 103 | General Organization |
| 104 | .PP |
| 105 | As mentioned above, the second pass reads lines from |
| 106 | the intermediate file, copying through to the output unchanged |
| 107 | any lines that begin with a `)', and making note of the |
| 108 | information about stack usage and register allocation contained on |
| 109 | lines beginning with `]' and `['. |
| 110 | The expression trees, whose beginning is indicated by a line beginning with `.', |
| 111 | are read and rebuilt into trees. |
| 112 | If the compiler is loaded as one pass, the expression trees are |
| 113 | immediately available to the code generator. |
| 114 | .PP |
| 115 | The actual code generation is done by a hierarchy of routines. |
| 116 | The routine |
| 117 | .I delay |
| 118 | is first given the tree; it attempts to delay some postfix |
| 119 | ++ and \-\- computations that might reasonably be done after the |
| 120 | smoke clears. |
| 121 | It also attempts to handle comma (,) operators by computing the |
| 122 | left side expression first, and then rewriting the tree |
| 123 | to eliminate the operator. |
| 124 | .I Delay |
| 125 | calls |
| 126 | .I codgen |
| 127 | to control the actual code generation process. |
| 128 | .I Codgen |
| 129 | takes as arguments a pointer to the expression tree, |
| 130 | and a second argument that, for socio-historical reasons, is called a |
| 131 | .I cookie . |
| 132 | The cookie describes a set of goals that would be acceptable |
| 133 | for the code generation: these are assigned to individual bits, |
| 134 | so they may be logically or'ed together to form a large number of possible goals. |
| 135 | Among the possible goals are FOREFF (compute for side effects only; |
| 136 | don't worry about the value), INTEMP (compute and store value into a temporary location |
| 137 | in memory), INAREG (compute into an A register), INTAREG (compute into a scratch |
| 138 | A register), INBREG and INTBREG similarly, FORCC (compute for condition codes), |
| 139 | and FORARG (compute it as a function argument; e.g., stack it if appropriate). |
| 140 | .PP |
| 141 | .I Codgen |
| 142 | first canonicalizes the tree by calling |
| 143 | .I canon . |
| 144 | This routine looks for certain transformations that might now be |
| 145 | applicable to the tree. |
| 146 | One, which is very common and very powerful, is to |
| 147 | fold together an indirection operator (UNARY MUL) |
| 148 | and a register (REG); in most machines, this combination is |
| 149 | addressable directly, and so is similar to a NAME in its |
| 150 | behavior. |
| 151 | The UNARY MUL and REG are folded together to make |
| 152 | another node type called OREG. |
| 153 | In fact, in many machines it is possible to directly address not just the |
| 154 | cell pointed to by a register, but also cells differing by a constant |
| 155 | offset from the cell pointed to by the register. |
| 156 | .I Canon |
| 157 | also looks for such cases, calling the |
| 158 | machine dependent routine |
| 159 | .I notoff |
| 160 | to decide if the offset is acceptable (for example, in the IBM 370 the offset |
| 161 | must be between 0 and 4095 bytes). |
| 162 | Another optimization is to replace bit field operations |
| 163 | by shifts and masks if the operation involves extracting the field. |
| 164 | Finally, a machine dependent routine, |
| 165 | .I sucomp , |
| 166 | is called that computes the Sethi-Ullman numbers |
| 167 | for the tree (see below). |
| 168 | .PP |
| 169 | After the tree is canonicalized, |
| 170 | .I codgen |
| 171 | calls the routine |
| 172 | .I store |
| 173 | whose job is to select a subtree of the tree to be computed |
| 174 | and (usually) stored before beginning the |
| 175 | computation of the full tree. |
| 176 | .I Store |
| 177 | must return a tree that can be computed without need |
| 178 | for any temporary storage locations. |
| 179 | In effect, the only store operations generated while processing the subtree must be as a response |
| 180 | to explicit assignment operators in the tree. |
| 181 | This division of the job marks one of the more significant, and successful, |
| 182 | departures from most other compilers. |
| 183 | It means that the code generator can operate |
| 184 | under the assumption that there are enough |
| 185 | registers to do its job, without worrying about |
| 186 | temporary storage. |
| 187 | If a store into a temporary appears in the output, it is always |
| 188 | as a direct result of logic in the |
| 189 | .I store |
| 190 | routine; this makes debugging easier. |
| 191 | .PP |
| 192 | One consequence of this organization is that |
| 193 | code is not generated by a treewalk. |
| 194 | There are theoretical results that support this decision. |
| 195 | .[ |
| 196 | Aho Johnson Optimal Code Trees jacm |
| 197 | .] |
| 198 | It may be desirable to compute several subtrees and store |
| 199 | them before tackling the whole tree; |
| 200 | if a subtree is to be stored, this is known before the code |
| 201 | generation for the subtree is begun, and the subtree is computed |
| 202 | when all scratch registers are available. |
| 203 | .PP |
| 204 | The |
| 205 | .I store |
| 206 | routine decides what subtrees, if any, should be |
| 207 | stored by making use of numbers, |
| 208 | called |
| 209 | .I "Sethi-Ullman numbers" , |
| 210 | that give, for each subtree of an expression tree, |
| 211 | the minimum number of scratch registers required to |
| 212 | compile the subtree, without any stores into temporaries. |
| 213 | .[ |
| 214 | Sethi Ullman jacm 1970 |
| 215 | .] |
| 216 | These numbers are computed by the machine-dependent |
| 217 | routine |
| 218 | .I sucomp , |
| 219 | called by |
| 220 | .I canon . |
| 221 | The basic notion is that, knowing the Sethi-Ullman numbers for |
| 222 | the descendants of a node, and knowing the operator of the |
| 223 | node and some information about the machine, the |
| 224 | Sethi-Ullman number of the node itself can be computed. |
| 225 | If the Sethi-Ullman number for a tree exceeds the number of scratch |
| 226 | registers available, some subtree must be stored. |
| 227 | Unfortunately, the theory behind the Sethi-Ullman numbers |
| 228 | applies only to uselessly simple machines and operators. |
| 229 | For the rich set of C operators, and for machines with |
| 230 | asymmetric registers, register pairs, different kinds of registers, |
| 231 | and exceptional forms of addressing, |
| 232 | the theory cannot be applied directly. |
| 233 | The basic idea of estimation is a good one, however, |
| 234 | and well worth applying; the application, especially |
| 235 | when the compiler comes to be tuned for high code |
| 236 | quality, goes beyond the park of theory into the |
| 237 | swamp of heuristics. |
| 238 | This topic will be taken up again later, when more of the compiler |
| 239 | structure has been described. |
| 240 | .PP |
| 241 | After examining the Sethi-Ullman numbers, |
| 242 | .I store |
| 243 | selects a subtree, if any, to be stored, and returns the subtree and the associated cookie in |
| 244 | the external variables |
| 245 | .I stotree |
| 246 | and |
| 247 | .I stocook . |
| 248 | If a subtree has been selected, or if |
| 249 | the whole tree is ready to be processed, the |
| 250 | routine |
| 251 | .I order |
| 252 | is called, with a tree and cookie. |
| 253 | .I Order |
| 254 | generates code for trees that |
| 255 | do not require temporary locations. |
| 256 | .I Order |
| 257 | may make recursive calls on itself, and, |
| 258 | in some cases, on |
| 259 | .I codgen ; |
| 260 | for example, when processing the operators &&, \(or\(or, and comma (`,'), that have |
| 261 | a left to right evaluation, it is |
| 262 | incorrect for |
| 263 | .I store |
| 264 | examine the right operand for subtrees to be stored. |
| 265 | In these cases, |
| 266 | .I order |
| 267 | will call |
| 268 | .I codgen |
| 269 | recursively when it is permissible to work on the right operand. |
| 270 | A similar issue arises with the ? : operator. |
| 271 | .PP |
| 272 | The |
| 273 | .I order |
| 274 | routine works by matching the current tree with |
| 275 | a set of code templates. |
| 276 | If a template is discovered that will |
| 277 | match the current tree and cookie, the associated assembly language |
| 278 | statement or statements are generated. |
| 279 | The tree is then rewritten, |
| 280 | as specified by the template, to represent the effect of the output instruction(s). |
| 281 | If no template match is found, first an attempt is made to find a match with a |
| 282 | different cookie; for example, in order to compute |
| 283 | an expression with cookie INTEMP (store into a temporary storage location), |
| 284 | it is usually necessary to compute the expression into a scratch register |
| 285 | first. |
| 286 | If all attempts to match the tree fail, the heuristic part of |
| 287 | the algorithm becomes dominant. |
| 288 | Control is typically given to one of a number of machine-dependent routines |
| 289 | that may in turn recursively call |
| 290 | .I order |
| 291 | to achieve a subgoal of the computation (for example, one of the |
| 292 | arguments may be computed into a temporary register). |
| 293 | After this subgoal has been achieved, the process begins again with the |
| 294 | modified tree. |
| 295 | If the machine-dependent heuristics are unable to reduce the tree further, |
| 296 | a number of default rewriting rules may be considered appropriate. |
| 297 | For example, if the left operand of a + is a scratch |
| 298 | register, the + can be replaced by a += operator; |
| 299 | the tree may then match a template. |
| 300 | .PP |
| 301 | To close this introduction, we will discuss the steps in compiling |
| 302 | code for the expression |
| 303 | .DS |
| 304 | \fIa\fR += \fIb\fR |
| 305 | .DE |
| 306 | where |
| 307 | .I a |
| 308 | and |
| 309 | .I b |
| 310 | are static variables. |
| 311 | .PP |
| 312 | To begin with, the whole expression tree is examined with cookie FOREFF, and |
| 313 | no match is found. Search with other cookies is equally fruitless, so an |
| 314 | attempt at rewriting is made. |
| 315 | Suppose we are dealing with the Interdata 8/32 for the moment. |
| 316 | It is recognized that the left hand and right hand sides of the += operator |
| 317 | are addressable, and in particular the left hand side has no |
| 318 | side effects, so it is permissible to rewrite this as |
| 319 | .DS |
| 320 | \fIa\fR = \fIa\fR + \fIb\fR |
| 321 | .DE |
| 322 | and this is done. |
| 323 | No match is found on this tree either, so a machine dependent rewrite is done; it is recognized |
| 324 | that the left hand side of the assignment is addressable, but the right hand side is not |
| 325 | in a register, so |
| 326 | .I order |
| 327 | is called recursively, being asked to put the right |
| 328 | hand side of the assignment into a register. |
| 329 | This invocation of |
| 330 | .I order |
| 331 | searches the tree for a match, and fails. |
| 332 | The machine dependent rule for + |
| 333 | notices that the right hand operand is addressable; |
| 334 | it decides to put the left operand into a scratch register. |
| 335 | Another recursive call to |
| 336 | .I order |
| 337 | is made, with the tree |
| 338 | consisting solely of the leaf |
| 339 | .I a , |
| 340 | and the cookie asking that the value be placed into a scratch register. |
| 341 | This now matches a template, and a load instruction is emitted. |
| 342 | The node consisting of |
| 343 | .I a |
| 344 | is rewritten in place to represent the register into which |
| 345 | .I a |
| 346 | is loaded, |
| 347 | and this third call to |
| 348 | .I order |
| 349 | returns. |
| 350 | The second call to |
| 351 | .I order |
| 352 | now finds that it has the tree |
| 353 | .DS |
| 354 | \fBreg\fR + \fIb\fR |
| 355 | .DE |
| 356 | to consider. |
| 357 | Once again, there is no match, but the default rewriting rule rewrites |
| 358 | the + as a += operator, since the left operand is a scratch register. |
| 359 | When this is done, there is a match: in fact, |
| 360 | .DS |
| 361 | \fBreg\fR += \fIb\fR |
| 362 | .DE |
| 363 | simply describes the effect of the add instruction |
| 364 | on a typical machine. |
| 365 | After the add is emitted, the tree is rewritten |
| 366 | to consist merely of the register node, since the result of the add |
| 367 | is now in the register. |
| 368 | This agrees with the cookie passed to the second invocation of |
| 369 | .I order , |
| 370 | so this invocation |
| 371 | terminates, returning to the first level. |
| 372 | The original tree has now |
| 373 | become |
| 374 | .DS |
| 375 | \fIa\fR = \fBreg\fR |
| 376 | .DE |
| 377 | which matches a template for the store instruction. |
| 378 | The store is output, and the tree rewritten to become |
| 379 | just a single register node. |
| 380 | At this point, since the top level call to |
| 381 | .I order |
| 382 | was |
| 383 | interested only in side effects, the call to |
| 384 | .I order |
| 385 | returns, and the code generation is completed; |
| 386 | we have generated a load, add, and store, as might have been expected. |
| 387 | .PP |
| 388 | The effect of machine architecture on this is considerable. |
| 389 | For example, on the Honeywell 6000, the machine dependent heuristics recognize that there is an ``add to storage'' |
| 390 | instruction, so the strategy is quite different; |
| 391 | .I b |
| 392 | is loaded in to |
| 393 | a register, and then an add to storage instruction generated |
| 394 | to add this register in to |
| 395 | .I a . |
| 396 | The transformations, involving as they do the semantics of C, |
| 397 | are largely machine independent. |
| 398 | The decisions as to when to use them, however, are |
| 399 | almost totally machine dependent. |
| 400 | .PP |
| 401 | Having given a broad outline of |
| 402 | the code generation process, we shall next consider the |
| 403 | heart of it: the templates. |
| 404 | This leads naturally into discussions of template matching and register allocation, |
| 405 | and finally a discussion of the machine dependent interfaces and strategies. |
| 406 | .SH |
| 407 | The Templates |
| 408 | .PP |
| 409 | The templates describe the effect of the target machine instructions |
| 410 | on the model of computation around which the compiler is organized. |
| 411 | In effect, each template has five logical sections, and represents an assertion |
| 412 | of the form: |
| 413 | .IP |
| 414 | .B If |
| 415 | we have a subtree of a given shape (1), and we have a goal (cookie) or goals to |
| 416 | achieve (2), and we have sufficient free resources (3), |
| 417 | .B then |
| 418 | we may emit an instruction or instructions (4), and |
| 419 | rewrite the subtree in a particular manner (5), |
| 420 | and the rewritten tree will achieve the desired goals. |
| 421 | .PP |
| 422 | These five sections will be discussed in more |
| 423 | detail later. First, we give an example of a |
| 424 | template: |
| 425 | .DS |
| 426 | .ta 1i 2i 3i 4i 5i |
| 427 | ASG PLUS, INAREG, |
| 428 | SAREG, TINT, |
| 429 | SNAME, TINT, |
| 430 | 0, RLEFT, |
| 431 | " add AL,AR\en", |
| 432 | .DE |
| 433 | The top line specifies the operator (+=) and the cookie (compute the |
| 434 | value of the subtree into an AREG). |
| 435 | The second and third lines specify the left and right descendants, |
| 436 | respectively, |
| 437 | of the += operator. |
| 438 | The left descendant must be a REG node, representing an |
| 439 | A register, and have integer type, while the right side must be a NAME node, |
| 440 | and also have integer type. |
| 441 | The fourth line contains the resource requirements (no scratch registers |
| 442 | or temporaries needed), and the rewriting rule (replace the subtree by the left descendant). |
| 443 | Finally, the quoted string on the last line represents the output to the assembler: |
| 444 | lower case letters, tabs, spaces, etc. are copied |
| 445 | .I verbatim . |
| 446 | to the output; upper case letters trigger various macro-like expansions. |
| 447 | Thus, |
| 448 | .B AL |
| 449 | would expand into the \fBA\fRddress form of the \fBL\fReft operand \(em |
| 450 | presumably the register number. |
| 451 | Similarly, |
| 452 | .B AR |
| 453 | would expand into the name of the right operand. |
| 454 | The |
| 455 | .I add |
| 456 | instruction of the last section might well be |
| 457 | emitted by this template. |
| 458 | .PP |
| 459 | In principle, it would be possible to make separate templates |
| 460 | for all legal combinations of operators, cookies, types, and shapes. |
| 461 | In practice, the number of combinations is very large. |
| 462 | Thus, a considerable amount of mechanism is present to |
| 463 | permit a large number of subtrees to be matched |
| 464 | by a single template. |
| 465 | Most of the shape and type specifiers are individual bits, and can |
| 466 | be logically |
| 467 | or'ed |
| 468 | together. |
| 469 | There are a number of special descriptors for matching classes of |
| 470 | operators. |
| 471 | The cookies can also be combined. |
| 472 | As an example of the kind of template |
| 473 | that really arises in practice, the |
| 474 | actual template for the Interdata 8/32 |
| 475 | that subsumes the above example is: |
| 476 | .DS |
| 477 | .ta 1i 2i 3i 4i 5i |
| 478 | ASG OPSIMP, INAREG\(orFORCC, |
| 479 | SAREG, TINT\(orTUNSIGNED\(orTPOINT, |
| 480 | SAREG\(orSNAME\(orSOREG\(orSCON, TINT\(orTUNSIGNED\(orTPOINT, |
| 481 | 0, RLEFT\(orRESCC, |
| 482 | " OI AL,AR\en", |
| 483 | .DE |
| 484 | Here, OPSIMP represents the operators |
| 485 | +, \-, \(or, &, and ^. |
| 486 | The |
| 487 | .B OI |
| 488 | macro in the output string expands into the |
| 489 | appropriate \fBI\fRnteger \fBO\fRpcode for the operator. |
| 490 | The left and right sides can be integers, unsigned, or pointer types. |
| 491 | The right side can be, in addition to a name, a register, |
| 492 | a memory location whose address is given by a register and displacement (OREG), |
| 493 | or a constant. |
| 494 | Finally, these instructions set the condition codes, |
| 495 | and so can be used in condition contexts: |
| 496 | the cookie and rewriting rules reflect this. |
| 497 | .SH |
| 498 | The Template Matching Algorithm. |
| 499 | .PP |
| 500 | The heart of the second pass is the template matching |
| 501 | algorithm, in the routine |
| 502 | .I match . |
| 503 | .I Match |
| 504 | is called with a tree and a cookie; it attempts to match |
| 505 | the given tree against some template that will transform it |
| 506 | according to one of the goals given in the cookie. |
| 507 | If a match is successful, the transformation is |
| 508 | applied; |
| 509 | .I expand |
| 510 | is called to generate the assembly code, and then |
| 511 | .I reclaim |
| 512 | rewrites the tree, and reclaims the resources, such |
| 513 | as registers, that might have become free as a result |
| 514 | of the generated code. |
| 515 | .PP |
| 516 | This part of the compiler is among the most time critical. |
| 517 | There is a spectrum of implementation techniques available |
| 518 | for doing this matching. |
| 519 | The most naive algorithm simply looks at the templates one by one. |
| 520 | This can be considerably improved upon by restricting the search |
| 521 | for an acceptable template. |
| 522 | It would be possible to do better than this if the templates were given |
| 523 | to a separate program that ate them and generated a template |
| 524 | matching subroutine. |
| 525 | This would make maintenance of the compiler much more |
| 526 | complicated, however, so this has not been done. |
| 527 | .PP |
| 528 | The matching algorithm is actually carried out by restricting |
| 529 | the range in the table that must be searched for each opcode. |
| 530 | This introduces a number of complications, however, and needs a |
| 531 | bit of sympathetic help by the person constructing the |
| 532 | compiler in order to obtain best results. |
| 533 | The exact tuning of this algorithm continues; it |
| 534 | is best to consult the code and comments in |
| 535 | .I match |
| 536 | for the latest version. |
| 537 | .PP |
| 538 | In order to match a template to a tree, |
| 539 | it is necessary to match not only the cookie and the |
| 540 | op of the root, but also the types and shapes of the |
| 541 | left and right descendants (if any) of the tree. |
| 542 | A convention is established here that is carried out throughout |
| 543 | the second pass of the compiler. |
| 544 | If a node represents a unary operator, the single descendant |
| 545 | is always the ``left'' descendant. |
| 546 | If a node represents a unary operator or a leaf node (no descendants) |
| 547 | the ``right'' descendant is taken by convention to be the node itself. |
| 548 | This enables templates to easily match leaves and conversion operators, for example, |
| 549 | without any additional mechanism in the matching program. |
| 550 | .PP |
| 551 | The type matching is straightforward; it is possible to specify any combination |
| 552 | of basic types, general pointers, and pointers to one or more of |
| 553 | the basic types. |
| 554 | The shape matching is somewhat more complicated, but still pretty simple. |
| 555 | Templates have a collection of possible operand shapes |
| 556 | on which the opcode might match. |
| 557 | In the simplest case, an |
| 558 | .I add |
| 559 | operation might be able to add to either a register variable |
| 560 | or a scratch register, and might be able (with appropriate |
| 561 | help from the assembler) to add an integer constant (ICON), a static |
| 562 | memory cell (NAME), or a stack location (OREG). |
| 563 | .PP |
| 564 | It is usually attractive to specify a number of such shapes, |
| 565 | and distinguish between them when the assembler output is produced. |
| 566 | It is possible to describe the union of many elementary |
| 567 | shapes such as ICON, NAME, OREG, |
| 568 | AREG or BREG |
| 569 | (both scratch and register forms), etc. |
| 570 | To handle at least the simple forms of indirection, one can also |
| 571 | match some more complicated forms of trees; STARNM and STARREG |
| 572 | can match more complicated trees headed by an indirection operator, |
| 573 | and SFLD can match certain trees headed by a FLD operator: these |
| 574 | patterns call machine dependent routines that match the |
| 575 | patterns of interest on a given machine. |
| 576 | The shape SWADD may be used to recognize NAME or OREG |
| 577 | nodes that lie on word boundaries: this may be of some importance |
| 578 | on word\-addressed machines. |
| 579 | Finally, there are some special shapes: these may not |
| 580 | be used in conjunction with the other shapes, but may be |
| 581 | defined and extended in machine dependent ways. |
| 582 | The special shapes SZERO, SONE, and SMONE are predefined and match |
| 583 | constants 0, 1, and \-1, respectively; others are easy to add |
| 584 | and match by using the machine dependent routine |
| 585 | .I special . |
| 586 | .PP |
| 587 | When a template has been found that matches the root of the tree, |
| 588 | the cookie, and the shapes and types of the descendants, |
| 589 | there is still one bar to a total match: the template may |
| 590 | call for some resources (for example, a scratch register). |
| 591 | The routine |
| 592 | .I allo |
| 593 | is called, and it attempts to allocate the resources. |
| 594 | If it cannot, the match fails; no resources are |
| 595 | allocated. |
| 596 | If successful, the allocated resources are given numbers |
| 597 | 1, 2, etc. for later reference when the assembly code is |
| 598 | generated. |
| 599 | The routines |
| 600 | .I expand |
| 601 | and |
| 602 | .I reclaim |
| 603 | are then called. |
| 604 | The |
| 605 | .I match |
| 606 | routine then returns a special value, MDONE. |
| 607 | If no match was found, the value MNOPE is returned; |
| 608 | this is a signal to the caller to try more cookie |
| 609 | values, or attempt a rewriting rule. |
| 610 | .I Match |
| 611 | is also used to select rewriting rules, although |
| 612 | the way of doing this is pretty straightforward. |
| 613 | A special cookie, FORREW, is used to ask |
| 614 | .I match |
| 615 | to search for a rewriting rule. |
| 616 | The rewriting rules are keyed to various opcodes; most |
| 617 | are carried out in |
| 618 | .I order . |
| 619 | Since the question of when to rewrite is one of the key issues in |
| 620 | code generation, it will be taken up again later. |
| 621 | .SH |
| 622 | Register Allocation. |
| 623 | .PP |
| 624 | The register allocation routines, and the allocation strategy, |
| 625 | play a central role in the correctness of the code generation algorithm. |
| 626 | If there are bugs in the Sethi-Ullman computation that cause the |
| 627 | number of needed registers to be underestimated, |
| 628 | the compiler may run out of scratch registers; |
| 629 | it is essential that the allocator keep track of those registers that |
| 630 | are free and busy, in order to detect such conditions. |
| 631 | .PP |
| 632 | Allocation of registers takes place as the result of a template |
| 633 | match; the routine |
| 634 | .I allo |
| 635 | is called with a word describing the number of A registers, |
| 636 | B registers, and temporary locations needed. |
| 637 | The allocation of temporary locations on the stack is relatively |
| 638 | straightforward, and will not be further covered; the |
| 639 | bookkeeping is a bit tricky, but conceptually trivial, and requests |
| 640 | for temporary space on the stack will never fail. |
| 641 | .PP |
| 642 | Register allocation is less straightforward. |
| 643 | The two major complications are |
| 644 | .I pairing |
| 645 | and |
| 646 | .I sharing . |
| 647 | In many machines, some operations (such as multiplication |
| 648 | and division), and/or some types (such as longs or double precision) |
| 649 | require even/odd pairs of registers. |
| 650 | Operations of the first type are exceptionally difficult to |
| 651 | deal with in the compiler; in fact, their theoretical |
| 652 | properties are rather bad as well. |
| 653 | .[ |
| 654 | Aho Johnson Ullman Multiregister |
| 655 | .] |
| 656 | The second issue is dealt with rather more successfully; |
| 657 | a machine dependent function called |
| 658 | .I szty(t) |
| 659 | is called that returns 1 or 2, depending on the |
| 660 | number of A registers required to hold an object of type |
| 661 | .I t . |
| 662 | If |
| 663 | .I szty |
| 664 | returns 2, an even/odd pair of A registers is allocated |
| 665 | for each request. |
| 666 | .PP |
| 667 | The other issue, sharing, is more subtle, but |
| 668 | important for good code quality. |
| 669 | When registers are allocated, it |
| 670 | is possible to reuse registers that hold address |
| 671 | information, and use them to contain the values |
| 672 | computed or accessed. |
| 673 | For example, on the IBM 360, if register 2 has |
| 674 | a pointer to an integer in it, we may load the |
| 675 | integer into register 2 itself by saying: |
| 676 | .DS |
| 677 | L 2,0(2) |
| 678 | .DE |
| 679 | If register 2 had a byte pointer, however, the sequence for |
| 680 | loading a character involves clearing the target |
| 681 | register first, and then inserting the desired character: |
| 682 | .DS |
| 683 | SR 3,3 |
| 684 | IC 3,0(2) |
| 685 | .DE |
| 686 | In the first case, if register 3 were used as the target, |
| 687 | it would lead to a larger number of registers |
| 688 | used for the expression than were required; the compiler would |
| 689 | generate inefficient code. |
| 690 | On the other hand, if register 2 were used as the target in the second |
| 691 | case, the code would simply be wrong. |
| 692 | In the first case, register 2 can be |
| 693 | .I shared |
| 694 | while in the second, it cannot. |
| 695 | .PP |
| 696 | In the specification of the register needs in the templates, |
| 697 | it is possible to indicate whether required scratch registers |
| 698 | may be shared with possible registers on the left or the right of the input tree. |
| 699 | In order that a register be shared, it must be scratch, and it must |
| 700 | be used only once, on the appropriate side of the tree being compiled. |
| 701 | .PP |
| 702 | The |
| 703 | .I allo |
| 704 | routine thus has a bit more to do than meets the eye; |
| 705 | it calls |
| 706 | .I freereg |
| 707 | to obtain a free register for each A and B register request. |
| 708 | .I Freereg |
| 709 | makes multiple calls on the routine |
| 710 | .I usable |
| 711 | to decide if a given register can be used to satisfy |
| 712 | a given need. |
| 713 | .I Usable |
| 714 | calls |
| 715 | .I shareit |
| 716 | if the register is busy, but might be shared. |
| 717 | Finally, |
| 718 | .I shareit |
| 719 | calls |
| 720 | .I ushare |
| 721 | to decide if the desired register is actually in the appropriate |
| 722 | subtree, and can be shared. |
| 723 | .PP |
| 724 | Just to add additional complexity, on some machines (such as the IBM 370) it |
| 725 | is possible to have ``double indexing'' forms of |
| 726 | addressing; these are represented by OREGS's |
| 727 | with the base and index registers encoded into the register field. |
| 728 | While the register allocation and deallocation |
| 729 | .I "per se" |
| 730 | is not made more difficult by this phenomenon, the code itself |
| 731 | is somewhat more complex. |
| 732 | .PP |
| 733 | Having allocated the registers and expanded the assembly language, |
| 734 | it is time to reclaim the resources; the routine |
| 735 | .I reclaim |
| 736 | does this. |
| 737 | Many operations produce more than one result. |
| 738 | For example, many arithmetic operations may produce |
| 739 | a value in a register, and also set the condition |
| 740 | codes. |
| 741 | Assignment operations may leave results both in a register and in memory. |
| 742 | .I Reclaim |
| 743 | is passed three parameters; the tree and cookie |
| 744 | that were matched, and the rewriting field of the template. |
| 745 | The rewriting field allows the specification of possible results; |
| 746 | the tree is rewritten to reflect the results of the operation. |
| 747 | If the tree was computed for side effects only (FOREFF), |
| 748 | the tree is freed, and all resources in it reclaimed. |
| 749 | If the tree was computed for condition codes, the resources |
| 750 | are also freed, and the tree replaced by a special |
| 751 | node type, FORCC. |
| 752 | Otherwise, the value may be found in the left |
| 753 | argument of the root, the right argument of the root, |
| 754 | or one of the temporary resources allocated. |
| 755 | In these cases, first the resources of the tree, |
| 756 | and the newly allocated resources, |
| 757 | are |
| 758 | freed; then the resources needed by the result |
| 759 | are made busy again. |
| 760 | The final result must always match the shape of the input cookie; |
| 761 | otherwise, the compiler error |
| 762 | ``cannot reclaim'' |
| 763 | is generated. |
| 764 | There are some machine dependent ways of |
| 765 | preferring results in registers or memory when |
| 766 | there are multiple results matching multiple goals in the cookie. |
| 767 | .SH |
| 768 | The Machine Dependent Interface |
| 769 | .PP |
| 770 | The files |
| 771 | .I order.c , |
| 772 | .I local2.c , |
| 773 | and |
| 774 | .I table.c , |
| 775 | as well as the header file |
| 776 | .I mac2defs , |
| 777 | represent the machine dependent portion of the second pass. |
| 778 | The machine dependent portion can be roughly divided into |
| 779 | two: the easy portion and the hard portion. |
| 780 | The easy portion |
| 781 | tells the compiler the names of the registers, and arranges that |
| 782 | the compiler generate the proper assembler formats, opcode names, location counters, etc. |
| 783 | The hard portion involves the Sethi\-Ullman computation, the |
| 784 | rewriting rules, and, to some extent, the templates. |
| 785 | It is hard because there are no real algorithms that apply; |
| 786 | most of this portion is based on heuristics. |
| 787 | This section discusses the easy portion; the next several |
| 788 | sections will discuss the hard portion. |
| 789 | .PP |
| 790 | If the compiler is adapted from a compiler for a machine |
| 791 | of similar architecture, the easy part is indeed easy. |
| 792 | In |
| 793 | .I mac2defs , |
| 794 | the register numbers are defined, as well as various parameters for |
| 795 | the stack frame, and various macros that describe the machine architecture. |
| 796 | If double indexing is to be permitted, for example, the symbol |
| 797 | R2REGS is defined. |
| 798 | Also, a number of macros that are involved in function call processing, |
| 799 | especially for unusual function call mechanisms, are defined here. |
| 800 | .PP |
| 801 | In |
| 802 | .I local2.c , |
| 803 | a large number of simple functions are defined. |
| 804 | These do things such as write out opcodes, register names, |
| 805 | and address forms for the assembler. |
| 806 | Part of the function call code is defined here; that is nontrivial |
| 807 | to design, but typically rather straightforward to implement. |
| 808 | Among the easy routines in |
| 809 | .I order.c |
| 810 | are routines for generating a created label, |
| 811 | defining a label, and generating the arguments |
| 812 | of a function call. |
| 813 | .PP |
| 814 | These routines tend to have a local effect, and depend on a fairly straightforward way |
| 815 | on the target assembler and the design decisions already made about |
| 816 | the compiler. |
| 817 | Thus they will not be further treated here. |
| 818 | .SH |
| 819 | The Rewriting Rules |
| 820 | .PP |
| 821 | When a tree fails to match any template, it becomes |
| 822 | a candidate for rewriting. |
| 823 | Before the tree is rewritten, |
| 824 | the machine dependent routine |
| 825 | .I nextcook |
| 826 | is called with the tree and the cookie; it suggests |
| 827 | another cookie that might be a better candidate for the |
| 828 | matching of the tree. |
| 829 | If all else fails, the templates are searched with the cookie |
| 830 | FORREW, to look for a rewriting rule. |
| 831 | The rewriting rules are of two kinds; |
| 832 | for most of the common operators, there are |
| 833 | machine dependent rewriting rules that may be applied; |
| 834 | these are handled by machine dependent functions |
| 835 | that are called and given the tree to be computed. |
| 836 | These routines may recursively call |
| 837 | .I order |
| 838 | or |
| 839 | .I codgen |
| 840 | to cause certain subgoals to be achieved; |
| 841 | if they actually call for some alteration of the tree, |
| 842 | they return 1, and the |
| 843 | code generation algorithm recanonicalizes and tries again. |
| 844 | If these routines choose not to deal with the tree, the |
| 845 | default rewriting rules are applied. |
| 846 | .PP |
| 847 | The assignment ops, when rewritten, call the routine |
| 848 | .I setasg . |
| 849 | This is assumed to rewrite the tree at least to the point where there are |
| 850 | no side effects in the left hand side. |
| 851 | If there is still no template match, |
| 852 | a default rewriting is done that causes |
| 853 | an expression such as |
| 854 | .DS |
| 855 | .I "a += b" |
| 856 | .DE |
| 857 | to be rewritten as |
| 858 | .DS |
| 859 | .I "a = a + b" |
| 860 | .DE |
| 861 | This is a useful default for certain mixtures of strange types |
| 862 | (for example, when |
| 863 | .I a |
| 864 | is a bit field and |
| 865 | .I b |
| 866 | an character) that |
| 867 | otherwise might need separate table entries. |
| 868 | .PP |
| 869 | Simple assignment, structure assignment, and all forms of calls |
| 870 | are handled completely by the machine dependent routines. |
| 871 | For historical reasons, the routines generating the calls return |
| 872 | 1 on failure, 0 on success, unlike the other routines. |
| 873 | .PP |
| 874 | The machine dependent routine |
| 875 | .I setbin |
| 876 | handles binary operators; it too must do most of the job. |
| 877 | In particular, when it returns 0, it must do so with |
| 878 | the left hand side in a temporary register. |
| 879 | The default rewriting rule in this case is to convert the |
| 880 | binary operator into the associated assignment operator; |
| 881 | since the left hand side is assumed to be a temporary register, |
| 882 | this preserves the semantics and often allows a considerable |
| 883 | saving in the template table. |
| 884 | .PP |
| 885 | The increment and decrement operators may be dealt with with |
| 886 | the machine dependent routine |
| 887 | .I setincr . |
| 888 | If this routine chooses not to deal with the tree, the rewriting rule replaces |
| 889 | .DS |
| 890 | .I "x ++" |
| 891 | .DE |
| 892 | by |
| 893 | .DS |
| 894 | .I "( (x += 1) \- 1)" |
| 895 | .DE |
| 896 | which preserves the semantics. |
| 897 | Once again, this is not too attractive for the most common |
| 898 | cases, but can generate close to optimal code when the |
| 899 | type of x is unusual. |
| 900 | .PP |
| 901 | Finally, the indirection (UNARY MUL) operator is also handled |
| 902 | in a special way. |
| 903 | The machine dependent routine |
| 904 | .I offstar |
| 905 | is extremely important for the efficient generation of code. |
| 906 | .I Offstar |
| 907 | is called with a tree that is the direct descendant of a UNARY MUL node; |
| 908 | its job is to transform this tree so that the combination of |
| 909 | UNARY MUL with the transformed tree becomes addressable. |
| 910 | On most machines, |
| 911 | .I offstar |
| 912 | can simply compute the tree into an A or B register, |
| 913 | depending on the architecture, and then |
| 914 | .I canon |
| 915 | will make the resulting tree into an OREG. |
| 916 | On many machines, |
| 917 | .I offstar |
| 918 | can profitably choose to do less work than computing |
| 919 | its entire argument into a register. |
| 920 | For example, if the target machine supports OREGS |
| 921 | with a constant offset from a register, and |
| 922 | .I offstar |
| 923 | is called |
| 924 | with a tree of the form |
| 925 | .DS |
| 926 | .I "expr + const" |
| 927 | .DE |
| 928 | where |
| 929 | .I const |
| 930 | is a constant, then |
| 931 | .I offstar |
| 932 | need only compute |
| 933 | .I expr |
| 934 | into the appropriate form of register. |
| 935 | On machines that support double indexing, |
| 936 | .I offstar |
| 937 | may have even more choice as to how to proceed. |
| 938 | The proper tuning of |
| 939 | .I offstar , |
| 940 | which is not typically too difficult, should be one of the |
| 941 | first tries at optimization attempted by the |
| 942 | compiler writer. |
| 943 | .SH |
| 944 | The Sethi-Ullman Computation |
| 945 | .PP |
| 946 | The heart of the heuristics is the computation of the Sethi-Ullman numbers. |
| 947 | This computation is closely linked with the rewriting rules and the |
| 948 | templates. |
| 949 | As mentioned before, the Sethi-Ullman numbers are expected to |
| 950 | estimate the number of scratch registers needed to compute |
| 951 | the subtrees without using any stores. |
| 952 | However, the original theory does not apply to real machines. |
| 953 | For one thing, the theory assumes that all registers |
| 954 | are interchangeable. |
| 955 | Real machines have general purpose, floating point, and index registers, |
| 956 | register pairs, etc. |
| 957 | The theory also does not account for side effects; |
| 958 | this rules out various forms of pathology that arise |
| 959 | from assignment and assignment ops. |
| 960 | Condition codes are also undreamed of. |
| 961 | Finally, the influence of types, conversions, and the |
| 962 | various addressability restrictions and extensions of real |
| 963 | machines are also ignored. |
| 964 | .PP |
| 965 | Nevertheless, for a ``useless'' theory, |
| 966 | the basic insight of Sethi and Ullman is amazingly |
| 967 | useful in a real compiler. |
| 968 | The notion that one should attempt to estimate the |
| 969 | resource needs of trees before starting the |
| 970 | code generation provides a natural means of splitting the |
| 971 | code generation problem, and provides a bit of redundancy |
| 972 | and self checking in the compiler. |
| 973 | Moreover, if writing the |
| 974 | Sethi-Ullman routines is hard, describing, writing, and debugging the |
| 975 | alternative (routines that attempt to free up registers by stores into |
| 976 | temporaries ``on the fly'') is even worse. |
| 977 | Nevertheless, it should be clearly understood that these routines exist in a realm |
| 978 | where there is no ``right'' way to write them; |
| 979 | it is an art, the realm of heuristics, and, consequently, a major |
| 980 | source of bugs in the compiler. |
| 981 | Often, the early, crude versions of these routines give little trouble; |
| 982 | only after |
| 983 | the compiler is actually working |
| 984 | and the |
| 985 | code quality is being improved do serious problem have to be faced. |
| 986 | Having a simple, regular machine architecture is worth quite |
| 987 | a lot at this time. |
| 988 | .PP |
| 989 | The major problems arise from asymmetries in the registers: register pairs, |
| 990 | having different kinds of registers, and the related problem of |
| 991 | needing more than one register (frequently a pair) to store certain |
| 992 | data |
| 993 | types (such as longs or doubles). |
| 994 | There appears to be no general way of treating this problem; |
| 995 | solutions have to be fudged for each machine where the problem arises. |
| 996 | On the Honeywell 66, for example, there are only two general purpose registers, |
| 997 | so a need for a pair is the same as the need for two registers. |
| 998 | On the IBM 370, the register pair (0,1) is used to do multiplications and divisions; |
| 999 | registers 0 and 1 are not generally considered part of the scratch registers, and so |
| 1000 | do not require allocation explicitly. |
| 1001 | On the Interdata 8/32, after much consideration, the |
| 1002 | decision was made not to try to deal with the register pair issue; |
| 1003 | operations such as multiplication and division that required pairs |
| 1004 | were simply assumed to take all of the scratch registers. |
| 1005 | Several weeks of effort had failed to produce |
| 1006 | an algorithm that seemed to have much chance of running successfully |
| 1007 | without inordinate debugging effort. |
| 1008 | The difficulty of this issue should not be minimized; it represents one of the |
| 1009 | main intellectual efforts in porting the compiler. |
| 1010 | Nevertheless, this problem has been fudged with a degree of |
| 1011 | success on nearly a dozen machines, so the compiler writer should |
| 1012 | not abandon hope. |
| 1013 | .PP |
| 1014 | The Sethi-Ullman computations interact with the |
| 1015 | rest of the compiler in a number of rather subtle ways. |
| 1016 | As already discussed, the |
| 1017 | .I store |
| 1018 | routine uses the Sethi-Ullman numbers to decide which subtrees are too difficult |
| 1019 | to compute in registers, and must be stored. |
| 1020 | There are also subtle interactions between the |
| 1021 | rewriting routines and the Sethi-Ullman numbers. |
| 1022 | Suppose we have a tree such as |
| 1023 | .DS |
| 1024 | .I "A \- B" |
| 1025 | .DE |
| 1026 | where |
| 1027 | .I A |
| 1028 | and |
| 1029 | .I B |
| 1030 | are expressions; suppose further that |
| 1031 | .I B |
| 1032 | takes two registers, and |
| 1033 | .I A |
| 1034 | one. |
| 1035 | It is possible to compute the full expression in two registers by |
| 1036 | first computing |
| 1037 | .I B , |
| 1038 | and then, using the scratch register |
| 1039 | used by |
| 1040 | .I B , |
| 1041 | but not containing the answer, compute |
| 1042 | .I A . |
| 1043 | The subtraction can then be done, computing the expression. |
| 1044 | (Note that this assumes a number of things, not the least of which |
| 1045 | are register-to-register subtraction operators and symmetric |
| 1046 | registers.) |
| 1047 | If the machine dependent routine |
| 1048 | .I setbin , |
| 1049 | however, is not prepared to recognize this case |
| 1050 | and compute the more difficult side of the expression |
| 1051 | first, the |
| 1052 | Sethi-Ullman number must be set to three. |
| 1053 | Thus, the |
| 1054 | Sethi-Ullman number for a tree should represent the code that |
| 1055 | the machine dependent routines are actually willing to generate. |
| 1056 | .PP |
| 1057 | The interaction can go the other way. |
| 1058 | If we take an expression such as |
| 1059 | .DS |
| 1060 | .I "* ( p + i )" |
| 1061 | .DE |
| 1062 | where |
| 1063 | .I p |
| 1064 | is a pointer and |
| 1065 | .I i |
| 1066 | an integer, |
| 1067 | this can probably be done in one register on most machines. |
| 1068 | Thus, its Sethi-Ullman number would probably be set to one. |
| 1069 | If double indexing is possible in the machine, a possible way |
| 1070 | of computing the expression is to load both |
| 1071 | .I p |
| 1072 | and |
| 1073 | .I i |
| 1074 | into registers, and then use double indexing. |
| 1075 | This would use two scratch registers; in such a case, |
| 1076 | it is possible that the scratch registers might be unobtainable, |
| 1077 | or might make some other part of the computation run out of |
| 1078 | registers. |
| 1079 | The usual solution is to cause |
| 1080 | .I offstar |
| 1081 | to ignore opportunities for double indexing that would tie up more scratch |
| 1082 | registers than the Sethi-Ullman number had reserved. |
| 1083 | .PP |
| 1084 | In summary, the Sethi-Ullman computation represents much of the craftsmanship and artistry in any application |
| 1085 | of the portable compiler. |
| 1086 | It is also a frequent source of bugs. |
| 1087 | Algorithms are available that will produce nearly optimal code |
| 1088 | for specialized machines, but unfortunately most existing machines |
| 1089 | are far removed from these ideals. |
| 1090 | The best way of proceeding in practice is to start with a compiler |
| 1091 | for a similar machine to the target, and proceed very |
| 1092 | carefully. |
| 1093 | .SH |
| 1094 | Register Allocation |
| 1095 | .PP |
| 1096 | After the Sethi-Ullman numbers are computed, |
| 1097 | .I order |
| 1098 | calls a routine, |
| 1099 | .I rallo , |
| 1100 | that does register allocation, if appropriate. |
| 1101 | This routine does relatively little, in general; |
| 1102 | this is especially true if the target machine |
| 1103 | is fairly regular. |
| 1104 | There are a few cases where it is assumed that |
| 1105 | the result of a computation takes place in a particular register; |
| 1106 | switch and function return are the two major places. |
| 1107 | The expression tree has a field, |
| 1108 | .I rall , |
| 1109 | that may be filled with a register number; this is taken |
| 1110 | to be a preferred register, and the first temporary |
| 1111 | register allocated by a template match will be this preferred one, if |
| 1112 | it is free. |
| 1113 | If not, no particular action is taken; this is just a heuristic. |
| 1114 | If no register preference is present, the field contains NOPREF. |
| 1115 | In some cases, the result must be placed in |
| 1116 | a given register, no matter what. |
| 1117 | The register number is placed in |
| 1118 | .I rall , |
| 1119 | and the mask MUSTDO is logically or'ed in with it. |
| 1120 | In this case, if the subtree is requested in a register, and comes |
| 1121 | back in a register other than the demanded one, it is moved |
| 1122 | by calling the routine |
| 1123 | .I rmove . |
| 1124 | If the target register for this move is busy, it is a compiler |
| 1125 | error. |
| 1126 | .PP |
| 1127 | Note that this mechanism is the only one that will ever cause a register-to-register |
| 1128 | move between scratch registers (unless such a move is buried in the depths of |
| 1129 | some template). |
| 1130 | This simplifies debugging. |
| 1131 | In some cases, there is a rather strange interaction between |
| 1132 | the register allocation and the Sethi-Ullman number; |
| 1133 | if there is an operator or situation requiring a |
| 1134 | particular register, the allocator and the Sethi-Ullman |
| 1135 | computation must conspire to ensure that the target |
| 1136 | register is not being used by some intermediate result of some far-removed computation. |
| 1137 | This is most easily done by making the special operation take |
| 1138 | all of the free registers, preventing any other partially-computed |
| 1139 | results from cluttering up the works. |
| 1140 | .SH |
| 1141 | Compiler Bugs |
| 1142 | .PP |
| 1143 | The portable compiler has an excellent record of generating correct code. |
| 1144 | The requirement for reasonable cooperation between the register allocation, |
| 1145 | Sethi-Ullman computation, rewriting rules, and templates builds quite a bit |
| 1146 | of redundancy into the compiling process. |
| 1147 | The effect of this is that, in a surprisingly short time, the compiler will |
| 1148 | start generating correct code for those |
| 1149 | programs that it can compile. |
| 1150 | The hard part of the job then becomes finding and |
| 1151 | eliminating those situations where the compiler refuses to |
| 1152 | compile a program because it knows it cannot do it right. |
| 1153 | For example, a template may simply be missing; this may either |
| 1154 | give a compiler error of the form ``no match for op ...'' , or cause |
| 1155 | the compiler to go into an infinite loop applying various rewriting rules. |
| 1156 | The compiler has a variable, |
| 1157 | .I nrecur , |
| 1158 | that is set to 0 at the beginning of an expressions, and |
| 1159 | incremented at key spots in the |
| 1160 | compilation process; if this parameter gets too large, the |
| 1161 | compiler decides that it is in a loop, and aborts. |
| 1162 | Loops are also characteristic of botches in the machine-dependent rewriting rules. |
| 1163 | Bad Sethi-Ullman computations usually cause the scratch registers |
| 1164 | to run out; this often means |
| 1165 | that the Sethi-Ullman number was underestimated, so |
| 1166 | .I store |
| 1167 | did not store something it should have; alternatively, |
| 1168 | it can mean that the rewriting rules were not smart enough to |
| 1169 | find the sequence that |
| 1170 | .I sucomp |
| 1171 | assumed would be used. |
| 1172 | .PP |
| 1173 | The best approach when a compiler error is detected involves several stages. |
| 1174 | First, try to get a small example program that steps on the bug. |
| 1175 | Second, turn on various debugging flags in the code generator, and follow the |
| 1176 | tree through the process of being matched and rewritten. |
| 1177 | Some flags of interest are |
| 1178 | \-e, which prints the expression tree, |
| 1179 | \-r, which gives information about the allocation of registers, |
| 1180 | \-a, which gives information about the performance of |
| 1181 | .I rallo , |
| 1182 | and \-o, which gives information about the behavior of |
| 1183 | .I order . |
| 1184 | This technique should allow most bugs to be found relatively quickly. |
| 1185 | .PP |
| 1186 | Unfortunately, finding the bug is usually not enough; it must also |
| 1187 | be fixed! |
| 1188 | The difficulty arises because a fix to the particular bug of interest tends |
| 1189 | to break other code that already works. |
| 1190 | Regression tests, tests that compare the performance of |
| 1191 | a new compiler against the performance of an older one, are very |
| 1192 | valuable in preventing major catastrophes. |
| 1193 | .SH |
| 1194 | Summary and Conclusion |
| 1195 | .PP |
| 1196 | The portable compiler has been a useful tool for providing C |
| 1197 | capability on a large number of diverse machines, |
| 1198 | and for testing a number of theoretical |
| 1199 | constructs in a practical setting. |
| 1200 | It has many blemishes, both in style and functionality. |
| 1201 | It has been applied to many more machines than first |
| 1202 | anticipated, of a much wider range than originally dreamed of. |
| 1203 | Its use has also spread much faster than expected, leaving parts of |
| 1204 | the compiler still somewhat raw in shape. |
| 1205 | .PP |
| 1206 | On the theoretical side, there is some hope that the |
| 1207 | skeleton of the |
| 1208 | .I sucomp |
| 1209 | routine could be generated for many machines directly from the |
| 1210 | templates; this would give a considerable boost |
| 1211 | to the portability and correctness of the compiler, |
| 1212 | but might affect tunability and code quality. |
| 1213 | There is also room for more optimization, both within |
| 1214 | .I optim |
| 1215 | and in the form of a portable ``peephole'' optimizer. |
| 1216 | .PP |
| 1217 | On the practical, development side, |
| 1218 | the compiler could probably be sped up and made smaller |
| 1219 | without doing too much violence to its basic structure. |
| 1220 | Parts of the compiler deserve to be rewritten; |
| 1221 | the initialization code, register allocation, and |
| 1222 | parser are prime candidates. |
| 1223 | It might be that doing some or all of the parsing |
| 1224 | with a recursive descent parser might |
| 1225 | save enough space and time to be worthwhile; |
| 1226 | it would certainly ease the problem of moving the |
| 1227 | compiler to an environment where |
| 1228 | .I Yacc |
| 1229 | is not already present. |
| 1230 | .PP |
| 1231 | Finally, I would like to thank the many people who have |
| 1232 | sympathetically, and even enthusiastically, helped me grapple |
| 1233 | with what has been a frustrating program to write, test, and install. |
| 1234 | D. M. Ritchie and E. N. Pinson provided needed early |
| 1235 | encouragement and philosophical guidance; |
| 1236 | M. E. Lesk, |
| 1237 | R. Muha, T. G. Peterson, |
| 1238 | G. Riddle, L. Rosler, |
| 1239 | R. W. Mitze, |
| 1240 | B. R. Rowland, |
| 1241 | S. I. Feldman, |
| 1242 | and |
| 1243 | T. B. London |
| 1244 | have all contributed ideas, gripes, and all, at one time or another, |
| 1245 | climbed ``into the pits'' with me to help debug. |
| 1246 | Without their help this effort would have not been possible; |
| 1247 | with it, it was often kind of fun. |
| 1248 | .sp 100 |
| 1249 | .LP |
| 1250 | .[ |
| 1251 | $LIST$ |
| 1252 | .] |
| 1253 | .LP |
| 1254 | .sp 100 |
| 1255 | .LP |