Commit | Line | Data |
---|---|---|
8340f87c BJ |
1 | .SH |
2 | Pass Two | |
3 | .PP | |
4 | Code generation is far less well understood than | |
5 | parsing or lexical analysis, and for this reason | |
6 | the second pass is far harder to discuss in a file by file manner. | |
7 | A great deal of the difficulty is in understanding the issues | |
8 | and the strategies employed to meet them. | |
9 | Any particular function is likely to be reasonably straightforward. | |
10 | .PP | |
11 | Thus, this part of the paper will concentrate a good deal on the broader | |
12 | aspects of strategy in the code generator, | |
13 | and will not get too intimate with the details. | |
14 | .SH | |
15 | Overview. | |
16 | .PP | |
17 | It is difficult to organize a code generator to be flexible enough to | |
18 | generate code for a large number of machines, | |
19 | and still be efficient for any one of them. | |
20 | Flexibility is also important when it comes time to tune | |
21 | the code generator to improve the output code quality. | |
22 | On the other hand, too much flexibility can lead to semantically | |
23 | incorrect code, and potentially a combinatorial explosion in the | |
24 | number of cases to be considered in the compiler. | |
25 | .PP | |
26 | One goal of the code generator is to have a high degree of correctness. | |
27 | It is very desirable to have the compiler detect its own inability to | |
28 | generate correct code, rather than to produce incorrect code. | |
29 | This goal is achieved by having a simple model of the job | |
30 | to be done (e.g., an expression tree) | |
31 | and a simple model of the machine state | |
32 | (e.g., which registers are free). | |
33 | The act of generating an instruction performs a transformation | |
34 | on the tree and the machine state; | |
35 | hopefully, the tree eventually gets | |
36 | reduced to a single node. | |
37 | If each of these instruction/transformation pairs is correct, | |
38 | and if the machine state model really represents the actual machine, | |
39 | and if the transformations reduce the input tree to the desired single node, then the | |
40 | output code will be correct. | |
41 | .PP | |
42 | For most real machines, there is no definitive theory of code generation that | |
43 | encompasses all the C operators. | |
44 | Thus the selection of which instruction/transformations to generate, and in what | |
45 | order, will have a heuristic flavor. | |
46 | If, for some expression tree, no transformation applies, or, more | |
47 | seriously, if the heuristics select a sequence of instruction/transformations | |
48 | that do not in fact reduce the tree, the compiler | |
49 | will report its inability to generate code, and abort. | |
50 | .PP | |
51 | A major part of the code generator is concerned with the model and the transformations, | |
52 | \(em most of this is machine independent, or depends only on simple tables. | |
53 | The flexibility comes from the heuristics that guide the transformations | |
54 | of the trees, the selection of subgoals, and the ordering of the computation. | |
55 | .SH | |
56 | The Machine Model | |
57 | .PP | |
58 | The machine is assumed to have a number of registers, of at most two different | |
59 | types: | |
60 | .I A | |
61 | and | |
62 | .I B . | |
63 | Within each register class, there may be scratch (temporary) registers and dedicated registers | |
64 | (e.g., register variables, the stack pointer, etc.). | |
65 | Requests to allocate and free registers involve only the temporary registers. | |
66 | .PP | |
67 | Each of the registers in the machine is given a name and a number | |
68 | in the | |
69 | .I mac2defs | |
70 | file; the numbers are used as indices into various tables | |
71 | that describe the registers, so they should be kept small. | |
72 | One such table is the | |
73 | .I rstatus | |
74 | table on file | |
75 | .I local2.c . | |
76 | This table is indexed by register number, and | |
77 | contains expressions made up from manifest constants describing the register types: | |
78 | SAREG for dedicated AREG's, SAREG\(orSTAREG for scratch AREGS's, | |
79 | and SBREG and SBREG\(orSTBREG similarly for BREG's. | |
80 | There are macros that access this information: | |
81 | .I isbreg(r) | |
82 | returns true if register number | |
83 | .I r | |
84 | is a BREG, and | |
85 | .I istreg(r) | |
86 | returns true if register number | |
87 | .I r | |
88 | is a temporary AREG or BREG. | |
89 | Another table, | |
90 | .I rnames , | |
91 | contains the register names; this is used when | |
92 | putting out assembler code and diagnostics. | |
93 | .PP | |
94 | The usage of registers is kept track of by an array called | |
95 | .I busy . | |
96 | .I Busy[r] | |
97 | is the number of uses of register | |
98 | .I r | |
99 | in the current tree being processed. | |
100 | The allocation and freeing of registers will be discussed later as part of the code generation | |
101 | algorithm. | |
102 | .SH | |
103 | General Organization | |
104 | .PP | |
105 | As mentioned above, the second pass reads lines from | |
106 | the intermediate file, copying through to the output unchanged | |
107 | any lines that begin with a `)', and making note of the | |
108 | information about stack usage and register allocation contained on | |
109 | lines beginning with `]' and `['. | |
110 | The expression trees, whose beginning is indicated by a line beginning with `.', | |
111 | are read and rebuilt into trees. | |
112 | If the compiler is loaded as one pass, the expression trees are | |
113 | immediately available to the code generator. | |
114 | .PP | |
115 | The actual code generation is done by a hierarchy of routines. | |
116 | The routine | |
117 | .I delay | |
118 | is first given the tree; it attempts to delay some postfix | |
119 | ++ and \-\- computations that might reasonably be done after the | |
120 | smoke clears. | |
121 | It also attempts to handle comma (,) operators by computing the | |
122 | left side expression first, and then rewriting the tree | |
123 | to eliminate the operator. | |
124 | .I Delay | |
125 | calls | |
126 | .I codgen | |
127 | to control the actual code generation process. | |
128 | .I Codgen | |
129 | takes as arguments a pointer to the expression tree, | |
130 | and a second argument that, for socio-historical reasons, is called a | |
131 | .I cookie . | |
132 | The cookie describes a set of goals that would be acceptable | |
133 | for the code generation: these are assigned to individual bits, | |
134 | so they may be logically or'ed together to form a large number of possible goals. | |
135 | Among the possible goals are FOREFF (compute for side effects only; | |
136 | don't worry about the value), INTEMP (compute and store value into a temporary location | |
137 | in memory), INAREG (compute into an A register), INTAREG (compute into a scratch | |
138 | A register), INBREG and INTBREG similarly, FORCC (compute for condition codes), | |
139 | and FORARG (compute it as a function argument; e.g., stack it if appropriate). | |
140 | .PP | |
141 | .I Codgen | |
142 | first canonicalizes the tree by calling | |
143 | .I canon . | |
144 | This routine looks for certain transformations that might now be | |
145 | applicable to the tree. | |
146 | One, which is very common and very powerful, is to | |
147 | fold together an indirection operator (UNARY MUL) | |
148 | and a register (REG); in most machines, this combination is | |
149 | addressable directly, and so is similar to a NAME in its | |
150 | behavior. | |
151 | The UNARY MUL and REG are folded together to make | |
152 | another node type called OREG. | |
153 | In fact, in many machines it is possible to directly address not just the | |
154 | cell pointed to by a register, but also cells differing by a constant | |
155 | offset from the cell pointed to by the register. | |
156 | .I Canon | |
157 | also looks for such cases, calling the | |
158 | machine dependent routine | |
159 | .I notoff | |
160 | to decide if the offset is acceptable (for example, in the IBM 370 the offset | |
161 | must be between 0 and 4095 bytes). | |
162 | Another optimization is to replace bit field operations | |
163 | by shifts and masks if the operation involves extracting the field. | |
164 | Finally, a machine dependent routine, | |
165 | .I sucomp , | |
166 | is called that computes the Sethi-Ullman numbers | |
167 | for the tree (see below). | |
168 | .PP | |
169 | After the tree is canonicalized, | |
170 | .I codgen | |
171 | calls the routine | |
172 | .I store | |
173 | whose job is to select a subtree of the tree to be computed | |
174 | and (usually) stored before beginning the | |
175 | computation of the full tree. | |
176 | .I Store | |
177 | must return a tree that can be computed without need | |
178 | for any temporary storage locations. | |
179 | In effect, the only store operations generated while processing the subtree must be as a response | |
180 | to explicit assignment operators in the tree. | |
181 | This division of the job marks one of the more significant, and successful, | |
182 | departures from most other compilers. | |
183 | It means that the code generator can operate | |
184 | under the assumption that there are enough | |
185 | registers to do its job, without worrying about | |
186 | temporary storage. | |
187 | If a store into a temporary appears in the output, it is always | |
188 | as a direct result of logic in the | |
189 | .I store | |
190 | routine; this makes debugging easier. | |
191 | .PP | |
192 | One consequence of this organization is that | |
193 | code is not generated by a treewalk. | |
194 | There are theoretical results that support this decision. | |
195 | .[ | |
196 | Aho Johnson Optimal Code Trees jacm | |
197 | .] | |
198 | It may be desirable to compute several subtrees and store | |
199 | them before tackling the whole tree; | |
200 | if a subtree is to be stored, this is known before the code | |
201 | generation for the subtree is begun, and the subtree is computed | |
202 | when all scratch registers are available. | |
203 | .PP | |
204 | The | |
205 | .I store | |
206 | routine decides what subtrees, if any, should be | |
207 | stored by making use of numbers, | |
208 | called | |
209 | .I "Sethi-Ullman numbers" , | |
210 | that give, for each subtree of an expression tree, | |
211 | the minimum number of scratch registers required to | |
212 | compile the subtree, without any stores into temporaries. | |
213 | .[ | |
214 | Sethi Ullman jacm 1970 | |
215 | .] | |
216 | These numbers are computed by the machine-dependent | |
217 | routine | |
218 | .I sucomp , | |
219 | called by | |
220 | .I canon . | |
221 | The basic notion is that, knowing the Sethi-Ullman numbers for | |
222 | the descendants of a node, and knowing the operator of the | |
223 | node and some information about the machine, the | |
224 | Sethi-Ullman number of the node itself can be computed. | |
225 | If the Sethi-Ullman number for a tree exceeds the number of scratch | |
226 | registers available, some subtree must be stored. | |
227 | Unfortunately, the theory behind the Sethi-Ullman numbers | |
228 | applies only to uselessly simple machines and operators. | |
229 | For the rich set of C operators, and for machines with | |
230 | asymmetric registers, register pairs, different kinds of registers, | |
231 | and exceptional forms of addressing, | |
232 | the theory cannot be applied directly. | |
233 | The basic idea of estimation is a good one, however, | |
234 | and well worth applying; the application, especially | |
235 | when the compiler comes to be tuned for high code | |
236 | quality, goes beyond the park of theory into the | |
237 | swamp of heuristics. | |
238 | This topic will be taken up again later, when more of the compiler | |
239 | structure has been described. | |
240 | .PP | |
241 | After examining the Sethi-Ullman numbers, | |
242 | .I store | |
243 | selects a subtree, if any, to be stored, and returns the subtree and the associated cookie in | |
244 | the external variables | |
245 | .I stotree | |
246 | and | |
247 | .I stocook . | |
248 | If a subtree has been selected, or if | |
249 | the whole tree is ready to be processed, the | |
250 | routine | |
251 | .I order | |
252 | is called, with a tree and cookie. | |
253 | .I Order | |
254 | generates code for trees that | |
255 | do not require temporary locations. | |
256 | .I Order | |
257 | may make recursive calls on itself, and, | |
258 | in some cases, on | |
259 | .I codgen ; | |
260 | for example, when processing the operators &&, \(or\(or, and comma (`,'), that have | |
261 | a left to right evaluation, it is | |
262 | incorrect for | |
263 | .I store | |
264 | examine the right operand for subtrees to be stored. | |
265 | In these cases, | |
266 | .I order | |
267 | will call | |
268 | .I codgen | |
269 | recursively when it is permissible to work on the right operand. | |
270 | A similar issue arises with the ? : operator. | |
271 | .PP | |
272 | The | |
273 | .I order | |
274 | routine works by matching the current tree with | |
275 | a set of code templates. | |
276 | If a template is discovered that will | |
277 | match the current tree and cookie, the associated assembly language | |
278 | statement or statements are generated. | |
279 | The tree is then rewritten, | |
280 | as specified by the template, to represent the effect of the output instruction(s). | |
281 | If no template match is found, first an attempt is made to find a match with a | |
282 | different cookie; for example, in order to compute | |
283 | an expression with cookie INTEMP (store into a temporary storage location), | |
284 | it is usually necessary to compute the expression into a scratch register | |
285 | first. | |
286 | If all attempts to match the tree fail, the heuristic part of | |
287 | the algorithm becomes dominant. | |
288 | Control is typically given to one of a number of machine-dependent routines | |
289 | that may in turn recursively call | |
290 | .I order | |
291 | to achieve a subgoal of the computation (for example, one of the | |
292 | arguments may be computed into a temporary register). | |
293 | After this subgoal has been achieved, the process begins again with the | |
294 | modified tree. | |
295 | If the machine-dependent heuristics are unable to reduce the tree further, | |
296 | a number of default rewriting rules may be considered appropriate. | |
297 | For example, if the left operand of a + is a scratch | |
298 | register, the + can be replaced by a += operator; | |
299 | the tree may then match a template. | |
300 | .PP | |
301 | To close this introduction, we will discuss the steps in compiling | |
302 | code for the expression | |
303 | .DS | |
304 | \fIa\fR += \fIb\fR | |
305 | .DE | |
306 | where | |
307 | .I a | |
308 | and | |
309 | .I b | |
310 | are static variables. | |
311 | .PP | |
312 | To begin with, the whole expression tree is examined with cookie FOREFF, and | |
313 | no match is found. Search with other cookies is equally fruitless, so an | |
314 | attempt at rewriting is made. | |
315 | Suppose we are dealing with the Interdata 8/32 for the moment. | |
316 | It is recognized that the left hand and right hand sides of the += operator | |
317 | are addressable, and in particular the left hand side has no | |
318 | side effects, so it is permissible to rewrite this as | |
319 | .DS | |
320 | \fIa\fR = \fIa\fR + \fIb\fR | |
321 | .DE | |
322 | and this is done. | |
323 | No match is found on this tree either, so a machine dependent rewrite is done; it is recognized | |
324 | that the left hand side of the assignment is addressable, but the right hand side is not | |
325 | in a register, so | |
326 | .I order | |
327 | is called recursively, being asked to put the right | |
328 | hand side of the assignment into a register. | |
329 | This invocation of | |
330 | .I order | |
331 | searches the tree for a match, and fails. | |
332 | The machine dependent rule for + | |
333 | notices that the right hand operand is addressable; | |
334 | it decides to put the left operand into a scratch register. | |
335 | Another recursive call to | |
336 | .I order | |
337 | is made, with the tree | |
338 | consisting solely of the leaf | |
339 | .I a , | |
340 | and the cookie asking that the value be placed into a scratch register. | |
341 | This now matches a template, and a load instruction is emitted. | |
342 | The node consisting of | |
343 | .I a | |
344 | is rewritten in place to represent the register into which | |
345 | .I a | |
346 | is loaded, | |
347 | and this third call to | |
348 | .I order | |
349 | returns. | |
350 | The second call to | |
351 | .I order | |
352 | now finds that it has the tree | |
353 | .DS | |
354 | \fBreg\fR + \fIb\fR | |
355 | .DE | |
356 | to consider. | |
357 | Once again, there is no match, but the default rewriting rule rewrites | |
358 | the + as a += operator, since the left operand is a scratch register. | |
359 | When this is done, there is a match: in fact, | |
360 | .DS | |
361 | \fBreg\fR += \fIb\fR | |
362 | .DE | |
363 | simply describes the effect of the add instruction | |
364 | on a typical machine. | |
365 | After the add is emitted, the tree is rewritten | |
366 | to consist merely of the register node, since the result of the add | |
367 | is now in the register. | |
368 | This agrees with the cookie passed to the second invocation of | |
369 | .I order , | |
370 | so this invocation | |
371 | terminates, returning to the first level. | |
372 | The original tree has now | |
373 | become | |
374 | .DS | |
375 | \fIa\fR = \fBreg\fR | |
376 | .DE | |
377 | which matches a template for the store instruction. | |
378 | The store is output, and the tree rewritten to become | |
379 | just a single register node. | |
380 | At this point, since the top level call to | |
381 | .I order | |
382 | was | |
383 | interested only in side effects, the call to | |
384 | .I order | |
385 | returns, and the code generation is completed; | |
386 | we have generated a load, add, and store, as might have been expected. | |
387 | .PP | |
388 | The effect of machine architecture on this is considerable. | |
389 | For example, on the Honeywell 6000, the machine dependent heuristics recognize that there is an ``add to storage'' | |
390 | instruction, so the strategy is quite different; | |
391 | .I b | |
392 | is loaded in to | |
393 | a register, and then an add to storage instruction generated | |
394 | to add this register in to | |
395 | .I a . | |
396 | The transformations, involving as they do the semantics of C, | |
397 | are largely machine independent. | |
398 | The decisions as to when to use them, however, are | |
399 | almost totally machine dependent. | |
400 | .PP | |
401 | Having given a broad outline of | |
402 | the code generation process, we shall next consider the | |
403 | heart of it: the templates. | |
404 | This leads naturally into discussions of template matching and register allocation, | |
405 | and finally a discussion of the machine dependent interfaces and strategies. | |
406 | .SH | |
407 | The Templates | |
408 | .PP | |
409 | The templates describe the effect of the target machine instructions | |
410 | on the model of computation around which the compiler is organized. | |
411 | In effect, each template has five logical sections, and represents an assertion | |
412 | of the form: | |
413 | .IP | |
414 | .B If | |
415 | we have a subtree of a given shape (1), and we have a goal (cookie) or goals to | |
416 | achieve (2), and we have sufficient free resources (3), | |
417 | .B then | |
418 | we may emit an instruction or instructions (4), and | |
419 | rewrite the subtree in a particular manner (5), | |
420 | and the rewritten tree will achieve the desired goals. | |
421 | .PP | |
422 | These five sections will be discussed in more | |
423 | detail later. First, we give an example of a | |
424 | template: | |
425 | .DS | |
426 | .ta 1i 2i 3i 4i 5i | |
427 | ASG PLUS, INAREG, | |
428 | SAREG, TINT, | |
429 | SNAME, TINT, | |
430 | 0, RLEFT, | |
431 | " add AL,AR\en", | |
432 | .DE | |
433 | The top line specifies the operator (+=) and the cookie (compute the | |
434 | value of the subtree into an AREG). | |
435 | The second and third lines specify the left and right descendants, | |
436 | respectively, | |
437 | of the += operator. | |
438 | The left descendant must be a REG node, representing an | |
439 | A register, and have integer type, while the right side must be a NAME node, | |
440 | and also have integer type. | |
441 | The fourth line contains the resource requirements (no scratch registers | |
442 | or temporaries needed), and the rewriting rule (replace the subtree by the left descendant). | |
443 | Finally, the quoted string on the last line represents the output to the assembler: | |
444 | lower case letters, tabs, spaces, etc. are copied | |
445 | .I verbatim . | |
446 | to the output; upper case letters trigger various macro-like expansions. | |
447 | Thus, | |
448 | .B AL | |
449 | would expand into the \fBA\fRddress form of the \fBL\fReft operand \(em | |
450 | presumably the register number. | |
451 | Similarly, | |
452 | .B AR | |
453 | would expand into the name of the right operand. | |
454 | The | |
455 | .I add | |
456 | instruction of the last section might well be | |
457 | emitted by this template. | |
458 | .PP | |
459 | In principle, it would be possible to make separate templates | |
460 | for all legal combinations of operators, cookies, types, and shapes. | |
461 | In practice, the number of combinations is very large. | |
462 | Thus, a considerable amount of mechanism is present to | |
463 | permit a large number of subtrees to be matched | |
464 | by a single template. | |
465 | Most of the shape and type specifiers are individual bits, and can | |
466 | be logically | |
467 | or'ed | |
468 | together. | |
469 | There are a number of special descriptors for matching classes of | |
470 | operators. | |
471 | The cookies can also be combined. | |
472 | As an example of the kind of template | |
473 | that really arises in practice, the | |
474 | actual template for the Interdata 8/32 | |
475 | that subsumes the above example is: | |
476 | .DS | |
477 | .ta 1i 2i 3i 4i 5i | |
478 | ASG OPSIMP, INAREG\(orFORCC, | |
479 | SAREG, TINT\(orTUNSIGNED\(orTPOINT, | |
480 | SAREG\(orSNAME\(orSOREG\(orSCON, TINT\(orTUNSIGNED\(orTPOINT, | |
481 | 0, RLEFT\(orRESCC, | |
482 | " OI AL,AR\en", | |
483 | .DE | |
484 | Here, OPSIMP represents the operators | |
485 | +, \-, \(or, &, and ^. | |
486 | The | |
487 | .B OI | |
488 | macro in the output string expands into the | |
489 | appropriate \fBI\fRnteger \fBO\fRpcode for the operator. | |
490 | The left and right sides can be integers, unsigned, or pointer types. | |
491 | The right side can be, in addition to a name, a register, | |
492 | a memory location whose address is given by a register and displacement (OREG), | |
493 | or a constant. | |
494 | Finally, these instructions set the condition codes, | |
495 | and so can be used in condition contexts: | |
496 | the cookie and rewriting rules reflect this. | |
497 | .SH | |
498 | The Template Matching Algorithm. | |
499 | .PP | |
500 | The heart of the second pass is the template matching | |
501 | algorithm, in the routine | |
502 | .I match . | |
503 | .I Match | |
504 | is called with a tree and a cookie; it attempts to match | |
505 | the given tree against some template that will transform it | |
506 | according to one of the goals given in the cookie. | |
507 | If a match is successful, the transformation is | |
508 | applied; | |
509 | .I expand | |
510 | is called to generate the assembly code, and then | |
511 | .I reclaim | |
512 | rewrites the tree, and reclaims the resources, such | |
513 | as registers, that might have become free as a result | |
514 | of the generated code. | |
515 | .PP | |
516 | This part of the compiler is among the most time critical. | |
517 | There is a spectrum of implementation techniques available | |
518 | for doing this matching. | |
519 | The most naive algorithm simply looks at the templates one by one. | |
520 | This can be considerably improved upon by restricting the search | |
521 | for an acceptable template. | |
522 | It would be possible to do better than this if the templates were given | |
523 | to a separate program that ate them and generated a template | |
524 | matching subroutine. | |
525 | This would make maintenance of the compiler much more | |
526 | complicated, however, so this has not been done. | |
527 | .PP | |
528 | The matching algorithm is actually carried out by restricting | |
529 | the range in the table that must be searched for each opcode. | |
530 | This introduces a number of complications, however, and needs a | |
531 | bit of sympathetic help by the person constructing the | |
532 | compiler in order to obtain best results. | |
533 | The exact tuning of this algorithm continues; it | |
534 | is best to consult the code and comments in | |
535 | .I match | |
536 | for the latest version. | |
537 | .PP | |
538 | In order to match a template to a tree, | |
539 | it is necessary to match not only the cookie and the | |
540 | op of the root, but also the types and shapes of the | |
541 | left and right descendants (if any) of the tree. | |
542 | A convention is established here that is carried out throughout | |
543 | the second pass of the compiler. | |
544 | If a node represents a unary operator, the single descendant | |
545 | is always the ``left'' descendant. | |
546 | If a node represents a unary operator or a leaf node (no descendants) | |
547 | the ``right'' descendant is taken by convention to be the node itself. | |
548 | This enables templates to easily match leaves and conversion operators, for example, | |
549 | without any additional mechanism in the matching program. | |
550 | .PP | |
551 | The type matching is straightforward; it is possible to specify any combination | |
552 | of basic types, general pointers, and pointers to one or more of | |
553 | the basic types. | |
554 | The shape matching is somewhat more complicated, but still pretty simple. | |
555 | Templates have a collection of possible operand shapes | |
556 | on which the opcode might match. | |
557 | In the simplest case, an | |
558 | .I add | |
559 | operation might be able to add to either a register variable | |
560 | or a scratch register, and might be able (with appropriate | |
561 | help from the assembler) to add an integer constant (ICON), a static | |
562 | memory cell (NAME), or a stack location (OREG). | |
563 | .PP | |
564 | It is usually attractive to specify a number of such shapes, | |
565 | and distinguish between them when the assembler output is produced. | |
566 | It is possible to describe the union of many elementary | |
567 | shapes such as ICON, NAME, OREG, | |
568 | AREG or BREG | |
569 | (both scratch and register forms), etc. | |
570 | To handle at least the simple forms of indirection, one can also | |
571 | match some more complicated forms of trees; STARNM and STARREG | |
572 | can match more complicated trees headed by an indirection operator, | |
573 | and SFLD can match certain trees headed by a FLD operator: these | |
574 | patterns call machine dependent routines that match the | |
575 | patterns of interest on a given machine. | |
576 | The shape SWADD may be used to recognize NAME or OREG | |
577 | nodes that lie on word boundaries: this may be of some importance | |
578 | on word\-addressed machines. | |
579 | Finally, there are some special shapes: these may not | |
580 | be used in conjunction with the other shapes, but may be | |
581 | defined and extended in machine dependent ways. | |
582 | The special shapes SZERO, SONE, and SMONE are predefined and match | |
583 | constants 0, 1, and \-1, respectively; others are easy to add | |
584 | and match by using the machine dependent routine | |
585 | .I special . | |
586 | .PP | |
587 | When a template has been found that matches the root of the tree, | |
588 | the cookie, and the shapes and types of the descendants, | |
589 | there is still one bar to a total match: the template may | |
590 | call for some resources (for example, a scratch register). | |
591 | The routine | |
592 | .I allo | |
593 | is called, and it attempts to allocate the resources. | |
594 | If it cannot, the match fails; no resources are | |
595 | allocated. | |
596 | If successful, the allocated resources are given numbers | |
597 | 1, 2, etc. for later reference when the assembly code is | |
598 | generated. | |
599 | The routines | |
600 | .I expand | |
601 | and | |
602 | .I reclaim | |
603 | are then called. | |
604 | The | |
605 | .I match | |
606 | routine then returns a special value, MDONE. | |
607 | If no match was found, the value MNOPE is returned; | |
608 | this is a signal to the caller to try more cookie | |
609 | values, or attempt a rewriting rule. | |
610 | .I Match | |
611 | is also used to select rewriting rules, although | |
612 | the way of doing this is pretty straightforward. | |
613 | A special cookie, FORREW, is used to ask | |
614 | .I match | |
615 | to search for a rewriting rule. | |
616 | The rewriting rules are keyed to various opcodes; most | |
617 | are carried out in | |
618 | .I order . | |
619 | Since the question of when to rewrite is one of the key issues in | |
620 | code generation, it will be taken up again later. | |
621 | .SH | |
622 | Register Allocation. | |
623 | .PP | |
624 | The register allocation routines, and the allocation strategy, | |
625 | play a central role in the correctness of the code generation algorithm. | |
626 | If there are bugs in the Sethi-Ullman computation that cause the | |
627 | number of needed registers to be underestimated, | |
628 | the compiler may run out of scratch registers; | |
629 | it is essential that the allocator keep track of those registers that | |
630 | are free and busy, in order to detect such conditions. | |
631 | .PP | |
632 | Allocation of registers takes place as the result of a template | |
633 | match; the routine | |
634 | .I allo | |
635 | is called with a word describing the number of A registers, | |
636 | B registers, and temporary locations needed. | |
637 | The allocation of temporary locations on the stack is relatively | |
638 | straightforward, and will not be further covered; the | |
639 | bookkeeping is a bit tricky, but conceptually trivial, and requests | |
640 | for temporary space on the stack will never fail. | |
641 | .PP | |
642 | Register allocation is less straightforward. | |
643 | The two major complications are | |
644 | .I pairing | |
645 | and | |
646 | .I sharing . | |
647 | In many machines, some operations (such as multiplication | |
648 | and division), and/or some types (such as longs or double precision) | |
649 | require even/odd pairs of registers. | |
650 | Operations of the first type are exceptionally difficult to | |
651 | deal with in the compiler; in fact, their theoretical | |
652 | properties are rather bad as well. | |
653 | .[ | |
654 | Aho Johnson Ullman Multiregister | |
655 | .] | |
656 | The second issue is dealt with rather more successfully; | |
657 | a machine dependent function called | |
658 | .I szty(t) | |
659 | is called that returns 1 or 2, depending on the | |
660 | number of A registers required to hold an object of type | |
661 | .I t . | |
662 | If | |
663 | .I szty | |
664 | returns 2, an even/odd pair of A registers is allocated | |
665 | for each request. | |
666 | .PP | |
667 | The other issue, sharing, is more subtle, but | |
668 | important for good code quality. | |
669 | When registers are allocated, it | |
670 | is possible to reuse registers that hold address | |
671 | information, and use them to contain the values | |
672 | computed or accessed. | |
673 | For example, on the IBM 360, if register 2 has | |
674 | a pointer to an integer in it, we may load the | |
675 | integer into register 2 itself by saying: | |
676 | .DS | |
677 | L 2,0(2) | |
678 | .DE | |
679 | If register 2 had a byte pointer, however, the sequence for | |
680 | loading a character involves clearing the target | |
681 | register first, and then inserting the desired character: | |
682 | .DS | |
683 | SR 3,3 | |
684 | IC 3,0(2) | |
685 | .DE | |
686 | In the first case, if register 3 were used as the target, | |
687 | it would lead to a larger number of registers | |
688 | used for the expression than were required; the compiler would | |
689 | generate inefficient code. | |
690 | On the other hand, if register 2 were used as the target in the second | |
691 | case, the code would simply be wrong. | |
692 | In the first case, register 2 can be | |
693 | .I shared | |
694 | while in the second, it cannot. | |
695 | .PP | |
696 | In the specification of the register needs in the templates, | |
697 | it is possible to indicate whether required scratch registers | |
698 | may be shared with possible registers on the left or the right of the input tree. | |
699 | In order that a register be shared, it must be scratch, and it must | |
700 | be used only once, on the appropriate side of the tree being compiled. | |
701 | .PP | |
702 | The | |
703 | .I allo | |
704 | routine thus has a bit more to do than meets the eye; | |
705 | it calls | |
706 | .I freereg | |
707 | to obtain a free register for each A and B register request. | |
708 | .I Freereg | |
709 | makes multiple calls on the routine | |
710 | .I usable | |
711 | to decide if a given register can be used to satisfy | |
712 | a given need. | |
713 | .I Usable | |
714 | calls | |
715 | .I shareit | |
716 | if the register is busy, but might be shared. | |
717 | Finally, | |
718 | .I shareit | |
719 | calls | |
720 | .I ushare | |
721 | to decide if the desired register is actually in the appropriate | |
722 | subtree, and can be shared. | |
723 | .PP | |
724 | Just to add additional complexity, on some machines (such as the IBM 370) it | |
725 | is possible to have ``double indexing'' forms of | |
726 | addressing; these are represented by OREGS's | |
727 | with the base and index registers encoded into the register field. | |
728 | While the register allocation and deallocation | |
729 | .I "per se" | |
730 | is not made more difficult by this phenomenon, the code itself | |
731 | is somewhat more complex. | |
732 | .PP | |
733 | Having allocated the registers and expanded the assembly language, | |
734 | it is time to reclaim the resources; the routine | |
735 | .I reclaim | |
736 | does this. | |
737 | Many operations produce more than one result. | |
738 | For example, many arithmetic operations may produce | |
739 | a value in a register, and also set the condition | |
740 | codes. | |
741 | Assignment operations may leave results both in a register and in memory. | |
742 | .I Reclaim | |
743 | is passed three parameters; the tree and cookie | |
744 | that were matched, and the rewriting field of the template. | |
745 | The rewriting field allows the specification of possible results; | |
746 | the tree is rewritten to reflect the results of the operation. | |
747 | If the tree was computed for side effects only (FOREFF), | |
748 | the tree is freed, and all resources in it reclaimed. | |
749 | If the tree was computed for condition codes, the resources | |
750 | are also freed, and the tree replaced by a special | |
751 | node type, FORCC. | |
752 | Otherwise, the value may be found in the left | |
753 | argument of the root, the right argument of the root, | |
754 | or one of the temporary resources allocated. | |
755 | In these cases, first the resources of the tree, | |
756 | and the newly allocated resources, | |
757 | are | |
758 | freed; then the resources needed by the result | |
759 | are made busy again. | |
760 | The final result must always match the shape of the input cookie; | |
761 | otherwise, the compiler error | |
762 | ``cannot reclaim'' | |
763 | is generated. | |
764 | There are some machine dependent ways of | |
765 | preferring results in registers or memory when | |
766 | there are multiple results matching multiple goals in the cookie. | |
767 | .SH | |
768 | The Machine Dependent Interface | |
769 | .PP | |
770 | The files | |
771 | .I order.c , | |
772 | .I local2.c , | |
773 | and | |
774 | .I table.c , | |
775 | as well as the header file | |
776 | .I mac2defs , | |
777 | represent the machine dependent portion of the second pass. | |
778 | The machine dependent portion can be roughly divided into | |
779 | two: the easy portion and the hard portion. | |
780 | The easy portion | |
781 | tells the compiler the names of the registers, and arranges that | |
782 | the compiler generate the proper assembler formats, opcode names, location counters, etc. | |
783 | The hard portion involves the Sethi\-Ullman computation, the | |
784 | rewriting rules, and, to some extent, the templates. | |
785 | It is hard because there are no real algorithms that apply; | |
786 | most of this portion is based on heuristics. | |
787 | This section discusses the easy portion; the next several | |
788 | sections will discuss the hard portion. | |
789 | .PP | |
790 | If the compiler is adapted from a compiler for a machine | |
791 | of similar architecture, the easy part is indeed easy. | |
792 | In | |
793 | .I mac2defs , | |
794 | the register numbers are defined, as well as various parameters for | |
795 | the stack frame, and various macros that describe the machine architecture. | |
796 | If double indexing is to be permitted, for example, the symbol | |
797 | R2REGS is defined. | |
798 | Also, a number of macros that are involved in function call processing, | |
799 | especially for unusual function call mechanisms, are defined here. | |
800 | .PP | |
801 | In | |
802 | .I local2.c , | |
803 | a large number of simple functions are defined. | |
804 | These do things such as write out opcodes, register names, | |
805 | and address forms for the assembler. | |
806 | Part of the function call code is defined here; that is nontrivial | |
807 | to design, but typically rather straightforward to implement. | |
808 | Among the easy routines in | |
809 | .I order.c | |
810 | are routines for generating a created label, | |
811 | defining a label, and generating the arguments | |
812 | of a function call. | |
813 | .PP | |
814 | These routines tend to have a local effect, and depend on a fairly straightforward way | |
815 | on the target assembler and the design decisions already made about | |
816 | the compiler. | |
817 | Thus they will not be further treated here. | |
818 | .SH | |
819 | The Rewriting Rules | |
820 | .PP | |
821 | When a tree fails to match any template, it becomes | |
822 | a candidate for rewriting. | |
823 | Before the tree is rewritten, | |
824 | the machine dependent routine | |
825 | .I nextcook | |
826 | is called with the tree and the cookie; it suggests | |
827 | another cookie that might be a better candidate for the | |
828 | matching of the tree. | |
829 | If all else fails, the templates are searched with the cookie | |
830 | FORREW, to look for a rewriting rule. | |
831 | The rewriting rules are of two kinds; | |
832 | for most of the common operators, there are | |
833 | machine dependent rewriting rules that may be applied; | |
834 | these are handled by machine dependent functions | |
835 | that are called and given the tree to be computed. | |
836 | These routines may recursively call | |
837 | .I order | |
838 | or | |
839 | .I codgen | |
840 | to cause certain subgoals to be achieved; | |
841 | if they actually call for some alteration of the tree, | |
842 | they return 1, and the | |
843 | code generation algorithm recanonicalizes and tries again. | |
844 | If these routines choose not to deal with the tree, the | |
845 | default rewriting rules are applied. | |
846 | .PP | |
847 | The assignment ops, when rewritten, call the routine | |
848 | .I setasg . | |
849 | This is assumed to rewrite the tree at least to the point where there are | |
850 | no side effects in the left hand side. | |
851 | If there is still no template match, | |
852 | a default rewriting is done that causes | |
853 | an expression such as | |
854 | .DS | |
855 | .I "a += b" | |
856 | .DE | |
857 | to be rewritten as | |
858 | .DS | |
859 | .I "a = a + b" | |
860 | .DE | |
861 | This is a useful default for certain mixtures of strange types | |
862 | (for example, when | |
863 | .I a | |
864 | is a bit field and | |
865 | .I b | |
866 | an character) that | |
867 | otherwise might need separate table entries. | |
868 | .PP | |
869 | Simple assignment, structure assignment, and all forms of calls | |
870 | are handled completely by the machine dependent routines. | |
871 | For historical reasons, the routines generating the calls return | |
872 | 1 on failure, 0 on success, unlike the other routines. | |
873 | .PP | |
874 | The machine dependent routine | |
875 | .I setbin | |
876 | handles binary operators; it too must do most of the job. | |
877 | In particular, when it returns 0, it must do so with | |
878 | the left hand side in a temporary register. | |
879 | The default rewriting rule in this case is to convert the | |
880 | binary operator into the associated assignment operator; | |
881 | since the left hand side is assumed to be a temporary register, | |
882 | this preserves the semantics and often allows a considerable | |
883 | saving in the template table. | |
884 | .PP | |
885 | The increment and decrement operators may be dealt with with | |
886 | the machine dependent routine | |
887 | .I setincr . | |
888 | If this routine chooses not to deal with the tree, the rewriting rule replaces | |
889 | .DS | |
890 | .I "x ++" | |
891 | .DE | |
892 | by | |
893 | .DS | |
894 | .I "( (x += 1) \- 1)" | |
895 | .DE | |
896 | which preserves the semantics. | |
897 | Once again, this is not too attractive for the most common | |
898 | cases, but can generate close to optimal code when the | |
899 | type of x is unusual. | |
900 | .PP | |
901 | Finally, the indirection (UNARY MUL) operator is also handled | |
902 | in a special way. | |
903 | The machine dependent routine | |
904 | .I offstar | |
905 | is extremely important for the efficient generation of code. | |
906 | .I Offstar | |
907 | is called with a tree that is the direct descendant of a UNARY MUL node; | |
908 | its job is to transform this tree so that the combination of | |
909 | UNARY MUL with the transformed tree becomes addressable. | |
910 | On most machines, | |
911 | .I offstar | |
912 | can simply compute the tree into an A or B register, | |
913 | depending on the architecture, and then | |
914 | .I canon | |
915 | will make the resulting tree into an OREG. | |
916 | On many machines, | |
917 | .I offstar | |
918 | can profitably choose to do less work than computing | |
919 | its entire argument into a register. | |
920 | For example, if the target machine supports OREGS | |
921 | with a constant offset from a register, and | |
922 | .I offstar | |
923 | is called | |
924 | with a tree of the form | |
925 | .DS | |
926 | .I "expr + const" | |
927 | .DE | |
928 | where | |
929 | .I const | |
930 | is a constant, then | |
931 | .I offstar | |
932 | need only compute | |
933 | .I expr | |
934 | into the appropriate form of register. | |
935 | On machines that support double indexing, | |
936 | .I offstar | |
937 | may have even more choice as to how to proceed. | |
938 | The proper tuning of | |
939 | .I offstar , | |
940 | which is not typically too difficult, should be one of the | |
941 | first tries at optimization attempted by the | |
942 | compiler writer. | |
943 | .SH | |
944 | The Sethi-Ullman Computation | |
945 | .PP | |
946 | The heart of the heuristics is the computation of the Sethi-Ullman numbers. | |
947 | This computation is closely linked with the rewriting rules and the | |
948 | templates. | |
949 | As mentioned before, the Sethi-Ullman numbers are expected to | |
950 | estimate the number of scratch registers needed to compute | |
951 | the subtrees without using any stores. | |
952 | However, the original theory does not apply to real machines. | |
953 | For one thing, the theory assumes that all registers | |
954 | are interchangeable. | |
955 | Real machines have general purpose, floating point, and index registers, | |
956 | register pairs, etc. | |
957 | The theory also does not account for side effects; | |
958 | this rules out various forms of pathology that arise | |
959 | from assignment and assignment ops. | |
960 | Condition codes are also undreamed of. | |
961 | Finally, the influence of types, conversions, and the | |
962 | various addressability restrictions and extensions of real | |
963 | machines are also ignored. | |
964 | .PP | |
965 | Nevertheless, for a ``useless'' theory, | |
966 | the basic insight of Sethi and Ullman is amazingly | |
967 | useful in a real compiler. | |
968 | The notion that one should attempt to estimate the | |
969 | resource needs of trees before starting the | |
970 | code generation provides a natural means of splitting the | |
971 | code generation problem, and provides a bit of redundancy | |
972 | and self checking in the compiler. | |
973 | Moreover, if writing the | |
974 | Sethi-Ullman routines is hard, describing, writing, and debugging the | |
975 | alternative (routines that attempt to free up registers by stores into | |
976 | temporaries ``on the fly'') is even worse. | |
977 | Nevertheless, it should be clearly understood that these routines exist in a realm | |
978 | where there is no ``right'' way to write them; | |
979 | it is an art, the realm of heuristics, and, consequently, a major | |
980 | source of bugs in the compiler. | |
981 | Often, the early, crude versions of these routines give little trouble; | |
982 | only after | |
983 | the compiler is actually working | |
984 | and the | |
985 | code quality is being improved do serious problem have to be faced. | |
986 | Having a simple, regular machine architecture is worth quite | |
987 | a lot at this time. | |
988 | .PP | |
989 | The major problems arise from asymmetries in the registers: register pairs, | |
990 | having different kinds of registers, and the related problem of | |
991 | needing more than one register (frequently a pair) to store certain | |
992 | data | |
993 | types (such as longs or doubles). | |
994 | There appears to be no general way of treating this problem; | |
995 | solutions have to be fudged for each machine where the problem arises. | |
996 | On the Honeywell 66, for example, there are only two general purpose registers, | |
997 | so a need for a pair is the same as the need for two registers. | |
998 | On the IBM 370, the register pair (0,1) is used to do multiplications and divisions; | |
999 | registers 0 and 1 are not generally considered part of the scratch registers, and so | |
1000 | do not require allocation explicitly. | |
1001 | On the Interdata 8/32, after much consideration, the | |
1002 | decision was made not to try to deal with the register pair issue; | |
1003 | operations such as multiplication and division that required pairs | |
1004 | were simply assumed to take all of the scratch registers. | |
1005 | Several weeks of effort had failed to produce | |
1006 | an algorithm that seemed to have much chance of running successfully | |
1007 | without inordinate debugging effort. | |
1008 | The difficulty of this issue should not be minimized; it represents one of the | |
1009 | main intellectual efforts in porting the compiler. | |
1010 | Nevertheless, this problem has been fudged with a degree of | |
1011 | success on nearly a dozen machines, so the compiler writer should | |
1012 | not abandon hope. | |
1013 | .PP | |
1014 | The Sethi-Ullman computations interact with the | |
1015 | rest of the compiler in a number of rather subtle ways. | |
1016 | As already discussed, the | |
1017 | .I store | |
1018 | routine uses the Sethi-Ullman numbers to decide which subtrees are too difficult | |
1019 | to compute in registers, and must be stored. | |
1020 | There are also subtle interactions between the | |
1021 | rewriting routines and the Sethi-Ullman numbers. | |
1022 | Suppose we have a tree such as | |
1023 | .DS | |
1024 | .I "A \- B" | |
1025 | .DE | |
1026 | where | |
1027 | .I A | |
1028 | and | |
1029 | .I B | |
1030 | are expressions; suppose further that | |
1031 | .I B | |
1032 | takes two registers, and | |
1033 | .I A | |
1034 | one. | |
1035 | It is possible to compute the full expression in two registers by | |
1036 | first computing | |
1037 | .I B , | |
1038 | and then, using the scratch register | |
1039 | used by | |
1040 | .I B , | |
1041 | but not containing the answer, compute | |
1042 | .I A . | |
1043 | The subtraction can then be done, computing the expression. | |
1044 | (Note that this assumes a number of things, not the least of which | |
1045 | are register-to-register subtraction operators and symmetric | |
1046 | registers.) | |
1047 | If the machine dependent routine | |
1048 | .I setbin , | |
1049 | however, is not prepared to recognize this case | |
1050 | and compute the more difficult side of the expression | |
1051 | first, the | |
1052 | Sethi-Ullman number must be set to three. | |
1053 | Thus, the | |
1054 | Sethi-Ullman number for a tree should represent the code that | |
1055 | the machine dependent routines are actually willing to generate. | |
1056 | .PP | |
1057 | The interaction can go the other way. | |
1058 | If we take an expression such as | |
1059 | .DS | |
1060 | .I "* ( p + i )" | |
1061 | .DE | |
1062 | where | |
1063 | .I p | |
1064 | is a pointer and | |
1065 | .I i | |
1066 | an integer, | |
1067 | this can probably be done in one register on most machines. | |
1068 | Thus, its Sethi-Ullman number would probably be set to one. | |
1069 | If double indexing is possible in the machine, a possible way | |
1070 | of computing the expression is to load both | |
1071 | .I p | |
1072 | and | |
1073 | .I i | |
1074 | into registers, and then use double indexing. | |
1075 | This would use two scratch registers; in such a case, | |
1076 | it is possible that the scratch registers might be unobtainable, | |
1077 | or might make some other part of the computation run out of | |
1078 | registers. | |
1079 | The usual solution is to cause | |
1080 | .I offstar | |
1081 | to ignore opportunities for double indexing that would tie up more scratch | |
1082 | registers than the Sethi-Ullman number had reserved. | |
1083 | .PP | |
1084 | In summary, the Sethi-Ullman computation represents much of the craftsmanship and artistry in any application | |
1085 | of the portable compiler. | |
1086 | It is also a frequent source of bugs. | |
1087 | Algorithms are available that will produce nearly optimal code | |
1088 | for specialized machines, but unfortunately most existing machines | |
1089 | are far removed from these ideals. | |
1090 | The best way of proceeding in practice is to start with a compiler | |
1091 | for a similar machine to the target, and proceed very | |
1092 | carefully. | |
1093 | .SH | |
1094 | Register Allocation | |
1095 | .PP | |
1096 | After the Sethi-Ullman numbers are computed, | |
1097 | .I order | |
1098 | calls a routine, | |
1099 | .I rallo , | |
1100 | that does register allocation, if appropriate. | |
1101 | This routine does relatively little, in general; | |
1102 | this is especially true if the target machine | |
1103 | is fairly regular. | |
1104 | There are a few cases where it is assumed that | |
1105 | the result of a computation takes place in a particular register; | |
1106 | switch and function return are the two major places. | |
1107 | The expression tree has a field, | |
1108 | .I rall , | |
1109 | that may be filled with a register number; this is taken | |
1110 | to be a preferred register, and the first temporary | |
1111 | register allocated by a template match will be this preferred one, if | |
1112 | it is free. | |
1113 | If not, no particular action is taken; this is just a heuristic. | |
1114 | If no register preference is present, the field contains NOPREF. | |
1115 | In some cases, the result must be placed in | |
1116 | a given register, no matter what. | |
1117 | The register number is placed in | |
1118 | .I rall , | |
1119 | and the mask MUSTDO is logically or'ed in with it. | |
1120 | In this case, if the subtree is requested in a register, and comes | |
1121 | back in a register other than the demanded one, it is moved | |
1122 | by calling the routine | |
1123 | .I rmove . | |
1124 | If the target register for this move is busy, it is a compiler | |
1125 | error. | |
1126 | .PP | |
1127 | Note that this mechanism is the only one that will ever cause a register-to-register | |
1128 | move between scratch registers (unless such a move is buried in the depths of | |
1129 | some template). | |
1130 | This simplifies debugging. | |
1131 | In some cases, there is a rather strange interaction between | |
1132 | the register allocation and the Sethi-Ullman number; | |
1133 | if there is an operator or situation requiring a | |
1134 | particular register, the allocator and the Sethi-Ullman | |
1135 | computation must conspire to ensure that the target | |
1136 | register is not being used by some intermediate result of some far-removed computation. | |
1137 | This is most easily done by making the special operation take | |
1138 | all of the free registers, preventing any other partially-computed | |
1139 | results from cluttering up the works. | |
1140 | .SH | |
1141 | Compiler Bugs | |
1142 | .PP | |
1143 | The portable compiler has an excellent record of generating correct code. | |
1144 | The requirement for reasonable cooperation between the register allocation, | |
1145 | Sethi-Ullman computation, rewriting rules, and templates builds quite a bit | |
1146 | of redundancy into the compiling process. | |
1147 | The effect of this is that, in a surprisingly short time, the compiler will | |
1148 | start generating correct code for those | |
1149 | programs that it can compile. | |
1150 | The hard part of the job then becomes finding and | |
1151 | eliminating those situations where the compiler refuses to | |
1152 | compile a program because it knows it cannot do it right. | |
1153 | For example, a template may simply be missing; this may either | |
1154 | give a compiler error of the form ``no match for op ...'' , or cause | |
1155 | the compiler to go into an infinite loop applying various rewriting rules. | |
1156 | The compiler has a variable, | |
1157 | .I nrecur , | |
1158 | that is set to 0 at the beginning of an expressions, and | |
1159 | incremented at key spots in the | |
1160 | compilation process; if this parameter gets too large, the | |
1161 | compiler decides that it is in a loop, and aborts. | |
1162 | Loops are also characteristic of botches in the machine-dependent rewriting rules. | |
1163 | Bad Sethi-Ullman computations usually cause the scratch registers | |
1164 | to run out; this often means | |
1165 | that the Sethi-Ullman number was underestimated, so | |
1166 | .I store | |
1167 | did not store something it should have; alternatively, | |
1168 | it can mean that the rewriting rules were not smart enough to | |
1169 | find the sequence that | |
1170 | .I sucomp | |
1171 | assumed would be used. | |
1172 | .PP | |
1173 | The best approach when a compiler error is detected involves several stages. | |
1174 | First, try to get a small example program that steps on the bug. | |
1175 | Second, turn on various debugging flags in the code generator, and follow the | |
1176 | tree through the process of being matched and rewritten. | |
1177 | Some flags of interest are | |
1178 | \-e, which prints the expression tree, | |
1179 | \-r, which gives information about the allocation of registers, | |
1180 | \-a, which gives information about the performance of | |
1181 | .I rallo , | |
1182 | and \-o, which gives information about the behavior of | |
1183 | .I order . | |
1184 | This technique should allow most bugs to be found relatively quickly. | |
1185 | .PP | |
1186 | Unfortunately, finding the bug is usually not enough; it must also | |
1187 | be fixed! | |
1188 | The difficulty arises because a fix to the particular bug of interest tends | |
1189 | to break other code that already works. | |
1190 | Regression tests, tests that compare the performance of | |
1191 | a new compiler against the performance of an older one, are very | |
1192 | valuable in preventing major catastrophes. | |
1193 | .SH | |
1194 | Summary and Conclusion | |
1195 | .PP | |
1196 | The portable compiler has been a useful tool for providing C | |
1197 | capability on a large number of diverse machines, | |
1198 | and for testing a number of theoretical | |
1199 | constructs in a practical setting. | |
1200 | It has many blemishes, both in style and functionality. | |
1201 | It has been applied to many more machines than first | |
1202 | anticipated, of a much wider range than originally dreamed of. | |
1203 | Its use has also spread much faster than expected, leaving parts of | |
1204 | the compiler still somewhat raw in shape. | |
1205 | .PP | |
1206 | On the theoretical side, there is some hope that the | |
1207 | skeleton of the | |
1208 | .I sucomp | |
1209 | routine could be generated for many machines directly from the | |
1210 | templates; this would give a considerable boost | |
1211 | to the portability and correctness of the compiler, | |
1212 | but might affect tunability and code quality. | |
1213 | There is also room for more optimization, both within | |
1214 | .I optim | |
1215 | and in the form of a portable ``peephole'' optimizer. | |
1216 | .PP | |
1217 | On the practical, development side, | |
1218 | the compiler could probably be sped up and made smaller | |
1219 | without doing too much violence to its basic structure. | |
1220 | Parts of the compiler deserve to be rewritten; | |
1221 | the initialization code, register allocation, and | |
1222 | parser are prime candidates. | |
1223 | It might be that doing some or all of the parsing | |
1224 | with a recursive descent parser might | |
1225 | save enough space and time to be worthwhile; | |
1226 | it would certainly ease the problem of moving the | |
1227 | compiler to an environment where | |
1228 | .I Yacc | |
1229 | is not already present. | |
1230 | .PP | |
1231 | Finally, I would like to thank the many people who have | |
1232 | sympathetically, and even enthusiastically, helped me grapple | |
1233 | with what has been a frustrating program to write, test, and install. | |
1234 | D. M. Ritchie and E. N. Pinson provided needed early | |
1235 | encouragement and philosophical guidance; | |
1236 | M. E. Lesk, | |
1237 | R. Muha, T. G. Peterson, | |
1238 | G. Riddle, L. Rosler, | |
1239 | R. W. Mitze, | |
1240 | B. R. Rowland, | |
1241 | S. I. Feldman, | |
1242 | and | |
1243 | T. B. London | |
1244 | have all contributed ideas, gripes, and all, at one time or another, | |
1245 | climbed ``into the pits'' with me to help debug. | |
1246 | Without their help this effort would have not been possible; | |
1247 | with it, it was often kind of fun. | |
1248 | .sp 100 | |
1249 | .LP | |
1250 | .[ | |
1251 | $LIST$ | |
1252 | .] | |
1253 | .LP | |
1254 | .sp 100 | |
1255 | .LP |