Research V7 development
[unix-history] / usr / doc / ctour / cdoc3
CommitLineData
1b5c1e54
DR
1.SH
2Code Generation
3.PP
4The grand plan for code-generation is
5independent of any particular machine;
6it depends largely on a set of tables.
7But this fact does not necessarily make it very easy
8to modify the compiler to produce code for other machines,
9both because there is a good deal of machine-dependent structure
10in the tables, and because in any event such tables are non-trivial to
11prepare.
12.PP
13The arguments to the basic code generation routine
14.II rcexpr
15are a pointer to a tree representing an expression,
16the name of a code-generation table,
17and the number of a register in which the value of the
18expression should be placed.
19.II Rcexpr
20returns the number of the register in which the value actually
21ended up;
22its caller
23may need to produce a
24.II mov
25instruction if the value really needs to be in the given register.
26There are four code generation tables.
27.PP
28.II Regtab
29is the basic one, which actually does the job described
30above: namely,
31compile code which places the value represented by the expression
32tree in a register.
33.PP
34.II Cctab
35is used when the value of the expression is not actually needed,
36but instead the value of the condition codes resulting from
37evaluation of the expression.
38This table is used, for example, to evaluate the expression after
39.II if.
40It is clearly silly to
41calculate the value (0 or 1) of the expression
42`a==b' in the context `if (a==b) ... '
43.PP
44The
45.II sptab
46table is used when the value of an expression is to be pushed on the stack,
47for example when it is an actual argument.
48For example in the function call `f(a)' it is a bad idea to
49load
50.II a
51into a register which is then pushed on the stack,
52when there is a single instruction which does the job.
53.PP
54The
55.II efftab
56table is used when an expression is to be evaluated for its side effects,
57not its value.
58This occurs mostly for expressions which are statements, which have no
59value.
60Thus the code for the statement
61`a = b'
62need produce only the approoriate
63.II mov
64instruction, and need not leave the value of
65.II b
66in a register,
67while in the expression `a + (b = c)'
68the value of `b = c' will appear in a register.
69.PP
70All of the tables besides
71.II regtab
72are rather small, and handle only a relatively few special cases.
73If one of these subsidiary tables does not contain
74an entry applicable to the given expression tree,
75.II rcexpr
76uses
77.II regtab
78to put the value of the expression into a register
79and then fixes things up;
80nothing need be done when the table
81was
82.II efftab,
83but a
84.II tst
85instruction is produced when the table called for was
86.II cctab,
87and a
88.II mov
89instruction,
90pushing the register on the stack,
91when the table was
92.II sptab.
93.PP
94The
95.II rcexpr
96routine itself picks off some special
97cases, then calls
98.II cexpr
99to do the real work.
100.II Cexpr
101tries to find an entry applicable
102to the given tree in the given table, and returns \-1 if
103no such entry is found, letting
104.II rcexpr
105try again with a different table.
106A successful match yields a string
107containing both literal characters
108which are written out and pseudo-operations, or macros, which are expanded.
109Before studying the contents
110of these strings we will consider how table entries are matched
111against trees.
112.PP
113Recall that most non-leaf nodes in an expression tree
114contain the name of the operator,
115the type of the value represented, and pointers to the subtrees (operands).
116They also contain an estimate of the number of registers required to evaluate
117the expression, placed there by the expression-optimizer routines.
118The register counts are used to guide the code generation process,
119which is based on the Sethi-Ullman algorithm.
120.PP
121The main code generation
122tables consist of entries
123each containing an operator number and a pointer
124to a subtable for the corresponding operator.
125A subtable consists of a sequence
126of entries, each with a key describing certain properties of the
127operands of the operator involved; associated with the key is a code string.
128Once the subtable corresponding to the operator is found, the subtable
129is searched linearly until a key is found such that the properties demanded
130by the key are compatible with the operands of the tree node.
131A successful match returns the code string;
132an unsuccessful search, either for the operator in the main table
133or a compatble key in the subtable,
134returns a failure indication.
135.PP
136The tables are all contained in a file
137which must be processed to obtain an assembly language program.
138Thus they are written in a special-purpose language.
139To provided definiteness to the following discussion, here is an
140example of a subtable entry.
141.DS
142%n,aw
143 F
144 add A2,R
145.DE
146The `%' indicates the key;
147the information following (up to a blank line) specifies the code string.
148Very briefly, this entry is in the subtable
149for `+' of
150.II regtab;
151the key specifies that the left operand is any integer, character, or pointer
152expression,
153and the right operand is any word quantity which is directly addressible
154(e.g. a variable or constant).
155The code string calls for the generation of the code
156to compile the left (first) operand into the
157current register (`F')
158and then to produce an `add' instruction which adds the
159second operand (`A2') to the register (`R').
160All of the notation will be explained below.
161.PP
162Only three features of the operands are used in deciding
163whether a match has occurred.
164They are:
165.IP 1.
166Is the type of the operand compatible with that demanded?
167.RT
168.IP 2.
169Is the `degree of difficulty' (in a sense described below) compatible?
170.RT
171.IP 3.
172The table may demand that the operand have a `*'
173(indirection operator) as its highest operator.
174.PP
175As suggested above, the key for a subtable entry
176is indicated by a `%,' and a comma-separated pair
177of specifications for the operands.
178(The second specification is ignored for unary operators).
179A specification indicates
180a type requirement by including one of the following letters.
181If no type letter is present, any integer, character,
182or pointer operand will satisfy the requirement (not float, double, or long).
183.IP b
184A byte (character) operand is required.
185.RT
186.IP w
187A word (integer or pointer) operand is required.
188.RT
189.IP f
190A float or double operand is required.
191.RT
192.IP d
193A double operand is required.
194.RT
195.IP l
196A long (32-bit integer) operand is required.
197.PP
198Before discussing the `degree of difficulty' specification,
199the algorithm has to be explained more completely.
200.II Rcexpr
201(and
202.II cexpr)
203are called with a register number in which to place their result.
204Registers 0, 1, ... are used during evaluation of expressions;
205the maximum register which can be used in this way depends on the
206number of register variables, but in any event only registers
2070 through 4 are available since r5 is used as a stack frame
208header and r6 (sp) and r7 (pc) have special
209hardware properties.
210The code generation routines assume that when called with register
211.II n
212as argument, they may use
213.II n+1,
214\&...
215(up to the first register variable)
216as temporaries.
217Consider the expression `X+Y', where both
218X and Y are expressions.
219As a first approximation, there are three ways of compiling
220code to put this expression in register
221.II n.
222.IP 1.
223If Y is an addressible cell,
224(recursively) put X into register
225.II n
226and add Y to it.
227.RT
228.IP 2.
229If Y is an expression that can be calculated in
230.II k
231registers, where
232.II k
233smaller than the number of registers available,
234compile X into register
235.II n,
236Y into register
237.II n+1,
238and add register
239.II n+1
240to
241.II n.
242.RT
243.IP 3.
244Otherwise, compile Y into register
245.II n,
246save the result in a temporary (actually, on the stack)
247compile X into register
248.II n,
249then add in the temporary.
250.PP
251The distinction between cases 2 and 3 therefore depends
252on whether the right operand can be compiled in fewer than
253.II k
254registers, where
255.II k
256is the number of free registers left after registers 0 through
257.II n
258are taken:
2590 through
260.II n\-1
261are presumed to contain already computed temporary results;
262.II n
263will, in case 2,
264contain the value of the left operand while the right
265is being evaluated.
266.PP
267These considerations should make clear
268the specification codes for the degree of difficulty,
269bearing in mind that a number of special cases are also present:
270.IP z
271is satisfied when the operand is zero, so that special code
272can be produced for expressions like `x = 0'.
273.RT
274.IP 1
275is satisfied when the operand is the constant 1, to optimize
276cases like left and right shift by 1, which can be done
277efficiently on the PDP-11.
278.RT
279.IP c
280is satisfied when the operand is a positive (16-bit)
281constant; this takes care of some special cases in long arithmetic.
282.RT
283.IP a
284is satisfied when the operand is addressible;
285this occurs not only for variables and constants, but also for
286some more complicated constructions, such as indirection through
287a simple variable, `*p++' where
288.II p
289is a register variable (because of the PDP-11's auto-increment address
290mode), and `*(p+c)' where
291.II p
292is a register and
293.II c
294is a constant.
295Precisely, the requirement is that the operand refers to a cell
296whose address can be written as a source or destination of a PDP-11
297instruction.
298.RT
299.IP e
300is satisfied by an operand whose value can be generated in a register
301using no more than
302.II k
303registers, where
304.II k
305is the number of registers left (not counting the current register).
306The `e' stands for `easy.'
307.RT
308.IP n
309is satisfied by any operand.
310The `n' stands for `anything.'
311.PP
312These degrees of difficulty are considered to lie in a linear ordering
313and any operand which satisfies an earlier-mentioned requirement
314will satisfy a later one.
315Since the subtables are searched linearly,
316if a `1' specification is included, almost certainly
317a `z' must be written first to prevent
318expressions containing the constant 0 to be compiled
319as if the 0 were 1.
320.PP
321Finally,
322a key specification may contain a `*' which
323requires the operand to have an indirection as its leading operator.
324Examples below should clarify the utility of this specification.
325.PP
326Now let us consider the contents of the code string
327associated with each subtable entry.
328Conventionally, lower-case letters in this string
329represent literal information which is copied directly
330to the output.
331Upper-case letters generally introduce specific
332macro-operations, some of which may be followed
333by modifying information.
334The code strings in the tables are written with tabs and
335new-lines used freely to suggest instructions which will be generated;
336the table-compiling program compresses tabs (using the 0200 bit of the
337next character) and throws away some of the new-lines.
338For example the macro `F' is ordinarily written on a line by itself;
339but since its expansion will end with a new-line, the new-line
340after `F' itself is dispensable.
341This is all to reduce the size of the stored tables.
342.PP
343The first set of macro-operations is concerned with
344compiling subtrees.
345Recall that this is done by the
346.II cexpr
347routine.
348In the following discussion the `current register'
349is generally the argument register to
350.II cexpr;
351that is, the place where the result is desired.
352The `next register' is numbered one
353higher
354than the current register.
355(This explanation isn't fully true
356because of complications, described below, involving
357operations which require even-odd register pairs.)
358.IP F
359causes a recursive call to
360the
361.II rcexpr
362routine to compile code which places the value of the first (left)
363operand of the operator in the current register.
364.RT
365.IP F1
366generates code which places the value of the first operand in the
367next register.
368It is incorrectly used if there might be no next register;
369that is, if the degree of difficulty of the first operand is not `easy;'
370if not, another register might not be available.
371.RT
372.IP FS
373generates code which pushes the value of the first operand on the stack,
374by calling
375.II rcexpr
376specifying
377.II sptab
378as the table.
379.LP
380Analogously,
381.IP "S, S1, SS"
382compile the second (right) operand
383into the current register, the next register, or onto the stack.
384.LP
385To deal with registers, there are
386.IP R
387which expands into the name of the current register.
388.RT
389.IP R1
390which expands into the name of the next register.
391.RT
392.IP R+
393which expands into the the name of the current register plus 1.
394It was suggested above that this is the same as the next register,
395except for complications; here is one of them.
396Long integer variables have
39732 bits and require 2 registers; in such cases the next register
398is the current register plus 2.
399The code would like to talk about both halves of the
400long quantity, so R refers to the register with the high-order part
401and R+ to the low-order part.
402.RT
403.IP R\-
404This is another complication, involving division and mod.
405These operations involve a pair of registers of which the odd-numbered
406contains the left operand.
407.II Cexpr
408arranges that the current register is odd;
409the R\- notation allows the code to refer to the next lower,
410even-numbered register.
411.LP
412To refer to addressible quantities, there are the notations:
413.IP A1
414causes generation of the address specified by the first operand.
415For this to be legal, the operand must be addressible; its
416key must contain an `a'
417or a more restrictive specification.
418.RT
419.IP A2
420correspondingly generates the address of the second operand
421providing it has one.
422.PP
423We now have enough mechanism to show a complete, if suboptimal,
424table for the + operator on word or byte operands.
425.DS
426%n,z
427 F
428.sp 1
429%n,1
430 F
431 inc R
432.sp 1
433%n,aw
434 F
435 add A2,R
436.sp 1
437%n,e
438 F
439 S1
440 add R1,R
441.sp 1
442%n,n
443 SS
444 F
445 add (sp)+,R
446.DE
447The first two sequences handle some special cases.
448Actually it turns out that handling a right operand of 0
449is unnecessary since the expression-optimizer
450throws out adds of 0.
451Adding 1 by using the `increment' instruction is done next,
452and then the case where the right operand is addressible.
453It must be a word quantity, since the PDP-11 lacks an `add byte' instruction.
454Finally the cases where the right operand either can, or cannot,
455be done in the available registers are treated.
456.PP
457The next macro-instructions are conveniently
458introduced by noticing that the above table is suitable
459for subtraction as well as addition, since no use is made of the
460commutativity of addition.
461All that is needed is substitution of `sub' for `add'
462and `dec' for 'inc.'
463Considerable saving of space is achieved by factoring out
464several similar operations.
465.IP I
466is replaced by a string from another table indexed by the operator
467in the node being expanded.
468This secondary table actually contains two strings per operator.
469.RT
470.IP I\(fm
471is replaced by the second string in the side table
472entry for the current operator.
473.PP
474Thus, given that the entries for `+' and `\-' in the side table
475(which is called
476.II instab)
477are `add' and `inc,' `sub' and `dec'
478respectively,
479the middle of of the above addition table can be written
480.DS
481%n,1
482 F
483 I' R
484
485%n,aw
486 F
487 I A2,R
488.DE
489and it will be suitable for subtraction,
490and several other operators, as well.
491.PP
492Next, there is the question of character and floating-point operations.
493.IP B1
494generates the letter `b' if the first operand is a character,
495`f' if it is float or double, and nothing otherwise.
496It is used in a context like `movB1'
497which generates a `mov', `movb', or `movf'
498instruction according to the type of the operand.
499.RT
500.IP B2
501is just like B1 but applies to the second operand.
502.RT
503.IP BE
504generates `b' if either operand is a character
505and null otherwise.
506.RT
507.IP BF
508generates `f' if the type of the operator node itself is float or double,
509otherwise null.
510.PP
511For example, there is an entry in
512.II efftab
513for the `=' operator
514.DS
515%a,aw
516%ab,a
517 IBE A2,A1
518.DE
519Note first that two key specifications
520can be applied to the same code string.
521Next, observe that when a word is assigned to a byte or to a word,
522or a word is assigned to a byte,
523a single instruction,
524a
525.II mov
526or
527.II movb
528as appropriate, does the job.
529However, when a byte is assigned to a word,
530it must pass through a register to implement the sign-extension rules:
531.DS
532%a,n
533 S
534 IB1 R,A1
535.DE
536.PP
537Next, there is the question of handling indirection properly.
538Consider the expression `X + *Y', where X and Y are expressions,
539Assuming that Y is more complicated than just a variable,
540but on the other hand qualifies as `easy' in the context,
541the expression would be compiled by placing the value of X in a register,
542that of *Y in the next register, and adding the registers.
543It is easy to see that a better job can be done
544by compiling X, then Y (into the next register),
545and producing the
546instruction symbolized by `add (R1),R'.
547This scheme avoids generating
548the instruction `mov (R1),R1'
549required actually to place the value of *Y in a register.
550A related situation occurs
551with the expression `X + *(p+6)', which
552exemplifies a construction
553frequent in structure and array references.
554The addition table shown above would produce
555.DS
556[put X in register R]
557mov p,R1
558add $6,R1
559mov (R1),R1
560add R1,R
561.DE
562when the best code is
563.DS
564[put X in R]
565mov p,R1
566add 6(R1),R
567.DE
568As we said above, a key specification for a code table entry
569may require an operand to have an indirection as its highest operator.
570To make use of the requirement,
571the following macros are provided.
572.IP F*
573the first operand must have the form *X.
574If in particular it has the form *(Y + c), for some constant
575.II c,
576then code is produced which places the value of Y in
577the current register.
578Otherwise, code is produced which loads X into the current register.
579.RT
580.IP F1*
581resembles F* except that the next register is loaded.
582.RT
583.IP S*
584resembles F* except that the second operand is loaded.
585.RT
586.IP S1*
587resembles S* except that the next register is loaded.
588.RT
589.IP FS*
590The first operand must have the form `*X'.
591Push the value of X on the stack.
592.RT
593.IP SS*
594resembles FS* except that it applies to the second operand.
595.LP
596To capture the constant that may have been skipped over
597in the above macros, there are
598.IP #1
599The first operand must have the form *X;
600if in particular it has the form *(Y + c) for
601.II c
602a constant, then the constant is written out,
603otherwise a null string.
604.RT
605.IP #2
606is the same as #1 except that the second operand is used.
607.LP
608Now we can improve the addition table above.
609Just before the `%n,e' entry, put
610.DS
611%n,ew*
612 F
613 S1*
614 add #2(R1),R
615.DE
616and just before the `%n,n' put
617.DS
618%n,nw*
619 SS*
620 F
621 add *(sp)+,R
622.DE
623When using the stacking macros there is no place to use
624the constant
625as an index word, so that particular special case doesn't occur.
626.PP
627The constant mentioned above can actually be more
628general than a number.
629Any quantity acceptable to the assembler as an expression will do,
630in particular the address of a static cell, perhaps with a numeric offset.
631If
632.II x
633is an external character array,
634the expression `x[i+5] = 0' will generate
635the code
636.DS
637mov i,r0
638clrb x+5(r0)
639.DE
640via the table entry (in the `=' part of
641.II efftab)
642.DS
643%e*,z
644 F
645 I'B1 #1(R)
646.DE
647Some machine operations place restrictions on the registers
648used.
649The divide instruction, used to implement the divide and mod
650operations, requires the dividend to be placed in the odd member
651of an even-odd pair;
652other peculiarities
653of multiplication make it simplest to put the multiplicand
654in an odd-numbered register.
655There is no theory which optimally accounts for
656this kind of requirement.
657.II Cexpr
658handles it by checking for a multiply, divide, or mod operation;
659in these cases, its argument register number is incremented by
660one or two so that it is odd, and if the operation was divide or mod,
661so that it is a member of a free even-odd pair.
662The routine which determines the number of registers required
663estimates, conservatively, that
664at least two registers are required for a multiplication
665and three for the other peculiar operators.
666After the expression is compiled,
667the register where the result actually ended up is returned.
668(Divide and mod are actually the same operation except for the
669location of the result).
670.PP
671These operations are the ones which cause results to end up in
672unexpected places,
673and this possibility adds a further level of complexity.
674The simplest way of handling the problem is always to move the
675result to the place where the caller expected it,
676but this will produce unnecessary register moves in many
677simple cases; `a = b*c' would generate
678.DS
679mov b,r1
680mul c,r1
681mov r1,r0
682mov r0,a
683.DE
684The next thought is used the passed-back
685information as to where the result landed to change the notion of the current
686register.
687While compiling the `=' operation above, which comes from a
688table
689entry
690like
691.DS
692%a,e
693 S
694 mov R,A1
695.DE
696it is sufficient to redefine the meaning of `R'
697after processing the `S' which does the multiply.
698This technique is in fact used; the tables are written in such a way
699that correct code is produced.
700The trouble is that the technique cannot be used in general,
701because it invalidates the count of the number of registers
702required for an expression.
703Consider just `a*b + X' where X is some expression.
704The algorithm assumes that the value of a*b,
705once computed, requires just one register.
706If there are three registers available, and X requires two registers to
707compute, then this expression will match a key specifying
708`%n,e'.
709If a*b is computed and left in register 1, then there are, contrary
710to expectations, no longer two registers available to compute X,
711but only one, and bad code will be produced.
712To guard against this possibility,
713.II cexpr
714checks the result returned by recursive calls which implement
715F, S and their relatives.
716If the result is not in the expected register, then the number of
717registers required by the other operand is checked;
718if it can be done using those registers which remain even
719after making unavailable the unexpectedly-occupied
720register, then
721the notions of the `next register' and possibly the `current
722register' are redefined.
723Otherwise a register-copy instruction is produced.
724A register-copy is also always produced
725when the current operator is one of those which have odd-even requirements.
726.PP
727Finally, there are a few loose-end macro operations
728and facts about the tables.
729The operators:
730.IP V
731is used for long operations.
732It is written with an address like a machine instruction;
733it expands into `adc' (add carry) if the operation
734is an additive operator,
735`sbc' (subtract carry) if the operation is a subtractive
736operator, and disappears, along with the rest of the line, otherwise.
737Its purpose is to allow common treatment of logical
738operations, which have no carries, and additive and subtractive
739operations, which generate carries.
740.RT
741.IP T
742generates a `tst' instruction if the first operand
743of the tree does not set the condition codes correctly.
744It is used with divide and mod operations,
745which require a sign-extended 32-bit operand.
746The code table for the operations contains an `sxt'
747(sign-extend) instruction to generate the high-order part of the
748dividend.
749.RT
750.IP H
751is analogous to the `F' and `S' macros,
752except that it calls for the generation of code for
753the current tree
754(not one of its operands)
755using
756.II regtab.
757It is used in
758.II cctab
759for all the operators which, when executed normally,
760set the condition codes properly according to the result.
761It prevents a `tst' instruction from being generated for
762constructions like `if (a+b) ...'
763since after calculation of the value of
764`a+b' a conditional branch can be written immediately.
765.PP
766All of the discussion above is in terms of operators with operands.
767Leaves of the expression tree (variables and constants), however,
768are peculiar in that they have no operands.
769In order to regularize the matching process,
770.II cexpr
771examines its operand to determine if it is a leaf;
772if so, it creates a special `load' operator whose operand
773is the leaf, and substitutes it for the argument tree;
774this allows the table entry for the created operator
775to use the `A1' notation to load the leaf into a register.
776.PP
777Purely to save space in the tables,
778pieces of subtables can be labelled and referred to later.
779It turns out, for example,
780that rather large portions of the
781the
782.II efftab
783table for the `=' and `=+' operators are identical.
784Thus `=' has an entry
785.DS
786%[move3:]
787%a,aw
788%ab,a
789 IBE A2,A1
790.DE
791while part of the `=+' table is
792.DS
793%aw,aw
794% [move3]
795.DE
796Labels are written as `%[ ... : ]',
797before the key specifications;
798references
799are written
800with `% [ ... ]'
801after the key.
802Peculiarities in the implementation
803make it necessary that labels appear before references to them.
804.PP
805The example illustrates the utility
806of allowing separate keys
807to point to the same code string.
808The assignment code
809works properly if either the right operand is a word, or the left operand
810is a byte;
811but since there is no `add byte' instruction the addition code
812has to be restricted to word operands.