Commit | Line | Data |
---|---|---|
c9528a00 C |
1 | .SH |
2 | The Intermediate Language | |
3 | .PP | |
4 | .FS | |
5 | \(dgUNIX is a Trademark of Bell Laboratories. | |
6 | .FE | |
7 | Communication between the two phases of the compiler proper | |
8 | is carried out by means of a pair of intermediate files. | |
9 | These files are treated as having identical structure, | |
10 | although the second file contains only the code generated for strings. | |
11 | It is convenient to write strings out separately to reduce the | |
12 | need for multiple location counters in a later assembly | |
13 | phase. | |
14 | .PP | |
15 | The intermediate language is not machine-independent; | |
16 | its structure in a number of ways reflects | |
17 | the fact that C was originally a one-pass compiler | |
18 | chopped in two to reduce the maximum memory | |
19 | requirement. | |
20 | In fact, only the latest version | |
21 | of the compiler has a complete | |
22 | intermediate language at all. | |
23 | Until recently, the first phase of the compiler generated | |
24 | assembly code for those constructions it could deal with, | |
25 | and passed expression parse trees, in absolute binary | |
26 | form, | |
27 | to the second phase for code generation. | |
28 | Now, at least, all inter-phase information | |
29 | is passed in a describable form, and there are | |
30 | no absolute pointers involved, so the coupling between | |
31 | the phases is not so strong. | |
32 | .PP | |
33 | The areas in which the machine | |
34 | (and system) dependencies are most noticeable are | |
35 | .IP 1. | |
36 | Storage allocation for automatic variables and arguments | |
37 | has already been performed, | |
38 | and nodes for such variables refer to them by offset | |
39 | from a display pointer. | |
40 | Type conversion | |
41 | (for example, from integer to pointer) | |
42 | has already occurred using the assumption of | |
43 | byte addressing and 2-byte words. | |
44 | .IP 2. | |
45 | Data representations suitable to the PDP-11 are assumed; | |
46 | in particular, floating point constants are passed as | |
47 | four words in the machine representation. | |
48 | .PP | |
49 | As it happens, each intermediate file is represented as a sequence | |
50 | of binary numbers without any explicit demarcations. | |
51 | It consists of a sequence of | |
52 | conceptual lines, each headed by an operator, and possibly containing | |
53 | various operands. | |
54 | The operators are small numbers; | |
55 | to assist in recognizing failure in synchronization, | |
56 | the high-order byte of each operator word is always the | |
57 | octal number 376. | |
58 | Operands are | |
59 | either 16-bit binary numbers or strings of characters representing names. | |
60 | Each name is terminated by a null character. | |
61 | There is no alignment requirement for numerical | |
62 | operands and so there is no padding | |
63 | after a name string. | |
64 | .PP | |
65 | The binary representation was chosen to avoid the necessity | |
66 | of converting to and from character form | |
67 | and to minimize the size of the files. | |
68 | It would be very easy to make | |
69 | each operator-operand `line' in the file be | |
70 | a genuine, printable line, with the numbers in octal or decimal; | |
71 | this in fact was the representation originally used. | |
72 | .PP | |
73 | The operators fall naturally into two classes: | |
74 | those which represent part of an expression, and all others. | |
75 | Expressions are transmitted in a reverse-Polish notation; | |
76 | as they are being read, a tree is built which is isomorphic | |
77 | to the tree constructed in the first phase. | |
78 | Expressions are passed as a whole, with no non-expression operators | |
79 | intervening. | |
80 | The reader maintains a stack; each leaf of the expression tree (name, constant) | |
81 | is pushed on the stack; | |
82 | each unary operator replaces the top of the stack by a node whose | |
83 | operand is the old top-of-stack; | |
84 | each binary operator replaces the top pair on the stack with | |
85 | a single entry. | |
86 | When the expression is complete there is exactly one item on the | |
87 | stack. | |
88 | Following each expression | |
89 | is a special operator which passes the unique previous expression | |
90 | to the `optimizer' described below and then to the code | |
91 | generator. | |
92 | .PP | |
93 | Here is the list of operators not themselves part of expressions. | |
94 | .LP | |
95 | .Op EOF | |
96 | marks the end of an input file. | |
97 | .Op BDATA "flag data ..." | |
98 | specifies a sequence of bytes to be assembled | |
99 | as static data. | |
100 | It is followed by pairs of words; the first member | |
101 | of the pair is non-zero to indicate that the data continue; | |
102 | a zero flag is not followed by data and terminates | |
103 | the operator. | |
104 | The data bytes occupy the low-order part of a word. | |
105 | .Op WDATA "flag data ..." | |
106 | specifies a sequence of words to be assembled as | |
107 | static data; it is identical to the BDATA operator | |
108 | except that entire words, not just bytes, are passed. | |
109 | .Op PROG | |
110 | means that subsequent information is to be compiled as program text. | |
111 | .Op DATA | |
112 | means that subsequent information is to be compiled as static data. | |
113 | .Op BSS | |
114 | means that subsequent information is to be compiled as unitialized | |
115 | static data. | |
116 | .Op SYMDEF name | |
117 | means that | |
118 | the symbol | |
119 | .I | |
120 | name | |
121 | .R | |
122 | is an external name defined in the current program. | |
123 | It is produced for each external data or function definition. | |
124 | .Op CSPACE "name size" | |
125 | indicates that the name refers to a data area whose size is the | |
126 | specified number of bytes. | |
127 | It is produced for external data definitions without explicit initialization. | |
128 | .Op SSPACE size | |
129 | indicates that | |
130 | .I | |
131 | size | |
132 | .R | |
133 | bytes should be set aside for data storage. | |
134 | It is used to pad out short initializations of external data | |
135 | and to reserve space for static (internal) data. | |
136 | It will be preceded by an appropriate label. | |
137 | .Op EVEN | |
138 | is produced after each | |
139 | external data definition whose size is not | |
140 | an integral number of words. | |
141 | It is not produced after strings except when they initialize | |
142 | a character array. | |
143 | .Op NLABEL name | |
144 | is produced just before a BDATA or WDATA initializing | |
145 | external data, and serves as a label for the data. | |
146 | .Op RLABEL name | |
147 | is produced just before each function definition, | |
148 | and labels its entry point. | |
149 | .Op SNAME "name number" | |
150 | is produced at the start of each function for each static variable | |
151 | or label | |
152 | declared therein. | |
153 | Subsequent uses of the variable will be in terms of the given number. | |
154 | The code generator uses this only to produce a debugging symbol table. | |
155 | .Op ANAME "name number" | |
156 | Likewise, each automatic variable's name and stack offset | |
157 | is specified by this operator. | |
158 | Arguments count as automatics. | |
159 | .Op RNAME "name number" | |
160 | Each register variable is similarly named, with its register number. | |
161 | .Op SAVE number | |
162 | produces a register-save sequence at the start of each function, | |
163 | just after its label (RLABEL). | |
164 | .Op SETREG number | |
165 | is used to indicate the number of registers used | |
166 | for register variables. | |
167 | It actually gives the register number of the lowest | |
168 | free register; it is redundant because the RNAME operators could be | |
169 | counted instead. | |
170 | .Op PROFIL | |
171 | is produced before the save sequence for functions | |
172 | when the profile option is turned on. | |
173 | It produces code to count the number | |
174 | of times the function is called. | |
175 | .Op SWIT "deflab line label value ..." | |
176 | is produced for switches. | |
177 | When control flows into it, | |
178 | the value being switched on is in the register | |
179 | forced by RFORCE (below). | |
180 | The switch statement occurred on the indicated line | |
181 | of the source, and the label number of the default location | |
182 | is | |
183 | .I | |
184 | deflab. | |
185 | .R | |
186 | Then the operator is followed by a sequence of label-number and value pairs; | |
187 | the list is terminated by a 0 label. | |
188 | .Op LABEL number | |
189 | generates an internal label. | |
190 | It is referred to elsewhere using the given number. | |
191 | .Op BRANCH number | |
192 | indicates an unconditional transfer to the internal label number | |
193 | given. | |
194 | .Op RETRN | |
195 | produces the return sequence for a function. | |
196 | It occurs only once, at the end of each function. | |
197 | .Op EXPR line | |
198 | causes the expression just preceding to be compiled. | |
199 | The argument is the line number in the source where the | |
200 | expression occurred. | |
201 | .Op NAME "class type name" | |
202 | .Op NAME "class type number" | |
203 | indicates a name occurring in an expression. | |
204 | The first form is used when the name is external; | |
205 | the second when the name is automatic, static, or a register. | |
206 | Then the number indicates the stack offset, the label number, | |
207 | or the register number as appropriate. | |
208 | Class and type encoding is described elsewhere. | |
209 | .Op CON "type value" | |
210 | transmits an integer constant. | |
211 | This and the next two operators occur as part of expressions. | |
212 | .Op FCON "type 4-word-value" | |
213 | transmits a floating constant as | |
214 | four words in PDP-11 notation. | |
215 | .Op SFCON "type value" | |
216 | transmits a floating-point constant | |
217 | whose value is correctly represented by its high-order word | |
218 | in PDP-11 notation. | |
219 | .Op NULL | |
220 | indicates a null argument list of a function call in an expression; | |
221 | call is a binary operator whose second operand is the argument list. | |
222 | .Op CBRANCH "label cond" | |
223 | produces a conditional branch. | |
224 | It is an expression operator, and will be followed | |
225 | by an EXPR. | |
226 | The branch to the label number takes place if the expression's | |
227 | truth value is the same as that of | |
228 | .I | |
229 | cond. | |
230 | .R | |
231 | That is, if | |
232 | .I | |
233 | cond=1 | |
234 | .R | |
235 | and the expression evaluates to true, the branch is taken. | |
236 | .Op binary-operator type | |
237 | There are binary operators corresponding | |
238 | to each such source-language operator; | |
239 | the type of the result of each is passed as well. | |
240 | Some perhaps-unexpected ones are: | |
241 | COMMA, which is a right-associative operator designed | |
242 | to simplify right-to-left evaluation | |
243 | of function arguments; | |
244 | prefix and postfix ++ and \-\-, whose second operand | |
245 | is the increment amount, as a CON; | |
246 | QUEST and COLON, to express the conditional | |
247 | expression as `a?(b:c)'; | |
248 | and a sequence of special operators for expressing | |
249 | relations between pointers, in case pointer comparison | |
250 | is different from integer comparison | |
251 | (e.g. unsigned). | |
252 | .Op unary-operator type | |
253 | There are also numerous unary operators. | |
254 | These include ITOF, FTOI, FTOL, LTOF, ITOL, LTOI | |
255 | which convert among floating, long, and integer; | |
256 | JUMP which branches indirectly through a label expression; | |
257 | INIT, which compiles the value of a constant expression | |
258 | used as an initializer; | |
259 | RFORCE, which is used before a return sequence or | |
260 | a switch to place a value in an agreed-upon register. |