Commit | Line | Data |
---|---|---|
1662094b BJ |
1 | .if !\n(xx .so tmac.p |
2 | .nr H1 0 | |
3 | .NH | |
4 | Organization | |
5 | .PP | |
6 | Most of | |
7 | .I px | |
8 | is written in the | |
9 | \s-2PDP\s0-11 | |
10 | assembly language, using the | |
11 | .UX | |
12 | assembler | |
13 | .I as. | |
14 | Portions of | |
15 | .I px | |
16 | are also written in the | |
17 | .UX | |
18 | systems programming language C. | |
19 | .I Px | |
20 | consists of a main procedure which reads in the interpreter code, | |
21 | a main interpreter loop which transfers successively to various | |
22 | code segments implementing the abstract machine operations | |
23 | and built-in procedures and functions, | |
24 | the code segments themselves, and | |
25 | a number of routines which support the implementation of the | |
26 | Pascal input-output environment. | |
27 | .PP | |
28 | The interpreter runs at a fraction of the speed of equivalent | |
29 | compiled C code, with this fraction varying from 1/5 to 1/15. | |
30 | The fact that the interpreter implements 32 bit integer arithmetic | |
31 | on a 16 bit machine notably degrades its speed. | |
32 | In a code generated Pascal for a | |
33 | \s-2PDP\s0-11, 32 bit integers would be undesirable. | |
34 | .PP | |
35 | The interpreter occupies 14.6K bytes of instruction space, which is shared | |
36 | between all instances of the interpreter, and 5K bytes of data space | |
37 | of which there is a copy for each interpreter execution. | |
38 | .PP | |
39 | The interpreter occupies 14.6K bytes of instruction space, shared among | |
40 | all processes executing Pascal, and has 4.6K bytes of data space (constants, | |
41 | error messages, etc.) a copy of which is allocated to each executing process. | |
42 | .NH 2 | |
43 | Format of the object file | |
44 | .PP | |
45 | .I Px | |
46 | normally interprets the code left in an object file by a run of the | |
47 | Pascal translator | |
48 | .I pi. | |
49 | The file where the translator puts the object originally, and the most | |
50 | commonly interpreted file, is called | |
51 | .I obj. | |
52 | We will first describe the way the object file was prepared in version 1.0 | |
53 | of the interpreter. | |
54 | .PP | |
55 | In version 1.0 of the interpreter, the | |
56 | .I obj | |
57 | file has an extremely simple format. | |
58 | The first word of the file has the octal value 404, | |
59 | a ``magic'' number. | |
60 | This number, | |
61 | like the numbers 407, 410, and 411 which signify executable files to the | |
62 | system, | |
63 | can be recognized by a modified shell (command interpreter) | |
64 | which can then fork instances of | |
65 | .I px | |
66 | to interpret the file. | |
67 | In this way, Pascal objects can be referred to at the shell level by typing | |
68 | their names. | |
69 | The modified shell can open each file which is executable | |
70 | but does not have a magic number recognized by the operating system. | |
71 | If the first word of such a file is 404, then the shell recognizes | |
72 | the file as a Pascal object, and creates an instance of the Pascal | |
73 | interpreter with the specified file as its first argument. | |
74 | This, importantly, allows all processes executing Pascal objects to share | |
75 | the same interpreter, and allows the Pascal object files to be small | |
76 | as they do not contain a copy of the interpreter. | |
77 | .PP | |
78 | With version 1.1 of the Pascal system an option exists to have the translator | |
79 | prepare true executable files. In order that all persons using | |
80 | .I px | |
81 | share a common text image, this executable file is not an interpreter, | |
82 | but rather a small process which coordinates with the iinterpreter to start | |
83 | execution. | |
84 | The way in which this happens is somewhat complicated. | |
85 | When one of these object files is created, the interpreter code is placed | |
86 | at the end of a special ``header'' file and the size of the initialized | |
87 | data area of this header file is expanded to include this code. | |
88 | When the process executes, the interpreter code is thus available at a | |
89 | easily determined address in its data space. | |
90 | When executed, the object process creates an | |
91 | .I pipe , | |
92 | creates another process by doing a | |
93 | .I fork , | |
94 | and arranges that the resulting parent process becomes an instance of | |
95 | .I px . | |
96 | The child process then writes, through the pipe whicch it has to the parent, | |
97 | interpreter process, the interpreter code. | |
98 | When this process is complete, the child exits. | |
99 | .PP | |
100 | The real advantage of this approach is that it does not require modifications | |
101 | to the shell, and that the resultant objects are ``true objects'' not | |
102 | requiring special treatment. | |
103 | A simpler mechanism would be to determine the name of the file which was | |
104 | executed and pass this to the interpreter. | |
105 | However it is not possible to determine this name in all cases.\u\*(dg\d | |
106 | .FS | |
107 | \*(dgFor instance, if the | |
108 | .I pxref | |
109 | program is placed in the directory | |
110 | `/usr/bin' | |
111 | then when the user types | |
112 | ``pxref prog1.p'' | |
113 | the first argument to the program, nominally the programs name, is | |
114 | ``pxref.'' | |
115 | While it would be possible to search in the standard place, | |
116 | i.e. the current directory, and the system directories | |
117 | `/bin' | |
118 | and | |
119 | `/usr/bin' | |
120 | for a corresponding object file, | |
121 | this would be expensive and not guaranteed to succeed. | |
122 | Several shells exist which allow other directories to be searched | |
123 | for commands, and there is, | |
124 | in general, | |
125 | no way to determine what these directories are. | |
126 | .FE | |
127 | .PP | |
128 | After the first word containing the value 404, | |
129 | the remainder of the | |
130 | .I obj | |
131 | file contains the object code. | |
132 | .NH 2 | |
133 | General features of object code | |
134 | .PP | |
135 | Pascal object code is relocatable as all addressing references for | |
136 | control transfers within the code are relative. | |
137 | The code consists of instructions interspersed with inline data. | |
138 | All instructions have a length which is an even number of bytes, | |
139 | that is, an integral number of words. | |
140 | No variables are kept in the object code area. | |
141 | .PP | |
142 | The first byte of a Pascal interpreter instruction contains an operation | |
143 | code. | |
144 | This allows a total of 256 major operation codes, and 219 of these are | |
145 | in use in the current | |
146 | .I px. | |
147 | The second byte of each interpreter instruction is called the | |
148 | ``sub-operation code'', | |
149 | or more commonly the | |
150 | .I subop. | |
151 | It contains a small integer which may, for example, be used as a | |
152 | block-structure level for the associated operation. | |
153 | If the instruction can take a fullword constant, | |
154 | this constant is often packed into the subop | |
155 | if it fits into 8 bits and is not zero. | |
156 | A subop value of 0 indicates that the constant wouldn't | |
157 | fit and therefore follows in the next word. | |
158 | This is a space optimization, the value of 0 for flagging | |
159 | the longer case being convenient because it is easy to test. | |
160 | .PP | |
161 | Other instruction formats are used. | |
162 | The branching | |
163 | instructions take an offset in the following word, | |
164 | operators which load constants onto the stack | |
165 | take arbitrarily long inline constant values, | |
166 | and a large number of operations deal exclusively with data on the | |
167 | interpreter stack, requiring no inline data. | |
168 | .NH 2 | |
169 | Stack structure of the interpreter | |
170 | .PP | |
171 | The interpreter emulates a stack-structured Pascal machine. | |
172 | The ``load'' instructions put values onto the stack, where all | |
173 | arithmetic operations take place. | |
174 | The ``store'' instructions take values off the stack | |
175 | and place them in an address which is also contained on the stack. | |
176 | The only way to move data or to compute in the machine is with the stack. | |
177 | .PP | |
178 | To make the interpreter operations more powerful | |
179 | and to thereby increase the interpreter speed, | |
180 | the arithmetic operations in the interpreter are ``typed''. | |
181 | That is, length conversion of arithmetic values occurs when they are | |
182 | used in an operation. | |
183 | This eliminates the need for interpreter cycles for length conversion | |
184 | and the associated overhead. | |
185 | For example, when adding an integer which fits in one byte to one which | |
186 | requires four bytes to store, no ``conversion'' operators are required. | |
187 | The one byte integer is loaded onto the stack, followed by the four | |
188 | byte integer, and then an adding operator is used which has, implicit | |
189 | in its definition, the sizes of the arguments. | |
190 | .NH 2 | |
191 | Data types in the interpreter | |
192 | .PP | |
193 | The interpreter deals with several different fundamental data types. | |
194 | In the memory of the machine, 1, 2, and 4 byte integers are supported, | |
195 | with only 2 and 4 byte integers being present on the stack. | |
196 | The interpreter always converts to 4 byte integers when there is a possibility | |
197 | of overflowing the shorter formats. | |
198 | This corresponds to the Pascal language definition of overflow in | |
199 | arithmetic operations which requires that the result be correct | |
200 | if all partial values lie within the bounds of the base integer type: | |
201 | 4 byte integer values. | |
202 | .PP | |
203 | Character constants are treated similarly to 1 byte integers for | |
204 | most purposes, as are Boolean values. | |
205 | All enumerated types are, in fact, treated as integer values of | |
206 | an appropriate length, usually 1 byte. | |
207 | The interpreter also has real numbers, occupying 8 bytes of storage, | |
208 | and sets and strings of varying length. | |
209 | The appropriate operations are included for each data type, such as | |
210 | set union and intersection and an operation to write a string | |
211 | which is on top of the stack to a file. | |
212 | .PP | |
213 | No special | |
214 | .B packed | |
215 | data formats are supported by the interpreter. | |
216 | The smallest unit of storage occupied by any variable is one byte. | |
217 | The built-ins | |
218 | .I pack | |
219 | and | |
220 | .I unpack | |
221 | thus degenerate to simple memory to memory transfers with | |
222 | no special processing. | |
223 | .NH 2 | |
224 | Runtime environment | |
225 | .PP | |
226 | The interpreter runtime environment uses a stack data area and a heap | |
227 | data area, which are kept at opposite ends of memory | |
228 | and grow towards each other. | |
229 | All global variables and variables local to procedures and functions | |
230 | are kept in the stack area. | |
231 | Dynamically allocated variables and buffers for input/output are | |
232 | allocated in the heap. | |
233 | .PP | |
234 | The addressing of block structured variables is accomplished through | |
235 | a fixed display which contains, for each | |
236 | statically active block, the address of its stack frame.\*(dg | |
237 | .FS | |
238 | \*(dg Here ``block'' is being used to mean any | |
239 | .I procedure , | |
240 | .I function | |
241 | or the main program. | |
242 | .FE | |
243 | This display is referenced by instructions which load and store | |
244 | variables and maintained by the operations for | |
245 | block entry and exit, and for non-local | |
246 | .B goto | |
247 | statements. | |
248 | .NH 2 | |
249 | Dp, lc, lp | |
250 | .PP | |
251 | Three ``global'' variables in the interpreter, in addition to the | |
252 | ``display'', are the | |
253 | .I dp, | |
254 | .I lc, | |
255 | and the | |
256 | .I lp. | |
257 | The | |
258 | .I dp | |
259 | is a pointer to the display entry for the current block; | |
260 | the | |
261 | .I lc | |
262 | is the abstract machine location counter; | |
263 | and the | |
264 | .I lp | |
265 | is a register which holds the address of the main interpreter | |
266 | loop so that returning to the loop to fetch the next instruction is | |
267 | a fast operation. | |
268 | .NH 2 | |
269 | The stack frame structure | |
270 | .PP | |
271 | Each active block | |
272 | has a stack frame consisting of three parts: | |
273 | a block mark, local variables, and temporary storage for partially | |
274 | evaluated expressions. | |
275 | The stack in the interpreter grows from the high addresses in memory | |
276 | to the low addresses, | |
277 | so that those parts of the stack frame which are ``on the top'' | |
278 | of the stack have the most negative offsets from the display | |
279 | entry for the block. | |
280 | The major parts of the stack frame are represented in Figure 1.1. | |
281 | .so fig1.1.n | |
282 | Note that the local variables of each block | |
283 | have negative offsets from the corresponding display entry, | |
284 | the ``first'' local variable having offset `\-2'. | |
285 | .NH 2 | |
286 | The block mark | |
287 | .PP | |
288 | The block mark contains the saved information necessary | |
289 | to restore the environment when the current block exits. | |
290 | It consists of two parts. | |
291 | The first and top-most part is saved by the | |
292 | .SM CALL | |
293 | instruction in the interpreter. | |
294 | This information is not present for the main program | |
295 | as it is never ``called''. | |
296 | The second part of the block mark is created by the | |
297 | .SM BEG | |
298 | begin block operator which also allocates and clears the | |
299 | local variable storage. | |
300 | The format of these blocks is represented in Figure 1.2. | |
301 | .sp | |
302 | .so fig1.2.n | |
303 | .PP | |
304 | The data saved by the | |
305 | .SM CALL | |
306 | operator includes the line number | |
307 | .I lino | |
308 | of the point of call, | |
309 | which is printed if the program execution terminates abnormally; | |
310 | the location counter | |
311 | .I lc | |
312 | giving the return address; | |
313 | and the current display entry address | |
314 | .I dp | |
315 | at the time of call. | |
316 | .PP | |
317 | The | |
318 | .SM BEG | |
319 | begin operator saves the previous display contents at the level | |
320 | of this block, so that the display can be restored on block exit. | |
321 | A pointer to 10 bytes of information giving the first eight characters of the | |
322 | name of this block and its beginning line number is also saved. | |
323 | This information is stored in the intepretor object code in-line after the | |
324 | .SM BEG | |
325 | operator. | |
326 | It is used in printing a post-mortem backtrace. | |
327 | The saved file name and buffer reference are necessary because of | |
328 | the input/output structure | |
329 | (this is discussed in detail in | |
330 | sections 3.3 and 3.4). | |
331 | The top of stack reference gives the value the stack pointer should | |
332 | have when there are no expression temporaries on the stack. | |
333 | It is used for a consistency check in the | |
334 | .SM LINO | |
335 | line number operators in the interpreter, which occurs before | |
336 | each statement executed. | |
337 | This helps to catch bugs in the interpreter, which often manifest | |
338 | themselves by leaving the stack non-empty between statements. | |
339 | .PP | |
340 | Note that there is no explicit static link here. | |
341 | Thus to set up the display correctly after a non-local | |
342 | .B goto | |
343 | statement one must ``unwind'' | |
344 | through all the block marks on the stack to rebuild the display. | |
345 | .NH 2 | |
346 | Arguments and return values | |
347 | .PP | |
348 | A function returns its value into a space reserved by the calling | |
349 | block. | |
350 | Arguments to a | |
351 | .B function | |
352 | are placed on top of this return area. | |
353 | For both | |
354 | .B procedure | |
355 | and | |
356 | .B function | |
357 | calls, arguments are placed at the end of the expression evaluation area | |
358 | of the caller. | |
359 | When a | |
360 | .B function | |
361 | completes, expression evaluation can continue | |
362 | after popping the arguments to the | |
363 | .B function | |
364 | off the stack, | |
365 | exactly as if the function value had been ``loaded''. | |
366 | The arguments to a | |
367 | .B procedure | |
368 | are also popped off the stack by the caller | |
369 | after its execution terminates. | |
370 | .KS | |
371 | .PP | |
372 | As a simple example consider the following stack structure | |
373 | for a call to a function | |
374 | .I f, | |
375 | of the form ``f(a)''. | |
376 | .so fig1.3.n | |
377 | .KE | |
378 | .PP | |
379 | If we suppose that | |
380 | .I f | |
381 | returns a | |
382 | .I real | |
383 | and that | |
384 | .I a | |
385 | is an integer, | |
386 | the calling sequence for this function would be: | |
387 | .DS | |
388 | .TS | |
389 | lp-2w(8) l. | |
390 | PUSH -8 | |
391 | RV4 \fIa\fR | |
392 | CALL \fIf\fR | |
393 | POP 4 | |
394 | .TE | |
395 | .DE | |
396 | .ZP | |
397 | Here we use the operator | |
398 | .SM PUSH | |
399 | to clear space for the return value, | |
400 | load | |
401 | .I a | |
402 | on the stack with an ``rvalue'' operator, | |
403 | call the function, | |
404 | pop off the argument | |
405 | .I a , | |
406 | and can then complete evaluation of the containing expression. | |
407 | The operations used here will be explained in section 2. | |
408 | .PP | |
409 | If the function | |
410 | .I f | |
411 | were given by | |
412 | .LS | |
413 | 10 \*bfunction\fR f(i: integer): real; | |
414 | 11 \*bbegin\fR | |
415 | 12 f := i | |
416 | 13 \*bend\fR; | |
417 | .LE | |
418 | then | |
419 | .I f | |
420 | would have code sequence: | |
421 | .DS | |
422 | .TS | |
423 | lp-2w(8) l. | |
424 | BEG 0 | |
425 | "f" | |
426 | 11 | |
427 | LV \fIl\fR,20 | |
428 | RV4 \fIl\fR,16 | |
429 | AS48 | |
430 | END | |
431 | .TE | |
432 | .DE | |
433 | .ZP | |
434 | Here the | |
435 | .SM BEG | |
436 | operator takes 12 bytes of inline data. | |
437 | The first word indicates the amount of local variable storage, here none. | |
438 | The succeeding two lines give the name of the block and the line number | |
439 | of the | |
440 | .B begin | |
441 | for error traceback. | |
442 | The | |
443 | .SM BEG | |
444 | operator places a pointer to the name and line number in the block mark. | |
445 | .PP | |
446 | The body of the | |
447 | .B function | |
448 | here involved taking an address of the | |
449 | .B function | |
450 | result variable | |
451 | .I f | |
452 | using the address of operator | |
453 | .SM LV . | |
454 | .I a . | |
455 | The next operation in the interpretation of this function is the loading | |
456 | of the value of | |
457 | .I i . | |
458 | .I I | |
459 | is at the level of the | |
460 | .B function | |
461 | .I f , | |
462 | here symbolically | |
463 | .I l, | |
464 | and the first variable in the local variable area. | |
465 | .PP | |
466 | The | |
467 | .B function | |
468 | completes by assigning the 4 byte integer on the stack to the 8 byte | |
469 | return location, hence the | |
470 | .SM AS48 | |
471 | assignment operator, and then uses the | |
472 | .SM END | |
473 | operator to exit the current block. | |
474 | .NH 2 | |
475 | The main interpreter loop | |
476 | .PP | |
477 | We can now describe the main interpreter loop. | |
478 | It is actually quite short: | |
479 | .DS | |
480 | loop: | |
481 | \fBmovb\fR (lc)+,r0 | |
482 | \fBadd\fR r0,r0 | |
483 | \fBmovb\fR (lc)+,r3 | |
484 | \fBjmp\fR *optab(r0) | |
485 | .DE | |
486 | .ZP | |
487 | First the main operation code is extracted from the first byte of | |
488 | the instruction. | |
489 | The code will be a small integer in the range -128 to 127. | |
490 | It is then doubled to make a word index into the operation table. | |
491 | Note that the sub-operation code is placed in register 3, and is thus available | |
492 | for use by the individual operation sequences. | |
493 | The hardware also leaves the condition codes set based on the value of this | |
494 | subop. | |
495 | The code will be discussed in section 2.1. | |
496 | .PP | |
497 | The label | |
498 | .I optab | |
499 | is in the middle of a branch table which has one operation address | |
500 | per word. | |
501 | The table is generated automatically from an abstract machine | |
502 | instruction list. | |
503 | The address of the instruction at | |
504 | .I loop | |
505 | is always contained in the register variable | |
506 | .I lp | |
507 | so that a return to the main interpreter loop both is quick and occupies | |
508 | little space. | |
509 | The return is thus a ``jmp\ (lp)'' instruction, | |
510 | defined for mnemonic value as the operator ``return'' in the intepreter, i.e. | |
511 | .DS | |
512 | return = 115 | |
513 | .DE | |
514 | so that one can write the mnemonic ``return'' at the end of an interpreter | |
515 | code sequence. | |
516 | .NH 2 | |
517 | Errors | |
518 | .PP | |
519 | Errors during interpretation cause a subroutine call to an error routine | |
520 | in a conventional fashion. An earlier version of the interpreter | |
521 | more compactly represented the raising of these conditions by using emulator | |
522 | traps (\s-2EMT\s0s), a form of system call otherwise unused by \s-2UNIX\s0. | |
523 | Errors were assigned small integer numbers and then referred to | |
524 | symbolically in the interpreter. | |
525 | The \s-2UNIX\s0 assember, | |
526 | .I as , | |
527 | provides a mechanism for defining the opcode ``error'' to be an ``emt,'' i.e.: | |
528 | .DS | |
529 | error = 104000 ^ sys | |
530 | .DE | |
531 | Thus there were many lines like | |
532 | .DS | |
533 | .TS | |
534 | lw(8) lp-2. | |
535 | \fBerror\fR ESTKOVFLO | |
536 | .TE | |
537 | .DE | |
538 | in the interpreter. | |
539 | These cause a process fault, | |
540 | which is trapped by the system and passed to the label | |
541 | .I onemt | |
542 | in the interpreter which fetches the error code byte from the | |
543 | .SM EMT | |
544 | instruction and calls the procedure | |
545 | .I error | |
546 | with this argument. | |
547 | .I Error | |
548 | processes the error condition, | |
549 | printing an appropriate error message and, usually, a backtrace. | |
550 | .PP | |
551 | In order that the interpreter run on a standard \s-2UNIX\s0 without | |
552 | using non-standard system calls to fetch the \s-2EMT\s0 code | |
553 | when running in separated instruction and data spaces, | |
554 | the current version of the interpreter places the error code in a global | |
555 | variable, before doing an | |
556 | .SM EMT . | |
557 | Thus the | |
558 | .SM EMT | |
559 | is used to compactly transfer control, but not for argument transmission. |