BSD 3 development
[unix-history] / .ref-BSD-2 / doc / px / pxin1.n
CommitLineData
1662094b
BJ
1.if !\n(xx .so tmac.p
2.nr H1 0
3.NH
4Organization
5.PP
6Most of
7.I px
8is written in the
9\s-2PDP\s0-11
10assembly language, using the
11.UX
12assembler
13.I as.
14Portions of
15.I px
16are also written in the
17.UX
18systems programming language C.
19.I Px
20consists of a main procedure which reads in the interpreter code,
21a main interpreter loop which transfers successively to various
22code segments implementing the abstract machine operations
23and built-in procedures and functions,
24the code segments themselves, and
25a number of routines which support the implementation of the
26Pascal input-output environment.
27.PP
28The interpreter runs at a fraction of the speed of equivalent
29compiled C code, with this fraction varying from 1/5 to 1/15.
30The fact that the interpreter implements 32 bit integer arithmetic
31on a 16 bit machine notably degrades its speed.
32In a code generated Pascal for a
33\s-2PDP\s0-11, 32 bit integers would be undesirable.
34.PP
35The interpreter occupies 14.6K bytes of instruction space, which is shared
36between all instances of the interpreter, and 5K bytes of data space
37of which there is a copy for each interpreter execution.
38.PP
39The interpreter occupies 14.6K bytes of instruction space, shared among
40all processes executing Pascal, and has 4.6K bytes of data space (constants,
41error messages, etc.) a copy of which is allocated to each executing process.
42.NH 2
43Format of the object file
44.PP
45.I Px
46normally interprets the code left in an object file by a run of the
47Pascal translator
48.I pi.
49The file where the translator puts the object originally, and the most
50commonly interpreted file, is called
51.I obj.
52We will first describe the way the object file was prepared in version 1.0
53of the interpreter.
54.PP
55In version 1.0 of the interpreter, the
56.I obj
57file has an extremely simple format.
58The first word of the file has the octal value 404,
59a ``magic'' number.
60This number,
61like the numbers 407, 410, and 411 which signify executable files to the
62system,
63can be recognized by a modified shell (command interpreter)
64which can then fork instances of
65.I px
66to interpret the file.
67In this way, Pascal objects can be referred to at the shell level by typing
68their names.
69The modified shell can open each file which is executable
70but does not have a magic number recognized by the operating system.
71If the first word of such a file is 404, then the shell recognizes
72the file as a Pascal object, and creates an instance of the Pascal
73interpreter with the specified file as its first argument.
74This, importantly, allows all processes executing Pascal objects to share
75the same interpreter, and allows the Pascal object files to be small
76as they do not contain a copy of the interpreter.
77.PP
78With version 1.1 of the Pascal system an option exists to have the translator
79prepare true executable files. In order that all persons using
80.I px
81share a common text image, this executable file is not an interpreter,
82but rather a small process which coordinates with the iinterpreter to start
83execution.
84The way in which this happens is somewhat complicated.
85When one of these object files is created, the interpreter code is placed
86at the end of a special ``header'' file and the size of the initialized
87data area of this header file is expanded to include this code.
88When the process executes, the interpreter code is thus available at a
89easily determined address in its data space.
90When executed, the object process creates an
91.I pipe ,
92creates another process by doing a
93.I fork ,
94and arranges that the resulting parent process becomes an instance of
95.I px .
96The child process then writes, through the pipe whicch it has to the parent,
97interpreter process, the interpreter code.
98When this process is complete, the child exits.
99.PP
100The real advantage of this approach is that it does not require modifications
101to the shell, and that the resultant objects are ``true objects'' not
102requiring special treatment.
103A simpler mechanism would be to determine the name of the file which was
104executed and pass this to the interpreter.
105However it is not possible to determine this name in all cases.\u\*(dg\d
106.FS
107\*(dgFor instance, if the
108.I pxref
109program is placed in the directory
110`/usr/bin'
111then when the user types
112``pxref prog1.p''
113the first argument to the program, nominally the programs name, is
114``pxref.''
115While it would be possible to search in the standard place,
116i.e. the current directory, and the system directories
117`/bin'
118and
119`/usr/bin'
120for a corresponding object file,
121this would be expensive and not guaranteed to succeed.
122Several shells exist which allow other directories to be searched
123for commands, and there is,
124in general,
125no way to determine what these directories are.
126.FE
127.PP
128After the first word containing the value 404,
129the remainder of the
130.I obj
131file contains the object code.
132.NH 2
133General features of object code
134.PP
135Pascal object code is relocatable as all addressing references for
136control transfers within the code are relative.
137The code consists of instructions interspersed with inline data.
138All instructions have a length which is an even number of bytes,
139that is, an integral number of words.
140No variables are kept in the object code area.
141.PP
142The first byte of a Pascal interpreter instruction contains an operation
143code.
144This allows a total of 256 major operation codes, and 219 of these are
145in use in the current
146.I px.
147The second byte of each interpreter instruction is called the
148``sub-operation code'',
149or more commonly the
150.I subop.
151It contains a small integer which may, for example, be used as a
152block-structure level for the associated operation.
153If the instruction can take a fullword constant,
154this constant is often packed into the subop
155if it fits into 8 bits and is not zero.
156A subop value of 0 indicates that the constant wouldn't
157fit and therefore follows in the next word.
158This is a space optimization, the value of 0 for flagging
159the longer case being convenient because it is easy to test.
160.PP
161Other instruction formats are used.
162The branching
163instructions take an offset in the following word,
164operators which load constants onto the stack
165take arbitrarily long inline constant values,
166and a large number of operations deal exclusively with data on the
167interpreter stack, requiring no inline data.
168.NH 2
169Stack structure of the interpreter
170.PP
171The interpreter emulates a stack-structured Pascal machine.
172The ``load'' instructions put values onto the stack, where all
173arithmetic operations take place.
174The ``store'' instructions take values off the stack
175and place them in an address which is also contained on the stack.
176The only way to move data or to compute in the machine is with the stack.
177.PP
178To make the interpreter operations more powerful
179and to thereby increase the interpreter speed,
180the arithmetic operations in the interpreter are ``typed''.
181That is, length conversion of arithmetic values occurs when they are
182used in an operation.
183This eliminates the need for interpreter cycles for length conversion
184and the associated overhead.
185For example, when adding an integer which fits in one byte to one which
186requires four bytes to store, no ``conversion'' operators are required.
187The one byte integer is loaded onto the stack, followed by the four
188byte integer, and then an adding operator is used which has, implicit
189in its definition, the sizes of the arguments.
190.NH 2
191Data types in the interpreter
192.PP
193The interpreter deals with several different fundamental data types.
194In the memory of the machine, 1, 2, and 4 byte integers are supported,
195with only 2 and 4 byte integers being present on the stack.
196The interpreter always converts to 4 byte integers when there is a possibility
197of overflowing the shorter formats.
198This corresponds to the Pascal language definition of overflow in
199arithmetic operations which requires that the result be correct
200if all partial values lie within the bounds of the base integer type:
2014 byte integer values.
202.PP
203Character constants are treated similarly to 1 byte integers for
204most purposes, as are Boolean values.
205All enumerated types are, in fact, treated as integer values of
206an appropriate length, usually 1 byte.
207The interpreter also has real numbers, occupying 8 bytes of storage,
208and sets and strings of varying length.
209The appropriate operations are included for each data type, such as
210set union and intersection and an operation to write a string
211which is on top of the stack to a file.
212.PP
213No special
214.B packed
215data formats are supported by the interpreter.
216The smallest unit of storage occupied by any variable is one byte.
217The built-ins
218.I pack
219and
220.I unpack
221thus degenerate to simple memory to memory transfers with
222no special processing.
223.NH 2
224Runtime environment
225.PP
226The interpreter runtime environment uses a stack data area and a heap
227data area, which are kept at opposite ends of memory
228and grow towards each other.
229All global variables and variables local to procedures and functions
230are kept in the stack area.
231Dynamically allocated variables and buffers for input/output are
232allocated in the heap.
233.PP
234The addressing of block structured variables is accomplished through
235a fixed display which contains, for each
236statically active block, the address of its stack frame.\*(dg
237.FS
238\*(dg Here ``block'' is being used to mean any
239.I procedure ,
240.I function
241or the main program.
242.FE
243This display is referenced by instructions which load and store
244variables and maintained by the operations for
245block entry and exit, and for non-local
246.B goto
247statements.
248.NH 2
249Dp, lc, lp
250.PP
251Three ``global'' variables in the interpreter, in addition to the
252``display'', are the
253.I dp,
254.I lc,
255and the
256.I lp.
257The
258.I dp
259is a pointer to the display entry for the current block;
260the
261.I lc
262is the abstract machine location counter;
263and the
264.I lp
265is a register which holds the address of the main interpreter
266loop so that returning to the loop to fetch the next instruction is
267a fast operation.
268.NH 2
269The stack frame structure
270.PP
271Each active block
272has a stack frame consisting of three parts:
273a block mark, local variables, and temporary storage for partially
274evaluated expressions.
275The stack in the interpreter grows from the high addresses in memory
276to the low addresses,
277so that those parts of the stack frame which are ``on the top''
278of the stack have the most negative offsets from the display
279entry for the block.
280The major parts of the stack frame are represented in Figure 1.1.
281.so fig1.1.n
282Note that the local variables of each block
283have negative offsets from the corresponding display entry,
284the ``first'' local variable having offset `\-2'.
285.NH 2
286The block mark
287.PP
288The block mark contains the saved information necessary
289to restore the environment when the current block exits.
290It consists of two parts.
291The first and top-most part is saved by the
292.SM CALL
293instruction in the interpreter.
294This information is not present for the main program
295as it is never ``called''.
296The second part of the block mark is created by the
297.SM BEG
298begin block operator which also allocates and clears the
299local variable storage.
300The format of these blocks is represented in Figure 1.2.
301.sp
302.so fig1.2.n
303.PP
304The data saved by the
305.SM CALL
306operator includes the line number
307.I lino
308of the point of call,
309which is printed if the program execution terminates abnormally;
310the location counter
311.I lc
312giving the return address;
313and the current display entry address
314.I dp
315at the time of call.
316.PP
317The
318.SM BEG
319begin operator saves the previous display contents at the level
320of this block, so that the display can be restored on block exit.
321A pointer to 10 bytes of information giving the first eight characters of the
322name of this block and its beginning line number is also saved.
323This information is stored in the intepretor object code in-line after the
324.SM BEG
325operator.
326It is used in printing a post-mortem backtrace.
327The saved file name and buffer reference are necessary because of
328the input/output structure
329(this is discussed in detail in
330sections 3.3 and 3.4).
331The top of stack reference gives the value the stack pointer should
332have when there are no expression temporaries on the stack.
333It is used for a consistency check in the
334.SM LINO
335line number operators in the interpreter, which occurs before
336each statement executed.
337This helps to catch bugs in the interpreter, which often manifest
338themselves by leaving the stack non-empty between statements.
339.PP
340Note that there is no explicit static link here.
341Thus to set up the display correctly after a non-local
342.B goto
343statement one must ``unwind''
344through all the block marks on the stack to rebuild the display.
345.NH 2
346Arguments and return values
347.PP
348A function returns its value into a space reserved by the calling
349block.
350Arguments to a
351.B function
352are placed on top of this return area.
353For both
354.B procedure
355and
356.B function
357calls, arguments are placed at the end of the expression evaluation area
358of the caller.
359When a
360.B function
361completes, expression evaluation can continue
362after popping the arguments to the
363.B function
364off the stack,
365exactly as if the function value had been ``loaded''.
366The arguments to a
367.B procedure
368are also popped off the stack by the caller
369after its execution terminates.
370.KS
371.PP
372As a simple example consider the following stack structure
373for a call to a function
374.I f,
375of the form ``f(a)''.
376.so fig1.3.n
377.KE
378.PP
379If we suppose that
380.I f
381returns a
382.I real
383and that
384.I a
385is an integer,
386the calling sequence for this function would be:
387.DS
388.TS
389lp-2w(8) l.
390PUSH -8
391RV4 \fIa\fR
392CALL \fIf\fR
393POP 4
394.TE
395.DE
396.ZP
397Here we use the operator
398.SM PUSH
399to clear space for the return value,
400load
401.I a
402on the stack with an ``rvalue'' operator,
403call the function,
404pop off the argument
405.I a ,
406and can then complete evaluation of the containing expression.
407The operations used here will be explained in section 2.
408.PP
409If the function
410.I f
411were given by
412.LS
413 10 \*bfunction\fR f(i: integer): real;
414 11 \*bbegin\fR
415 12 f := i
416 13 \*bend\fR;
417.LE
418then
419.I f
420would have code sequence:
421.DS
422.TS
423lp-2w(8) l.
424BEG 0
425 "f"
426 11
427LV \fIl\fR,20
428RV4 \fIl\fR,16
429AS48
430END
431.TE
432.DE
433.ZP
434Here the
435.SM BEG
436operator takes 12 bytes of inline data.
437The first word indicates the amount of local variable storage, here none.
438The succeeding two lines give the name of the block and the line number
439of the
440.B begin
441for error traceback.
442The
443.SM BEG
444operator places a pointer to the name and line number in the block mark.
445.PP
446The body of the
447.B function
448here involved taking an address of the
449.B function
450result variable
451.I f
452using the address of operator
453.SM LV .
454.I a .
455The next operation in the interpretation of this function is the loading
456of the value of
457.I i .
458.I I
459is at the level of the
460.B function
461.I f ,
462here symbolically
463.I l,
464and the first variable in the local variable area.
465.PP
466The
467.B function
468completes by assigning the 4 byte integer on the stack to the 8 byte
469return location, hence the
470.SM AS48
471assignment operator, and then uses the
472.SM END
473operator to exit the current block.
474.NH 2
475The main interpreter loop
476.PP
477We can now describe the main interpreter loop.
478It is actually quite short:
479.DS
480loop:
481 \fBmovb\fR (lc)+,r0
482 \fBadd\fR r0,r0
483 \fBmovb\fR (lc)+,r3
484 \fBjmp\fR *optab(r0)
485.DE
486.ZP
487First the main operation code is extracted from the first byte of
488the instruction.
489The code will be a small integer in the range -128 to 127.
490It is then doubled to make a word index into the operation table.
491Note that the sub-operation code is placed in register 3, and is thus available
492for use by the individual operation sequences.
493The hardware also leaves the condition codes set based on the value of this
494subop.
495The code will be discussed in section 2.1.
496.PP
497The label
498.I optab
499is in the middle of a branch table which has one operation address
500per word.
501The table is generated automatically from an abstract machine
502instruction list.
503The address of the instruction at
504.I loop
505is always contained in the register variable
506.I lp
507so that a return to the main interpreter loop both is quick and occupies
508little space.
509The return is thus a ``jmp\ (lp)'' instruction,
510defined for mnemonic value as the operator ``return'' in the intepreter, i.e.
511.DS
512return = 115
513.DE
514so that one can write the mnemonic ``return'' at the end of an interpreter
515code sequence.
516.NH 2
517Errors
518.PP
519Errors during interpretation cause a subroutine call to an error routine
520in a conventional fashion. An earlier version of the interpreter
521more compactly represented the raising of these conditions by using emulator
522traps (\s-2EMT\s0s), a form of system call otherwise unused by \s-2UNIX\s0.
523Errors were assigned small integer numbers and then referred to
524symbolically in the interpreter.
525The \s-2UNIX\s0 assember,
526.I as ,
527provides a mechanism for defining the opcode ``error'' to be an ``emt,'' i.e.:
528.DS
529error = 104000 ^ sys
530.DE
531Thus there were many lines like
532.DS
533.TS
534lw(8) lp-2.
535\fBerror\fR ESTKOVFLO
536.TE
537.DE
538in the interpreter.
539These cause a process fault,
540which is trapped by the system and passed to the label
541.I onemt
542in the interpreter which fetches the error code byte from the
543.SM EMT
544instruction and calls the procedure
545.I error
546with this argument.
547.I Error
548processes the error condition,
549printing an appropriate error message and, usually, a backtrace.
550.PP
551In order that the interpreter run on a standard \s-2UNIX\s0 without
552using non-standard system calls to fetch the \s-2EMT\s0 code
553when running in separated instruction and data spaces,
554the current version of the interpreter places the error code in a global
555variable, before doing an
556.SM EMT .
557Thus the
558.SM EMT
559is used to compactly transfer control, but not for argument transmission.