[unix-history] / .ref-BSD-2 / doc / px / pxin1.n

.if !\n(xx .so tmac.p
.nr H1 0
.NH 
Organization
.PP
Most of
.I px
is written in the
\s-2PDP\s0-11
assembly language, using the
.UX
assembler
.I as.
Portions of
.I px
are also written in the
.UX
systems programming language C.
.I Px
consists of a main procedure which reads in the interpreter code,
a main interpreter loop which transfers successively to various
code segments implementing the abstract machine operations
and built-in procedures and functions,
the code segments themselves, and
a number of routines which support the implementation of the
Pascal input-output environment.
.PP
The interpreter runs at a fraction of the speed of equivalent
compiled C code, with this fraction varying from 1/5 to 1/15.
The fact that the interpreter implements 32 bit integer arithmetic
on a 16 bit machine notably degrades its speed.
In a code generated Pascal for a
\s-2PDP\s0-11, 32 bit integers would be undesirable.
.PP
The interpreter occupies 14.6K bytes of instruction space, which is shared
between all instances of the interpreter, and 5K bytes of data space
of which there is a copy for each interpreter execution.
.PP
The interpreter occupies 14.6K bytes of instruction space, shared among
all processes executing Pascal, and has 4.6K bytes of data space (constants,
error messages, etc.) a copy of which is allocated to each executing process.
.NH 2
Format of the object file
.PP
.I Px
normally interprets the code left in an object file by a run of the
Pascal translator
.I pi.
The file where the translator puts the object originally, and the most
commonly interpreted file, is called
.I obj.
We will first describe the way the object file was prepared in version 1.0
of the interpreter.
.PP
In version 1.0 of the interpreter, the
.I obj
file has an extremely simple format.
The first word of the file has the octal value 404,
a ``magic'' number.
This number,
like the numbers 407, 410, and 411 which signify executable files to the
system,
can be recognized by a modified shell (command interpreter)
which can then fork instances of
.I px
to interpret the file.
In this way, Pascal objects can be referred to at the shell level by typing
their names.
The modified shell can open each file which is executable
but does not have a magic number recognized by the operating system.
If the first word of such a file is 404, then the shell recognizes
the file as a Pascal object, and creates an instance of the Pascal
interpreter with the specified file as its first argument.
This, importantly, allows all processes executing Pascal objects to share
the same interpreter, and allows the Pascal object files to be small
as they do not contain a copy of the interpreter.
.PP
With version 1.1 of the Pascal system an option exists to have the translator
prepare true executable files.  In order that all persons using
.I px
share a common text image, this executable file is not an interpreter,
but rather a small process which coordinates with the iinterpreter to start
execution.
The way in which this happens is somewhat complicated.
When one of these object files is created, the interpreter code is placed
at the end of a special ``header'' file and the size of the initialized
data area of this header file is expanded to include this code.
When the process executes, the interpreter code is thus available at a
easily determined address in its data space.
When executed, the object process creates an
.I pipe ,
creates another process by doing a
.I fork ,
and arranges that the resulting parent process becomes an instance of
.I px .
The child process then writes, through the pipe whicch it has to the parent,
interpreter process, the interpreter code.
When this process is complete, the child exits.
.PP
The real advantage of this approach is that it does not require modifications
to the shell, and that the resultant objects are ``true objects'' not
requiring special treatment.
A simpler mechanism would be to determine the name of the file which was
executed and pass this to the interpreter.
However it is not possible to determine this name in all cases.\u\*(dg\d
.FS
\*(dgFor instance, if the
.I pxref
program is placed in the directory
`/usr/bin'
then when the user types
``pxref prog1.p''
the first argument to the program, nominally the programs name, is
``pxref.''
While it would be possible to search in the standard place,
i.e. the current directory, and the system directories
`/bin'
and
`/usr/bin'
for a corresponding object file,
this would be expensive and not guaranteed to succeed.
Several shells exist which allow other directories to be searched
for commands, and there is,
in general,
no way to determine what these directories are.
.FE
.PP
After the first word containing the value 404,
the remainder of the
.I obj
file contains the object code.
.NH 2
General features of object code
.PP
Pascal object code is relocatable as all addressing references for
control transfers within the code are relative.
The code consists of instructions interspersed with inline data.
All instructions have a length which is an even number of bytes,
that is, an integral number of words.
No variables are kept in the object code area.
.PP
The first byte of a Pascal interpreter instruction contains an operation
code.
This allows a total of 256 major operation codes, and 219 of these are
in use in the current
.I px.
The second byte of each interpreter instruction is called the
``sub-operation code'',
or more commonly the
.I subop.
It contains a small integer which may, for example, be used as a
block-structure level for the associated operation.
If the instruction can take a fullword constant,
this constant is often packed into the subop
if it fits into 8 bits and is not zero.
A subop value of 0 indicates that the constant wouldn't
fit and therefore follows in the next word.
This is a space optimization, the value of 0 for flagging
the longer case being convenient because it is easy to test.
.PP
Other instruction formats are used.
The branching
instructions take an offset in the following word,
operators which load constants onto the stack 
take arbitrarily long inline constant values,
and a large number of operations deal exclusively with data on the
interpreter stack, requiring no inline data.
.NH 2
Stack structure of the interpreter
.PP
The interpreter emulates a stack-structured Pascal machine.
The ``load'' instructions put values onto the stack, where all
arithmetic operations take place.
The ``store'' instructions take values off the stack
and place them in an address which is also contained on the stack.
The only way to move data or to compute in the machine is with the stack.
.PP
To make the interpreter operations more powerful
and to thereby increase the interpreter speed,
the arithmetic operations in the interpreter are ``typed''.
That is, length conversion of arithmetic values occurs when they are
used in an operation.
This eliminates the need for interpreter cycles for length conversion
and the associated overhead.
For example, when adding an integer which fits in one byte to one which
requires four bytes to store, no ``conversion'' operators are required.
The one byte integer is loaded onto the stack, followed by the four
byte integer, and then an adding operator is used which has, implicit
in its definition, the sizes of the arguments.
.NH 2
Data types in the interpreter
.PP
The interpreter deals with several different fundamental data types.
In the memory of the machine, 1, 2, and 4 byte integers are supported,
with only 2 and 4 byte integers being present on the stack.
The interpreter always converts to 4 byte integers when there is a possibility
of overflowing the shorter formats.
This corresponds to the Pascal language definition of overflow in
arithmetic operations which requires that the result be correct
if all partial values lie within the bounds of the base integer type:
4 byte integer values.
.PP
Character constants are treated similarly to 1 byte integers for
most purposes, as are Boolean values.
All enumerated types are, in fact, treated as integer values of
an appropriate length, usually 1 byte.
The interpreter also has real numbers, occupying 8 bytes of storage,
and sets and strings of varying length.
The appropriate operations are included for each data type, such as
set union and intersection and an operation to write a string
which is on top of the stack to a file.
.PP
No special
.B packed
data formats are supported by the interpreter.
The smallest unit of storage occupied by any variable is one byte.
The built-ins
.I pack
and
.I unpack
thus degenerate to simple memory to memory transfers with
no special processing.
.NH 2
Runtime environment
.PP
The interpreter runtime environment uses a stack data area and a heap
data area, which are kept at opposite ends of memory
and grow towards each other.
All global variables and variables local to procedures and functions
are kept in the stack area.
Dynamically allocated variables and buffers for input/output are
allocated in the heap.
.PP
The addressing of block structured variables is accomplished through
a fixed display which contains, for each 
statically active block, the address of its stack frame.\*(dg
.FS
\*(dg Here ``block'' is being used to mean any
.I procedure ,
.I function
or the main program.
.FE
This display is referenced by instructions which load and store
variables and maintained by the operations for
block entry and exit, and for non-local
.B goto
statements.
.NH 2
Dp, lc, lp
.PP
Three ``global'' variables in the interpreter, in addition to the
``display'', are the
.I dp,
.I lc,
and the
.I lp.
The
.I dp
is a pointer to the display entry for the current block;
the
.I lc
is the abstract machine location counter;
and the
.I lp
is a register which holds the address of the main interpreter
loop so that returning to the loop to fetch the next instruction is
a fast operation.
.NH 2
The stack frame structure
.PP
Each active block
has a stack frame consisting of three parts:
a block mark, local variables, and temporary storage for partially
evaluated expressions.
The stack in the interpreter grows from the high addresses in memory
to the low addresses,
so that those parts of the stack frame which are ``on the top''
of the stack have the most negative offsets from the display
entry for the block.
The major parts of the stack frame are represented in Figure 1.1.
.so fig1.1.n
Note that the local variables of each block
have negative offsets from the corresponding display entry,
the ``first'' local variable having offset `\-2'.
.NH 2
The block mark
.PP
The block mark contains the saved information necessary
to restore the environment when the current block exits.
It consists of two parts.
The first and top-most part is saved by the
.SM CALL
instruction in the interpreter.
This information is not present for the main program
as it is never ``called''.
The second part of the block mark is created by the
.SM BEG
begin block operator which also allocates and clears the
local variable storage.
The format of these blocks is represented in Figure 1.2.
.sp
.so fig1.2.n
.PP
The data saved by the
.SM CALL
operator includes the line number
.I lino
of the point of call,
which is printed if the program execution terminates abnormally;
the location counter
.I lc
giving the return address;
and the current display entry address
.I dp
at the time of call.
.PP
The
.SM BEG
begin operator saves the previous display contents at the level
of this block, so that the display can be restored on block exit.
A pointer to 10 bytes of information giving the first eight characters of the
name of this block and its beginning line number is also saved.
This information is stored in the intepretor object code in-line after the
.SM BEG
operator.
It is used in printing a post-mortem backtrace.
The saved file name and buffer reference are necessary because of
the input/output structure
(this is discussed in detail in 
sections 3.3 and 3.4).
The top of stack reference gives the value the stack pointer should
have when there are no expression temporaries on the stack.
It is used for a consistency check in the
.SM LINO
line number operators in the interpreter, which occurs before
each statement executed.
This helps to catch bugs in the interpreter, which often manifest
themselves by leaving the stack non-empty between statements.
.PP
Note that there is no explicit static link here.
Thus to set up the display correctly after a non-local
.B goto
statement one must ``unwind''
through all the block marks on the stack to rebuild the display.
.NH 2
Arguments and return values
.PP
A function returns its value into a space reserved by the calling
block.
Arguments to a
.B function
are placed on top of this return area.
For both
.B procedure
and
.B function
calls, arguments are placed at the end of the expression evaluation area
of the caller.
When a
.B function
completes, expression evaluation can continue
after popping the arguments to the
.B function
off the stack,
exactly as if the function value had been ``loaded''.
The arguments to a
.B procedure
are also popped off the stack by the caller
after its execution terminates.
.KS
.PP
As a simple example consider the following stack structure
for a call to a function
.I f,
of the form ``f(a)''.
.so fig1.3.n
.KE
.PP
If we suppose that
.I f
returns a
.I real
and that
.I a
is an integer,
the calling sequence for this function would be:
.DS
.TS
lp-2w(8) l.
PUSH	-8
RV4	\fIa\fR
CALL	\fIf\fR
POP	4
.TE
.DE
.ZP
Here we use the operator
.SM PUSH
to clear space for the return value,
load
.I a
on the stack with an ``rvalue'' operator,
call the function,
pop off the argument
.I a ,
and can then complete evaluation of the containing expression.
The operations used here will be explained in section 2.
.PP
If the function
.I f
were given by
.LS
 10 \*bfunction\fR f(i: integer): real;
 11 \*bbegin\fR
 12     f := i
 13 \*bend\fR;
.LE
then
.I f
would have code sequence:
.DS
.TS
lp-2w(8) l.
BEG	0
	"f"
	11
LV	\fIl\fR,20
RV4	\fIl\fR,16
AS48
END
.TE
.DE
.ZP
Here the
.SM BEG
operator takes 12 bytes of inline data.
The first word indicates the amount of local variable storage, here none.
The succeeding two lines give the name of the block and the line number
of the
.B begin
for error traceback.
The
.SM BEG
operator places a pointer to the name and line number in the block mark.
.PP
The body of the
.B function
here involved taking an address of the
.B function
result variable
.I f
using the address of operator
.SM LV .
.I a .
The next operation in the interpretation of this function is the loading
of the value of
.I i .
.I I
is at the level of the
.B function
.I f ,
here symbolically
.I l,
and the first variable in the local variable area.
.PP
The
.B function
completes by assigning the 4 byte integer on the stack to the 8 byte
return location, hence the
.SM AS48
assignment operator, and then uses the
.SM END
operator to exit the current block.
.NH 2
The main interpreter loop
.PP
We can now describe the main interpreter loop.
It is actually quite short:
.DS
loop:
	\fBmovb\fR	(lc)+,r0
	\fBadd\fR	r0,r0
	\fBmovb\fR	(lc)+,r3
	\fBjmp\fR	*optab(r0)
.DE
.ZP
First the main operation code is extracted from the first byte of
the instruction.
The code will be a small integer in the range -128 to 127.
It is then doubled to make a word index into the operation table.
Note that the sub-operation code is placed in register 3, and is thus available
for use by the individual operation sequences.
The hardware also leaves the condition codes set based on the value of this 
subop.
The code will be discussed in section 2.1.
.PP
The label
.I optab
is in the middle of a branch table which has one operation address
per word.
The table is generated automatically from an abstract machine
instruction list.
The address of the instruction at
.I loop
is always contained in the register variable
.I lp
so that a return to the main interpreter loop both is quick and occupies
little space.
The return is thus a ``jmp\ (lp)'' instruction,
defined for mnemonic value as the operator ``return'' in the intepreter, i.e.
.DS
return = 115
.DE
so that one can write the mnemonic ``return'' at the end of an interpreter
code sequence.
.NH 2
Errors
.PP
Errors during interpretation cause a subroutine call to an error routine
in a conventional fashion.  An earlier version of the interpreter
more compactly represented the raising of these conditions by using emulator
traps (\s-2EMT\s0s), a form of system call otherwise unused by \s-2UNIX\s0.
Errors were assigned small integer numbers and then referred to
symbolically in the interpreter.
The \s-2UNIX\s0 assember,
.I as ,
provides a mechanism for defining the opcode ``error'' to be an ``emt,'' i.e.:
.DS
error = 104000 ^ sys
.DE
Thus there were many lines like
.DS
.TS
lw(8) lp-2.
\fBerror\fR	ESTKOVFLO
.TE
.DE
in the interpreter.
These cause a process fault,
which is trapped by the system and passed to the label
.I onemt
in the interpreter which fetches the error code byte from the
.SM EMT
instruction and calls the procedure
.I error
with this argument.
.I Error
processes the error condition,
printing an appropriate error message and, usually, a backtrace.
.PP
In order that the interpreter run on a standard \s-2UNIX\s0 without
using non-standard system calls to fetch the \s-2EMT\s0 code
when running in separated instruction and data spaces,
the current version of the interpreter places the error code in a global
variable, before doing an
.SM EMT .
Thus the
.SM EMT
is used to compactly transfer control, but not for argument transmission.
Commit	Line	Data
1662094b BJ	1	.if !\n(xx .so tmac.p
	2	.nr H1 0
	3	.NH
	4	Organization
	5	.PP
	6	Most of
	7	.I px
	8	is written in the
	9	\s-2PDP\s0-11
	10	assembly language, using the
	11	.UX
	12	assembler
	13	.I as.
	14	Portions of
	15	.I px
	16	are also written in the
	17	.UX
	18	systems programming language C.
	19	.I Px
	20	consists of a main procedure which reads in the interpreter code,
	21	a main interpreter loop which transfers successively to various
	22	code segments implementing the abstract machine operations
	23	and built-in procedures and functions,
	24	the code segments themselves, and
	25	a number of routines which support the implementation of the
	26	Pascal input-output environment.
	27	.PP
	28	The interpreter runs at a fraction of the speed of equivalent
	29	compiled C code, with this fraction varying from 1/5 to 1/15.
	30	The fact that the interpreter implements 32 bit integer arithmetic
	31	on a 16 bit machine notably degrades its speed.
	32	In a code generated Pascal for a
	33	\s-2PDP\s0-11, 32 bit integers would be undesirable.
	34	.PP
	35	The interpreter occupies 14.6K bytes of instruction space, which is shared
	36	between all instances of the interpreter, and 5K bytes of data space
	37	of which there is a copy for each interpreter execution.
	38	.PP
	39	The interpreter occupies 14.6K bytes of instruction space, shared among
	40	all processes executing Pascal, and has 4.6K bytes of data space (constants,
	41	error messages, etc.) a copy of which is allocated to each executing process.
	42	.NH 2
	43	Format of the object file
	44	.PP
	45	.I Px
	46	normally interprets the code left in an object file by a run of the
	47	Pascal translator
	48	.I pi.
	49	The file where the translator puts the object originally, and the most
	50	commonly interpreted file, is called
	51	.I obj.
	52	We will first describe the way the object file was prepared in version 1.0
	53	of the interpreter.
	54	.PP
	55	In version 1.0 of the interpreter, the
	56	.I obj
	57	file has an extremely simple format.
	58	The first word of the file has the octal value 404,
	59	a ``magic'' number.
	60	This number,
	61	like the numbers 407, 410, and 411 which signify executable files to the
	62	system,
	63	can be recognized by a modified shell (command interpreter)
	64	which can then fork instances of
65	.I px
66	to interpret the file.
67	In this way, Pascal objects can be referred to at the shell level by typing
68	their names.
69	The modified shell can open each file which is executable
70	but does not have a magic number recognized by the operating system.
71	If the first word of such a file is 404, then the shell recognizes
72	the file as a Pascal object, and creates an instance of the Pascal
73	interpreter with the specified file as its first argument.
74	This, importantly, allows all processes executing Pascal objects to share
75	the same interpreter, and allows the Pascal object files to be small
76	as they do not contain a copy of the interpreter.
77	.PP
78	With version 1.1 of the Pascal system an option exists to have the translator
79	prepare true executable files. In order that all persons using
80	.I px
81	share a common text image, this executable file is not an interpreter,
82	but rather a small process which coordinates with the iinterpreter to start
83	execution.
84	The way in which this happens is somewhat complicated.
85	When one of these object files is created, the interpreter code is placed
86	at the end of a special ``header'' file and the size of the initialized
87	data area of this header file is expanded to include this code.
88	When the process executes, the interpreter code is thus available at a
89	easily determined address in its data space.
90	When executed, the object process creates an
91	.I pipe ,
92	creates another process by doing a
93	.I fork ,
94	and arranges that the resulting parent process becomes an instance of
95	.I px .
96	The child process then writes, through the pipe whicch it has to the parent,
97	interpreter process, the interpreter code.
98	When this process is complete, the child exits.
99	.PP
100	The real advantage of this approach is that it does not require modifications
101	to the shell, and that the resultant objects are ``true objects'' not
102	requiring special treatment.
103	A simpler mechanism would be to determine the name of the file which was
104	executed and pass this to the interpreter.
105	However it is not possible to determine this name in all cases.\u\*(dg\d
106	.FS
107	\*(dgFor instance, if the
108	.I pxref
109	program is placed in the directory
110	`/usr/bin'
111	then when the user types
112	``pxref prog1.p''
113	the first argument to the program, nominally the programs name, is
114	``pxref.''
115	While it would be possible to search in the standard place,
116	i.e. the current directory, and the system directories
117	`/bin'
118	and
119	`/usr/bin'
120	for a corresponding object file,
121	this would be expensive and not guaranteed to succeed.
122	Several shells exist which allow other directories to be searched
123	for commands, and there is,
124	in general,
125	no way to determine what these directories are.
126	.FE
127	.PP
128	After the first word containing the value 404,
129	the remainder of the
130	.I obj
131	file contains the object code.
132	.NH 2
133	General features of object code
134	.PP
135	Pascal object code is relocatable as all addressing references for
136	control transfers within the code are relative.
137	The code consists of instructions interspersed with inline data.
138	All instructions have a length which is an even number of bytes,
139	that is, an integral number of words.
140	No variables are kept in the object code area.
141	.PP
142	The first byte of a Pascal interpreter instruction contains an operation
143	code.
144	This allows a total of 256 major operation codes, and 219 of these are
145	in use in the current
146	.I px.
147	The second byte of each interpreter instruction is called the
148	``sub-operation code'',
149	or more commonly the
150	.I subop.
151	It contains a small integer which may, for example, be used as a
152	block-structure level for the associated operation.
153	If the instruction can take a fullword constant,
154	this constant is often packed into the subop
155	if it fits into 8 bits and is not zero.
156	A subop value of 0 indicates that the constant wouldn't
157	fit and therefore follows in the next word.
158	This is a space optimization, the value of 0 for flagging
159	the longer case being convenient because it is easy to test.
160	.PP
161	Other instruction formats are used.
162	The branching
163	instructions take an offset in the following word,
164	operators which load constants onto the stack
165	take arbitrarily long inline constant values,
166	and a large number of operations deal exclusively with data on the
167	interpreter stack, requiring no inline data.
168	.NH 2
169	Stack structure of the interpreter
170	.PP
171	The interpreter emulates a stack-structured Pascal machine.
172	The ``load'' instructions put values onto the stack, where all
173	arithmetic operations take place.
174	The ``store'' instructions take values off the stack
175	and place them in an address which is also contained on the stack.
176	The only way to move data or to compute in the machine is with the stack.
177	.PP
178	To make the interpreter operations more powerful
179	and to thereby increase the interpreter speed,
180	the arithmetic operations in the interpreter are ``typed''.
181	That is, length conversion of arithmetic values occurs when they are
182	used in an operation.
183	This eliminates the need for interpreter cycles for length conversion
184	and the associated overhead.
185	For example, when adding an integer which fits in one byte to one which
186	requires four bytes to store, no ``conversion'' operators are required.
187	The one byte integer is loaded onto the stack, followed by the four
188	byte integer, and then an adding operator is used which has, implicit
189	in its definition, the sizes of the arguments.
190	.NH 2
191	Data types in the interpreter
192	.PP
193	The interpreter deals with several different fundamental data types.
194	In the memory of the machine, 1, 2, and 4 byte integers are supported,
195	with only 2 and 4 byte integers being present on the stack.
196	The interpreter always converts to 4 byte integers when there is a possibility
197	of overflowing the shorter formats.
198	This corresponds to the Pascal language definition of overflow in
199	arithmetic operations which requires that the result be correct
200	if all partial values lie within the bounds of the base integer type:
201	4 byte integer values.
202	.PP
203	Character constants are treated similarly to 1 byte integers for
204	most purposes, as are Boolean values.
205	All enumerated types are, in fact, treated as integer values of
206	an appropriate length, usually 1 byte.
207	The interpreter also has real numbers, occupying 8 bytes of storage,
208	and sets and strings of varying length.
209	The appropriate operations are included for each data type, such as
210	set union and intersection and an operation to write a string
211	which is on top of the stack to a file.
212	.PP
213	No special
214	.B packed
215	data formats are supported by the interpreter.
216	The smallest unit of storage occupied by any variable is one byte.
217	The built-ins
218	.I pack
219	and
220	.I unpack
221	thus degenerate to simple memory to memory transfers with
222	no special processing.
223	.NH 2
224	Runtime environment
225	.PP
226	The interpreter runtime environment uses a stack data area and a heap
227	data area, which are kept at opposite ends of memory
228	and grow towards each other.
229	All global variables and variables local to procedures and functions
230	are kept in the stack area.
231	Dynamically allocated variables and buffers for input/output are
232	allocated in the heap.
233	.PP
234	The addressing of block structured variables is accomplished through
235	a fixed display which contains, for each
236	statically active block, the address of its stack frame.\*(dg
237	.FS
238	\*(dg Here ``block'' is being used to mean any
239	.I procedure ,
240	.I function
241	or the main program.
242	.FE
243	This display is referenced by instructions which load and store
244	variables and maintained by the operations for
245	block entry and exit, and for non-local
246	.B goto
247	statements.
248	.NH 2
249	Dp, lc, lp
250	.PP
251	Three ``global'' variables in the interpreter, in addition to the
252	``display'', are the
253	.I dp,
254	.I lc,
255	and the
256	.I lp.
257	The
258	.I dp
259	is a pointer to the display entry for the current block;
260	the
261	.I lc
262	is the abstract machine location counter;
263	and the
264	.I lp
265	is a register which holds the address of the main interpreter
266	loop so that returning to the loop to fetch the next instruction is
267	a fast operation.
268	.NH 2
269	The stack frame structure
270	.PP
271	Each active block
272	has a stack frame consisting of three parts:
273	a block mark, local variables, and temporary storage for partially
274	evaluated expressions.
275	The stack in the interpreter grows from the high addresses in memory
276	to the low addresses,
277	so that those parts of the stack frame which are ``on the top''
278	of the stack have the most negative offsets from the display
279	entry for the block.
280	The major parts of the stack frame are represented in Figure 1.1.
281	.so fig1.1.n
282	Note that the local variables of each block
283	have negative offsets from the corresponding display entry,
284	the ``first'' local variable having offset `\-2'.
285	.NH 2
286	The block mark
287	.PP
288	The block mark contains the saved information necessary
289	to restore the environment when the current block exits.
290	It consists of two parts.
291	The first and top-most part is saved by the
292	.SM CALL
293	instruction in the interpreter.
294	This information is not present for the main program
295	as it is never ``called''.
296	The second part of the block mark is created by the
297	.SM BEG
298	begin block operator which also allocates and clears the
299	local variable storage.
300	The format of these blocks is represented in Figure 1.2.
301	.sp
302	.so fig1.2.n
303	.PP
304	The data saved by the
305	.SM CALL
306	operator includes the line number
307	.I lino
308	of the point of call,
309	which is printed if the program execution terminates abnormally;
310	the location counter
311	.I lc
312	giving the return address;
313	and the current display entry address
314	.I dp
315	at the time of call.
316	.PP
317	The
318	.SM BEG
319	begin operator saves the previous display contents at the level
320	of this block, so that the display can be restored on block exit.
321	A pointer to 10 bytes of information giving the first eight characters of the
322	name of this block and its beginning line number is also saved.
323	This information is stored in the intepretor object code in-line after the
324	.SM BEG
325	operator.
326	It is used in printing a post-mortem backtrace.
327	The saved file name and buffer reference are necessary because of
328	the input/output structure
329	(this is discussed in detail in
330	sections 3.3 and 3.4).
331	The top of stack reference gives the value the stack pointer should
332	have when there are no expression temporaries on the stack.
333	It is used for a consistency check in the
334	.SM LINO
335	line number operators in the interpreter, which occurs before
336	each statement executed.
337	This helps to catch bugs in the interpreter, which often manifest
338	themselves by leaving the stack non-empty between statements.
339	.PP
340	Note that there is no explicit static link here.
341	Thus to set up the display correctly after a non-local
342	.B goto
343	statement one must ``unwind''
344	through all the block marks on the stack to rebuild the display.
345	.NH 2
346	Arguments and return values
347	.PP
348	A function returns its value into a space reserved by the calling
349	block.
350	Arguments to a
351	.B function
352	are placed on top of this return area.
353	For both
354	.B procedure
355	and
356	.B function
357	calls, arguments are placed at the end of the expression evaluation area
358	of the caller.
359	When a
360	.B function
361	completes, expression evaluation can continue
362	after popping the arguments to the
363	.B function
364	off the stack,
365	exactly as if the function value had been ``loaded''.
366	The arguments to a
367	.B procedure
368	are also popped off the stack by the caller
369	after its execution terminates.
370	.KS
371	.PP
372	As a simple example consider the following stack structure
373	for a call to a function
374	.I f,
375	of the form ``f(a)''.
376	.so fig1.3.n
377	.KE
378	.PP
379	If we suppose that
380	.I f
381	returns a
382	.I real
383	and that
384	.I a
385	is an integer,
386	the calling sequence for this function would be:
387	.DS
388	.TS
389	lp-2w(8) l.
390	PUSH -8
391	RV4 \fIa\fR
392	CALL \fIf\fR
393	POP 4
394	.TE
395	.DE
396	.ZP
397	Here we use the operator
398	.SM PUSH
399	to clear space for the return value,
400	load
401	.I a
402	on the stack with an ``rvalue'' operator,
403	call the function,
404	pop off the argument
405	.I a ,
406	and can then complete evaluation of the containing expression.
407	The operations used here will be explained in section 2.
408	.PP
409	If the function
410	.I f
411	were given by
412	.LS
413	10 \*bfunction\fR f(i: integer): real;
414	11 \*bbegin\fR
415	12 f := i
416	13 \*bend\fR;
417	.LE
418	then
419	.I f
420	would have code sequence:
421	.DS
422	.TS
423	lp-2w(8) l.
424	BEG 0
425	"f"
426	11
427	LV \fIl\fR,20
428	RV4 \fIl\fR,16
429	AS48
430	END
431	.TE
432	.DE
433	.ZP
434	Here the
435	.SM BEG
436	operator takes 12 bytes of inline data.
437	The first word indicates the amount of local variable storage, here none.
438	The succeeding two lines give the name of the block and the line number
439	of the
440	.B begin
441	for error traceback.
442	The
443	.SM BEG
444	operator places a pointer to the name and line number in the block mark.
445	.PP
446	The body of the
447	.B function
448	here involved taking an address of the
449	.B function
450	result variable
451	.I f
452	using the address of operator
453	.SM LV .
454	.I a .
455	The next operation in the interpretation of this function is the loading
456	of the value of
457	.I i .
458	.I I
459	is at the level of the
460	.B function
461	.I f ,
462	here symbolically
463	.I l,
464	and the first variable in the local variable area.
465	.PP
466	The
467	.B function
468	completes by assigning the 4 byte integer on the stack to the 8 byte
469	return location, hence the
470	.SM AS48
471	assignment operator, and then uses the
472	.SM END
473	operator to exit the current block.
474	.NH 2
475	The main interpreter loop
476	.PP
477	We can now describe the main interpreter loop.
478	It is actually quite short:
479	.DS
480	loop:
481	\fBmovb\fR (lc)+,r0
482	\fBadd\fR r0,r0
483	\fBmovb\fR (lc)+,r3
484	\fBjmp\fR *optab(r0)
485	.DE
486	.ZP
487	First the main operation code is extracted from the first byte of
488	the instruction.
489	The code will be a small integer in the range -128 to 127.
490	It is then doubled to make a word index into the operation table.
491	Note that the sub-operation code is placed in register 3, and is thus available
492	for use by the individual operation sequences.
493	The hardware also leaves the condition codes set based on the value of this
494	subop.
495	The code will be discussed in section 2.1.
496	.PP
497	The label
498	.I optab
499	is in the middle of a branch table which has one operation address
500	per word.
501	The table is generated automatically from an abstract machine
502	instruction list.
503	The address of the instruction at
504	.I loop
505	is always contained in the register variable
506	.I lp
507	so that a return to the main interpreter loop both is quick and occupies
508	little space.
509	The return is thus a ``jmp\ (lp)'' instruction,
510	defined for mnemonic value as the operator ``return'' in the intepreter, i.e.
511	.DS
512	return = 115
513	.DE
514	so that one can write the mnemonic ``return'' at the end of an interpreter
515	code sequence.
516	.NH 2
517	Errors
518	.PP
519	Errors during interpretation cause a subroutine call to an error routine
520	in a conventional fashion. An earlier version of the interpreter
521	more compactly represented the raising of these conditions by using emulator
522	traps (\s-2EMT\s0s), a form of system call otherwise unused by \s-2UNIX\s0.
523	Errors were assigned small integer numbers and then referred to
524	symbolically in the interpreter.
525	The \s-2UNIX\s0 assember,
526	.I as ,
527	provides a mechanism for defining the opcode ``error'' to be an ``emt,'' i.e.:
528	.DS
529	error = 104000 ^ sys
530	.DE
531	Thus there were many lines like
532	.DS
533	.TS
534	lw(8) lp-2.
535	\fBerror\fR ESTKOVFLO
536	.TE
537	.DE
538	in the interpreter.
539	These cause a process fault,
540	which is trapped by the system and passed to the label
541	.I onemt
542	in the interpreter which fetches the error code byte from the
543	.SM EMT
544	instruction and calls the procedure
545	.I error
546	with this argument.
547	.I Error
548	processes the error condition,
549	printing an appropriate error message and, usually, a backtrace.
550	.PP
551	In order that the interpreter run on a standard \s-2UNIX\s0 without
552	using non-standard system calls to fetch the \s-2EMT\s0 code
553	when running in separated instruction and data spaces,
554	the current version of the interpreter places the error code in a global
555	variable, before doing an
556	.SM EMT .
557	Thus the
558	.SM EMT
559	is used to compactly transfer control, but not for argument transmission.