Commit | Line | Data |
---|---|---|
4b9ccde7 C |
1 | ." franz.n -[Thu Jun 17 11:01:27 1982 by jkf]- |
2 | ." this file will contain a description of the franz lisp system. | |
3 | ." sort of a systems manual. | |
4 | .de Fr | |
5 | F\s-2RANZ\s0 L\s-2ISP\s0 | |
6 | .. | |
7 | .tp | |
8 | .(l C | |
9 | .sz +2 | |
10 | \fBThe Franz Lisp System\fP | |
11 | .sz -2 | |
12 | .sp 2v | |
13 | \fIby | |
14 | \fIJohn K. Foderaro\fP | |
15 | .)l | |
16 | .sp 3v | |
17 | .tl ''Abstract'' | |
18 | .br | |
19 | This document describes the | |
20 | .Fr | |
21 | system written at The University of California | |
22 | at Berkeley. | |
23 | Included are descriptions of the memory management, interpreter | |
24 | and compiler, the conventions used within the C coded kernel | |
25 | and those within the compiled code. | |
26 | This does not duplicate the information found in The Franz Lisp Manual. | |
27 | .++ C '\s-2Draft\s+2'\fBThe\ Franz\ Lisp\ System\fP'%' | |
28 | .+c Characteristics\ of\ \F\s-2RANZ\ \s0L\s-2ISP\s0 | |
29 | .ls 1 | |
30 | .pp | |
31 | There is no formal standard for lisp systems, thus each lisp | |
32 | system is almost guaranteed to be different from any other lisp system. | |
33 | In this section we will examine the design decisions which most often | |
34 | characterize a lisp system. | |
35 | This | |
36 | focus does not imply that all other parts of the language are generally | |
37 | agreed upon. | |
38 | In fact, there is nothing sacred to lisp system designers. | |
39 | For example, one usually identifies the symbol | |
40 | .i nil | |
41 | with the empty list, | |
42 | and one usually thinks of lisp as a dynamically scoped language. | |
43 | Both of these conventions are are not true in the lisp dialect | |
44 | .i NIL | |
45 | currently being written at MIT. | |
46 | As another example, one imagines that a lisp system must | |
47 | use garbage collection to reclaim storage. | |
48 | The lisp dialect | |
49 | .i ZetaLisp | |
50 | that is running on Lisp Machines doesn't | |
51 | normally use a garbage collector | |
52 | because of the way in which it allocates its address space. | |
53 | It is faster to reboot the machine than to do a garbage collection. | |
54 | In most lisp dialects the lisp expressions are written in | |
55 | parenthesised form. In | |
56 | .i Portable | |
57 | .i Standard | |
58 | .i Lisp | |
59 | (PSL) at the University of Utah, programs are written | |
60 | primarily | |
61 | in an Algol-like | |
62 | syntax. | |
63 | .pp | |
64 | Thus some of the discussion to follow indicates not so much | |
65 | .i unique | |
66 | charateristics of | |
67 | .Fr | |
68 | but instead, how decisions were made. | |
69 | .sh 2 Scoping\ and\ Binding | |
70 | .pp | |
71 | The | |
72 | .Fr | |
73 | interpreter | |
74 | uses | |
75 | .i dynamic | |
76 | .i scoping , | |
77 | that is, after A calls B, B can access all of the | |
78 | variables that A allocated | |
79 | as well | |
80 | as those that A can access, provided B doesn't allocate variables of the same | |
81 | names. | |
82 | There are two popular ways of implementing dynamic scoping: | |
83 | .i deep | |
84 | .i binding | |
85 | and | |
86 | .i shallow | |
87 | .i binding . | |
88 | Note that we will use the terms variable allocation and lambda | |
89 | binding interchangeably in this report. | |
90 | The former term is less language-specific than the latter. | |
91 | .sh +1 Deep\ Binding | |
92 | .pp | |
93 | In a deep binding implementation, when a variable is allocated, | |
94 | the name of the variable and a space for its value are pushed on the | |
95 | top of a stack. | |
96 | When a program asks for the value of a variable, | |
97 | the lisp system must search down the stack for the first occurrence of | |
98 | the variable. | |
99 | This system offers great flexibility for the | |
100 | programmer since the state of his program | |
101 | can be described by the contents of the stack. | |
102 | It is thus possible to save the context of a program by just saving | |
103 | the stack or in some cases just a pointer to | |
104 | a place in the stack. | |
105 | The problem with deep binding is that it is time-consuming to search | |
106 | down the stack for the value of a variable, and, | |
107 | as a result, most systems | |
108 | modify the deep binding model by giving variables a global value | |
109 | cell and allowing programs to set and access the global cell. | |
110 | Some implementations of Interlisp use deep binding. | |
111 | .sh +0 Shallow\ Binding | |
112 | .pp | |
113 | In a shallow binding implementation, each variable has a global value cell | |
114 | which contains the current value of the variable. | |
115 | When a variable is allocated inside a program, its name | |
116 | and old value are pushed on a stack called a | |
117 | .i bindstack . | |
118 | The variable is then given a new value. | |
119 | Throughout the lifetime of the allocation, the current value of the | |
120 | variable can be found in the global value cell associated with the variable. | |
121 | When leaving the context of the variable's allocation, the old value | |
122 | is restored from the | |
123 | .i bindstack . | |
124 | A shallow binding scheme makes it much cheaper to access | |
125 | the values of variables, however it is much more difficult to | |
126 | save the state and to restore it. | |
127 | .pp | |
128 | .Fr | |
129 | and most other lisps which are not derived from Interlisp | |
130 | use shallow binding. | |
131 | Some versions of Interlisp use shallow binding. | |
132 | .sh -1 Lisp\ Compiler | |
133 | .pp | |
134 | Dynamic scoping often is not necessary and it is never cheap, even | |
135 | in a shallow binding implementation. | |
136 | Thus the | |
137 | .Fr | |
138 | compiler prefers to do lexical scoping rather than dynamic | |
139 | scoping. | |
140 | If the user does not specify otherwise, when a | |
141 | function is compiled, all variables | |
142 | allocated will be | |
143 | put on a local stack and will | |
144 | .i not | |
145 | be accessible by any | |
146 | function that this function calls. | |
147 | This | |
148 | convention results in faster code being generated by | |
149 | the compiler, but if the | |
150 | user is not careful, it can result in compiled and interpreted code | |
151 | working differently. | |
152 | In some of the new lisp designs, such as | |
153 | .i NIL | |
154 | and | |
155 | .i Common | |
156 | .i Lisp | |
157 | the semantics of compiled and interpreted code have been | |
158 | unified by | |
159 | transferring the compiler semantics (lexical scoping) to the interpreter. | |
160 | This is a rather large step | |
161 | if dynamic scoping is no | |
162 | longer an option, and it is not clear whether the resulting | |
163 | language should be called 'lisp'. | |
164 | .sh +0 Data\ Types | |
165 | .pp | |
166 | A complete list of data types in | |
167 | .Fr | |
168 | can be found in the first chapter of the | |
169 | .Fr | |
170 | Manual. | |
171 | The most important ones are described below. | |
172 | .pp | |
173 | Lisp is a list processing language and the basic data type is the list | |
174 | cell. | |
175 | A list cell also goes by the name of cons cell, pair, or dotted pair (dtpr). | |
176 | It contains pointers to two lisp objects, and these pointers can | |
177 | be accessed via the | |
178 | .i car | |
179 | and | |
180 | .i cdr | |
181 | functions. | |
182 | Each pointer requires four (8-bit) bytes and thus the list cell is | |
183 | eight bytes large. | |
184 | The cdr pointer is stored first since this makes it easier | |
185 | to 'cdr down a list' using the addressing modes of the VAX. | |
186 | .pp | |
187 | The symbol cell is used to implement variables. It contains a pointer | |
188 | to a string that is the print name, a cell to store a value, a cell to | |
189 | store the function definition, a cell to store a pointer to the property | |
190 | list and one pointer that the list system uses if it stores | |
191 | the symbol in the oblist. | |
192 | .sh +0 Memory\ Management | |
193 | .pp | |
194 | A lisp system must be able to determine the type of an | |
195 | object at run time. | |
196 | The method used to determine the type influences the | |
197 | way storage is managed and garbage collection is done. | |
198 | Next, we will look at three possible places | |
199 | to store the type information. | |
200 | .sh +1 Typed\ data | |
201 | .pp | |
202 | The type of the data object could be placed in the object itself, much | |
203 | as Pascal stores the type of a variant record in the record itself. | |
204 | This could result is a large amount of storage being used to store | |
205 | type information. | |
206 | No lisp system that we know of uses this method exclusively. | |
207 | .sh +0 Typed\ pointers | |
208 | .pp | |
209 | Every reference to a lisp object could contain the type of the object | |
210 | referenced. | |
211 | This is a good idea in a machine like an IBM 370 where only | |
212 | part of machine word is used by the addressing hardware. | |
213 | Lisps that use typed pointers generally use a | |
214 | .i heap | |
215 | memory management scheme and a | |
216 | .i copying | |
217 | garbage collector. | |
218 | In a heap scheme, | |
219 | all memory is divided by a pointer (called the | |
220 | heap pointer) separating lisp | |
221 | data and free space. | |
222 | When a new lisp object is needed, it is formed at the | |
223 | dividing line and then | |
224 | the heap pointer is incremented | |
225 | past the new object. | |
226 | Garbage collection is done by maintaining two separate spaces for lisp | |
227 | data, only one of which is used at any one time. | |
228 | When one fills up, all active lisp objects are copied to the | |
229 | other space, leaving the first space totally free. | |
230 | Subsequent allocations are done from the second space, until it fills up, | |
231 | at which point | |
232 | all active data in the second space is copied to the first space. | |
233 | The advantage of the copying garbage collector is that the data space | |
234 | is compacted, which will improve virtual memory behavior. | |
235 | The disadvantage is that | |
236 | half the data space is always unused. | |
237 | .pp | |
238 | PSL on a PDP-10 uses this scheme, as does Lisp 370 on | |
239 | an IBM 370. | |
240 | PSL and NIL on the VAX will also use this scheme. | |
241 | Since the VAX hardware | |
242 | uses the entire word for address | |
243 | calculation, | |
244 | PSL and NIL | |
245 | must mask off the type bits | |
246 | of a pointer before using it to reference memory. | |
247 | .sh +0 Typed\ pages | |
248 | .pp | |
249 | The final alternative is to allocate data objects by pages rather than | |
250 | individually. | |
251 | Each page contains only one type of object and there is a table that | |
252 | keeps track of the type of data object in each page. | |
253 | The address of an object tells which page the object is on and | |
254 | the page number tells which type the object is. | |
255 | Since a whole page of objects is allocated when only one | |
256 | object is necessary, | |
257 | the rest of the objects in the page are put on a free list. | |
258 | This method of allocation is sometimes referred to as | |
259 | .i bibop | |
260 | for BIg Bag Of Pages. | |
261 | The garbage collector for bibop is usually | |
262 | a traditional | |
263 | .i mark | |
264 | and | |
265 | .i sweep | |
266 | algorithm. | |
267 | All active data objects are marked and | |
268 | then each page is swept linearly with | |
269 | the free cells being put on the free list and the used cells | |
270 | left alone. | |
271 | The advantage of mark and sweep over a copying garbage collector is that | |
272 | the mark phase is cheaper because data objects do not have to be copied. | |
273 | Also, all of memory can be used since there is no requirement for two spaces. | |
274 | The disadvantage is that a sweep phase is required in a mark and sweep | |
275 | but one is not required in a copying garbage collector. | |
276 | The sweep phase is expensive because it has to alter most data pages | |
277 | while building a free list. | |
278 | This operation | |
279 | can be expensive in a virtual memory environment. | |
280 | Alternatives to the sweep phase have been proposed in | |
281 | [Foderaro+Fateman Characterization of Vax Macsyma]. | |
282 | .pp | |
283 | .Fr | |
284 | uses a bibop memory allocator and a mark and sweep garbage collector, | |
285 | as does Maclisp (on a PDP-10). | |
286 | The reason that | |
287 | .Fr | |
288 | uses bibop is primarily due to the VAX architecture, | |
289 | which makes typed pointers | |
290 | expensive to implement. | |
291 | Also, typed pointers would make it difficult to pass lisp values | |
292 | to programs written in other languages, some of which may not have | |
293 | the ability to extract the address of a lisp object from a typed pointer. | |
294 | .sh -1 Construction | |
295 | .pp | |
296 | The | |
297 | .Fr | |
298 | system is built by adding lisp code to a C-coded kernel. | |
299 | The C-coded kernel is a working lisp interpreter, although it has few of | |
300 | the functions a lisp system needs. | |
301 | The lisp code provides most of the functions, the top level and error | |
302 | handlers. | |
303 | The lisp compiler is also written in lisp, | |
304 | but is usually | |
305 | run as a separate process. | |
306 | .sh +0 Unique\ features | |
307 | .pp | |
308 | .Fr | |
309 | can dynamically load and execute functions defined in the other languages | |
310 | which run on the VAX. | |
311 | It uses two different dynamic loaders. | |
312 | One is part of the lisp system and is used for lisp only, it is called | |
313 | .i fasl | |
314 | (for fast loader). | |
315 | The other is the Unix loader, ld, which is used for loading in | |
316 | C, Fortran or Pascal code. | |
317 | Loading code with fasl | |
318 | is much faster than with ld | |
319 | since fasl benefits from | |
320 | the simplicity of a lisp object file. | |
321 | .+c Data\ Structures | |
322 | .sh 0 _ | |
323 | .nr $1 \n(ch | |
324 | .pp | |
325 | In this chapter we shall examine the data structures which are most | |
326 | important to the | |
327 | .Fr | |
328 | system. | |
329 | First, though we will see how the lisp system uses the address space. | |
330 | .sh 2 Physical\ layout | |
331 | .pp | |
332 | As a Unix process, lisp has three address regions: text, data and stack. | |
333 | Text begins at location 0 and continues up to 0x3b73c (0x means hex number). | |
334 | The data segment begins on the next page boundary and continues | |
335 | up to a limit set by the configuration parameters (currently 0x2fd000). | |
336 | Lisp does not begin running with such a large data segment, rather it grows | |
337 | when necessary. | |
338 | The stack segment begins at address 0x7fffffff and grows downward to a | |
339 | maximum size of one-half megabyte. | |
340 | .pp | |
341 | The text segment stores the instructions for the C kernel as well as read-only | |
342 | data. | |
343 | The read-only data for lisp are | |
344 | the symbol nil, the i/o ports, and the small integer | |
345 | table. | |
346 | The symbol nil is stored at location 0 which makes it very easy to test | |
347 | whether a value is nil. | |
348 | The problem with storing nil in read-only space is that a special case | |
349 | must be made for nil's property list, which is the only thing in the | |
350 | nil symbol that the user is permitted to alter. | |
351 | .pp | |
352 | The fixnums -1024 through 1023 are stored sequentially, with 0 being | |
353 | stored at 0x1400. | |
354 | .Fr | |
355 | doesn't have any 'immediate' lisp data, that is data whose value is | |
356 | stored as part of the reference to the data. | |
357 | But, by storing some of the fixnums in a known place, we can achieve | |
358 | some of the benefits of immediate data: | |
359 | A program can use | |
360 | .i eq | |
361 | as a test for a fixnum in the range -1024 to 1023. | |
362 | In the majority of cases, when asked | |
363 | to allocate a fixnum, the system can return a pointer into this table | |
364 | and bypass the memory allocator. | |
365 | .sh +0 Typetable | |
366 | .pp | |
367 | .Fr | |
368 | uses the typed pages (or bibop) | |
369 | method of type determination. | |
370 | The | |
371 | .i typetable | |
372 | is an array of bytes ( | |
373 | .i chars | |
374 | in C lingo). | |
375 | This table describes the type of all pages, from page 0 where nil is stored, | |
376 | up to the end of data space. | |
377 | Thus there are many entries that describe the types of the pages which | |
378 | make up the C kernel. | |
379 | In order to reduce the wasted space in the typetable, | |
380 | we could have made the typetable begin typing pages at the start of data | |
381 | space and make a special case of the pages that contain nil and | |
382 | the small integer table. | |
383 | However, the | |
384 | effect of this change would be that type checks | |
385 | (which are done in-line in compiled code) would | |
386 | be longer and slower. Since type checking happens quite | |
387 | frequently, we felt it was better to waste the space in the | |
388 | typetable in order to save execution time and instruction space. | |
389 | .pp | |
390 | Each page on a VAX is 512 bytes, and thus to find the type of an | |
391 | object given the address of it, | |
392 | it is only necessary to shift the address right 9 bits | |
393 | and index the typetable array offset by one. | |
394 | The offset by one is required because the value -4, which is an illegal | |
395 | address, is used as a sentinel value to indicate an illegal value. | |
396 | Thus when we take the type of -4 we will be indexing the table by -1 | |
397 | and we want to retrieve the first byte in the table (which contains | |
398 | the value UNBOUND). | |
399 | The C macro which retrieves the type is this (from file h/global.h): | |
400 | .(b I | |
401 | ||
402 | #define TYPE(a1) ((typetable+1)[(int)(a1) >> 9]) | |
403 | ||
404 | .)b | |
405 | This is code generated by the lisp compiler to check if the type | |
406 | code of an argument (stored at 0(r10)) is a symbol (which is type | |
407 | code 1): | |
408 | .(b I | |
409 | ||
410 | ashl $-9,0(r10),r0 | |
411 | cmpb _typetable+1[r0],$1 | |
412 | ||
413 | .)b | |
414 | .pp | |
415 | The type codes which are stored in the typetable are these: | |
416 | .ts 2i 4i 6i | |
417 | .(b I | |
418 | UNBO -1\tSTRNG 0\tATOM 1\tINT 2 | |
419 | DTPR 3\tDOUB 4\tBCD 5\tPORT 6 | |
420 | ARRAY 7\tOTHER 8\tSDOT 9\tVALUE 10 | |
421 | HUNK2 11\tHUNK4 12\tHUNK8 13\tHUNK16 14 | |
422 | HUNK32 15\tHUNK64 16\tHUNK128 17 | |
423 | .)b | |
424 | The names given above are the C kernel internal names. | |
425 | ATOM is symbol, INT is fixnum, DTPR is list cell, | |
426 | DOUB is flonum, BCD is binary, | |
427 | SDOT is bignum and all the HUNKn types are just known as hunk to | |
428 | the user. | |
429 | .sh +0 Stacks | |
430 | .pp | |
431 | .Fr | |
432 | uses three stacks: the | |
433 | .i C | |
434 | .i runtime | |
435 | stack, the | |
436 | .i namestack | |
437 | and the | |
438 | .i bindstack . | |
439 | The C runtime stack is the stack part of the address space | |
440 | and the other two stacks are stored in the data space. | |
441 | .sh +1 C\ runtime\ stack | |
442 | The C-coded kernel uses this | |
443 | stack in the same way as a typical C program | |
444 | use the stack: | |
445 | storing return addresses | |
446 | and non-lisp-object arguments to subroutines, | |
447 | saving registers, | |
448 | and storing local variables within a function. | |
449 | The lisp system stores | |
450 | .i catch | |
451 | .i frames | |
452 | on this stack as well (to be described later). | |
453 | .pp | |
454 | The lisp system assumes that there are no lisp data on the stack and | |
455 | thus the use of this stack | |
456 | by a program is unrestricted. | |
457 | As will be discussed later on, the | |
458 | .b calls | |
459 | and | |
460 | .b ret | |
461 | instructions are used for most subroutine calls and returns. | |
462 | These instructions expect the stack to look a certain way. | |
463 | .sh +0 Namestack | |
464 | .pp | |
465 | The namestack contains only lisp data. | |
466 | It is used to pass arguments to functions and to hold | |
467 | local (lexically scoped) data within lisp functions. | |
468 | It is also used as a temporary storage spot for lisp data | |
469 | which must be protected from garbage collection. | |
470 | .pp | |
471 | A slight digression on the garbage collector: | |
472 | The person who writes code for the lisp system must always be aware of | |
473 | his greatest enemy: the garbage collector. | |
474 | Whenever a function is called that could possibly allocate more lisp | |
475 | data, one must assume that it | |
476 | when it attempts to allocate space, the garbage collector | |
477 | will be invoked and that it will take away everything that isn't protected | |
478 | from garbage collection. | |
479 | The objects that are protected from garbage collection are: the | |
480 | namestack, the bindstack, the oblist (table of interned symbols), | |
481 | and the compiler literals. Objects that are only | |
482 | referred to by values in registers or | |
483 | the C runtime stack will not be seen by the mark phase of the | |
484 | garbage collector and will be swept up during the sweep phase. | |
485 | .pp | |
486 | Back to the namestack: | |
487 | The first free element on the namestack is pointed to by the | |
488 | C variable | |
489 | .i np . | |
490 | This variable is always stored in the VAX register r6. | |
491 | Another pointer into the namestack is the C variable | |
492 | .i lbot . | |
493 | It is always stored in VAX register r7. | |
494 | Its use will be described in the section on calling conventions. | |
495 | .sh +0 Bindstack | |
496 | .pp | |
497 | The bindstack is a stack of lisp object pairs: symbol and | |
498 | value. | |
499 | It is used to implement shallow binding. | |
500 | When a symbol is lambda-bound, the symbol and its current value are put | |
501 | on this stack. | |
502 | Then the symbol can be given a new value. | |
503 | When the context of the lambda binding is over, the symbol and value pair | |
504 | are popped from the stack and the symbol is given its old value. | |
505 | The C variable | |
506 | .i bnp | |
507 | points to the first free entry on the bindstack. | |
508 | In the C code, the following macros lambda-bind a symbol to a value | |
509 | and restore the old value: | |
510 | .(b | |
511 | #define PUSHDOWN(atom,value)\e | |
512 | {bnp->atm=(atom);\e | |
513 | bnp++->val=(atom)->a.clb;\e | |
514 | (atom)->a.clb=value;\e | |
515 | if(bnp>bnplim) binderr();} | |
516 | ||
517 | #define POP\e | |
518 | {--bnp; bnp->atm->a.clb=bnp->val;} | |
519 | .)b | |
520 | .sh -1 Bitmap | |
521 | .pp | |
522 | The bitmap is used in garbage collection to hold | |
523 | the mark bits for the mark and sweep garbage collector. | |
524 | As its names implies, it is viewed as a collection of bits. | |
525 | The minimum size of a lisp data object is 4 bytes, thus | |
526 | there are 128 of those on a VAX page of 512 bytes. | |
527 | Since there are 8 bits in a byte, it takes 16 bytes to obtain 128 bits. | |
528 | Therefore the size of the bitmap in bytes is 16 | |
529 | times the maximum number of | |
530 | pages. | |
531 | Like the typetable, the bitmap keeps track of marks from the | |
532 | bottom of memory, not the bottom of data space. | |
533 | The bitmap and the typetable are static structures. | |
534 | It is their size, which is determined when the C kernel is built, | |
535 | which determines the size to which the data segment can grow. | |
536 | .sh +0 Transfer\ Tables | |
537 | .pp | |
538 | Transfer tables are used by compiled lisp code as a transfer vector to | |
539 | other functions. | |
540 | A transfer table consists of pairs of entries: symbol and pointer to | |
541 | function. | |
542 | When a compiled lisp function calls another (non-local) function, it | |
543 | calls indirectly through an entry in the transfer table. | |
544 | Depending on the setting of certain status variables, the call may | |
545 | bring control into a function linkage routine or directly to | |
546 | the function desired (if that function is a compiled lisp or C function). | |
547 | .pp | |
548 | Transfer tables serve a number of useful purposes. | |
549 | .np | |
550 | They allow compiled code to call interpreted code | |
551 | or compiled code using the same calling sequence. | |
552 | .np | |
553 | They allow the selection of which function to call to be determined | |
554 | at runtime based on the name of the function. | |
555 | In most other languages, this selection process is done at either | |
556 | compile or load time. | |
557 | .np | |
558 | They allow the user to run in a debugging mode where all calls | |
559 | from compiled code go through the interpreter. | |
560 | Once control reaches the interpreter, it is easier to debug. | |
561 | .np | |
562 | They allow the user to run in a production mode, where all calls | |
563 | from compiled to other compiled code are done without function | |
564 | lookup overhead. | |
565 | .np | |
566 | They allow the user to switch between debugging and production modes | |
567 | at runtime with one function call. | |
568 | .pp | |
569 | Transfer tables will be described further in the section on | |
570 | the lisp compiler. | |
571 | .sh +0 Catch\ frames | |
572 | .pp | |
573 | Lisp has a number of forms of non-local transfers. | |
574 | Among them are | |
575 | .i throw , | |
576 | .i error , | |
577 | .i return | |
578 | and | |
579 | .i go . | |
580 | If a program is willing to accept a non-local transfer, it puts a | |
581 | .i catch | |
582 | .i frame | |
583 | on the stack which describes which type of transfer it | |
584 | accepts. | |
585 | The catch frame describes the current state of the registers, | |
586 | the variables np, lbot, and bnp, and also contains entries that describe | |
587 | what kinds of non-local transfers the function will accept. | |
588 | After creating a catch frame, the program goes on to evaluate forms. | |
589 | Should the evaluation of one of those forms result in a non-local | |
590 | transfer to the catch frame that this program has set up, the | |
591 | system will restore the namestack and bindstack to the way | |
592 | they were when the program put the catch frame on the stack (by using | |
593 | np and bnp) and will return control to the program (setting | |
594 | the variable retval to describe why it returned). | |
595 | This is described further in the | |
596 | section on interpreter conventions. | |
597 | .pp | |
598 | The C variable | |
599 | .i errp | |
600 | points to the most recent catch frame, and then each catch frame points | |
601 | to the previous catch frame. | |
602 | .sh +0 oblist | |
603 | .pp | |
604 | Normally when symbols are created they are | |
605 | .i interned , | |
606 | that is they are put in a hash table called an oblist (or obarray). | |
607 | The purpose of the oblist is to insure that if you type a | |
608 | symbol's name to the reader, you will always get the same symbol. | |
609 | Also it protects the symbol from garbage collection. | |
610 | The oblist is simply a hash table with buckets, with a hash link being | |
611 | part of the symbol data structure. | |
612 | Currently the hash table is not accessible to a lisp program, but with | |
613 | a little work it could be. | |
614 | .+c Interpreter | |
615 | .sh 0 _ | |
616 | .nr $1 \n(ch | |
617 | .pp | |
618 | The interpreter is composed of these parts: | |
619 | .ip \fIcore:\fP | |
620 | The functions eval, apply and funcall. | |
621 | .ip \fIstack\ management:\fP | |
622 | The code to create catch frames and handle non-local transfers. | |
623 | .ip \fImemory\ management:\fP | |
624 | The code to allocate and garbage collect memory. | |
625 | .ip \fIthe\ functions:\fP | |
626 | The user callable functions that expect lisp arguments and return | |
627 | lisp values. | |
628 | .pp | |
629 | In the above list, the first three are written mainly in C (with a little | |
630 | assembler) and the last is written mainly in Lisp with a little | |
631 | bit in C | |
632 | and a very small amount in assembler. | |
633 | .pp | |
634 | The core functions are the center of activity in the interpreter. | |
635 | The | |
636 | .i eval | |
637 | function given a lisp expression to evaluate. | |
638 | It must determine if it is a simple expression such as a symbol or number | |
639 | whose value eval can determine immediately, or if it is an function | |
640 | calling expression. | |
641 | If the form is a function call, eval must determine if the arguments should | |
642 | be evaluated and if so eval must recursively call itself to evaluate the | |
643 | arguments and then stack them. | |
644 | If the function called is to be interpreted, eval must lambda-bind the | |
645 | formal variables of the function to the arguments just stacked. | |
646 | If the function being called is compiled, eval just calls the function and | |
647 | lets the function do the lambda binding if it wants to. | |
648 | .pp | |
649 | The | |
650 | .i apply | |
651 | function is passed two lisp objects: | |
652 | a function to call (either a symbol or a functional object) | |
653 | and a list of arguments already evaluated. | |
654 | It will do lambda binding if the function to call is to | |
655 | be interpreted. | |
656 | .pp | |
657 | The | |
658 | .i funcall | |
659 | function is passed any number of lisp objects, | |
660 | the first of which is the function to call, and the rest are the | |
661 | arguments to the function | |
662 | which have already been evaluated and stacked. | |
663 | Funcall will do lambda binding if the function to call is | |
664 | to be interpreted. | |
665 | .pp | |
666 | When compiled lisp code calls a function | |
667 | which must be interpreted, it enters the interpreter through the | |
668 | funcall function. | |
669 | The interpreter may go to compiled code through either eval, apply or | |
670 | funcall, though most often it goes through eval. | |
671 | .sh 2 Conventions | |
672 | .pp | |
673 | These are the conventions used in interpreted and compiled code. | |
674 | .sh +1 C\ conventions | |
675 | .pp | |
676 | Our conventions are extensions of the C calling conventions, so we | |
677 | describe here the conventions for C. | |
678 | The VAX has 16 general | |
679 | purpose registers. | |
680 | Registers r12 through r15 are reserved | |
681 | for use by the hardware because we use the | |
682 | .i calls | |
683 | subroutine call. | |
684 | Registers r0-r5 can be used by a program without saving them first. | |
685 | The result of a function call is returned in r0. | |
686 | Registers r11-r6 are allocated (from high to low) | |
687 | by the C compiler when a | |
688 | .i register | |
689 | declaration is made in the C code. | |
690 | Registers r11-r6 must be | |
691 | saved upon entry and restored upon exit if they are used within a function. | |
692 | On the VAX it is very easy to preserve registers | |
693 | since it is done automatically | |
694 | by the hardware if the | |
695 | .i calls | |
696 | (or | |
697 | .i callg ) | |
698 | instruction is used. | |
699 | The first short word (two bytes) of a subroutine is a | |
700 | register save mask which | |
701 | tells which registers should be saved (on the C runtime stack) upon | |
702 | entry and restored when a | |
703 | .i ret | |
704 | instruction is done. | |
705 | .sh +0 np\ and\ lbot | |
706 | .pp | |
707 | Earlier we mentioned that the C variables np and lbot are | |
708 | always stored in | |
709 | registers r6 and r7. | |
710 | It is very difficult to globally assign a variable to a register | |
711 | in the C language (no external register declarations | |
712 | are permitted) | |
713 | and thus we have to filter | |
714 | the output of the C compiler and convert all occurrences of 'np' to 'r6' | |
715 | and all occurrences of 'lbot' to 'r7'. | |
716 | This is only half the job, though. | |
717 | We also must modify the register save masks for those routines which | |
718 | alter the value of np or lbot to insure that those registers get | |
719 | saved and restored upon function entry and exit. | |
720 | This is done in the C code by putting | |
721 | .(b C | |
722 | Savestack(n) | |
723 | .)b | |
724 | at the beginning of a routine which will alter np or lbot and which | |
725 | also allocates n register variables. | |
726 | Also in that routine, before the function returns, we put | |
727 | .(b C | |
728 | Restorestack() | |
729 | .)b | |
730 | This is not really necessary in the VAX, but it is there for other | |
731 | machine implementations which don't have a | |
732 | .i ret | |
733 | function which restores registers. | |
734 | The calls to Savestack are recognized by a filter which processes | |
735 | the output | |
736 | of the C compiler and fixes up | |
737 | the save masks. | |
738 | .sh +0 Function\ calling | |
739 | .pp | |
740 | The following text describes what the conventions are for | |
741 | calling | |
742 | compiled lisp functions, whether they were written in lisp or C. | |
743 | We look at it from the viewpoint of the called function. | |
744 | .pp | |
745 | Upon entry | |
746 | to the compiled functon, | |
747 | lbot points to the first argument and np points to the | |
748 | first free spot on the namestack. | |
749 | If np equals lbot then there are no arguments. | |
750 | Recall that np will be in r6 and lbot in r7. | |
751 | The function is free to alter registers r0 through r5 and should return | |
752 | the result in r0. | |
753 | The function may alter registers r6 through r11 as long as their | |
754 | original values | |
755 | are restored when the function returns. | |
756 | The value of np should always | |
757 | point to the first free entry in the namestack. | |
758 | This is all that is required. | |
759 | The extra conventions followed by | |
760 | the lisp compiler | |
761 | in the code it generates are described in the next chapter. | |
762 | .sh +0 Catch\ frames | |
763 | .pp | |
764 | A catch frame saves the state of execution that a program | |
765 | might want to return to at some later time. | |
766 | A catch frame is stored on the C runtime stack, with the most recent | |
767 | frame pointed to by the C global variable errp. | |
768 | The information saved is | |
769 | .ip \fIargs\fP | |
770 | - one or two optional arguments. The lisp compiler always | |
771 | stacks two | |
772 | arguments since it must know exactly how large the frame is. | |
773 | .ip \fIclass\fP | |
774 | - the class describes what type of frame this is (described below). | |
775 | .ip \fIretaddr\fP | |
776 | - address to return to if returning from this frame | |
777 | .ip \fIolderrp\fP | |
778 | - pointer to next older catch frame on the stack | |
779 | .ip \fIbnp\fP | |
780 | - value of bnp (bindstack pointer) when the frame was created | |
781 | .ip \fInp\fP | |
782 | - value of np when the frame was created | |
783 | .ip \fIlbot\fP | |
784 | - value of lbot when the frame was created. | |
785 | .ip \fIsystem\ dependent\fP | |
786 | - the rest of the information stacked depends on the particular machine. | |
787 | In the case of the VAX, registers r13 through r8 are stacked. (r14 and | |
788 | r15 are the stack pointer and program counter; they are not saved since | |
789 | they can be calculated from the other information). | |
790 | .pp | |
791 | The information in a catch frame is put on the C runtime stack | |
792 | in the precise order given above, and the variable errp points not | |
793 | at the beginning of the entire frame, but to the lbot entry. | |
794 | Thus errp\ +\ 4 points to np. | |
795 | The classes of frames are described below. | |
796 | Each class is defined as a constant in the C code (h/frame.h) whose value | |
797 | is a small integer. | |
798 | .ip F_PROG\ [1] | |
799 | - takes no arguments. This frame marks the beginning of a piece of | |
800 | code which can accept a | |
801 | .i return | |
802 | or a | |
803 | .i go . | |
804 | The interpreter uses this to implement | |
805 | .i prog | |
806 | and | |
807 | .i do | |
808 | and for its primative break loop. | |
809 | The lisp compiler does not use catch frames for progs since it only | |
810 | permits | |
811 | .i returns | |
812 | and | |
813 | .i gos | |
814 | to occur within | |
815 | .i progs | |
816 | or | |
817 | .i dos | |
818 | and thus it can determine how to do the non-local transfer | |
819 | at compile time. | |
820 | .ip F_CATCH\ [2] | |
821 | - this takes one or two arguments and is used to implement both | |
822 | .i catch | |
823 | and | |
824 | .i errset . | |
825 | In both cases | |
826 | the first argument is the tag which is being caught, | |
827 | which in the case of an | |
828 | .i errset | |
829 | is symbol ER%all. | |
830 | An | |
831 | .i errset | |
832 | also | |
833 | supplies a second argument which is non nil if the error message | |
834 | should be printed. | |
835 | .ip F_RESET\ [3] | |
836 | - in the C kernel, the reset function | |
837 | is implemented as a non local transfer to a F_RESET catchframe. | |
838 | When the lisp system is built, the reset function is redefined to | |
839 | do a | |
840 | .i throw. | |
841 | Thus this type of catch frame is rarely used. | |
842 | .ip F_EVAL\ [4] | |
843 | - this has one argument, the form being evaluated. | |
844 | Since stacking this | |
845 | on every eval would be expensive, | |
846 | this type of catch frame is only stacked if a \fI(*rset\ t)\fP | |
847 | has been done and if the value of the symbol | |
848 | .i *rset | |
849 | is non nil. | |
850 | The form being evaluated is stacked so that | |
851 | the necessary information for the | |
852 | .i evalframe | |
853 | function is available and so that | |
854 | .i freturn | |
855 | can return from the context of any pending evaluation. | |
856 | .ip F_FUNCALL\ [5] | |
857 | - this is much like F_EVAL, | |
858 | except the one argument it takes is the | |
859 | name of the function to call. | |
860 | It has the same restrictions on when it is stacked as F_EVAL | |
861 | and for the same reasons. | |
862 | .pp | |
863 | In C, a frame is pushed on the stack with Pushframe, with a call | |
864 | of one of these forms: | |
865 | .(b | |
866 | errp = Pushframe(class); | |
867 | errp = Pushframe(class,arg1); | |
868 | errp = Pushframe(class,arg1,arg2); | |
869 | .)b | |
870 | After the call the C variable | |
871 | .i retval | |
872 | contains the value which describes why this function returned. | |
873 | You must remember that it is possible for this one function call to | |
874 | return more than once! | |
875 | The reasons it can return are these (from h/frame.h): | |
876 | .ip C_INITIAL\ [0] | |
877 | This is the initial call to set up the frame. | |
878 | .ip C_GO\ [1] | |
879 | This will only happen for F_PROG frames. | |
880 | In this case, | |
881 | the C variable lispretval contains a lisp symbol which is the tag | |
882 | to which control should be transferred. | |
883 | If the tag cannot be found in this prog or do body, the | |
884 | tag should be again thrown up the stack to a next higher prog or do. | |
885 | .ip C_RET\ [2] | |
886 | This will only happen for F_PROG frames. | |
887 | In this case, lispretval contains the value to return from this prog. | |
888 | .ip C_THROW\ [3] | |
889 | This will only happen for F_CATCH frames. | |
890 | In this case lispretval contains the value to return from this catch. | |
891 | .ip C_RESET\ [4] | |
892 | This will only happen for F_RESET frames. | |
893 | It simply means that a reset is being done. | |
894 | .ip C_FRETURN\ [5] | |
895 | This will only happen for F_EVAL and F_FRETURN frames. | |
896 | The variable lispretval contains the value to return from this | |
897 | call to | |
898 | .i eval | |
899 | or | |
900 | .i funcall . | |
901 | .pp | |
902 | The call to Pushframe is turned into a simple subroutine call (using | |
903 | the | |
904 | .i jsb | |
905 | instruction instead of the | |
906 | .i calls ) | |
907 | by the filters which alter the code coming out of the C compiler. | |
908 | Thus it may be useful to see what stacking a catch frame really looks | |
909 | like. | |
910 | Here is what the lisp compiler generates | |
911 | to stack the frame for \fI(*catch\ 'tag\ x)\fP | |
912 | .(b | |
913 | movl 0(r8),r0 #move 'tag to r0 | |
914 | pushl $0 # dummy argument | |
915 | pushl r0 # put tag argument on C runtime stack | |
916 | pushl $2 # push F_CATCH | |
917 | jsb _qpushframe # call Pushframe | |
918 | movl r0,_errp # move result into errp | |
919 | .)b | |
920 | .pp | |
921 | Every function which does a Pushframe() must also do a Popframe() | |
922 | before it returns to its caller. | |
923 | This simply removes the frame from the stack. | |
924 | In C it is written: | |
925 | .br | |
926 | .tl ''errp = Popframe()'' | |
927 | in the code generated by the lisp compiler it looks like: | |
928 | .(b | |
929 | movl _errp,r1 # put current errp in r1 | |
930 | movl 12(r1),_errp # put previous errp in errp | |
931 | addl3 $32,r1,sp # pop frame from stack | |
932 | .)b | |
933 | .pp | |
934 | Non-local transfers are done with the Inonlocalgo | |
935 | function. This function always takes three arguments. | |
936 | The | |
937 | first is the return type (one of the types mentioned | |
938 | above which begin with 'C_'). It will be assigned to retval. | |
939 | The second argument is the value to be assigned to lispretval, | |
940 | except in the case of a C_THROW, where the second argument | |
941 | is the tag to throw to and the third argument is the value | |
942 | to assign to lispretval. | |
943 | This function never returns. | |
944 | If it doesn't find a catch frame | |
945 | which matches what it is looking for, it signals an error. | |
946 | The function is called with | |
947 | .i calls | |
948 | and the arguments are stacked on the C runtime stack, not the | |
949 | namestack. | |
950 | .+c Liszt:\ The\ Lisp\ Compiler | |
951 | .sh 0 _ | |
952 | .nr $1 \n(ch | |
953 | .pp | |
954 | The purpose of compiling a lisp function is to create a representation of | |
955 | the function which can be evaluated in less time and perhaps take | |
956 | up less space. | |
957 | There are two approaches to lisp compilers. | |
958 | One is to convert the functions to a compact form, often called | |
959 | .i bytecodes | |
960 | which can be rapidly interpreted. | |
961 | Each bytecode represents a primitive operation in the lisp system. | |
962 | This approach is simple to implement but there is an time penalty | |
963 | in using an interpreter. | |
964 | The other approach is to compiled to machine code. | |
965 | In general, this is not as portable as the bytecode approach but the result | |
966 | generally runs faster. | |
967 | There are two ways of compiling to machine code. | |
968 | One is to place arguments to functions in registers. | |
969 | If there are more arguments than registers, the arguments are put on | |
970 | a stack and a special type of call is made. | |
971 | This method is used in Maclisp. | |
972 | The other method is to assume a stack model, in which | |
973 | arguments to a function are placed on a stack. | |
974 | This is what the | |
975 | .Fr | |
976 | compiler Liszt does. | |
977 | The stack model made it much easier to write parts of the lisp | |
978 | system in the C langauge. | |
979 | It also simplifies the garbage collector since the mark phase doesn't | |
980 | have to locate and peruse all saved registers looking for lisp data. | |
981 | .sh 2 File\ Compiler | |
982 | .pp | |
983 | When a file of lisp expressions is loaded, | |
984 | the | |
985 | .i load | |
986 | function repeatedly reads and evaluates the forms in the | |
987 | file. | |
988 | Some of these evaluations may result in functions being defined, | |
989 | and others may use the functions just defined or previously defined. | |
990 | The job of the lisp compiler is to create an object file, which, | |
991 | when read in | |
992 | by | |
993 | .i fasl, | |
994 | acts just like it was | |
995 | .i load ed | |
996 | in, except when a function is defined it is defined in machine | |
997 | code, not as a lisp expression. | |
998 | This is quite a bit different from what compilers do in other languages | |
999 | and it is done this way to make it easier to | |
1000 | switch between compiled and | |
1001 | interpreted code. | |
1002 | .sh +0 Differences\ between\ compiled\ and\ interpreted | |
1003 | .pp | |
1004 | There are some differences, though, between compiled and interpreted code. | |
1005 | Local variables in compiled code are lexically scoped (put on the | |
1006 | namestack and inaccessible to other functions) unless the variable | |
1007 | has been declared | |
1008 | .i special. | |
1009 | The declaration: | |
1010 | .(b | |
1011 | \fI(declare (special x y))\fP | |
1012 | .)b | |
1013 | declares both x and y to be special from this point in the file on. | |
1014 | The declaration | |
1015 | .(b | |
1016 | \fI(declare (specials t))\fP | |
1017 | .)b | |
1018 | declares all variables to be special. | |
1019 | .pp | |
1020 | Lisp has a very powerful macro definition system. | |
1021 | The compiler will macro expand all it can, whereas the interpreter | |
1022 | expands macros when necessary but never replaces a macro call with | |
1023 | its expansion. | |
1024 | Thus if you redefine a macro, the compiled code that uses it will | |
1025 | still work the same way, but the interpreted code will use the | |
1026 | new definition. | |
1027 | Also, when compiling a file, macro definitions must be | |
1028 | done before any call on the macro is encountered during compiling. | |
1029 | In the interpreter, | |
1030 | macros can be defined or redefined anytime up until | |
1031 | they are used. | |
1032 | Thus the interpreter is far freer about macro definitions than | |
1033 | the compiler. | |
1034 | This could cause programs which work interpreted to | |
1035 | fail compiled. | |
1036 | .sh +0 Top\ level\ algorithm | |
1037 | .pp | |
1038 | The top level algorithm of the compiler is simply this: | |
1039 | .np | |
1040 | read a lisp expression from the input file | |
1041 | .np | |
1042 | macro expand the top level of the form as much as possible | |
1043 | .np | |
1044 | if the form is a function definition (a list beginning with | |
1045 | the symbol 'def') | |
1046 | then compile it. | |
1047 | .np | |
1048 | if the form is not a function definition then put it on a list of | |
1049 | forms that will be evaluated when the file is | |
1050 | .i fasl ed | |
1051 | in. | |
1052 | .np | |
1053 | if not at end of file, go to step 1. | |
1054 | .pp | |
1055 | In reality, step 3 is a bit more complex. | |
1056 | If the definition is of a macro, then the form will be evaluated | |
1057 | immediately, thus adding the macro definition to the compiler's | |
1058 | environment. | |
1059 | If the lisp variable | |
1060 | .i macros | |
1061 | is t then the macro will also be compiled. | |
1062 | There are also some forms like \fI(progn\ 'compile\ ...)\fP | |
1063 | and \fI(eval-when\ ()\ )\fP which determine when the | |
1064 | forms get compiled and evaluated. | |
1065 | See the lisp manual for details. | |
1066 | .sh +0 Expression\ Compilation | |
1067 | .pp | |
1068 | Just as the interpreter is centered around the | |
1069 | .i eval | |
1070 | function, the lisp compiler is centered around the function | |
1071 | .i d-exp | |
1072 | whose job it is to compile the expression passed to it. | |
1073 | The lisp compiler is one pass. | |
1074 | Before a call to d-exp returns, | |
1075 | all the compiled code for a form has been computed and written | |
1076 | to a file. | |
1077 | One reason that the lisp compiler can be a single pass compiler | |
1078 | is that the assembler which reads the compiler's output is | |
1079 | a two pass assembler. | |
1080 | .sh +1 global\ state\ variables | |
1081 | .pp | |
1082 | There are a number of variables that maintain the state of | |
1083 | the compilation process. | |
1084 | These variables are declared special and are thus dynamically scoped | |
1085 | and visible to any function within the compiler. | |
1086 | When d-exp | |
1087 | is called their meanings are: | |
1088 | .ip \fIv-form\fP | |
1089 | - contains the expression to be compiled. | |
1090 | .ip \fIg-loc\fP | |
1091 | - tells where the result of evaluating this expression should be put. | |
1092 | If g-loc is nil, then the value returned is unimportant and shouldn't | |
1093 | be put anywhere. | |
1094 | .ip \fIg-cc\fP | |
1095 | - controls conditional branches. | |
1096 | If g-cc is non nil, then it is a list cell whose value has | |
1097 | either a non-null car or non-null cdr but not both. | |
1098 | If the car is non-nil then | |
1099 | if the value of the expression held in | |
1100 | v-form is non-nil, a branch should be done to the symbol | |
1101 | stored in \fI(car\ g-cc)\fP. | |
1102 | If the cdr is non-nil then if the value of v-form | |
1103 | is nil, a branch should be done to the symbol stored in | |
1104 | \fI(cdr\ g-cc)\fP. Since at | |
1105 | every conditional | |
1106 | branch control should either jump or continue, the car and cdr will | |
1107 | never both be specified. | |
1108 | .ip \fIg-ret\fP | |
1109 | - is non nil if the result of evaluating v-form will be returned as the | |
1110 | value of the function we are evaluating. | |
1111 | This is used to allow liszt to detect | |
1112 | when tail recursion removal is possible. | |
1113 | .ip \fIg-locs\fP | |
1114 | - maintains current information about the state of the stacks: | |
1115 | the bindstack (for specials), the namestack (for locals) and the | |
1116 | C runtime stack (for catch frames) | |
1117 | The form of g-locs is a stack of tokens stored as a list. | |
1118 | The meaning of each token is as follows: | |
1119 | .br | |
1120 | \fInil\fP - this represents an unnamed object on the namestack. This happens | |
1121 | when calling functions, when the arguments are stacked prior | |
1122 | to a function call. | |
1123 | .br | |
1124 | \fI<symbol>\fP - the given symbol is the name of a local variable on the namestack. | |
1125 | .br | |
1126 | \fI(prog . <n>)\fP - prog statement which binds <n> special variables | |
1127 | .br | |
1128 | \fI(do . <n>)\fP - head of a do statement which binds <n> special variables | |
1129 | .br | |
1130 | \fI(catcherrset . 0)\fP - catch or errset frame on stack | |
1131 | .br | |
1132 | \fI(lambda . <n>)\fP - lambda expression which binds <n> special variables | |
1133 | .ip \fIg-labs\fP | |
1134 | - this is a stack of labels. | |
1135 | There is one entry in g-labs for every entry which is a list | |
1136 | in g-locs. | |
1137 | the forms of the entries are: | |
1138 | .br | |
1139 | \fInil\fP - no labels in this form | |
1140 | .br | |
1141 | \fI(endlab (real . gen) (real2 . gen2) ...)\fP - endlab is the label to go to | |
1142 | to get out of this body. After this label will be code | |
1143 | to unbind specials and pop off locals. | |
1144 | The 'real' labels are those found in the code. the gen | |
1145 | labels are those generated and put in the assembler code. | |
1146 | .sh +0 Function\ compilation | |
1147 | .pp | |
1148 | The action d-exp takes when compiling an expression depends on the | |
1149 | type of expression. | |
1150 | For atoms (symbols and numbers) the compilation is very simple, it | |
1151 | is just a matter of putting the value where g-loc specifies and | |
1152 | then jumping if specified by g-cc. | |
1153 | When the expression is a list, d-exp first macro | |
1154 | expands the form and then looks at the first element of | |
1155 | the list (if the list has not macro expanded to an atom). | |
1156 | If the first element is a symbol then the list is | |
1157 | is a function call. | |
1158 | If the function is one of the functions which liszt knows how to open compile | |
1159 | then liszt will call the particular routine designated to open | |
1160 | compile this function. | |
1161 | There are two classes of functions within liszt that | |
1162 | do open compiling. | |
1163 | The first class, the fl-expr class, are distinguished by names which | |
1164 | begin with c-. | |
1165 | These functions always generate | |
1166 | code which returns the result in r0. | |
1167 | They are not equipped to obey the contents of g-loc and g-cc. | |
1168 | Thus d-exp, after calling one of these functions, must do what | |
1169 | is necessary to obey g-loc and g-cc. | |
1170 | The other class of open compiling functions, the fl-exprcc class (whose | |
1171 | names begin with cc-), | |
1172 | handle g-loc and g-cc. | |
1173 | As a result these are usually much more complex and generate better code. | |
1174 | .sh -1 Register\ Conventions | |
1175 | .pp | |
1176 | The register conventions used by liszt are an extension of those | |
1177 | used by the C code. | |
1178 | .ip \fIr0\fP | |
1179 | - return value placed here | |
1180 | .ip \fIr1,r2,r3,r4\fP | |
1181 | - scratch registers. | |
1182 | When long strings of | |
1183 | .i car's | |
1184 | or | |
1185 | .i cdr's | |
1186 | are done (such as | |
1187 | .i caddadr ) | |
1188 | these registers are used in a least-recently-used fashion to access down | |
1189 | the list. | |
1190 | The compiler keeps track of the values in these registers but must | |
1191 | assume that they are garbage after a function is called. | |
1192 | .ip \fIr5\fP | |
1193 | - fixnum accumulator. | |
1194 | There a number of functions which work on fixnum's only and these | |
1195 | usually put their result in r5. | |
1196 | The assembler code function | |
1197 | .i qnewint | |
1198 | which returns a pointer to a cell containing a fixnum value (after | |
1199 | checking if it is in the fixnum table), expects its argument to be | |
1200 | in r5. | |
1201 | .ip \fIr6\fP | |
1202 | np | |
1203 | .ip \fIr7\fP | |
1204 | lbot. | |
1205 | When calling a function, this register is set just before the function | |
1206 | call, and after the function call its value is used to reset the value | |
1207 | of np in order to pop the arguments off the namestack. | |
1208 | .ip \fIr8\fP | |
1209 | the literal table pointer. | |
1210 | The compiler generates a table of all the literal lisp data | |
1211 | which the compiled code might access. | |
1212 | Upon function entry a pointer to the base of this table is put in r8. | |
1213 | For example, if we compile \fI(setq\ x\ 'foo)\fP then we will | |
1214 | need a pointer to the lisp symbol | |
1215 | .i foo | |
1216 | and if the symbol | |
1217 | .i x | |
1218 | as been declared special we will also need a pointer to | |
1219 | .i x . | |
1220 | .ip \fIr10\fP | |
1221 | - upon function entry the value of lbot (r7) is put in r10. | |
1222 | This allows us to reference the arguments to our function while | |
1223 | using lbot to call other function. | |
1224 | .sh +0 Addresses | |
1225 | There are four types of addresses used internally in Franz: | |
1226 | symbolic, intermediate addresses (iadr), extended intermediate (eiadr) | |
1227 | and vax assembler format. | |
1228 | .sh +1 Symbolic | |
1229 | .pp | |
1230 | These are the names of lisp objects, such as `a', `foo', 34, | |
1231 | or 3.234234. | |
1232 | A name is either a local variable, a special variable or a | |
1233 | number. A number is either a small fixnum, large fixnum, | |
1234 | bignum or floating point number. | |
1235 | .sh +0 Intermediate\ address\ (IADR) | |
1236 | .pp | |
1237 | This type of address is generated from a symbolic address by | |
1238 | .i d-loc, | |
1239 | .i d-loclit | |
1240 | and the routines | |
1241 | .i d-simple | |
1242 | and | |
1243 | .i d-rsimple | |
1244 | which | |
1245 | call them. The forms are | |
1246 | .ip \fINil\fP | |
1247 | - the location or value of nil. | |
1248 | .ip \fIT\fP | |
1249 | - the location or value of t. | |
1250 | .ip \fIreg\fP | |
1251 | - register r0, which is where the result of function calls | |
1252 | are stored. | |
1253 | .ip \fIstack\fP | |
1254 | - the address pointed to by np with np incremented after | |
1255 | the value is stored. (i.e (r6)+) | |
1256 | .ip \fIunstack\fP | |
1257 | - the address one word below np (np is decremented before | |
1258 | accessing. (i.e. -(r6)) | |
1259 | .ip \fI<symbol>\fP | |
1260 | - this is just <symbol>. This allows strange forms to be | |
1261 | represented directly. | |
1262 | .ip \fI(stack\ n)\fP | |
1263 | - The n'th value on the namestack for this function. | |
1264 | The first value 0(r10) is (stack 1). | |
1265 | .ip \fI(vstack\ n)\fP | |
1266 | - The cdr of the n'th value on the namestack. | |
1267 | That is, (stack 1) is *0(r10) | |
1268 | .ip \fI(bind\ n)\fP | |
1269 | - The value of the n'th value in | |
1270 | the literal table. If | |
1271 | this refers to a symbol, then this is the value | |
1272 | of the symbol. | |
1273 | If this refers to a list then this | |
1274 | this is the cdr of the list (although this is a rare | |
1275 | use). (bind 1) corresponds to *0(r8). | |
1276 | .ip \fI(lbind\ n)\fP | |
1277 | - The n'th value in the literal table. If | |
1278 | this refers to | |
1279 | object foo then this is the address of foo | |
1280 | in memory. | |
1281 | .ip \fI(fixnum\ n)\fP | |
1282 | - This is the address of small fixnum n in memory. | |
1283 | A small fixnum is in the range -1024 to 1023. | |
1284 | .ip \fI(immed\ n)\fP | |
1285 | - The is the immediate constant n. | |
1286 | .sh +0 extended\ intermediate\ address\ (EIADR) | |
1287 | .pp | |
1288 | This address is generated from an IADR by e-cvt. It | |
1289 | symbolically represents a vax address. | |
1290 | .ip \fI<atom>\fP | |
1291 | - This corresponds to <atom>. | |
1292 | .ip \fI(n\ <atom>)\fP | |
1293 | - This corresponds to n(<atom>) | |
1294 | (stack n+1) and (lbind n+1) are converted to these forms. | |
1295 | .ip \fI(n\ <atom1>\ <atom2>)\fP | |
1296 | - This corresponds to n(<atom1>)[<atom2>] | |
1297 | There is currently no IADR which generates this. | |
1298 | .ip \fI(*\ n\ <atom>)\fP | |
1299 | -This corresponds to *n(<atom>) | |
1300 | (vstack n+1) and (bind n+1) are converted to these forms. | |
1301 | .ip \fI(+\ <atom>)\fP | |
1302 | - This corresponds to (<atom>)+. | |
1303 | stack is converted to this form. | |
1304 | .ip \fI(-\ <atom>)\fP | |
1305 | - This corresponds to -(<atom>) | |
1306 | unstack is converted to this form. | |
1307 | .ip \fI($\ <numb>)\fP | |
1308 | - This corresponds to $<atom> | |
1309 | (immed <numb>) is converted to this form. | |
1310 | .ip \fI(#\ <numb>)\fP | |
1311 | - This corresponds to $<loc> where <loc> equals | |
1312 | the base of the fixnums (0x1400) plus 4 * <numb> | |
1313 | (fixnum <numb>) is converted to this form | |
1314 | .ip \fI(*\ #\ <numb>)\fP | |
1315 | - This corresponds to $<numb>. It is generated | |
1316 | by d-rsimple occasionally when you ask for the | |
1317 | cdr of a number (which you do sometimes when you | |
1318 | are compiling fixnum operators). | |
1319 | .sh +0 Vax\ Unix\ assembler\ addresses | |
1320 | .pp | |
1321 | The addresses are printed from a EIADR by e-cvtas. The conversions | |
1322 | are shown in the above section. | |
1323 | .sh -1 Function\ calling\ convention | |
1324 | .sh +1 Function\ linkages | |
1325 | .pp | |
1326 | The function associated with a symbol is stored in the function | |
1327 | definition slot of the symbol. If the function slot contains a | |
1328 | list then the function is to be interpreted and its 'car' will | |
1329 | be lambda, nlambda, lexpr or macro. If the function is compiled then | |
1330 | the function definition slot will contain a binary object. | |
1331 | A binary object | |
1332 | is composed of two independent parts: the discipline and the entry. | |
1333 | The discipline is one of: | |
1334 | .ip \fIlambda\fP | |
1335 | - a function whose arguments should be evaluated. | |
1336 | .ip \fInlambda\fP | |
1337 | - a function whose arguments should not be evaluated but | |
1338 | which should be passed as a list | |
1339 | .ip \fImacro\fP | |
1340 | - a function which should be passed the unevaluated form | |
1341 | being evaluated and whose result should be evaluated. | |
1342 | .ip \fI\"subroutine\"\fP | |
1343 | - a foreign function subroutine | |
1344 | .ip \fI\"integer-function\"\fP | |
1345 | - a foreign function returning an integer | |
1346 | .ip \fI\"real-function"\fP | |
1347 | - a foreign function returning a flonum. | |
1348 | .pp | |
1349 | A lexpr is not in the list as there is no difference to the caller | |
1350 | between compiled lambda's and compiled lexprs. | |
1351 | .sh +0 Transfer\ tables | |
1352 | Most calls from compiled code to other lisp functions go through | |
1353 | transfer tables. A transfer table entry is a pair: symbol, address | |
1354 | of routine. | |
1355 | When another lisp function is called it uses the | |
1356 | .i calls | |
1357 | instruction which will | |
1358 | indirectly jump through the address in the transfer table. This | |
1359 | address may point to the desired function or it may point to the | |
1360 | interface routine. If a call ends up in the interface routine, | |
1361 | then that routine will trace back through the call stack and eventually | |
1362 | reach the tranfer table entry that it was 'called through'. Since the | |
1363 | transfer table entry contains a symbol which names the function | |
1364 | to be called, the interface routine can determine which routine | |
1365 | was to have been called. If that routine is compiled then the | |
1366 | interface routine can modify the transfer table so that a call | |
1367 | through that table entry will go directly to the desired function. | |
1368 | If the routine to be called is interpreted, or if the user has | |
1369 | requested that transfer linkages should be disabled, then the | |
1370 | interface routine will go through the 'funcall' function | |
1371 | in the interpreter to continue the call. | |
1372 | .sh +0 calling\ sequence\ in\ the\ compiler: | |
1373 | .pp | |
1374 | When transfer tables are used | |
1375 | .(b | |
1376 | \fBforeach\fP arg | |
1377 | \fBcompute\fP arg and stack result using (r6)+ | |
1378 | for example: movl r0,(r6)+ | |
1379 | movab -n(r6),r7 where n = 4*number of args to fcn | |
1380 | calls $0,*trantb+m where m is the index of the function | |
1381 | in the transfer table. | |
1382 | movl r7,r6 restore r6 | |
1383 | .)b | |
1384 | .pp | |
1385 | The compiler supports local functions, which are function accessible | |
1386 | only within one file. | |
1387 | Because they are not accessible from C code, we can use a very fast | |
1388 | call and return sequence when calling them. | |
1389 | To call a local function | |
1390 | .(b | |
1391 | \fBforeach\fP arg | |
1392 | \fBcompute\fP and stack using (r6)+ | |
1393 | jsb LOCALNAME go directly to the function, it will | |
1394 | restore r6 before it returns. | |
1395 | .)b | |
1396 | .pp | |
1397 | The beginning of each function looks as follows. | |
1398 | First for a non-lexpr function called in the standard way | |
1399 | (topsim is the label jumped to when tail merging, it will be unique | |
1400 | for each function; the brackets indicate the optional code which | |
1401 | exists if the -p switch is given to liszt); | |
1402 | .(b | |
1403 | .word 0x5c0 # save r6, r7, r8, r10 | |
1404 | [ movab mcounts,r0 # if profiling, mcounts replaced by fasl | |
1405 | jsb mcount ] | |
1406 | movab linker,r8 # set up r8 | |
1407 | movl r7,r10 # set up oldlbot | |
1408 | movab n(r10),r6 # n = 4*Number of args expected. | |
1409 | topsim: | |
1410 | .)b | |
1411 | .pp | |
1412 | Now for lexprs: | |
1413 | .(b | |
1414 | .word 0x5c0 | |
1415 | [ movab mcounts,r0 # if profiling. [mcounts replaced by fasl] | |
1416 | jsb mcount ] | |
1417 | movab linker,r8 # set up r8 | |
1418 | subl3 $4,r7,-(sp) # address one word below bottom of args | |
1419 | topsim: | |
1420 | movl r6,r10 # first free addr to arg base | |
1421 | subl3 r7,r6,r0 # number of args * 4 into r0 | |
1422 | movab 0x1400(r0),(r6)+ # stack boxed number of args | |
1423 | movl 0(r10),-(sp) # also store on stack so user can't clobber | |
1424 | .)b | |
1425 | .pp | |
1426 | And finally for local functions: | |
1427 | .(b | |
1428 | movl r10,-(sp) # save r10 | |
1429 | movab -n(r6),r10 # set up arg base using arg top | |
1430 | topsim: | |
1431 | .)b | |
1432 | .sh -1 Assembler\ file\ format | |
1433 | .pp | |
1434 | The liszt compiler generates a file which is in Unix assembler format. | |
1435 | The Unix assembler converts that file into an object file which fasl | |
1436 | then reads. | |
1437 | Although the object file generated is a standard Unix object | |
1438 | file (as defined in /usr/include/a.out.h), it is not of a | |
1439 | format that the Unix ld loader can understand. | |
1440 | This is because the requirements of a lisp object file are different | |
1441 | from an object file of other languages. | |
1442 | The | |
1443 | run time semantics of lisp and the fact that lisp data must be protected | |
1444 | from garbage collection are the principal differences. | |
1445 | The unconventional object file created by the unix assembler | |
1446 | is a result of the | |
1447 | unconventional assembler input file. | |
1448 | Next we will look at what must be put in the assembler file and how | |
1449 | it is put there. | |
1450 | .pp | |
1451 | The assembler file must contain | |
1452 | .ip \fIinstructions\fP | |
1453 | - vax assembler code for the compiled functions. | |
1454 | If there aren't any functions compiled, this can be empty. | |
1455 | .ip \fIliterals\fP | |
1456 | - lisp data which is referred to by compiled code | |
1457 | .ip \fItransfer\ table\fP | |
1458 | - the names of the functions which correspond to the calls through | |
1459 | the transfer table. | |
1460 | The instructions simply say 'call indirect through the nth transfer | |
1461 | table entry'. | |
1462 | .ip \fIfunction\ names\fP | |
1463 | - the names of the functions which are being | |
1464 | defined. | |
1465 | .ip \fIload\ time\ forms\fP | |
1466 | - in order to mimic the | |
1467 | .i load | |
1468 | function, fasl has to be able to evaluate lisp expressions at fasl | |
1469 | time, so we must be able to store lisp expressions in the assembler | |
1470 | file and indicate when they should be evaluated. | |
1471 | .pp | |
1472 | Based on the requirements above, the form of the assembler file | |
1473 | is as described below. | |
1474 | The assembler builds two segments: text and data. | |
1475 | We put all of our information in the text segment. | |
1476 | The compiler places some annotation strings in the data segment | |
1477 | so that the object file can be identified, however the data segment | |
1478 | is ignored by fasl. | |
1479 | The format is | |
1480 | .ip \fIcompiled\ instructions\fP | |
1481 | The instructions for each compiled (non-local) function begins with | |
1482 | .(b | |
1483 | .globl F00003 | |
1484 | F00003: | |
1485 | .word 0x5c0 | |
1486 | .)b | |
1487 | The globl declaration and the fact that the symbol name begins | |
1488 | with a F will tell fasl that this is the beginning of a lisp | |
1489 | function. | |
1490 | The symbols beginning with F must be unique within a file | |
1491 | but may be duplicated in other files. | |
1492 | The lisp name of the function will appear later. | |
1493 | Next the instructions for the function are given. | |
1494 | Only a small fixed set of external symbols may be referenced. | |
1495 | The list is found in the code for nfasl.c and the common | |
1496 | ones will be described below (soon). | |
1497 | Labels should be given a name beginning with L and should | |
1498 | be unique within the file. | |
1499 | .ip \fItable\ sizes\fP | |
1500 | somewhere in the file there should be a pair of 'set' assembler | |
1501 | pseudo ops which describe the sizes of the literal table and | |
1502 | transfer table. | |
1503 | The form is this | |
1504 | .(b | |
1505 | .set linker_size 4 | |
1506 | .set trans_size 3 | |
1507 | .)b | |
1508 | where linker_size is the number of entries in the literal table | |
1509 | which will be required and trans_size is the number of entries | |
1510 | in the transfer table which will be required. | |
1511 | Those tables will be built by fasl. | |
1512 | .ip \fIbinder\ table\fP | |
1513 | - this table describes the order that the functions should be | |
1514 | defined and forms evaluated. | |
1515 | It is a table of longwords beginning at the symbol bind_org | |
1516 | and ending when a -1 entry is seen. | |
1517 | The meaning of the entries will be described below. | |
1518 | .ip \fIlisp\ expression\ table\fP | |
1519 | - this is a table of null terminated strings beginning at the | |
1520 | symbol lit_org and ending at the symbol lit_end. | |
1521 | Each string is read by the lisp read function (using the | |
1522 | raw readtable). | |
1523 | The first linker_size expressions are put in the literal table. | |
1524 | The next trans_size expressions are the names of the | |
1525 | functions to put in the transfer table. | |
1526 | The remaining expressions correspond to the entries in the | |
1527 | binder table. | |
1528 | The binder entries are processed, one by one. | |
1529 | Provided that the binder entry is not -1, an expression | |
1530 | is read from the lisp expression table. | |
1531 | Then if the binder table entry is 0, that expression is the name of | |
1532 | a lambda type lisp function. | |
1533 | A binary object is created, the discipline is set to lambda and | |
1534 | the function address is set to the lexically next function | |
1535 | defined in the | |
1536 | file. | |
1537 | If the binder entry is 1 then this is an nlambda function, and | |
1538 | if the entry is 2 then this is a macro function. | |
1539 | If the entry is 99 then the expression just read should be evaluated | |
1540 | by the lisp function eval. | |
1541 | .pp | |
1542 | The lisp compiler normally puts the assembler format output file in /tmp | |
1543 | and removes it when it is done. | |
1544 | The -S switch will tell liszt to write the | |
1545 | assembler file in the same place as | |
1546 | the source file, and with the same root name but a .s extension. | |
1547 | The -C file will tell the lisp compiler to comment the file as it | |
1548 | generates it, making it easier to understand what is going on. | |
1549 | Assume that the following is file foo.l: | |
1550 | .(b | |
1551 | (defun foo (x) (bar y) (baz k)) | |
1552 | (print (foo 3)) | |
1553 | (def test (nlambda (l) (print 'hithere) (foo 3))) | |
1554 | .)b | |
1555 | we now compile it with -SC | |
1556 | .(b | |
1557 | .globl F00007 #(fcn lambda foo) | |
1558 | F00007: | |
1559 | .word 0x5c0 | |
1560 | movab linker,r8 | |
1561 | movl r7,r10 | |
1562 | movab 4(r10),r6 | |
1563 | L00008: | |
1564 | movl *0(r8),(r6)+ #(calling bar) | |
1565 | #(from y to stack) | |
1566 | movab -4(r6),r7 | |
1567 | calls $0,*trantb+0 | |
1568 | movl r7,r6 | |
1569 | movl *4(r8),(r6)+ #(calling baz) | |
1570 | #(from k to stack) | |
1571 | movab -4(r6),r7 | |
1572 | calls $0,*trantb+8 | |
1573 | movl r7,r6 | |
1574 | ret | |
1575 | .globl F00009 #(fcn nlambda test) | |
1576 | F00009: | |
1577 | .word 0x5c0 | |
1578 | movab linker,r8 | |
1579 | movl r7,r10 | |
1580 | movab 4(r10),r6 | |
1581 | L00010: | |
1582 | movl 8(r8),(r6)+ #(calling print) | |
1583 | #(from 'hithere to stack) | |
1584 | movab -4(r6),r7 | |
1585 | calls $0,*trantb+16 | |
1586 | movl r7,r6 | |
1587 | movl $5132,(r6)+ #(calling foo) | |
1588 | #(from (fixnum 3) to stack) | |
1589 | movab -4(r6),r7 | |
1590 | calls $0,*trantb+24 | |
1591 | movl r7,r6 | |
1592 | ret | |
1593 | bind_org: | |
1594 | .set linker_size, 3 | |
1595 | .set trans_size, 4 | |
1596 | .long 0 | |
1597 | .long 99 | |
1598 | .long 1 | |
1599 | .long -1 | |
1600 | lit_org: | |
1601 | .asciz "y" | |
1602 | .asciz "k" | |
1603 | .asciz "hithere" | |
1604 | .asciz "bar" | |
1605 | .asciz "baz" | |
1606 | .asciz "print" | |
1607 | .asciz "foo" | |
1608 | .asciz "foo" | |
1609 | .asciz "(print (foo 3))" | |
1610 | .asciz "test" | |
1611 | lit_end: | |
1612 | .data # this is just for documentation | |
1613 | .asciz "@(#)Compiled by Lisp Compiler 7.1 on Sun Feb 21 17:51:54 1982" | |
1614 | .asciz "@(#)decl.l 1.5 2/10/82" | |
1615 | .asciz "@(#)array.l 1.1 9/25/81" | |
1616 | .asciz "@(#)datab.l 1.1 9/25/81" | |
1617 | .asciz "@(#)expr.l 1.1 9/25/81" | |
1618 | .asciz "@(#)io.l 1.1 9/25/81" | |
1619 | .asciz "@(#)funa.l 1.3 2/10/82" | |
1620 | .asciz "@(#)funb.l 1.2 2/10/82" | |
1621 | .asciz "@(#)func.l 1.2 2/10/82" | |
1622 | .asciz "@(#)tlev.l 1.4 10/24/81" | |
1623 | .asciz "@(#)fixnum.l 1.6 10/21/81" | |
1624 | .asciz "@(#)util.l 1.2 10/7/81" | |
1625 | .)b | |
1626 | .sh +0 functions\ callable\ from\ compiled\ lisp\ code | |
1627 | .sh +0 Object\ File\ Format | |
1628 | .sh +0 Fasl | |
1629 | ||
1630 | ||
1631 | ||
1632 | ||
1633 |