BSD 4_2 release
[unix-history] / usr / lisp / franz.n
CommitLineData
4b9ccde7
C
1." franz.n -[Thu Jun 17 11:01:27 1982 by jkf]-
2." this file will contain a description of the franz lisp system.
3." sort of a systems manual.
4.de Fr
5F\s-2RANZ\s0 L\s-2ISP\s0
6..
7.tp
8.(l C
9.sz +2
10\fBThe Franz Lisp System\fP
11.sz -2
12.sp 2v
13\fIby
14\fIJohn K. Foderaro\fP
15.)l
16.sp 3v
17.tl ''Abstract''
18.br
19This document describes the
20.Fr
21system written at The University of California
22at Berkeley.
23Included are descriptions of the memory management, interpreter
24and compiler, the conventions used within the C coded kernel
25and those within the compiled code.
26This does not duplicate the information found in The Franz Lisp Manual.
27.++ C '\s-2Draft\s+2'\fBThe\ Franz\ Lisp\ System\fP'%'
28.+c Characteristics\ of\ \F\s-2RANZ\ \s0L\s-2ISP\s0
29.ls 1
30.pp
31There is no formal standard for lisp systems, thus each lisp
32system is almost guaranteed to be different from any other lisp system.
33In this section we will examine the design decisions which most often
34characterize a lisp system.
35This
36focus does not imply that all other parts of the language are generally
37agreed upon.
38In fact, there is nothing sacred to lisp system designers.
39For example, one usually identifies the symbol
40.i nil
41with the empty list,
42and one usually thinks of lisp as a dynamically scoped language.
43Both of these conventions are are not true in the lisp dialect
44.i NIL
45currently being written at MIT.
46As another example, one imagines that a lisp system must
47use garbage collection to reclaim storage.
48The lisp dialect
49.i ZetaLisp
50that is running on Lisp Machines doesn't
51normally use a garbage collector
52because of the way in which it allocates its address space.
53It is faster to reboot the machine than to do a garbage collection.
54In most lisp dialects the lisp expressions are written in
55parenthesised form. In
56.i Portable
57.i Standard
58.i Lisp
59(PSL) at the University of Utah, programs are written
60primarily
61in an Algol-like
62syntax.
63.pp
64Thus some of the discussion to follow indicates not so much
65.i unique
66charateristics of
67.Fr
68but instead, how decisions were made.
69.sh 2 Scoping\ and\ Binding
70.pp
71The
72.Fr
73interpreter
74uses
75.i dynamic
76.i scoping ,
77that is, after A calls B, B can access all of the
78variables that A allocated
79as well
80as those that A can access, provided B doesn't allocate variables of the same
81names.
82There are two popular ways of implementing dynamic scoping:
83.i deep
84.i binding
85and
86.i shallow
87.i binding .
88Note that we will use the terms variable allocation and lambda
89binding interchangeably in this report.
90The former term is less language-specific than the latter.
91.sh +1 Deep\ Binding
92.pp
93In a deep binding implementation, when a variable is allocated,
94the name of the variable and a space for its value are pushed on the
95top of a stack.
96When a program asks for the value of a variable,
97the lisp system must search down the stack for the first occurrence of
98the variable.
99This system offers great flexibility for the
100programmer since the state of his program
101can be described by the contents of the stack.
102It is thus possible to save the context of a program by just saving
103the stack or in some cases just a pointer to
104a place in the stack.
105The problem with deep binding is that it is time-consuming to search
106down the stack for the value of a variable, and,
107as a result, most systems
108modify the deep binding model by giving variables a global value
109cell and allowing programs to set and access the global cell.
110Some implementations of Interlisp use deep binding.
111.sh +0 Shallow\ Binding
112.pp
113In a shallow binding implementation, each variable has a global value cell
114which contains the current value of the variable.
115When a variable is allocated inside a program, its name
116and old value are pushed on a stack called a
117.i bindstack .
118The variable is then given a new value.
119Throughout the lifetime of the allocation, the current value of the
120variable can be found in the global value cell associated with the variable.
121When leaving the context of the variable's allocation, the old value
122is restored from the
123.i bindstack .
124A shallow binding scheme makes it much cheaper to access
125the values of variables, however it is much more difficult to
126save the state and to restore it.
127.pp
128.Fr
129and most other lisps which are not derived from Interlisp
130use shallow binding.
131Some versions of Interlisp use shallow binding.
132.sh -1 Lisp\ Compiler
133.pp
134Dynamic scoping often is not necessary and it is never cheap, even
135in a shallow binding implementation.
136Thus the
137.Fr
138compiler prefers to do lexical scoping rather than dynamic
139scoping.
140If the user does not specify otherwise, when a
141function is compiled, all variables
142allocated will be
143put on a local stack and will
144.i not
145be accessible by any
146function that this function calls.
147This
148convention results in faster code being generated by
149the compiler, but if the
150user is not careful, it can result in compiled and interpreted code
151working differently.
152In some of the new lisp designs, such as
153.i NIL
154and
155.i Common
156.i Lisp
157the semantics of compiled and interpreted code have been
158unified by
159transferring the compiler semantics (lexical scoping) to the interpreter.
160This is a rather large step
161if dynamic scoping is no
162longer an option, and it is not clear whether the resulting
163language should be called 'lisp'.
164.sh +0 Data\ Types
165.pp
166A complete list of data types in
167.Fr
168can be found in the first chapter of the
169.Fr
170Manual.
171The most important ones are described below.
172.pp
173Lisp is a list processing language and the basic data type is the list
174cell.
175A list cell also goes by the name of cons cell, pair, or dotted pair (dtpr).
176It contains pointers to two lisp objects, and these pointers can
177be accessed via the
178.i car
179and
180.i cdr
181functions.
182Each pointer requires four (8-bit) bytes and thus the list cell is
183eight bytes large.
184The cdr pointer is stored first since this makes it easier
185to 'cdr down a list' using the addressing modes of the VAX.
186.pp
187The symbol cell is used to implement variables. It contains a pointer
188to a string that is the print name, a cell to store a value, a cell to
189store the function definition, a cell to store a pointer to the property
190list and one pointer that the list system uses if it stores
191the symbol in the oblist.
192.sh +0 Memory\ Management
193.pp
194A lisp system must be able to determine the type of an
195object at run time.
196The method used to determine the type influences the
197way storage is managed and garbage collection is done.
198Next, we will look at three possible places
199to store the type information.
200.sh +1 Typed\ data
201.pp
202The type of the data object could be placed in the object itself, much
203as Pascal stores the type of a variant record in the record itself.
204This could result is a large amount of storage being used to store
205type information.
206No lisp system that we know of uses this method exclusively.
207.sh +0 Typed\ pointers
208.pp
209Every reference to a lisp object could contain the type of the object
210referenced.
211This is a good idea in a machine like an IBM 370 where only
212part of machine word is used by the addressing hardware.
213Lisps that use typed pointers generally use a
214.i heap
215memory management scheme and a
216.i copying
217garbage collector.
218In a heap scheme,
219all memory is divided by a pointer (called the
220heap pointer) separating lisp
221data and free space.
222When a new lisp object is needed, it is formed at the
223dividing line and then
224the heap pointer is incremented
225past the new object.
226Garbage collection is done by maintaining two separate spaces for lisp
227data, only one of which is used at any one time.
228When one fills up, all active lisp objects are copied to the
229other space, leaving the first space totally free.
230Subsequent allocations are done from the second space, until it fills up,
231at which point
232all active data in the second space is copied to the first space.
233The advantage of the copying garbage collector is that the data space
234is compacted, which will improve virtual memory behavior.
235The disadvantage is that
236half the data space is always unused.
237.pp
238PSL on a PDP-10 uses this scheme, as does Lisp 370 on
239an IBM 370.
240PSL and NIL on the VAX will also use this scheme.
241Since the VAX hardware
242uses the entire word for address
243calculation,
244PSL and NIL
245must mask off the type bits
246of a pointer before using it to reference memory.
247.sh +0 Typed\ pages
248.pp
249The final alternative is to allocate data objects by pages rather than
250individually.
251Each page contains only one type of object and there is a table that
252keeps track of the type of data object in each page.
253The address of an object tells which page the object is on and
254the page number tells which type the object is.
255Since a whole page of objects is allocated when only one
256object is necessary,
257the rest of the objects in the page are put on a free list.
258This method of allocation is sometimes referred to as
259.i bibop
260for BIg Bag Of Pages.
261The garbage collector for bibop is usually
262a traditional
263.i mark
264and
265.i sweep
266algorithm.
267All active data objects are marked and
268then each page is swept linearly with
269the free cells being put on the free list and the used cells
270left alone.
271The advantage of mark and sweep over a copying garbage collector is that
272the mark phase is cheaper because data objects do not have to be copied.
273Also, all of memory can be used since there is no requirement for two spaces.
274The disadvantage is that a sweep phase is required in a mark and sweep
275but one is not required in a copying garbage collector.
276The sweep phase is expensive because it has to alter most data pages
277while building a free list.
278This operation
279can be expensive in a virtual memory environment.
280Alternatives to the sweep phase have been proposed in
281[Foderaro+Fateman Characterization of Vax Macsyma].
282.pp
283.Fr
284uses a bibop memory allocator and a mark and sweep garbage collector,
285as does Maclisp (on a PDP-10).
286The reason that
287.Fr
288uses bibop is primarily due to the VAX architecture,
289which makes typed pointers
290expensive to implement.
291Also, typed pointers would make it difficult to pass lisp values
292to programs written in other languages, some of which may not have
293the ability to extract the address of a lisp object from a typed pointer.
294.sh -1 Construction
295.pp
296The
297.Fr
298system is built by adding lisp code to a C-coded kernel.
299The C-coded kernel is a working lisp interpreter, although it has few of
300the functions a lisp system needs.
301The lisp code provides most of the functions, the top level and error
302handlers.
303The lisp compiler is also written in lisp,
304but is usually
305run as a separate process.
306.sh +0 Unique\ features
307.pp
308.Fr
309can dynamically load and execute functions defined in the other languages
310which run on the VAX.
311It uses two different dynamic loaders.
312One is part of the lisp system and is used for lisp only, it is called
313.i fasl
314(for fast loader).
315The other is the Unix loader, ld, which is used for loading in
316C, Fortran or Pascal code.
317Loading code with fasl
318is much faster than with ld
319since fasl benefits from
320the simplicity of a lisp object file.
321.+c Data\ Structures
322.sh 0 _
323.nr $1 \n(ch
324.pp
325In this chapter we shall examine the data structures which are most
326important to the
327.Fr
328system.
329First, though we will see how the lisp system uses the address space.
330.sh 2 Physical\ layout
331.pp
332As a Unix process, lisp has three address regions: text, data and stack.
333Text begins at location 0 and continues up to 0x3b73c (0x means hex number).
334The data segment begins on the next page boundary and continues
335up to a limit set by the configuration parameters (currently 0x2fd000).
336Lisp does not begin running with such a large data segment, rather it grows
337when necessary.
338The stack segment begins at address 0x7fffffff and grows downward to a
339maximum size of one-half megabyte.
340.pp
341The text segment stores the instructions for the C kernel as well as read-only
342data.
343The read-only data for lisp are
344the symbol nil, the i/o ports, and the small integer
345table.
346The symbol nil is stored at location 0 which makes it very easy to test
347whether a value is nil.
348The problem with storing nil in read-only space is that a special case
349must be made for nil's property list, which is the only thing in the
350nil symbol that the user is permitted to alter.
351.pp
352The fixnums -1024 through 1023 are stored sequentially, with 0 being
353stored at 0x1400.
354.Fr
355doesn't have any 'immediate' lisp data, that is data whose value is
356stored as part of the reference to the data.
357But, by storing some of the fixnums in a known place, we can achieve
358some of the benefits of immediate data:
359A program can use
360.i eq
361as a test for a fixnum in the range -1024 to 1023.
362In the majority of cases, when asked
363to allocate a fixnum, the system can return a pointer into this table
364and bypass the memory allocator.
365.sh +0 Typetable
366.pp
367.Fr
368uses the typed pages (or bibop)
369method of type determination.
370The
371.i typetable
372is an array of bytes (
373.i chars
374in C lingo).
375This table describes the type of all pages, from page 0 where nil is stored,
376up to the end of data space.
377Thus there are many entries that describe the types of the pages which
378make up the C kernel.
379In order to reduce the wasted space in the typetable,
380we could have made the typetable begin typing pages at the start of data
381space and make a special case of the pages that contain nil and
382the small integer table.
383However, the
384effect of this change would be that type checks
385(which are done in-line in compiled code) would
386be longer and slower. Since type checking happens quite
387frequently, we felt it was better to waste the space in the
388typetable in order to save execution time and instruction space.
389.pp
390Each page on a VAX is 512 bytes, and thus to find the type of an
391object given the address of it,
392it is only necessary to shift the address right 9 bits
393and index the typetable array offset by one.
394The offset by one is required because the value -4, which is an illegal
395address, is used as a sentinel value to indicate an illegal value.
396Thus when we take the type of -4 we will be indexing the table by -1
397and we want to retrieve the first byte in the table (which contains
398the value UNBOUND).
399The C macro which retrieves the type is this (from file h/global.h):
400.(b I
401
402#define TYPE(a1) ((typetable+1)[(int)(a1) >> 9])
403
404.)b
405This is code generated by the lisp compiler to check if the type
406code of an argument (stored at 0(r10)) is a symbol (which is type
407code 1):
408.(b I
409
410ashl $-9,0(r10),r0
411cmpb _typetable+1[r0],$1
412
413.)b
414.pp
415The type codes which are stored in the typetable are these:
416.ts 2i 4i 6i
417.(b I
418UNBO -1\tSTRNG 0\tATOM 1\tINT 2
419DTPR 3\tDOUB 4\tBCD 5\tPORT 6
420ARRAY 7\tOTHER 8\tSDOT 9\tVALUE 10
421HUNK2 11\tHUNK4 12\tHUNK8 13\tHUNK16 14
422HUNK32 15\tHUNK64 16\tHUNK128 17
423.)b
424The names given above are the C kernel internal names.
425ATOM is symbol, INT is fixnum, DTPR is list cell,
426DOUB is flonum, BCD is binary,
427SDOT is bignum and all the HUNKn types are just known as hunk to
428the user.
429.sh +0 Stacks
430.pp
431.Fr
432uses three stacks: the
433.i C
434.i runtime
435stack, the
436.i namestack
437and the
438.i bindstack .
439The C runtime stack is the stack part of the address space
440and the other two stacks are stored in the data space.
441.sh +1 C\ runtime\ stack
442The C-coded kernel uses this
443stack in the same way as a typical C program
444use the stack:
445storing return addresses
446and non-lisp-object arguments to subroutines,
447saving registers,
448and storing local variables within a function.
449The lisp system stores
450.i catch
451.i frames
452on this stack as well (to be described later).
453.pp
454The lisp system assumes that there are no lisp data on the stack and
455thus the use of this stack
456by a program is unrestricted.
457As will be discussed later on, the
458.b calls
459and
460.b ret
461instructions are used for most subroutine calls and returns.
462These instructions expect the stack to look a certain way.
463.sh +0 Namestack
464.pp
465The namestack contains only lisp data.
466It is used to pass arguments to functions and to hold
467local (lexically scoped) data within lisp functions.
468It is also used as a temporary storage spot for lisp data
469which must be protected from garbage collection.
470.pp
471A slight digression on the garbage collector:
472The person who writes code for the lisp system must always be aware of
473his greatest enemy: the garbage collector.
474Whenever a function is called that could possibly allocate more lisp
475data, one must assume that it
476when it attempts to allocate space, the garbage collector
477will be invoked and that it will take away everything that isn't protected
478from garbage collection.
479The objects that are protected from garbage collection are: the
480namestack, the bindstack, the oblist (table of interned symbols),
481and the compiler literals. Objects that are only
482referred to by values in registers or
483the C runtime stack will not be seen by the mark phase of the
484garbage collector and will be swept up during the sweep phase.
485.pp
486Back to the namestack:
487The first free element on the namestack is pointed to by the
488C variable
489.i np .
490This variable is always stored in the VAX register r6.
491Another pointer into the namestack is the C variable
492.i lbot .
493It is always stored in VAX register r7.
494Its use will be described in the section on calling conventions.
495.sh +0 Bindstack
496.pp
497The bindstack is a stack of lisp object pairs: symbol and
498value.
499It is used to implement shallow binding.
500When a symbol is lambda-bound, the symbol and its current value are put
501on this stack.
502Then the symbol can be given a new value.
503When the context of the lambda binding is over, the symbol and value pair
504are popped from the stack and the symbol is given its old value.
505The C variable
506.i bnp
507points to the first free entry on the bindstack.
508In the C code, the following macros lambda-bind a symbol to a value
509and restore the old value:
510.(b
511#define PUSHDOWN(atom,value)\e
512 {bnp->atm=(atom);\e
513 bnp++->val=(atom)->a.clb;\e
514 (atom)->a.clb=value;\e
515 if(bnp>bnplim) binderr();}
516
517#define POP\e
518 {--bnp; bnp->atm->a.clb=bnp->val;}
519.)b
520.sh -1 Bitmap
521.pp
522The bitmap is used in garbage collection to hold
523the mark bits for the mark and sweep garbage collector.
524As its names implies, it is viewed as a collection of bits.
525The minimum size of a lisp data object is 4 bytes, thus
526there are 128 of those on a VAX page of 512 bytes.
527Since there are 8 bits in a byte, it takes 16 bytes to obtain 128 bits.
528Therefore the size of the bitmap in bytes is 16
529times the maximum number of
530pages.
531Like the typetable, the bitmap keeps track of marks from the
532bottom of memory, not the bottom of data space.
533The bitmap and the typetable are static structures.
534It is their size, which is determined when the C kernel is built,
535which determines the size to which the data segment can grow.
536.sh +0 Transfer\ Tables
537.pp
538Transfer tables are used by compiled lisp code as a transfer vector to
539other functions.
540A transfer table consists of pairs of entries: symbol and pointer to
541function.
542When a compiled lisp function calls another (non-local) function, it
543calls indirectly through an entry in the transfer table.
544Depending on the setting of certain status variables, the call may
545bring control into a function linkage routine or directly to
546the function desired (if that function is a compiled lisp or C function).
547.pp
548Transfer tables serve a number of useful purposes.
549.np
550They allow compiled code to call interpreted code
551or compiled code using the same calling sequence.
552.np
553They allow the selection of which function to call to be determined
554at runtime based on the name of the function.
555In most other languages, this selection process is done at either
556compile or load time.
557.np
558They allow the user to run in a debugging mode where all calls
559from compiled code go through the interpreter.
560Once control reaches the interpreter, it is easier to debug.
561.np
562They allow the user to run in a production mode, where all calls
563from compiled to other compiled code are done without function
564lookup overhead.
565.np
566They allow the user to switch between debugging and production modes
567at runtime with one function call.
568.pp
569Transfer tables will be described further in the section on
570the lisp compiler.
571.sh +0 Catch\ frames
572.pp
573Lisp has a number of forms of non-local transfers.
574Among them are
575.i throw ,
576.i error ,
577.i return
578and
579.i go .
580If a program is willing to accept a non-local transfer, it puts a
581.i catch
582.i frame
583on the stack which describes which type of transfer it
584accepts.
585The catch frame describes the current state of the registers,
586the variables np, lbot, and bnp, and also contains entries that describe
587what kinds of non-local transfers the function will accept.
588After creating a catch frame, the program goes on to evaluate forms.
589Should the evaluation of one of those forms result in a non-local
590transfer to the catch frame that this program has set up, the
591system will restore the namestack and bindstack to the way
592they were when the program put the catch frame on the stack (by using
593np and bnp) and will return control to the program (setting
594the variable retval to describe why it returned).
595This is described further in the
596section on interpreter conventions.
597.pp
598The C variable
599.i errp
600points to the most recent catch frame, and then each catch frame points
601to the previous catch frame.
602.sh +0 oblist
603.pp
604Normally when symbols are created they are
605.i interned ,
606that is they are put in a hash table called an oblist (or obarray).
607The purpose of the oblist is to insure that if you type a
608symbol's name to the reader, you will always get the same symbol.
609Also it protects the symbol from garbage collection.
610The oblist is simply a hash table with buckets, with a hash link being
611part of the symbol data structure.
612Currently the hash table is not accessible to a lisp program, but with
613a little work it could be.
614.+c Interpreter
615.sh 0 _
616.nr $1 \n(ch
617.pp
618The interpreter is composed of these parts:
619.ip \fIcore:\fP
620The functions eval, apply and funcall.
621.ip \fIstack\ management:\fP
622The code to create catch frames and handle non-local transfers.
623.ip \fImemory\ management:\fP
624The code to allocate and garbage collect memory.
625.ip \fIthe\ functions:\fP
626The user callable functions that expect lisp arguments and return
627lisp values.
628.pp
629In the above list, the first three are written mainly in C (with a little
630assembler) and the last is written mainly in Lisp with a little
631bit in C
632and a very small amount in assembler.
633.pp
634The core functions are the center of activity in the interpreter.
635The
636.i eval
637function given a lisp expression to evaluate.
638It must determine if it is a simple expression such as a symbol or number
639whose value eval can determine immediately, or if it is an function
640calling expression.
641If the form is a function call, eval must determine if the arguments should
642be evaluated and if so eval must recursively call itself to evaluate the
643arguments and then stack them.
644If the function called is to be interpreted, eval must lambda-bind the
645formal variables of the function to the arguments just stacked.
646If the function being called is compiled, eval just calls the function and
647lets the function do the lambda binding if it wants to.
648.pp
649The
650.i apply
651function is passed two lisp objects:
652a function to call (either a symbol or a functional object)
653and a list of arguments already evaluated.
654It will do lambda binding if the function to call is to
655be interpreted.
656.pp
657The
658.i funcall
659function is passed any number of lisp objects,
660the first of which is the function to call, and the rest are the
661arguments to the function
662which have already been evaluated and stacked.
663Funcall will do lambda binding if the function to call is
664to be interpreted.
665.pp
666When compiled lisp code calls a function
667which must be interpreted, it enters the interpreter through the
668funcall function.
669The interpreter may go to compiled code through either eval, apply or
670funcall, though most often it goes through eval.
671.sh 2 Conventions
672.pp
673These are the conventions used in interpreted and compiled code.
674.sh +1 C\ conventions
675.pp
676Our conventions are extensions of the C calling conventions, so we
677describe here the conventions for C.
678The VAX has 16 general
679purpose registers.
680Registers r12 through r15 are reserved
681for use by the hardware because we use the
682.i calls
683subroutine call.
684Registers r0-r5 can be used by a program without saving them first.
685The result of a function call is returned in r0.
686Registers r11-r6 are allocated (from high to low)
687by the C compiler when a
688.i register
689declaration is made in the C code.
690Registers r11-r6 must be
691saved upon entry and restored upon exit if they are used within a function.
692On the VAX it is very easy to preserve registers
693since it is done automatically
694by the hardware if the
695.i calls
696(or
697.i callg )
698instruction is used.
699The first short word (two bytes) of a subroutine is a
700register save mask which
701tells which registers should be saved (on the C runtime stack) upon
702entry and restored when a
703.i ret
704instruction is done.
705.sh +0 np\ and\ lbot
706.pp
707Earlier we mentioned that the C variables np and lbot are
708always stored in
709registers r6 and r7.
710It is very difficult to globally assign a variable to a register
711in the C language (no external register declarations
712are permitted)
713and thus we have to filter
714the output of the C compiler and convert all occurrences of 'np' to 'r6'
715and all occurrences of 'lbot' to 'r7'.
716This is only half the job, though.
717We also must modify the register save masks for those routines which
718alter the value of np or lbot to insure that those registers get
719saved and restored upon function entry and exit.
720This is done in the C code by putting
721.(b C
722Savestack(n)
723.)b
724at the beginning of a routine which will alter np or lbot and which
725also allocates n register variables.
726Also in that routine, before the function returns, we put
727.(b C
728Restorestack()
729.)b
730This is not really necessary in the VAX, but it is there for other
731machine implementations which don't have a
732.i ret
733function which restores registers.
734The calls to Savestack are recognized by a filter which processes
735the output
736of the C compiler and fixes up
737the save masks.
738.sh +0 Function\ calling
739.pp
740The following text describes what the conventions are for
741calling
742compiled lisp functions, whether they were written in lisp or C.
743We look at it from the viewpoint of the called function.
744.pp
745Upon entry
746to the compiled functon,
747lbot points to the first argument and np points to the
748first free spot on the namestack.
749If np equals lbot then there are no arguments.
750Recall that np will be in r6 and lbot in r7.
751The function is free to alter registers r0 through r5 and should return
752the result in r0.
753The function may alter registers r6 through r11 as long as their
754original values
755are restored when the function returns.
756The value of np should always
757point to the first free entry in the namestack.
758This is all that is required.
759The extra conventions followed by
760the lisp compiler
761in the code it generates are described in the next chapter.
762.sh +0 Catch\ frames
763.pp
764A catch frame saves the state of execution that a program
765might want to return to at some later time.
766A catch frame is stored on the C runtime stack, with the most recent
767frame pointed to by the C global variable errp.
768The information saved is
769.ip \fIargs\fP
770- one or two optional arguments. The lisp compiler always
771stacks two
772arguments since it must know exactly how large the frame is.
773.ip \fIclass\fP
774- the class describes what type of frame this is (described below).
775.ip \fIretaddr\fP
776- address to return to if returning from this frame
777.ip \fIolderrp\fP
778- pointer to next older catch frame on the stack
779.ip \fIbnp\fP
780- value of bnp (bindstack pointer) when the frame was created
781.ip \fInp\fP
782- value of np when the frame was created
783.ip \fIlbot\fP
784- value of lbot when the frame was created.
785.ip \fIsystem\ dependent\fP
786- the rest of the information stacked depends on the particular machine.
787In the case of the VAX, registers r13 through r8 are stacked. (r14 and
788r15 are the stack pointer and program counter; they are not saved since
789they can be calculated from the other information).
790.pp
791The information in a catch frame is put on the C runtime stack
792in the precise order given above, and the variable errp points not
793at the beginning of the entire frame, but to the lbot entry.
794Thus errp\ +\ 4 points to np.
795The classes of frames are described below.
796Each class is defined as a constant in the C code (h/frame.h) whose value
797is a small integer.
798.ip F_PROG\ [1]
799- takes no arguments. This frame marks the beginning of a piece of
800code which can accept a
801.i return
802or a
803.i go .
804The interpreter uses this to implement
805.i prog
806and
807.i do
808and for its primative break loop.
809The lisp compiler does not use catch frames for progs since it only
810permits
811.i returns
812and
813.i gos
814to occur within
815.i progs
816or
817.i dos
818and thus it can determine how to do the non-local transfer
819at compile time.
820.ip F_CATCH\ [2]
821- this takes one or two arguments and is used to implement both
822.i catch
823and
824.i errset .
825In both cases
826the first argument is the tag which is being caught,
827which in the case of an
828.i errset
829is symbol ER%all.
830An
831.i errset
832also
833supplies a second argument which is non nil if the error message
834should be printed.
835.ip F_RESET\ [3]
836- in the C kernel, the reset function
837is implemented as a non local transfer to a F_RESET catchframe.
838When the lisp system is built, the reset function is redefined to
839do a
840.i throw.
841Thus this type of catch frame is rarely used.
842.ip F_EVAL\ [4]
843- this has one argument, the form being evaluated.
844Since stacking this
845on every eval would be expensive,
846this type of catch frame is only stacked if a \fI(*rset\ t)\fP
847has been done and if the value of the symbol
848.i *rset
849is non nil.
850The form being evaluated is stacked so that
851the necessary information for the
852.i evalframe
853function is available and so that
854.i freturn
855can return from the context of any pending evaluation.
856.ip F_FUNCALL\ [5]
857- this is much like F_EVAL,
858except the one argument it takes is the
859name of the function to call.
860It has the same restrictions on when it is stacked as F_EVAL
861and for the same reasons.
862.pp
863In C, a frame is pushed on the stack with Pushframe, with a call
864of one of these forms:
865.(b
866errp = Pushframe(class);
867errp = Pushframe(class,arg1);
868errp = Pushframe(class,arg1,arg2);
869.)b
870After the call the C variable
871.i retval
872contains the value which describes why this function returned.
873You must remember that it is possible for this one function call to
874return more than once!
875The reasons it can return are these (from h/frame.h):
876.ip C_INITIAL\ [0]
877This is the initial call to set up the frame.
878.ip C_GO\ [1]
879This will only happen for F_PROG frames.
880In this case,
881the C variable lispretval contains a lisp symbol which is the tag
882to which control should be transferred.
883If the tag cannot be found in this prog or do body, the
884tag should be again thrown up the stack to a next higher prog or do.
885.ip C_RET\ [2]
886This will only happen for F_PROG frames.
887In this case, lispretval contains the value to return from this prog.
888.ip C_THROW\ [3]
889This will only happen for F_CATCH frames.
890In this case lispretval contains the value to return from this catch.
891.ip C_RESET\ [4]
892This will only happen for F_RESET frames.
893It simply means that a reset is being done.
894.ip C_FRETURN\ [5]
895This will only happen for F_EVAL and F_FRETURN frames.
896The variable lispretval contains the value to return from this
897call to
898.i eval
899or
900.i funcall .
901.pp
902The call to Pushframe is turned into a simple subroutine call (using
903the
904.i jsb
905instruction instead of the
906.i calls )
907by the filters which alter the code coming out of the C compiler.
908Thus it may be useful to see what stacking a catch frame really looks
909like.
910Here is what the lisp compiler generates
911to stack the frame for \fI(*catch\ 'tag\ x)\fP
912.(b
913movl 0(r8),r0 #move 'tag to r0
914pushl $0 # dummy argument
915pushl r0 # put tag argument on C runtime stack
916pushl $2 # push F_CATCH
917jsb _qpushframe # call Pushframe
918movl r0,_errp # move result into errp
919.)b
920.pp
921Every function which does a Pushframe() must also do a Popframe()
922before it returns to its caller.
923This simply removes the frame from the stack.
924In C it is written:
925.br
926.tl ''errp = Popframe()''
927in the code generated by the lisp compiler it looks like:
928.(b
929movl _errp,r1 # put current errp in r1
930movl 12(r1),_errp # put previous errp in errp
931addl3 $32,r1,sp # pop frame from stack
932.)b
933.pp
934Non-local transfers are done with the Inonlocalgo
935function. This function always takes three arguments.
936The
937first is the return type (one of the types mentioned
938above which begin with 'C_'). It will be assigned to retval.
939The second argument is the value to be assigned to lispretval,
940except in the case of a C_THROW, where the second argument
941is the tag to throw to and the third argument is the value
942to assign to lispretval.
943This function never returns.
944If it doesn't find a catch frame
945which matches what it is looking for, it signals an error.
946The function is called with
947.i calls
948and the arguments are stacked on the C runtime stack, not the
949namestack.
950.+c Liszt:\ The\ Lisp\ Compiler
951.sh 0 _
952.nr $1 \n(ch
953.pp
954The purpose of compiling a lisp function is to create a representation of
955the function which can be evaluated in less time and perhaps take
956up less space.
957There are two approaches to lisp compilers.
958One is to convert the functions to a compact form, often called
959.i bytecodes
960which can be rapidly interpreted.
961Each bytecode represents a primitive operation in the lisp system.
962This approach is simple to implement but there is an time penalty
963in using an interpreter.
964The other approach is to compiled to machine code.
965In general, this is not as portable as the bytecode approach but the result
966generally runs faster.
967There are two ways of compiling to machine code.
968One is to place arguments to functions in registers.
969If there are more arguments than registers, the arguments are put on
970a stack and a special type of call is made.
971This method is used in Maclisp.
972The other method is to assume a stack model, in which
973arguments to a function are placed on a stack.
974This is what the
975.Fr
976compiler Liszt does.
977The stack model made it much easier to write parts of the lisp
978system in the C langauge.
979It also simplifies the garbage collector since the mark phase doesn't
980have to locate and peruse all saved registers looking for lisp data.
981.sh 2 File\ Compiler
982.pp
983When a file of lisp expressions is loaded,
984the
985.i load
986function repeatedly reads and evaluates the forms in the
987file.
988Some of these evaluations may result in functions being defined,
989and others may use the functions just defined or previously defined.
990The job of the lisp compiler is to create an object file, which,
991when read in
992by
993.i fasl,
994acts just like it was
995.i load ed
996in, except when a function is defined it is defined in machine
997code, not as a lisp expression.
998This is quite a bit different from what compilers do in other languages
999and it is done this way to make it easier to
1000switch between compiled and
1001interpreted code.
1002.sh +0 Differences\ between\ compiled\ and\ interpreted
1003.pp
1004There are some differences, though, between compiled and interpreted code.
1005Local variables in compiled code are lexically scoped (put on the
1006namestack and inaccessible to other functions) unless the variable
1007has been declared
1008.i special.
1009The declaration:
1010.(b
1011\fI(declare (special x y))\fP
1012.)b
1013declares both x and y to be special from this point in the file on.
1014The declaration
1015.(b
1016\fI(declare (specials t))\fP
1017.)b
1018declares all variables to be special.
1019.pp
1020Lisp has a very powerful macro definition system.
1021The compiler will macro expand all it can, whereas the interpreter
1022expands macros when necessary but never replaces a macro call with
1023its expansion.
1024Thus if you redefine a macro, the compiled code that uses it will
1025still work the same way, but the interpreted code will use the
1026new definition.
1027Also, when compiling a file, macro definitions must be
1028done before any call on the macro is encountered during compiling.
1029In the interpreter,
1030macros can be defined or redefined anytime up until
1031they are used.
1032Thus the interpreter is far freer about macro definitions than
1033the compiler.
1034This could cause programs which work interpreted to
1035fail compiled.
1036.sh +0 Top\ level\ algorithm
1037.pp
1038The top level algorithm of the compiler is simply this:
1039.np
1040read a lisp expression from the input file
1041.np
1042macro expand the top level of the form as much as possible
1043.np
1044if the form is a function definition (a list beginning with
1045the symbol 'def')
1046then compile it.
1047.np
1048if the form is not a function definition then put it on a list of
1049forms that will be evaluated when the file is
1050.i fasl ed
1051in.
1052.np
1053if not at end of file, go to step 1.
1054.pp
1055In reality, step 3 is a bit more complex.
1056If the definition is of a macro, then the form will be evaluated
1057immediately, thus adding the macro definition to the compiler's
1058environment.
1059If the lisp variable
1060.i macros
1061is t then the macro will also be compiled.
1062There are also some forms like \fI(progn\ 'compile\ ...)\fP
1063and \fI(eval-when\ ()\ )\fP which determine when the
1064forms get compiled and evaluated.
1065See the lisp manual for details.
1066.sh +0 Expression\ Compilation
1067.pp
1068Just as the interpreter is centered around the
1069.i eval
1070function, the lisp compiler is centered around the function
1071.i d-exp
1072whose job it is to compile the expression passed to it.
1073The lisp compiler is one pass.
1074Before a call to d-exp returns,
1075all the compiled code for a form has been computed and written
1076to a file.
1077One reason that the lisp compiler can be a single pass compiler
1078is that the assembler which reads the compiler's output is
1079a two pass assembler.
1080.sh +1 global\ state\ variables
1081.pp
1082There are a number of variables that maintain the state of
1083the compilation process.
1084These variables are declared special and are thus dynamically scoped
1085and visible to any function within the compiler.
1086When d-exp
1087is called their meanings are:
1088.ip \fIv-form\fP
1089- contains the expression to be compiled.
1090.ip \fIg-loc\fP
1091- tells where the result of evaluating this expression should be put.
1092If g-loc is nil, then the value returned is unimportant and shouldn't
1093be put anywhere.
1094.ip \fIg-cc\fP
1095- controls conditional branches.
1096If g-cc is non nil, then it is a list cell whose value has
1097either a non-null car or non-null cdr but not both.
1098If the car is non-nil then
1099if the value of the expression held in
1100v-form is non-nil, a branch should be done to the symbol
1101stored in \fI(car\ g-cc)\fP.
1102If the cdr is non-nil then if the value of v-form
1103is nil, a branch should be done to the symbol stored in
1104\fI(cdr\ g-cc)\fP. Since at
1105every conditional
1106branch control should either jump or continue, the car and cdr will
1107never both be specified.
1108.ip \fIg-ret\fP
1109- is non nil if the result of evaluating v-form will be returned as the
1110value of the function we are evaluating.
1111This is used to allow liszt to detect
1112when tail recursion removal is possible.
1113.ip \fIg-locs\fP
1114- maintains current information about the state of the stacks:
1115the bindstack (for specials), the namestack (for locals) and the
1116C runtime stack (for catch frames)
1117The form of g-locs is a stack of tokens stored as a list.
1118The meaning of each token is as follows:
1119.br
1120\fInil\fP - this represents an unnamed object on the namestack. This happens
1121when calling functions, when the arguments are stacked prior
1122to a function call.
1123.br
1124\fI<symbol>\fP - the given symbol is the name of a local variable on the namestack.
1125.br
1126\fI(prog . <n>)\fP - prog statement which binds <n> special variables
1127.br
1128\fI(do . <n>)\fP - head of a do statement which binds <n> special variables
1129.br
1130\fI(catcherrset . 0)\fP - catch or errset frame on stack
1131.br
1132\fI(lambda . <n>)\fP - lambda expression which binds <n> special variables
1133.ip \fIg-labs\fP
1134- this is a stack of labels.
1135There is one entry in g-labs for every entry which is a list
1136in g-locs.
1137the forms of the entries are:
1138.br
1139\fInil\fP - no labels in this form
1140.br
1141\fI(endlab (real . gen) (real2 . gen2) ...)\fP - endlab is the label to go to
1142to get out of this body. After this label will be code
1143to unbind specials and pop off locals.
1144The 'real' labels are those found in the code. the gen
1145labels are those generated and put in the assembler code.
1146.sh +0 Function\ compilation
1147.pp
1148The action d-exp takes when compiling an expression depends on the
1149type of expression.
1150For atoms (symbols and numbers) the compilation is very simple, it
1151is just a matter of putting the value where g-loc specifies and
1152then jumping if specified by g-cc.
1153When the expression is a list, d-exp first macro
1154expands the form and then looks at the first element of
1155the list (if the list has not macro expanded to an atom).
1156If the first element is a symbol then the list is
1157is a function call.
1158If the function is one of the functions which liszt knows how to open compile
1159then liszt will call the particular routine designated to open
1160compile this function.
1161There are two classes of functions within liszt that
1162do open compiling.
1163The first class, the fl-expr class, are distinguished by names which
1164begin with c-.
1165These functions always generate
1166code which returns the result in r0.
1167They are not equipped to obey the contents of g-loc and g-cc.
1168Thus d-exp, after calling one of these functions, must do what
1169is necessary to obey g-loc and g-cc.
1170The other class of open compiling functions, the fl-exprcc class (whose
1171names begin with cc-),
1172handle g-loc and g-cc.
1173As a result these are usually much more complex and generate better code.
1174.sh -1 Register\ Conventions
1175.pp
1176The register conventions used by liszt are an extension of those
1177used by the C code.
1178.ip \fIr0\fP
1179- return value placed here
1180.ip \fIr1,r2,r3,r4\fP
1181- scratch registers.
1182When long strings of
1183.i car's
1184or
1185.i cdr's
1186are done (such as
1187.i caddadr )
1188these registers are used in a least-recently-used fashion to access down
1189the list.
1190The compiler keeps track of the values in these registers but must
1191assume that they are garbage after a function is called.
1192.ip \fIr5\fP
1193- fixnum accumulator.
1194There a number of functions which work on fixnum's only and these
1195usually put their result in r5.
1196The assembler code function
1197.i qnewint
1198which returns a pointer to a cell containing a fixnum value (after
1199checking if it is in the fixnum table), expects its argument to be
1200in r5.
1201.ip \fIr6\fP
1202np
1203.ip \fIr7\fP
1204lbot.
1205When calling a function, this register is set just before the function
1206call, and after the function call its value is used to reset the value
1207of np in order to pop the arguments off the namestack.
1208.ip \fIr8\fP
1209the literal table pointer.
1210The compiler generates a table of all the literal lisp data
1211which the compiled code might access.
1212Upon function entry a pointer to the base of this table is put in r8.
1213For example, if we compile \fI(setq\ x\ 'foo)\fP then we will
1214need a pointer to the lisp symbol
1215.i foo
1216and if the symbol
1217.i x
1218as been declared special we will also need a pointer to
1219.i x .
1220.ip \fIr10\fP
1221- upon function entry the value of lbot (r7) is put in r10.
1222This allows us to reference the arguments to our function while
1223using lbot to call other function.
1224.sh +0 Addresses
1225There are four types of addresses used internally in Franz:
1226symbolic, intermediate addresses (iadr), extended intermediate (eiadr)
1227and vax assembler format.
1228.sh +1 Symbolic
1229.pp
1230These are the names of lisp objects, such as `a', `foo', 34,
1231or 3.234234.
1232A name is either a local variable, a special variable or a
1233number. A number is either a small fixnum, large fixnum,
1234bignum or floating point number.
1235.sh +0 Intermediate\ address\ (IADR)
1236.pp
1237This type of address is generated from a symbolic address by
1238.i d-loc,
1239.i d-loclit
1240and the routines
1241.i d-simple
1242and
1243.i d-rsimple
1244which
1245call them. The forms are
1246.ip \fINil\fP
1247- the location or value of nil.
1248.ip \fIT\fP
1249- the location or value of t.
1250.ip \fIreg\fP
1251- register r0, which is where the result of function calls
1252are stored.
1253.ip \fIstack\fP
1254- the address pointed to by np with np incremented after
1255the value is stored. (i.e (r6)+)
1256.ip \fIunstack\fP
1257- the address one word below np (np is decremented before
1258accessing. (i.e. -(r6))
1259.ip \fI<symbol>\fP
1260- this is just <symbol>. This allows strange forms to be
1261represented directly.
1262.ip \fI(stack\ n)\fP
1263- The n'th value on the namestack for this function.
1264The first value 0(r10) is (stack 1).
1265.ip \fI(vstack\ n)\fP
1266- The cdr of the n'th value on the namestack.
1267That is, (stack 1) is *0(r10)
1268.ip \fI(bind\ n)\fP
1269- The value of the n'th value in
1270the literal table. If
1271this refers to a symbol, then this is the value
1272of the symbol.
1273If this refers to a list then this
1274this is the cdr of the list (although this is a rare
1275use). (bind 1) corresponds to *0(r8).
1276.ip \fI(lbind\ n)\fP
1277- The n'th value in the literal table. If
1278this refers to
1279object foo then this is the address of foo
1280in memory.
1281.ip \fI(fixnum\ n)\fP
1282- This is the address of small fixnum n in memory.
1283A small fixnum is in the range -1024 to 1023.
1284.ip \fI(immed\ n)\fP
1285- The is the immediate constant n.
1286.sh +0 extended\ intermediate\ address\ (EIADR)
1287.pp
1288This address is generated from an IADR by e-cvt. It
1289symbolically represents a vax address.
1290.ip \fI<atom>\fP
1291- This corresponds to <atom>.
1292.ip \fI(n\ <atom>)\fP
1293- This corresponds to n(<atom>)
1294(stack n+1) and (lbind n+1) are converted to these forms.
1295.ip \fI(n\ <atom1>\ <atom2>)\fP
1296- This corresponds to n(<atom1>)[<atom2>]
1297There is currently no IADR which generates this.
1298.ip \fI(*\ n\ <atom>)\fP
1299-This corresponds to *n(<atom>)
1300(vstack n+1) and (bind n+1) are converted to these forms.
1301.ip \fI(+\ <atom>)\fP
1302- This corresponds to (<atom>)+.
1303stack is converted to this form.
1304.ip \fI(-\ <atom>)\fP
1305- This corresponds to -(<atom>)
1306unstack is converted to this form.
1307.ip \fI($\ <numb>)\fP
1308- This corresponds to $<atom>
1309(immed <numb>) is converted to this form.
1310.ip \fI(#\ <numb>)\fP
1311- This corresponds to $<loc> where <loc> equals
1312the base of the fixnums (0x1400) plus 4 * <numb>
1313(fixnum <numb>) is converted to this form
1314.ip \fI(*\ #\ <numb>)\fP
1315- This corresponds to $<numb>. It is generated
1316by d-rsimple occasionally when you ask for the
1317cdr of a number (which you do sometimes when you
1318are compiling fixnum operators).
1319.sh +0 Vax\ Unix\ assembler\ addresses
1320.pp
1321The addresses are printed from a EIADR by e-cvtas. The conversions
1322are shown in the above section.
1323.sh -1 Function\ calling\ convention
1324.sh +1 Function\ linkages
1325.pp
1326The function associated with a symbol is stored in the function
1327definition slot of the symbol. If the function slot contains a
1328list then the function is to be interpreted and its 'car' will
1329be lambda, nlambda, lexpr or macro. If the function is compiled then
1330the function definition slot will contain a binary object.
1331A binary object
1332is composed of two independent parts: the discipline and the entry.
1333The discipline is one of:
1334.ip \fIlambda\fP
1335- a function whose arguments should be evaluated.
1336.ip \fInlambda\fP
1337- a function whose arguments should not be evaluated but
1338which should be passed as a list
1339.ip \fImacro\fP
1340- a function which should be passed the unevaluated form
1341being evaluated and whose result should be evaluated.
1342.ip \fI\"subroutine\"\fP
1343- a foreign function subroutine
1344.ip \fI\"integer-function\"\fP
1345- a foreign function returning an integer
1346.ip \fI\"real-function"\fP
1347- a foreign function returning a flonum.
1348.pp
1349A lexpr is not in the list as there is no difference to the caller
1350between compiled lambda's and compiled lexprs.
1351.sh +0 Transfer\ tables
1352Most calls from compiled code to other lisp functions go through
1353transfer tables. A transfer table entry is a pair: symbol, address
1354of routine.
1355When another lisp function is called it uses the
1356.i calls
1357instruction which will
1358indirectly jump through the address in the transfer table. This
1359address may point to the desired function or it may point to the
1360interface routine. If a call ends up in the interface routine,
1361then that routine will trace back through the call stack and eventually
1362reach the tranfer table entry that it was 'called through'. Since the
1363transfer table entry contains a symbol which names the function
1364to be called, the interface routine can determine which routine
1365was to have been called. If that routine is compiled then the
1366interface routine can modify the transfer table so that a call
1367through that table entry will go directly to the desired function.
1368If the routine to be called is interpreted, or if the user has
1369requested that transfer linkages should be disabled, then the
1370interface routine will go through the 'funcall' function
1371in the interpreter to continue the call.
1372.sh +0 calling\ sequence\ in\ the\ compiler:
1373.pp
1374When transfer tables are used
1375.(b
1376 \fBforeach\fP arg
1377 \fBcompute\fP arg and stack result using (r6)+
1378 for example: movl r0,(r6)+
1379 movab -n(r6),r7 where n = 4*number of args to fcn
1380 calls $0,*trantb+m where m is the index of the function
1381 in the transfer table.
1382 movl r7,r6 restore r6
1383.)b
1384.pp
1385The compiler supports local functions, which are function accessible
1386only within one file.
1387Because they are not accessible from C code, we can use a very fast
1388call and return sequence when calling them.
1389To call a local function
1390.(b
1391 \fBforeach\fP arg
1392 \fBcompute\fP and stack using (r6)+
1393 jsb LOCALNAME go directly to the function, it will
1394 restore r6 before it returns.
1395.)b
1396.pp
1397The beginning of each function looks as follows.
1398First for a non-lexpr function called in the standard way
1399(topsim is the label jumped to when tail merging, it will be unique
1400for each function; the brackets indicate the optional code which
1401exists if the -p switch is given to liszt);
1402.(b
1403 .word 0x5c0 # save r6, r7, r8, r10
1404 [ movab mcounts,r0 # if profiling, mcounts replaced by fasl
1405 jsb mcount ]
1406 movab linker,r8 # set up r8
1407 movl r7,r10 # set up oldlbot
1408 movab n(r10),r6 # n = 4*Number of args expected.
1409topsim:
1410.)b
1411.pp
1412Now for lexprs:
1413.(b
1414 .word 0x5c0
1415 [ movab mcounts,r0 # if profiling. [mcounts replaced by fasl]
1416 jsb mcount ]
1417 movab linker,r8 # set up r8
1418 subl3 $4,r7,-(sp) # address one word below bottom of args
1419topsim:
1420 movl r6,r10 # first free addr to arg base
1421 subl3 r7,r6,r0 # number of args * 4 into r0
1422 movab 0x1400(r0),(r6)+ # stack boxed number of args
1423 movl 0(r10),-(sp) # also store on stack so user can't clobber
1424.)b
1425.pp
1426And finally for local functions:
1427.(b
1428 movl r10,-(sp) # save r10
1429 movab -n(r6),r10 # set up arg base using arg top
1430topsim:
1431.)b
1432.sh -1 Assembler\ file\ format
1433.pp
1434The liszt compiler generates a file which is in Unix assembler format.
1435The Unix assembler converts that file into an object file which fasl
1436then reads.
1437Although the object file generated is a standard Unix object
1438file (as defined in /usr/include/a.out.h), it is not of a
1439format that the Unix ld loader can understand.
1440This is because the requirements of a lisp object file are different
1441from an object file of other languages.
1442The
1443run time semantics of lisp and the fact that lisp data must be protected
1444from garbage collection are the principal differences.
1445The unconventional object file created by the unix assembler
1446is a result of the
1447unconventional assembler input file.
1448Next we will look at what must be put in the assembler file and how
1449it is put there.
1450.pp
1451The assembler file must contain
1452.ip \fIinstructions\fP
1453- vax assembler code for the compiled functions.
1454If there aren't any functions compiled, this can be empty.
1455.ip \fIliterals\fP
1456- lisp data which is referred to by compiled code
1457.ip \fItransfer\ table\fP
1458- the names of the functions which correspond to the calls through
1459the transfer table.
1460The instructions simply say 'call indirect through the nth transfer
1461table entry'.
1462.ip \fIfunction\ names\fP
1463- the names of the functions which are being
1464defined.
1465.ip \fIload\ time\ forms\fP
1466- in order to mimic the
1467.i load
1468function, fasl has to be able to evaluate lisp expressions at fasl
1469time, so we must be able to store lisp expressions in the assembler
1470file and indicate when they should be evaluated.
1471.pp
1472Based on the requirements above, the form of the assembler file
1473is as described below.
1474The assembler builds two segments: text and data.
1475We put all of our information in the text segment.
1476The compiler places some annotation strings in the data segment
1477so that the object file can be identified, however the data segment
1478is ignored by fasl.
1479The format is
1480.ip \fIcompiled\ instructions\fP
1481The instructions for each compiled (non-local) function begins with
1482.(b
1483 .globl F00003
1484 F00003:
1485 .word 0x5c0
1486.)b
1487The globl declaration and the fact that the symbol name begins
1488with a F will tell fasl that this is the beginning of a lisp
1489function.
1490The symbols beginning with F must be unique within a file
1491but may be duplicated in other files.
1492The lisp name of the function will appear later.
1493Next the instructions for the function are given.
1494Only a small fixed set of external symbols may be referenced.
1495The list is found in the code for nfasl.c and the common
1496ones will be described below (soon).
1497Labels should be given a name beginning with L and should
1498be unique within the file.
1499.ip \fItable\ sizes\fP
1500somewhere in the file there should be a pair of 'set' assembler
1501pseudo ops which describe the sizes of the literal table and
1502transfer table.
1503The form is this
1504.(b
1505 .set linker_size 4
1506 .set trans_size 3
1507.)b
1508where linker_size is the number of entries in the literal table
1509which will be required and trans_size is the number of entries
1510in the transfer table which will be required.
1511Those tables will be built by fasl.
1512.ip \fIbinder\ table\fP
1513- this table describes the order that the functions should be
1514defined and forms evaluated.
1515It is a table of longwords beginning at the symbol bind_org
1516and ending when a -1 entry is seen.
1517The meaning of the entries will be described below.
1518.ip \fIlisp\ expression\ table\fP
1519- this is a table of null terminated strings beginning at the
1520symbol lit_org and ending at the symbol lit_end.
1521Each string is read by the lisp read function (using the
1522raw readtable).
1523The first linker_size expressions are put in the literal table.
1524The next trans_size expressions are the names of the
1525functions to put in the transfer table.
1526The remaining expressions correspond to the entries in the
1527binder table.
1528The binder entries are processed, one by one.
1529Provided that the binder entry is not -1, an expression
1530is read from the lisp expression table.
1531Then if the binder table entry is 0, that expression is the name of
1532a lambda type lisp function.
1533A binary object is created, the discipline is set to lambda and
1534the function address is set to the lexically next function
1535defined in the
1536file.
1537If the binder entry is 1 then this is an nlambda function, and
1538if the entry is 2 then this is a macro function.
1539If the entry is 99 then the expression just read should be evaluated
1540by the lisp function eval.
1541.pp
1542The lisp compiler normally puts the assembler format output file in /tmp
1543and removes it when it is done.
1544The -S switch will tell liszt to write the
1545assembler file in the same place as
1546the source file, and with the same root name but a .s extension.
1547The -C file will tell the lisp compiler to comment the file as it
1548generates it, making it easier to understand what is going on.
1549Assume that the following is file foo.l:
1550.(b
1551(defun foo (x) (bar y) (baz k))
1552(print (foo 3))
1553(def test (nlambda (l) (print 'hithere) (foo 3)))
1554.)b
1555we now compile it with -SC
1556.(b
1557 .globl F00007 #(fcn lambda foo)
1558 F00007:
1559 .word 0x5c0
1560 movab linker,r8
1561 movl r7,r10
1562 movab 4(r10),r6
1563 L00008:
1564 movl *0(r8),(r6)+ #(calling bar)
1565 #(from y to stack)
1566 movab -4(r6),r7
1567 calls $0,*trantb+0
1568 movl r7,r6
1569 movl *4(r8),(r6)+ #(calling baz)
1570 #(from k to stack)
1571 movab -4(r6),r7
1572 calls $0,*trantb+8
1573 movl r7,r6
1574 ret
1575 .globl F00009 #(fcn nlambda test)
1576 F00009:
1577 .word 0x5c0
1578 movab linker,r8
1579 movl r7,r10
1580 movab 4(r10),r6
1581 L00010:
1582 movl 8(r8),(r6)+ #(calling print)
1583 #(from 'hithere to stack)
1584 movab -4(r6),r7
1585 calls $0,*trantb+16
1586 movl r7,r6
1587 movl $5132,(r6)+ #(calling foo)
1588 #(from (fixnum 3) to stack)
1589 movab -4(r6),r7
1590 calls $0,*trantb+24
1591 movl r7,r6
1592 ret
1593 bind_org:
1594 .set linker_size, 3
1595 .set trans_size, 4
1596 .long 0
1597 .long 99
1598 .long 1
1599 .long -1
1600 lit_org:
1601 .asciz "y"
1602 .asciz "k"
1603 .asciz "hithere"
1604 .asciz "bar"
1605 .asciz "baz"
1606 .asciz "print"
1607 .asciz "foo"
1608 .asciz "foo"
1609 .asciz "(print (foo 3))"
1610 .asciz "test"
1611 lit_end:
1612 .data # this is just for documentation
1613 .asciz "@(#)Compiled by Lisp Compiler 7.1 on Sun Feb 21 17:51:54 1982"
1614 .asciz "@(#)decl.l 1.5 2/10/82"
1615 .asciz "@(#)array.l 1.1 9/25/81"
1616 .asciz "@(#)datab.l 1.1 9/25/81"
1617 .asciz "@(#)expr.l 1.1 9/25/81"
1618 .asciz "@(#)io.l 1.1 9/25/81"
1619 .asciz "@(#)funa.l 1.3 2/10/82"
1620 .asciz "@(#)funb.l 1.2 2/10/82"
1621 .asciz "@(#)func.l 1.2 2/10/82"
1622 .asciz "@(#)tlev.l 1.4 10/24/81"
1623 .asciz "@(#)fixnum.l 1.6 10/21/81"
1624 .asciz "@(#)util.l 1.2 10/7/81"
1625.)b
1626.sh +0 functions\ callable\ from\ compiled\ lisp\ code
1627.sh +0 Object\ File\ Format
1628.sh +0 Fasl
1629
1630
1631
1632
1633