document distributed with 4.1BSD
[unix-history] / usr / src / old / lisp / PSD.doc / ch7.n
CommitLineData
fc86177b
NC
1.\" Copyright (c) 1980 Regents of the University of California.
2.\" All rights reserved. The Berkeley software License Agreement
3.\" specifies the terms and conditions for redistribution.
4.\"
847274d2 5.\" @(#)ch7.n 6.1 (Berkeley) %G%
fc86177b 6.\"
847274d2
NC
7." $Header: ch7.n,v 1.3 83/07/01 11:22:58 layer Exp $
8.Lc The\ Lisp\ Reader 7
9.sh 2 Introduction \n(ch 1
10.pp
11The
12.i read
13function is responsible for converting
14a stream of
15characters into a Lisp expression.
16.i Read
17is table driven and the table it uses is called a
18.i readtable.
19The
20.i print
21function does the
22inverse of
23.i read ;
24it converts a Lisp expression into a stream of
25characters.
26Typically the conversion is done in such
27a way that if that stream of characters were read by
28.i read ,
29the
30result would be an expression equal to the one
31.i print
32was given.
33.i Print
34must also refer to the readtable in order to determine
35how to format its output.
36The
37.i explode
38function, which returns a list of characters rather than
39printing them, must also refer to the readtable.
40.pp
41A readtable is created
42with the
43.i makereadtable
44function, modified with the
45.i setsyntax
46function and interrogated with the
47.i getsyntax
48function.
49The structure of a readtable is hidden from the user - a
50readtable should
51only be manipulated with the three functions mentioned above.
52.pp
53There is one distinguished readtable called the
54.i current
55.i readtable
56whose value determines what
57.i read ,
58.i print
59and
60.i explode
61do.
62The current readtable is the value of the symbol
63.i readtable .
64Thus it is possible to rapidly change
65the current syntax by lambda binding
66a different readtable to the symbol
67.i readtable.
68When the binding is undone, the syntax reverts to its old form.
69.sh +0 Syntax\ Classes
70.pp
71The readtable describes how each of the 128 ascii characters should
72be treated by the reader and printer.
73Each character belongs to a
74.i syntax
75.i class
76which has three properties:
77.ip character\ class\ -
78Tells what the reader should do when it sees this character.
79There are a large number of character classes.
80They are described below.
81.ip separator\ -
82Most types of tokens the reader constructs are one character
83long.
84Four token types have an arbitrary length: number (1234),
85symbol print name (franz),
86escaped symbol print name (|franz|), and string ("franz").
87The reader can easily determine when it has
88come to the
89end of one of the last two types: it just looks for the
90matching delimiter (| or ").
91When the reader is reading a number or symbol print name, it
92stops reading when it comes to a character with the
93.i separator
94property.
95The separator character is pushed back into the input stream and will
96be the first character read when the reader is called again.
97.ip escape\ -
98Tells the printer when to put escapes in front of, or around, a symbol
99whose print name contains this character.
100There are three possibilities: always escape a symbol with this character
101in it, only escape a symbol if this is the only character in the symbol,
102and only escape a symbol if this is the first character in the symbol.
103[note: The printer will always escape a symbol which, if printed out, would
104look like a valid number.]
105.pp
106When the Lisp system is built, Lisp code is added to a C-coded kernel
107and the result becomes the standard lisp system.
108The readtable present in the C-coded kernel, called the
109.i raw
110.i readtable ,
111contains the bare necessities for reading in Lisp code.
112During the
113construction of the complete Lisp system,
114a copy is made of the raw readtable and
115then the copy is modified by adding macro characters.
116The result is what is called the
117.i standard
118.i readtable .
119When a new readtable is created with
120.i makereadtable,
121a copy is made of either the
122raw readtable
123or the current readtable (which is likely to be the standard readtable).
124.sh +0 Reader\ Operations
125.pp
126The reader has a very simple algorithm.
127It is either
128.i scanning
129for a token,
130.i collecting
131a token,
132or
133.i processing
134a token.
135Scanning involves reading characters and throwing
136away those which don't start tokens (such as blanks and tabs).
137Collecting means gathering the characters which make up a
138token into a buffer.
139Processing may involve creating symbols, strings, lists,
140fixnums, bignums or flonums or calling a user written function called
141a character macro.
142.pp
143The components of the syntax class determine when the reader
144switches between the scanning, collecting and processing states.
145The reader will continue scanning as long as the character class
146of the characters it reads is
147.i cseparator.
148When it reads a character whose character class is not
149.i cseparator
150it stores that character in its buffer and begins the collecting phase.
151.pp
152If the character class of that first character is
153.i ccharacter ,
154.i cnumber ,
155.i cperiod ,
156or
157.i csign .
158then it will continue collecting until it runs into a character whose
159syntax class has the
160.i separator
161property.
162(That last character will be pushed back into the input buffer and will
163be the first character read next time.)
164Now the reader goes into the processing phase, checking to see if the
165token it read is a number or symbol.
166It is important to note that after
167the first character is collected the component of the syntax class which
168tells the reader to stop
169collecting is the
170.i separator
171property, not the character class.
172.pp
173If the character class of the character which stopped the scanning is not
174.i ccharacter ,
175.i cnumber ,
176.i cperiod ,
177or
178.i csign .
179then the reader processes that character immediately.
180The character classes
181.i csingle-macro ,
182.i csingle-splicing-macro ,
183and
184.i csingle-infix-macro
185will act like
186.i ccharacter
187if the following token is not a
188.i separator.
189The processing which is done for a given character class
190is described in detail in the next section.
191.sh +0 Character\ Classes
192.de Cc
193.sp 2v
194.tl '\fI\\$1\fP''raw readtable:\\$2'
195.tl '''standard readtable:\\$3'
196..
197.pc
198.Cc ccharacter A-Z\ a-z\ ^H\ !#$%&*,/:;<=>?@^_`{}~ A-Z\ a-z\ ^H\ !$%&*/:;<=>?@^_{}~
199.pc %
200A normal character.
201.Cc cnumber 0-9 0-9
202This type is a digit.
203The syntax for an integer (fixnum or bignum) is a string of
204.i cnumber
205characters optionally followed by a
206.i cperiod.
207If the digits are not followed by a
208.i cperiod ,
209then they are interpreted in base
210.i ibase
211which must be eight or ten.
212The syntax for a floating point number is
213either zero or more
214.i cnumber 's
215followed by a
216.i cperiod
217and then followed by one or more
218.i cnumber 's.
219A floating point number
220may also be an integer or floating point number followed
221by 'e' or 'd', an optional '+' or '\-'
222and then zero or more
223.i cnumber 's.
224.Cc csign +\- +\-
225A leading sign for a number.
226No other characters should be given this class.
227.Cc cleft-paren ( (
228A left parenthesis.
229Tells the reader to begin forming a list.
230.Cc cright-paren ) )
231A right parenthesis.
232Tells the reader that it has reached the end of a list.
233.Cc cleft-bracket [ [
234A left bracket.
235Tells the reader that it should begin forming a list.
236See the description of
237.i cright-bracket
238for the difference between cleft-bracket and cleft-paren.
239.Cc cright-bracket ] ]
240A right bracket.
241A
242.i cright-bracket
243finishes the formation of the current
244list and all enclosing lists until it finds one which
245begins with a
246.i cleft-bracket
247or until it reaches the
248top level list.
249.Cc cperiod . .
250The period is used to separate element of a cons cell
251[e.g. (a\ .\ (b\ .\ nil)) is the same as (a\ b)].
252.i cperiod
253is also used in numbers as described above.
254.Cc cseparator ^I-^M\ esc\ space ^I-^M\ esc\ space
255Separates tokens. When the reader is scanning, these character
256are passed over.
257Note: there is a difference between the
258.i cseparator
259character class and the
260.i separator
261property of a syntax class.
262.Cc csingle-quote \\' \\'
263This causes
264.i read
265to be called recursively and the list
266(quote <value read>) to be returned.
267.Cc csymbol-delimiter | |
268This causes the reader to begin collecting characters and to stop only
269when another identical
270.i csymbol-delimiter
271is seen.
272The only way to escape a
273.i csymbol-delimiter
274within a symbol name is with a
275.i cescape
276character.
277The collected characters are converted into a string which becomes
278the print name of a symbol.
279If a symbol with an identical print name already exists, then the
280allocation is not done, rather the existing symbol is used.
281.Cc cescape \e \e
282This causes the next character to read in to be treated as a
283.b vcharacter .
284A character whose syntax class is
285.b vcharacter
286has a character class
287.i ccharacter
288and does not have
289the
290.i separator
291property so it will not separate symbols.
292.Cc cstring-delimiter """" """"
293This is the same as
294.i csymbol-delimiter
295except the result is returned as a string instead of a symbol.
296.Cc csingle-character-symbol none none
297This returns a symbol whose print name is the the single character
298which has been collected.
299.Cc cmacro none `,
300The reader calls the macro function associated with this character and
301the current readtable, passing it no arguments.
302The result of the macro is added to the structure the reader is building,
303just as if that form were directly read by the reader.
304More details on macros are provided below.
305.Cc csplicing-macro none #;
306A
307.i csplicing-macro
308differs from a
309.i cmacro
310in the way the result is incorporated in the structure the reader is
311building.
312A
313.i csplicing-macro
314must return a list of forms (possibly empty).
315The reader acts as
316if it read each element of
317the list itself without
318the surrounding parenthesis.
319.Cc csingle-macro none none
320This causes to reader to check the next character.
321If it is a
322.i cseparator
323then this acts like a
324.i cmacro.
325Otherwise, it acts like a
326.i ccharacter.
327.Cc csingle-splicing-macro none none
328This is triggered like a
329.i csingle-macro
330however the result is spliced in like a
331.i csplicing-macro.
332.Cc cinfix-macro none none
333This is differs from a
334.i cmacro
335in that the macro function is passed a form representing what the reader
336has read so far.
337The result of the macro replaces what the reader had read so far.
338.Cc csingle-infix-macro none none
339This differs from the
340.i cinfix-macro
341in that the macro will only be triggered if the character following the
342.i csingle-infix-macro
343character is a
344.i cseparator .
345.Cc cillegal ^@-^G^N-^Z^\e-^_rubout ^@-^G^N-^Z^\e-^_rubout
346The characters cause the reader to signal an error if read.
347.sh +0 Syntax\ Classes
348.pp
349The readtable maps each character into a syntax class.
350The syntax class contains three pieces of information:
351the character class, whether this is a separator, and the escape
352properties.
353The first two properties are used by the reader, the last by
354the printer (and
355.i explode ).
356The initial lisp system has the following syntax classes defined.
357The user may add syntax classes with
358.i add-syntax-class .
359For each syntax class, we list the properties of the class and
360which characters have this syntax class by default.
361More information about each syntax class can be found under the
362description of the syntax class's character class.
363.de Sy
364.sp 1v
365.tl '\fB\\$1\fP''raw readtable:\\$2'
366.tl '\fI\\$4\fP''standard readtable:\\$3'
367.tl '\fI\\$5\fP'''
368.tl '\fI\\$6\fP'''
369..
370.pc
371.Sy vcharacter A-Z\ a-z\ ^H\ !#$%&*,/:;<=>?@^_`{}~ A-Z\ a-z\ ^H\ !$%&*/:;<=>?@^_{}~ ccharacter
372.pc %
373.Sy vnumber 0-9 0-9 cnumber
374.Sy vsign +- +- csign
375.Sy vleft-paren ( ( cleft-paren escape-always separator
376.Sy vright-paren ) ) cright-paren escape-always separator
377.Sy vleft-bracket [ [ cleft-bracket escape-always separator
378.Sy vright-bracket ] ] cright-bracket escape-always separator
379.Sy vperiod . . cperiod escape-when-unique
380.Sy vseparator ^I-^M\ esc\ space ^I-^M\ esc\ space cseparator escape-always separator
381.Sy vsingle-quote \\' \\' csingle-quote escape-always separator
382.Sy vsymbol-delimiter | | csingle-delimiter escape-always
383.Sy vescape \e \e cescape escape-always
384.Sy vstring-delimiter """" """" cstring-delimiter escape-always
385.Sy vsingle-character-symbol none none csingle-character-symbol separator
386.Sy vmacro none `, cmacro escape-always separator
387.Sy vsplicing-macro none #; csplicing-macro escape-always separator
388.Sy vsingle-macro none none csingle-macro escape-when-unique
389.Sy vsingle-splicing-macro none none csingle-splicing-macro escape-when-unique
390.Sy vinfix-macro none none cinfix-macro escape-always separator
391.Sy vsingle-infix-macro none none csingle-infix-macro escape-when-unique
392.Sy villegal ^@-^G^N-^Z^\e-^_rubout ^@-^G^N-^Z^\e-^_rubout cillegal escape-always separator
393.sh +0 Character\ Macros
394.pp
395Character macros are
396user written functions which are executed during the reading process.
397The value returned by a character macro may or may not be used by
398the reader, depending on the type of macro and the value returned.
399Character macros are always attached to a single character with
400the
401.i setsyntax
402function.
403.sh +1 Types
404There are three types of character macros: normal, splicing and infix.
405These types differ in the arguments they are given or in what is done
406with the result they return.
407.sh +1 Normal
408.pp
409A normal macro
410is passed no arguments.
411The value returned by a normal macro is simply used by
412the reader as if it had read the value itself.
413Here is an example of a macro which returns the abbreviation
414for a given state.
415.Eb
416\->\fI(de\kAfun stateabbrev nil
417 \h'|\nAu'(cdr (assq (read) '((california . ca) (pennsylvania . pa)))))\fP
418stateabbrev
419\-> \fI(setsyntax '\e! 'vmacro 'stateabbrev)\fP
420t
421\-> \fI'( ! california ! wyoming ! pennsylvania)\fP
422(ca nil pa)
423.Ee
424Notice what happened to
425\fI ! wyoming\fP.
426Since it wasn't in the table, the associated function
427returned nil.
428The creator of the macro may have wanted to leave the
429list alone, in such a case, but couldn't with this
430type of reader macro.
431The splicing macro, described next, allows a character macro function
432to return a value that is ignored.
433.sh +0 Splicing
434.pp
435The value returned from a splicing macro must be a list or nil.
436If the value is nil, then the value is ignored, otherwise the reader
437acts as if it read each object in the list.
438Usually the list only contains one element.
439If the reader is reading at the top level (i.e. not collecting elements
440of list),
441then it is illegal for a splicing macro to return more then one
442element in the list.
443The major advantage of a splicing macro over a normal macro is the
444ability of the splicing macro to return nothing.
445The comment character (usually ;) is a splicing macro bound to a
446function which reads to the end of the line and always returns nil.
447Here is the previous example written as a splicing macro
448.Eb
449\-> \fI(de\kAfun stateabbrev nil
450\h'|\nAu'(\kC(lam\kBbda (value)
451 \h'|\nBu'(cond \kA(value (list value))
452 \h'|\nAu'(t nil)))
453 \h'|\nCu'(cdr (assq (read) '((california . ca) (pennsylvania . pa))))))\fP
454\-> \fI(setsyntax '! 'vsplicing-macro 'stateabbrev)\fP
455\-> \fI'(!pennsylvania ! foo !california)\fP
456(pa ca)
457\-> \fI'!foo !bar !pennsylvania\fP
458pa
459\->
460.Ee
461.sh +0 Infix
462.pp
463Infix macros are passed a
464.i conc
465structure representing what has been read so far.
466Briefly, a
467tconc
468structure is a single list cell whose car points to
469a list and whose cdr points to the last list cell in that list.
470The interpretation by the reader of the value
471returned by an infix macro depends on
472whether the macro is called while the reader is constructing a
473list or whether it is called at the top level of the reader.
474If the macro is called while a list is
475being constructed, then the value returned should be a tconc
476structure.
477The car of that structure replaces the list of elements that the
478reader has been collecting.
479If the macro is called at top level, then it will be passed the
480value nil, and the value it returns should either be nil
481or a tconc structure.
482If the macro returns nil, then the value is ignored and the reader
483continues to read.
484If the macro returns a tconc structure of one element (i.e. whose car
485is a list of one element), then that single element is returned
486as the value of
487.i read.
488If the macro returns a tconc structure of more than one element,
489then that list of elements is returned as the value of read.
490.Eb
491\-> \fI(de\kAfun plusop (x)
492 \h'|\nAu'(cond \kB((null x) (tconc nil '\e+))
493 \h'|\nBu'(t (lconc nil (list 'plus (caar x) (read))))))\fP
494
495plusop
496\-> \fI(setsyntax '\e+ 'vinfix-macro 'plusop)\fP
497t
498\-> \fI'(a + b)\fP
499(plus a b)
500\-> \fI'+\fP
501|+|
502\->
503.Ee
504.sh -1 Invocations
505.pp
506There are three different circumstances in which you would like
507a macro function to be triggered.
508.ip \fIAlways\ -\fP
509Whenever the macro character is seen, the macro should be invoked.
510This is accomplished by using the character classes
511.i cmacro ,
512.i csplicing-macro ,
513or
514.i cinfix-macro ,
515and by using the
516.i separator
517property.
518The syntax classes
519.b vmacro ,
520.b vsplicing-macro ,
521and
522.b vsingle-macro
523are defined this way.
524.ip \fIWhen\ first\ -\fP
525The macro should only be triggered when the macro character is the first
526character found after the scanning process.
527A syntax class for a
528.i when
529.i first
530macro would
531be defined
532using
533.i cmacro ,
534.i csplicing-macro ,
535or
536.i cinfix-macro
537and not including the
538.i separator
539property.
540.ip \fIWhen\ unique\ -\fP
541The macro should only be triggered when the macro character is the only
542character collected in the token collection
543phase of the reader,
544i.e the macro character is preceeded by zero or more
545.i cseparator s
546and followed by a
547.i separator.
548A syntax class for a
549.i when
550.i unique
551macro would
552be defined using
553.i csingle-macro ,
554.i csingle-splicing-macro ,
555or
556.i csingle-infix-macro
557and not including the
558.i separator
559property.
560The syntax classes so defined are
561.b vsingle-macro ,
562.b vsingle-splicing-macro ,
563and
564.b vsingle-infix-macro .
565.sh -1 Functions
566.Lf setsyntax 's_symbol\ 's_synclass\ ['ls_func]
567.Wh
568ls_func is the name of a function or a lambda body.
569.Re
570t
571.Se
572S_symbol should be a symbol whose print name is only one character.
573The syntax class for
574that character is
575set to s_synclass in the current readtable.
576If s_synclass is a class that requires a character macro, then
577ls_func must be supplied.
578.No
579The symbolic syntax codes are new to Opus 38.
580For compatibility, s_synclass can be one of the fixnum syntax codes
581which appeared in older versions of the
582.Fr
583Manual.
584This compatibility is only temporary: existing code which uses the
585fixnum syntax codes should be converted.
586.Lf getsyntax 's_symbol
587.Re
588the syntax class of the first character
589of s_symbol's print name.
590s_symbol's print name must be exactly one character long.
591.No
592This function is new to Opus 38.
593It supercedes \fI(status\ syntax)\fP which no longer exists.
594.Lf add-syntax-class 's_synclass\ 'l_properties
595.Re
596s_synclass
597.Se
598Defines the syntax class s_synclass to have properties l_properties.
599The list l_properties should contain a character classes mentioned
600above.
601l_properties may contain one of the escape properties:
602.i escape-always ,
603.i escape-when-unique ,
604or
605.i escape-when-first .
606l_properties may contain the
607.i separator
608property.
609After a syntax class has been defined with
610.i add-syntax-class ,
611the
612.i setsyntax
613function can be used to give characters that syntax class.
614.Eb
615; Define a non-separating macro character.
616; This type of macro character is used in UCI-Lisp, and
617; it corresponds to a FIRST MACRO in Interlisp
618\-> \fI(add-syntax-class 'vuci-macro '(cmacro escape-when-first))\fP
619vuci-macro
620\->
621.Ee