BSD 4_2 development
[unix-history] / usr / lisp / ch7.n
CommitLineData
d4998460
C
1." $Header: ch7.n 1.3 83/07/01 11:22:58 layer Exp $
2.Lc The\ Lisp\ Reader 7
3.sh 2 Introduction \n(ch 1
4.pp
5The
6.i read
7function is responsible for converting
8a stream of
9characters into a Lisp expression.
10.i Read
11is table driven and the table it uses is called a
12.i readtable.
13The
14.i print
15function does the
16inverse of
17.i read ;
18it converts a Lisp expression into a stream of
19characters.
20Typically the conversion is done in such
21a way that if that stream of characters were read by
22.i read ,
23the
24result would be an expression equal to the one
25.i print
26was given.
27.i Print
28must also refer to the readtable in order to determine
29how to format its output.
30The
31.i explode
32function, which returns a list of characters rather than
33printing them, must also refer to the readtable.
34.pp
35A readtable is created
36with the
37.i makereadtable
38function, modified with the
39.i setsyntax
40function and interrogated with the
41.i getsyntax
42function.
43The structure of a readtable is hidden from the user - a
44readtable should
45only be manipulated with the three functions mentioned above.
46.pp
47There is one distinguished readtable called the
48.i current
49.i readtable
50whose value determines what
51.i read ,
52.i print
53and
54.i explode
55do.
56The current readtable is the value of the symbol
57.i readtable .
58Thus it is possible to rapidly change
59the current syntax by lambda binding
60a different readtable to the symbol
61.i readtable.
62When the binding is undone, the syntax reverts to its old form.
63.sh +0 Syntax\ Classes
64.pp
65The readtable describes how each of the 128 ascii characters should
66be treated by the reader and printer.
67Each character belongs to a
68.i syntax
69.i class
70which has three properties:
71.ip character\ class\ -
72Tells what the reader should do when it sees this character.
73There are a large number of character classes.
74They are described below.
75.ip separator\ -
76Most types of tokens the reader constructs are one character
77long.
78Four token types have an arbitrary length: number (1234),
79symbol print name (franz),
80escaped symbol print name (|franz|), and string ("franz").
81The reader can easily determine when it has
82come to the
83end of one of the last two types: it just looks for the
84matching delimiter (| or ").
85When the reader is reading a number or symbol print name, it
86stops reading when it comes to a character with the
87.i separator
88property.
89The separator character is pushed back into the input stream and will
90be the first character read when the reader is called again.
91.ip escape\ -
92Tells the printer when to put escapes in front of, or around, a symbol
93whose print name contains this character.
94There are three possibilities: always escape a symbol with this character
95in it, only escape a symbol if this is the only character in the symbol,
96and only escape a symbol if this is the first character in the symbol.
97[note: The printer will always escape a symbol which, if printed out, would
98look like a valid number.]
99.pp
100When the Lisp system is built, Lisp code is added to a C-coded kernel
101and the result becomes the standard lisp system.
102The readtable present in the C-coded kernel, called the
103.i raw
104.i readtable ,
105contains the bare necessities for reading in Lisp code.
106During the
107construction of the complete Lisp system,
108a copy is made of the raw readtable and
109then the copy is modified by adding macro characters.
110The result is what is called the
111.i standard
112.i readtable .
113When a new readtable is created with
114.i makereadtable,
115a copy is made of either the
116raw readtable
117or the current readtable (which is likely to be the standard readtable).
118.sh +0 Reader\ operations
119.pp
120The reader has a very simple algorithm.
121It is either
122.i scanning
123for a token,
124.i collecting
125a token,
126or
127.i processing
128a token.
129Scanning involves reading characters and throwing
130away those which don't start tokens (such as blanks and tabs).
131Collecting means gathering the characters which make up a
132token into a buffer.
133Processing may involve creating symbols, strings, lists,
134fixnums, bignums or flonums or calling a user written function called
135a character macro.
136.pp
137The components of the syntax class determine when the reader
138switches between the scanning, collecting and processing states.
139The reader will continue scanning as long as the character class
140of the characters it reads is
141.i cseparator.
142When it reads a character whose character class is not
143.i cseparator
144it stores that character in its buffer and begins the collecting phase.
145.pp
146If the character class of that first character is
147.i ccharacter ,
148.i cnumber ,
149.i cperiod ,
150or
151.i csign .
152then it will continue collecting until it runs into a character whose
153syntax class has the
154.i separator
155property.
156(That last character will be pushed back into the input buffer and will
157be the first character read next time.)
158Now the reader goes into the processing phase, checking to see if the
159token it read is a number or symbol.
160It is important to note that after
161the first character is collected the component of the syntax class which
162tells the reader to stop
163collecting is the
164.i separator
165property, not the character class.
166.pp
167If the character class of the character which stopped the scanning is not
168.i ccharacter ,
169.i cnumber ,
170.i cperiod ,
171or
172.i csign .
173then the reader processes that character immediately.
174The character classes
175.i csingle-macro ,
176.i csingle-splicing-macro ,
177and
178.i csingle-infix-macro
179will act like
180.i ccharacter
181if the following token is not a
182.i separator.
183The processing which is done for a given character class
184is described in detail in the next section.
185.sh +0 Character\ classes
186.de Cc
187.sp 2v
188.tl '\fI\\$1\fP''raw readtable:\\$2'
189.tl '''standard readtable:\\$3'
190..
191.pc
192.Cc ccharacter A-Z\ a-z\ ^H\ !#$%&*,/:;<=>?@^_`{}~ A-Z\ a-z\ ^H\ !$%&*/:;<=>?@^_{}~
193.pc %
194A normal character.
195.Cc cnumber 0-9 0-9
196This type is a digit.
197The syntax for an integer (fixnum or bignum) is a string of
198.i cnumber
199characters optionally followed by a
200.i cperiod.
201If the digits are not followed by a
202.i cperiod ,
203then they are interpreted in base
204.i ibase
205which must be eight or ten.
206The syntax for a floating point number is
207either zero or more
208.i cnumber 's
209followed by a
210.i cperiod
211and then followed by one or more
212.i cnumber 's.
213A floating point number
214may also be an integer or floating point number followed
215by 'e' or 'd', an optional '+' or '\-'
216and then zero or more
217.i cnumber 's.
218.Cc csign +\- +\-
219A leading sign for a number.
220No other characters should be given this class.
221.Cc cleft-paren ( (
222A left parenthesis.
223Tells the reader to begin forming a list.
224.Cc cright-paren ) )
225A right parenthesis.
226Tells the reader that it has reached the end of a list.
227.Cc cleft-bracket [ [
228A left bracket.
229Tells the reader that it should begin forming a list.
230See the description of
231.i cright-bracket
232for the difference between cleft-bracket and cleft-paren.
233.Cc cright-bracket ] ]
234A right bracket.
235A
236.i cright-bracket
237finishes the formation of the current
238list and all enclosing lists until it finds one which
239begins with a
240.i cleft-bracket
241or until it reaches the
242top level list.
243.Cc cperiod . .
244The period is used to separate element of a cons cell
245[e.g. (a\ .\ (b\ .\ nil)) is the same as (a\ b)].
246.i cperiod
247is also used in numbers as described above.
248.Cc cseparator ^I-^M\ esc\ space ^I-^M\ esc\ space
249Separates tokens. When the reader is scanning, these character
250are passed over.
251Note: there is a difference between the
252.i cseparator
253character class and the
254.i separator
255property of a syntax class.
256.Cc csingle-quote \\' \\'
257This causes
258.i read
259to be called recursively and the list
260(quote <value read>) to be returned.
261.Cc csymbol-delimiter | |
262This causes the reader to begin collecting characters and to stop only
263when another identical
264.i csymbol-delimiter
265is seen.
266The only way to escape a
267.i csymbol-delimiter
268within a symbol name is with a
269.i cescape
270character.
271The collected characters are converted into a string which becomes
272the print name of a symbol.
273If a symbol with an identical print name already exists, then the
274allocation is not done, rather the existing symbol is used.
275.Cc cescape \e \e
276This causes the next character to read in to be treated as a
277.b vcharacter .
278A character whose syntax class is
279.b vcharacter
280has a character class
281.i ccharacter
282and does not have
283the
284.i separator
285property so it will not separate symbols.
286.Cc cstring-delimiter """" """"
287This is the same as
288.i csymbol-delimiter
289except the result is returned as a string instead of a symbol.
290.Cc csingle-character-symbol none none
291This returns a symbol whose print name is the the single character
292which has been collected.
293.Cc cmacro none `,
294The reader calls the macro function associated with this character and
295the current readtable, passing it no arguments.
296The result of the macro is added to the structure the reader is building,
297just as if that form were directly read by the reader.
298More details on macros are provided below.
299.Cc csplicing-macro none #;
300A
301.i csplicing-macro
302differs from a
303.i cmacro
304in the way the result is incorporated in the structure the reader is
305building.
306A
307.i csplicing-macro
308must return a list of forms (possibly empty).
309The reader acts as
310if it read each element of
311the list itself without
312the surrounding parenthesis.
313.Cc csingle-macro none none
314This causes to reader to check the next character.
315If it is a
316.i cseparator
317then this acts like a
318.i cmacro.
319Otherwise, it acts like a
320.i ccharacter.
321.Cc csingle-splicing-macro none none
322This is triggered like a
323.i csingle-macro
324however the result is spliced in like a
325.i csplicing-macro.
326.Cc cinfix-macro none none
327This is differs from a
328.i cmacro
329in that the macro function is passed a form representing what the reader
330has read so far.
331The result of the macro replaces what the reader had read so far.
332.Cc csingle-infix-macro none none
333This differs from the
334.i cinfix-macro
335in that the macro will only be triggered if the character following the
336.i csingle-infix-macro
337character is a
338.i cseparator .
339.Cc cillegal ^@-^G^N-^Z^\e-^_rubout ^@-^G^N-^Z^\e-^_rubout
340The characters cause the reader to signal an error if read.
341.sh +0 Syntax\ classes
342.pp
343The readtable maps each character into a syntax class.
344The syntax class contains three pieces of information:
345the character class, whether this is a separator, and the escape
346properties.
347The first two properties are used by the reader, the last by
348the printer (and
349.i explode ).
350The initial lisp system has the following syntax classes defined.
351The user may add syntax classes with
352.i add-syntax-class .
353For each syntax class, we list the properties of the class and
354which characters have this syntax class by default.
355More information about each syntax class can be found under the
356description of the syntax class's character class.
357.de Sy
358.sp 1v
359.tl '\fB\\$1\fP''raw readtable:\\$2'
360.tl '\fI\\$4\fP''standard readtable:\\$3'
361.tl '\fI\\$5\fP'''
362.tl '\fI\\$6\fP'''
363..
364.pc
365.Sy vcharacter A-Z\ a-z\ ^H\ !#$%&*,/:;<=>?@^_`{}~ A-Z\ a-z\ ^H\ !$%&*/:;<=>?@^_{}~ ccharacter
366.pc %
367.Sy vnumber 0-9 0-9 cnumber
368.Sy vsign +- +- csign
369.Sy vleft-paren ( ( cleft-paren escape-always separator
370.Sy vright-paren ) ) cright-paren escape-always separator
371.Sy vleft-bracket [ [ cleft-bracket escape-always separator
372.Sy vright-bracket ] ] cright-bracket escape-always separator
373.Sy vperiod . . cperiod escape-when-unique
374.Sy vseparator ^I-^M\ esc\ space ^I-^M\ esc\ space cseparator escape-always separator
375.Sy vsingle-quote \\' \\' csingle-quote escape-always separator
376.Sy vsymbol-delimiter | | csingle-delimiter escape-always
377.Sy vescape \e \e cescape escape-always
378.Sy vstring-delimiter """" """" cstring-delimiter escape-always
379.Sy vsingle-character-symbol none none csingle-character-symbol separator
380.Sy vmacro none `, cmacro escape-always separator
381.Sy vsplicing-macro none #; csplicing-macro escape-always separator
382.Sy vsingle-macro none none csingle-macro escape-when-unique
383.Sy vsingle-splicing-macro none none csingle-splicing-macro escape-when-unique
384.Sy vinfix-macro none none cinfix-macro escape-always separator
385.Sy vsingle-infix-macro none none csingle-infix-macro escape-when-unique
386.Sy villegal ^@-^G^N-^Z^\e-^_rubout ^@-^G^N-^Z^\e-^_rubout cillegal escape-always separator
387.sh +0 Character\ Macros
388.pp
389Character macros are
390user written functions which are executed during the reading process.
391The value returned by a character macro may or may not be used by
392the reader, depending on the type of macro and the value returned.
393Character macros are always attached to a single character with
394the
395.i setsyntax
396function.
397.sh +1 Types
398There are three types of character macros: normal, splicing and infix.
399These types differ in the arguments they are given or in what is done
400with the result they return.
401.sh +1 Normal
402.pp
403A normal macro
404is passed no arguments.
405The value returned by a normal macro is simply used by
406the reader as if it had read the value itself.
407Here is an example of a macro which returns the abbreviation
408for a given state.
409.Eb
410\->\fI(de\kAfun stateabbrev nil
411 \h'|\nAu'(cdr (assq (read) '((california . ca) (pennsylvania . pa)))))\fP
412stateabbrev
413\-> \fI(setsyntax '\e! 'vmacro 'stateabbrev)\fP
414t
415\-> \fI'( ! california ! wyoming ! pennsylvania)\fP
416(ca nil pa)
417.Ee
418Notice what happened to
419\fI ! wyoming\fP.
420Since it wasn't in the table, the macro probably didn't
421want to return anything at all, but it had to return
422something, and whatever it returned was put in the list.
423The splicing macro, described next, allows a character macro function
424to return a value that is ignored.
425.sh +0 Splicing
426.pp
427The value returned from a splicing macro must be a list or nil.
428If the value is nil, then the value is ignored, otherwise the reader
429acts as if it read each object in the list.
430Usually the list only contains one element.
431If the reader is reading at the top level (i.e. not collecting elements
432of list),
433then it is illegal for a splicing macro to return more then one
434element in the list.
435The major advantage of a splicing macro over a normal macro is the
436ability of the splicing macro to return nothing.
437The comment character (usually ;) is a splicing macro bound to a
438function which reads to the end of the line and always returns nil.
439Here is the previous example written as a splicing macro
440.Eb
441\-> \fI(de\kAfun stateabbrev nil
442\h'|\nAu'(\kC(lam\kBbda (value)
443 \h'|\nBu'(cond \kA(value (list value))
444 \h'|\nAu'(t nil)))
445 \h'|\nCu'(cdr (assq (read) '((california . ca) (pennsylvania . pa))))))\fP
446\-> \fI(setsyntax '! 'vsplicing-macro 'stateabbrev)\fP
447\-> \fI'(!pennsylvania ! foo !california)\fP
448(pa ca)
449\-> \fI'!foo !bar !pennsylvania\fP
450pa
451\->
452.Ee
453.sh +0 Infix
454.pp
455Infix macros are passed a
456.i conc
457structure representing what has been read so far.
458Briefly, a
459tconc
460structure is a single list cell whose car points to
461a list and whose cdr points to the last list cell in that list.
462The interpretation by the reader of the value
463returned by an infix macro depends on
464whether the macro is called while the reader is constructing a
465list or whether it is called at the top level of the reader.
466If the macro is called while a list is
467being constructed, then the value returned should be a tconc
468structure.
469The car of that structure replaces the list of elements that the
470reader has been collecting.
471If the macro is called at top level, then it will be passed the
472value nil, and the value it returns should either be nil
473or a tconc structure.
474If the macro returns nil, then the value is ignored and the reader
475continues to read.
476If the macro returns a tconc structure of one element (i.e. whose car
477is a list of one element), then that single element is returned
478as the value of
479.i read.
480If the macro returns a tconc structure of more than one element,
481then that list of elements is returned as the value of read.
482.Eb
483\-> \fI(de\kAfun plusop (x)
484 \h'|\nAu'(cond \kB((null x) (tconc nil '\e+))
485 \h'|\nBu'(t (lconc nil (list 'plus (caar x) (read))))))\fP
486
487plusop
488\-> \fI(setsyntax '\e+ 'vinfix-macro 'plusop)\fP
489t
490\-> \fI'(a + b)\fP
491(plus a b)
492\-> \fI'+\fP
493|+|
494\->
495.Ee
496.sh -1 Invocations
497.pp
498There are three different circumstances in which you would like
499a macro function to be triggered.
500.ip \fIAlways\ -\fP
501Whenever the macro character is seen, the macro should be invoked.
502This is accomplished by using the character classes
503.i cmacro ,
504.i csplicing-macro ,
505or
506.i cinfix-macro ,
507and by using the
508.i separator
509property.
510The syntax classes
511.b vmacro ,
512.b vsplicing-macro ,
513and
514.b vsingle-macro
515are defined this way.
516.ip \fIWhen\ first\ -\fP
517The macro should only be triggered when the macro character is the first
518character found after the scanning process.
519A syntax class for a
520.i when
521.i first
522macro would
523be defined
524using
525.i cmacro ,
526.i csplicing-macro ,
527or
528.i cinfix-macro
529and not including the
530.i separator
531property.
532.ip \fIWhen\ unique\ -\fP
533The macro should only be triggered when the macro character is the only
534character collected in the token collection
535phase of the reader,
536i.e the macro character is preceeded by zero or more
537.i cseparator s
538and followed by a
539.i separator.
540A syntax class for a
541.i when
542.i unique
543macro would
544be defined using
545.i csingle-macro ,
546.i csingle-splicing-macro ,
547or
548.i csingle-infix-macro
549and not including the
550.i separator
551property.
552The syntax classes so defined are
553.b vsingle-macro ,
554.b vsingle-splicing-macro ,
555and
556.b vsingle-infix-macro .
557.sh -1 Functions
558.Lf setsyntax 's_symbol\ 's_synclass\ ['ls_func]
559.Wh
560ls_func is the name of a function or a lambda body.
561.Re
562t
563.Se
564S_symbol should be a symbol whose print name is only one character.
565The syntax class for
566that character is
567set to s_synclass in the current readtable.
568If s_synclass is a class that requires a character macro, then
569ls_func must be supplied.
570.No
571The symbolic syntax codes are new to Opus 38.
572For compatibility, s_synclass can be one of the fixnum syntax codes
573which appeared in older versions of the
574.Fr
575Manual.
576This compatibility is only temporary: existing code which uses the
577fixnum syntax codes should be converted.
578.Lf getsyntax 's_symbol
579.Re
580the syntax class of the first character
581of s_symbol's print name.
582s_symbol's print name must be exactly one character long.
583.No
584This function is new to Opus 38.
585It supercedes \fI(status\ syntax)\fP which no longer exists.
586.Lf add-syntax-class 's_synclass\ 'l_properties
587.Re
588s_synclass
589.Se
590Defines the syntax class s_synclass to have properties l_properties.
591The list l_properties should contain a character classes mentioned
592above.
593l_properties may contain one of the escape properties:
594.i escape-always ,
595.i escape-when-unique ,
596or
597.i escape-when-first .
598l_properties may contain the
599.i separator
600property.
601After a syntax class has been defined with
602.i add-syntax-class ,
603the
604.i setsyntax
605function can be used to give characters that syntax class.
606.Eb
607; Define a non-separating macro character.
608; This type of macro character is used in UCI-Lisp, and
609; it corresponds to a FIRST MACRO in Interlisp
610\-> \fI(add-syntax-class 'vuci-macro '(cmacro escape-when-first))\fP
611vuci-macro
612\->
613.Ee