.\" Copyright (c) 1980 Regents of the University of California.
.\" All rights reserved. The Berkeley software License Agreement
.\" specifies the terms and conditions for redistribution.
.\" @(#)ch7.n 6.2 (Berkeley) 5/14/86
." $Header: ch7.n,v 1.3 83/07/01 11:22:58 layer Exp $
.sh 2 Introduction \n(ch 1
function is responsible for converting
characters into a Lisp expression.
is table driven and the table it uses is called a
it converts a Lisp expression into a stream of
Typically the conversion is done in such
a way that if that stream of characters were read by
result would be an expression equal to the one
must also refer to the readtable in order to determine
how to format its output.
function, which returns a list of characters rather than
printing them, must also refer to the readtable.
function, modified with the
function and interrogated with the
The structure of a readtable is hidden from the user - a
only be manipulated with the three functions mentioned above.
There is one distinguished readtable called the
whose value determines what
The current readtable is the value of the symbol
Thus it is possible to rapidly change
the current syntax by lambda binding
a different readtable to the symbol
When the binding is undone, the syntax reverts to its old form.
The readtable describes how each of the 128 ascii characters should
be treated by the reader and printer.
Each character belongs to a
which has three properties:
Tells what the reader should do when it sees this character.
There are a large number of character classes.
They are described below.
Most types of tokens the reader constructs are one character
Four token types have an arbitrary length: number (1234),
symbol print name (franz),
escaped symbol print name (|franz|), and string ("franz").
The reader can easily determine when it has
end of one of the last two types: it just looks for the
matching delimiter (| or ").
When the reader is reading a number or symbol print name, it
stops reading when it comes to a character with the
The separator character is pushed back into the input stream and will
be the first character read when the reader is called again.
Tells the printer when to put escapes in front of, or around, a symbol
whose print name contains this character.
There are three possibilities: always escape a symbol with this character
in it, only escape a symbol if this is the only character in the symbol,
and only escape a symbol if this is the first character in the symbol.
[note: The printer will always escape a symbol which, if printed out, would
look like a valid number.]
When the Lisp system is built, Lisp code is added to a C-coded kernel
and the result becomes the standard lisp system.
The readtable present in the C-coded kernel, called the
contains the bare necessities for reading in Lisp code.
construction of the complete Lisp system,
a copy is made of the raw readtable and
then the copy is modified by adding macro characters.
The result is what is called the
When a new readtable is created with
a copy is made of either the
or the current readtable (which is likely to be the standard readtable).
.sh +0 Reader\ Operations
The reader has a very simple algorithm.
Scanning involves reading characters and throwing
away those which don't start tokens (such as blanks and tabs).
Collecting means gathering the characters which make up a
Processing may involve creating symbols, strings, lists,
fixnums, bignums or flonums or calling a user written function called
The components of the syntax class determine when the reader
switches between the scanning, collecting and processing states.
The reader will continue scanning as long as the character class
of the characters it reads is
When it reads a character whose character class is not
it stores that character in its buffer and begins the collecting phase.
If the character class of that first character is
then it will continue collecting until it runs into a character whose
(That last character will be pushed back into the input buffer and will
be the first character read next time.)
Now the reader goes into the processing phase, checking to see if the
token it read is a number or symbol.
It is important to note that after
the first character is collected the component of the syntax class which
property, not the character class.
If the character class of the character which stopped the scanning is not
then the reader processes that character immediately.
.i csingle-splicing-macro ,
if the following token is not a
The processing which is done for a given character class
is described in detail in the next section.
.sh +0 Character\ Classes
.tl '\fI\\$1\fP''raw readtable:\\$2'
.tl '''standard readtable:\\$3'
.Cc ccharacter A-Z\ a-z\ ^H\ !#$%&*,/:;<=>?@^_`{}~ A-Z\ a-z\ ^H\ !$%&*/:;<=>?@^_{}~
The syntax for an integer (fixnum or bignum) is a string of
characters optionally followed by a
If the digits are not followed by a
then they are interpreted in base
which must be eight or ten.
The syntax for a floating point number is
and then followed by one or more
may also be an integer or floating point number followed
by 'e' or 'd', an optional '+' or '\-'
A leading sign for a number.
No other characters should be given this class.
Tells the reader to begin forming a list.
Tells the reader that it has reached the end of a list.
Tells the reader that it should begin forming a list.
for the difference between cleft-bracket and cleft-paren.
finishes the formation of the current
list and all enclosing lists until it finds one which
The period is used to separate element of a cons cell
[e.g. (a\ .\ (b\ .\ nil)) is the same as (a\ b)].
is also used in numbers as described above.
.Cc cseparator ^I-^M\ esc\ space ^I-^M\ esc\ space
Separates tokens. When the reader is scanning, these character
Note: there is a difference between the
property of a syntax class.
.Cc csingle-quote \\' \\'
to be called recursively and the list
(quote <value read>) to be returned.
.Cc csymbol-delimiter | |
This causes the reader to begin collecting characters and to stop only
within a symbol name is with a
The collected characters are converted into a string which becomes
the print name of a symbol.
If a symbol with an identical print name already exists, then the
allocation is not done, rather the existing symbol is used.
This causes the next character to read in to be treated as a
A character whose syntax class is
property so it will not separate symbols.
.Cc cstring-delimiter """" """"
except the result is returned as a string instead of a symbol.
.Cc csingle-character-symbol none none
This returns a symbol whose print name is the the single character
which has been collected.
The reader calls the macro function associated with this character and
the current readtable, passing it no arguments.
The result of the macro is added to the structure the reader is building,
just as if that form were directly read by the reader.
More details on macros are provided below.
.Cc csplicing-macro none #;
in the way the result is incorporated in the structure the reader is
must return a list of forms (possibly empty).
if it read each element of
the surrounding parenthesis.
.Cc csingle-macro none none
This causes to reader to check the next character.
Otherwise, it acts like a
.Cc csingle-splicing-macro none none
however the result is spliced in like a
.Cc cinfix-macro none none
in that the macro function is passed a form representing what the reader
The result of the macro replaces what the reader had read so far.
.Cc csingle-infix-macro none none
in that the macro will only be triggered if the character following the
.Cc cillegal ^@-^G^N-^Z^\e-^_rubout ^@-^G^N-^Z^\e-^_rubout
The characters cause the reader to signal an error if read.
The readtable maps each character into a syntax class.
The syntax class contains three pieces of information:
the character class, whether this is a separator, and the escape
The first two properties are used by the reader, the last by
The initial lisp system has the following syntax classes defined.
The user may add syntax classes with
For each syntax class, we list the properties of the class and
which characters have this syntax class by default.
More information about each syntax class can be found under the
description of the syntax class's character class.
.tl '\fB\\$1\fP''raw readtable:\\$2'
.tl '\fI\\$4\fP''standard readtable:\\$3'
.if \n(.$>5 .tl '\fI\\$6\fP'''
.Sy vcharacter A-Z\ a-z\ ^H\ !#$%&*,/:;<=>?@^_`{}~ A-Z\ a-z\ ^H\ !$%&*/:;<=>?@^_{}~ ccharacter
.Sy vnumber 0-9 0-9 cnumber
.Sy vleft-paren ( ( cleft-paren escape-always separator
.Sy vright-paren ) ) cright-paren escape-always separator
.Sy vleft-bracket [ [ cleft-bracket escape-always separator
.Sy vright-bracket ] ] cright-bracket escape-always separator
.Sy vperiod . . cperiod escape-when-unique
.Sy vseparator ^I-^M\ esc\ space ^I-^M\ esc\ space cseparator escape-always separator
.Sy vsingle-quote \\' \\' csingle-quote escape-always separator
.Sy vsymbol-delimiter | | csingle-delimiter escape-always
.Sy vescape \e \e cescape escape-always
.Sy vstring-delimiter """" """" cstring-delimiter escape-always
.Sy vsingle-character-symbol none none csingle-character-symbol separator
.Sy vmacro none `, cmacro escape-always separator
.Sy vsplicing-macro none #; csplicing-macro escape-always separator
.Sy vsingle-macro none none csingle-macro escape-when-unique
.Sy vsingle-splicing-macro none none csingle-splicing-macro escape-when-unique
.Sy vinfix-macro none none cinfix-macro escape-always separator
.Sy vsingle-infix-macro none none csingle-infix-macro escape-when-unique
.Sy villegal ^@-^G^N-^Z^\e-^_rubout ^@-^G^N-^Z^\e-^_rubout cillegal escape-always separator
user written functions which are executed during the reading process.
The value returned by a character macro may or may not be used by
the reader, depending on the type of macro and the value returned.
Character macros are always attached to a single character with
There are three types of character macros: normal, splicing and infix.
These types differ in the arguments they are given or in what is done
with the result they return.
The value returned by a normal macro is simply used by
the reader as if it had read the value itself.
Here is an example of a macro which returns the abbreviation
\->\fI(de\kAfun stateabbrev nil
\h'|\nAu'(cdr (assq (read) '((california . ca) (pennsylvania . pa)))))\fP
\-> \fI(setsyntax '\e! 'vmacro 'stateabbrev)\fP
\-> \fI'( ! california ! wyoming ! pennsylvania)\fP
Since it wasn't in the table, the associated function
The creator of the macro may have wanted to leave the
list alone, in such a case, but couldn't with this
The splicing macro, described next, allows a character macro function
to return a value that is ignored.
The value returned from a splicing macro must be a list or nil.
If the value is nil, then the value is ignored, otherwise the reader
acts as if it read each object in the list.
Usually the list only contains one element.
If the reader is reading at the top level (i.e. not collecting elements
then it is illegal for a splicing macro to return more then one
The major advantage of a splicing macro over a normal macro is the
ability of the splicing macro to return nothing.
The comment character (usually ;) is a splicing macro bound to a
function which reads to the end of the line and always returns nil.
Here is the previous example written as a splicing macro
\-> \fI(de\kAfun stateabbrev nil
\h'|\nAu'(\kC(lam\kBbda (value)
\h'|\nBu'(cond \kA(value (list value))
\h'|\nCu'(cdr (assq (read) '((california . ca) (pennsylvania . pa))))))\fP
\-> \fI(setsyntax '! 'vsplicing-macro 'stateabbrev)\fP
\-> \fI'(!pennsylvania ! foo !california)\fP
\-> \fI'!foo !bar !pennsylvania\fP
Infix macros are passed a
structure representing what has been read so far.
structure is a single list cell whose car points to
a list and whose cdr points to the last list cell in that list.
The interpretation by the reader of the value
returned by an infix macro depends on
whether the macro is called while the reader is constructing a
list or whether it is called at the top level of the reader.
If the macro is called while a list is
being constructed, then the value returned should be a tconc
The car of that structure replaces the list of elements that the
reader has been collecting.
If the macro is called at top level, then it will be passed the
value nil, and the value it returns should either be nil
If the macro returns nil, then the value is ignored and the reader
If the macro returns a tconc structure of one element (i.e. whose car
is a list of one element), then that single element is returned
If the macro returns a tconc structure of more than one element,
then that list of elements is returned as the value of read.
\-> \fI(de\kAfun plusop (x)
\h'|\nAu'(cond \kB((null x) (tconc nil '\e+))
\h'|\nBu'(t (lconc nil (list 'plus (caar x) (read))))))\fP
\-> \fI(setsyntax '\e+ 'vinfix-macro 'plusop)\fP
There are three different circumstances in which you would like
a macro function to be triggered.
Whenever the macro character is seen, the macro should be invoked.
This is accomplished by using the character classes
The macro should only be triggered when the macro character is the first
character found after the scanning process.
.ip \fIWhen\ unique\ -\fP
The macro should only be triggered when the macro character is the only
character collected in the token collection
i.e the macro character is preceeded by zero or more
.i csingle-splicing-macro ,
The syntax classes so defined are
.b vsingle-splicing-macro ,
.Lf setsyntax 's_symbol\ 's_synclass\ ['ls_func]
ls_func is the name of a function or a lambda body.
S_symbol should be a symbol whose print name is only one character.
set to s_synclass in the current readtable.
If s_synclass is a class that requires a character macro, then
ls_func must be supplied.
The symbolic syntax codes are new to Opus 38.
For compatibility, s_synclass can be one of the fixnum syntax codes
which appeared in older versions of the
This compatibility is only temporary: existing code which uses the
fixnum syntax codes should be converted.
the syntax class of the first character
of s_symbol's print name.
s_symbol's print name must be exactly one character long.
This function is new to Opus 38.
It supercedes \fI(status\ syntax)\fP which no longer exists.
.Lf add-syntax-class 's_synclass\ 'l_properties
Defines the syntax class s_synclass to have properties l_properties.
The list l_properties should contain a character classes mentioned
l_properties may contain one of the escape properties:
l_properties may contain the
After a syntax class has been defined with
function can be used to give characters that syntax class.
; Define a non-separating macro character.
; This type of macro character is used in UCI-Lisp, and
; it corresponds to a FIRST MACRO in Interlisp
\-> \fI(add-syntax-class 'vuci-macro '(cmacro escape-when-first))\fP