[unix-history] / usr / doc / yacc / ss1

.tr *\(**
.tr |\(or
.SH
1: Basic Specifications
.PP
Names refer to either tokens or nonterminal symbols.
Yacc requires
token names to be declared as such.
In addition, for reasons discussed in Section 3, it is often desirable
to include the lexical analyzer as part of the specification file;
it may be useful to include other programs as well.
Thus, every specification file consists of three sections:
the
.I declarations ,
.I "(grammar) rules" ,
and
.I programs .
The sections are separated by double percent ``%%'' marks.
(The percent ``%'' is generally used in Yacc specifications as an escape character.)
.PP
In other words, a full specification file looks like
.DS
declarations
%%
rules
%%
programs
.DE
.PP
The declaration section may be empty.
Moreover, if the programs section is omitted, the second %% mark may be omitted also;
thus, the smallest legal Yacc specification is
.DS
%%
rules
.DE
.PP
Blanks, tabs, and newlines are ignored except
that they may not appear in names or multi-character reserved symbols.
Comments may appear wherever a name is legal; they are enclosed
in /* . . . */, as in C and PL/I.
.PP
The rules section is made up of one or more grammar rules.
A grammar rule has the form:
.DS
A  :  BODY  ;
.DE
A represents a nonterminal name, and BODY represents a sequence of zero or more names and literals.
The colon and the semicolon are Yacc punctuation.
.PP
Names may be of arbitrary length, and may be made up of letters, dot ``.'', underscore ``\_'', and
non-initial digits.
Upper and lower case letters are distinct.
The names used in the body of a grammar rule may represent tokens or nonterminal symbols.
.PP
A literal consists of a character enclosed in single quotes ``\'''.
As in C, the backslash ``\e'' is an escape character within literals, and all the C escapes
are recognized.
Thus
.DS
\'\en\'	newline
\'\er\'	return
\'\e\'\'	single quote ``\'''
\'\e\e\'	backslash ``\e''
\'\et\'	tab
\'\eb\'	backspace
\'\ef\'	form feed
\'\exxx\'	``xxx'' in octal
.DE
For a number of technical reasons, the
\s-2NUL\s0
character (\'\e0\' or 0) should never
be used in grammar rules.
.PP
If there are several grammar rules with the same left hand side, the vertical bar ``|''
can be used to avoid rewriting the left hand side.
In addition,
the semicolon at the end of a rule can be dropped before a vertical bar.
Thus the grammar rules
.DS
A	:	B  C  D   ;
A	:	E  F   ;
A	:	G   ;
.DE
can be given to Yacc as
.DS
A	:	B  C  D
	|	E  F
	|	G
	;
.DE
It is not necessary that all grammar rules with the same left side appear together in the grammar rules section,
although it makes the input much more readable, and easier to change.
.PP
If a nonterminal symbol matches the empty string, this can be indicated in the obvious way:
.DS
empty :   ;
.DE
.PP
Names representing tokens must be declared; this is most simply done by writing
.DS
%token   name1  name2 . . .
.DE
in the declarations section.
(See Sections 3 , 5, and 6 for much more discussion).
Every name not defined in the declarations section is assumed to represent a nonterminal symbol.
Every nonterminal symbol must appear on the left side of at least one rule.
.PP
Of all the nonterminal symbols, one, called the
.I "start symbol" ,
has particular importance.
The parser is designed to recognize the start symbol; thus,
this symbol represents the largest,
most general structure described by the grammar rules.
By default,
the start symbol is taken to be the left hand side of the first
grammar rule in the rules section.
It is possible, and in fact desirable, to declare the start
symbol explicitly in the declarations section using the %start keyword:
.DS
%start   symbol
.DE
.PP
The end of the input to the parser is signaled by a special token, called the
.I endmarker .
If the tokens up to, but not including, the endmarker form a structure
which matches the start symbol, the parser function returns to its caller
after the endmarker is seen; it
.I accepts
the input.
If the endmarker is seen in any other context, it is an error.
.PP
It is the job of the user-supplied lexical analyzer
to return the endmarker when appropriate; see section 3, below.
Usually the endmarker represents some reasonably obvious 
I/O status, such as ``end-of-file'' or ``end-of-record''.
Commit	Line	Data
c9528a00 C	1	.tr \(*
	2	.tr \|\(or
	3	.SH
	4	1: Basic Specifications
	5	.PP
	6	Names refer to either tokens or nonterminal symbols.
	7	Yacc requires
	8	token names to be declared as such.
	9	In addition, for reasons discussed in Section 3, it is often desirable
	10	to include the lexical analyzer as part of the specification file;
	11	it may be useful to include other programs as well.
	12	Thus, every specification file consists of three sections:
	13	the
	14	.I declarations ,
	15	.I "(grammar) rules" ,
	16	and
	17	.I programs .
	18	The sections are separated by double percent ``%%'' marks.
	19	(The percent ``%'' is generally used in Yacc specifications as an escape character.)
	20	.PP
	21	In other words, a full specification file looks like
	22	.DS
	23	declarations
	24	%%
	25	rules
	26	%%
	27	programs
	28	.DE
	29	.PP
	30	The declaration section may be empty.
	31	Moreover, if the programs section is omitted, the second %% mark may be omitted also;
	32	thus, the smallest legal Yacc specification is
	33	.DS
	34	%%
	35	rules
	36	.DE
	37	.PP
	38	Blanks, tabs, and newlines are ignored except
	39	that they may not appear in names or multi-character reserved symbols.
	40	Comments may appear wherever a name is legal; they are enclosed
	41	in /* . . . */, as in C and PL/I.
	42	.PP
	43	The rules section is made up of one or more grammar rules.
	44	A grammar rule has the form:
	45	.DS
	46	A : BODY ;
	47	.DE
	48	A represents a nonterminal name, and BODY represents a sequence of zero or more names and literals.
	49	The colon and the semicolon are Yacc punctuation.
	50	.PP
	51	Names may be of arbitrary length, and may be made up of letters, dot ``.'', underscore ``\_'', and
	52	non-initial digits.
	53	Upper and lower case letters are distinct.
	54	The names used in the body of a grammar rule may represent tokens or nonterminal symbols.
	55	.PP
	56	A literal consists of a character enclosed in single quotes ``\'''.
	57	As in C, the backslash ``\e'' is an escape character within literals, and all the C escapes
	58	are recognized.
	59	Thus
	60	.DS
	61	\'\en\' newline
	62	\'\er\' return
	63	\'\e\'\' single quote ``\'''
	64	\'\e\e\' backslash ``\e''
65	\'\et\' tab
66	\'\eb\' backspace
67	\'\ef\' form feed
68	\'\exxx\' ``xxx'' in octal
69	.DE
70	For a number of technical reasons, the
71	\s-2NUL\s0
72	character (\'\e0\' or 0) should never
73	be used in grammar rules.
74	.PP
75	If there are several grammar rules with the same left hand side, the vertical bar ``\|''
76	can be used to avoid rewriting the left hand side.
77	In addition,
78	the semicolon at the end of a rule can be dropped before a vertical bar.
79	Thus the grammar rules
80	.DS
81	A : B C D ;
82	A : E F ;
83	A : G ;
84	.DE
85	can be given to Yacc as
86	.DS
87	A : B C D
88	\| E F
89	\| G
90	;
91	.DE
92	It is not necessary that all grammar rules with the same left side appear together in the grammar rules section,
93	although it makes the input much more readable, and easier to change.
94	.PP
95	If a nonterminal symbol matches the empty string, this can be indicated in the obvious way:
96	.DS
97	empty : ;
98	.DE
99	.PP
100	Names representing tokens must be declared; this is most simply done by writing
101	.DS
102	%token name1 name2 . . .
103	.DE
104	in the declarations section.
105	(See Sections 3 , 5, and 6 for much more discussion).
106	Every name not defined in the declarations section is assumed to represent a nonterminal symbol.
107	Every nonterminal symbol must appear on the left side of at least one rule.
108	.PP
109	Of all the nonterminal symbols, one, called the
110	.I "start symbol" ,
111	has particular importance.
112	The parser is designed to recognize the start symbol; thus,
113	this symbol represents the largest,
114	most general structure described by the grammar rules.
115	By default,
116	the start symbol is taken to be the left hand side of the first
117	grammar rule in the rules section.
118	It is possible, and in fact desirable, to declare the start
119	symbol explicitly in the declarations section using the %start keyword:
120	.DS
121	%start symbol
122	.DE
123	.PP
124	The end of the input to the parser is signaled by a special token, called the
125	.I endmarker .
126	If the tokens up to, but not including, the endmarker form a structure
127	which matches the start symbol, the parser function returns to its caller
128	after the endmarker is seen; it
129	.I accepts
130	the input.
131	If the endmarker is seen in any other context, it is an error.
132	.PP
133	It is the job of the user-supplied lexical analyzer
134	to return the endmarker when appropriate; see section 3, below.
135	Usually the endmarker represents some reasonably obvious
136	I/O status, such as ``end-of-file'' or ``end-of-record''.