BSD 4_2 development
[unix-history] / usr / doc / yacc / ss0
CommitLineData
c9528a00
C
1.SH
20: Introduction
3.PP
4Yacc provides a general tool for imposing structure on the input to a computer program.
5The Yacc user prepares a
6specification of the input process; this includes rules
7describing the input structure, code to be invoked when these
8rules are recognized, and a low-level routine to do the
9basic input.
10Yacc then generates a function to control the input process.
11This function, called a
12.I parser ,
13calls the user-supplied low-level input routine
14(the
15.I "lexical analyzer" )
16to pick up the basic items
17(called
18.I tokens )
19from the input stream.
20These tokens are organized according to the input structure rules,
21called
22.I "grammar rules" \|;
23when one of these rules has been recognized,
24then user code supplied for this rule, an
25.I action ,
26is invoked; actions have the ability to return values and
27make use of the values of other actions.
28.PP
29Yacc is written in a portable dialect of C
30.[
31Ritchie Kernighan Language Prentice
32.]
33and the actions, and output subroutine, are in C as well.
34Moreover, many of the syntactic conventions of Yacc follow C.
35.PP
36The heart of the input specification is a collection of grammar rules.
37Each rule describes an allowable structure and gives it a name.
38For example, one grammar rule might be
39.DS
40date : month\_name day \',\' year ;
41.DE
42Here,
43.I date ,
44.I month\_name ,
45.I day ,
46and
47.I year
48represent structures of interest in the input process;
49presumably,
50.I month\_name ,
51.I day ,
52and
53.I year
54are defined elsewhere.
55The comma ``,'' is enclosed in single quotes; this implies that the
56comma is to appear literally in the input.
57The colon and semicolon merely serve as punctuation in the rule, and have
58no significance in controlling the input.
59Thus, with proper definitions, the input
60.DS
61July 4, 1776
62.DE
63might be matched by the above rule.
64.PP
65An important part of the input process is carried out by the
66lexical analyzer.
67This user routine reads the input stream, recognizing the lower level structures,
68and communicates these tokens
69to the parser.
70For historical reasons, a structure recognized by the lexical analyzer is called a
71.I "terminal symbol" ,
72while the structure recognized by the parser is called a
73.I "nonterminal symbol" .
74To avoid confusion, terminal symbols will usually be referred to as
75.I tokens .
76.PP
77There is considerable leeway in deciding whether to recognize structures using the lexical
78analyzer or grammar rules.
79For example, the rules
80.DS
81month\_name : \'J\' \'a\' \'n\' ;
82month\_name : \'F\' \'e\' \'b\' ;
83
84 . . .
85
86month\_name : \'D\' \'e\' \'c\' ;
87.DE
88might be used in the above example.
89The lexical analyzer would only need to recognize individual letters, and
90.I month\_name
91would be a nonterminal symbol.
92Such low-level rules tend to waste time and space, and may
93complicate the specification beyond Yacc's ability to deal with it.
94Usually, the lexical analyzer would
95recognize the month names,
96and return an indication that a
97.I month\_name
98was seen; in this case,
99.I month\_name
100would be a token.
101.PP
102Literal characters such as ``,'' must also be passed through the lexical
103analyzer, and are also considered tokens.
104.PP
105Specification files are very flexible.
106It is realively easy to add to the above example the rule
107.DS
108date : month \'/\' day \'/\' year ;
109.DE
110allowing
111.DS
1127 / 4 / 1776
113.DE
114as a synonym for
115.DS
116July 4, 1776
117.DE
118In most cases, this new rule could be ``slipped in'' to a working system with minimal effort,
119and little danger of disrupting existing input.
120.PP
121The input being read may not conform to the
122specifications.
123These input errors are detected as early as is theoretically possible with a
124left-to-right scan;
125thus, not only is the chance of reading and computing with bad
126input data substantially reduced, but the bad data can usually be quickly found.
127Error handling,
128provided as part of the input specifications,
129permits the reentry of bad data,
130or the continuation of the input process after skipping over the bad data.
131.PP
132In some cases, Yacc fails to produce a parser when given a set of
133specifications.
134For example, the specifications may be self contradictory, or they may
135require a more powerful recognition mechanism than that available to Yacc.
136The former cases represent design errors;
137the latter cases
138can often be corrected
139by making
140the lexical analyzer
141more powerful, or by rewriting some of the grammar rules.
142While Yacc cannot handle all possible specifications, its power
143compares favorably with similar systems;
144moreover, the
145constructions which are difficult for Yacc to handle are
146also frequently difficult for human beings to handle.
147Some users have reported that the discipline of formulating valid
148Yacc specifications for their input revealed errors of
149conception or design early in the program development.
150.PP
151The theory underlying Yacc has been described elsewhere.
152.[
153Aho Johnson Surveys LR Parsing
154.]
155.[
156Aho Johnson Ullman Ambiguous Grammars
157.]
158.[
159Aho Ullman Principles Compiler Design
160.]
161Yacc has been extensively used in numerous practical applications,
162including
163.I lint ,
164.[
165Johnson Lint
166.]
167the Portable C Compiler,
168.[
169Johnson Portable Compiler Theory
170.]
171and a system for typesetting mathematics.
172.[
173Kernighan Cherry typesetting system CACM
174.]
175.PP
176The next several sections describe the
177basic process of preparing a Yacc specification;
178Section 1 describes the preparation of grammar rules,
179Section 2 the preparation of the user supplied actions associated with these rules,
180and Section 3 the preparation of lexical analyzers.
181Section 4 describes the operation of the parser.
182Section 5 discusses various reasons why Yacc may be unable to produce a
183parser from a specification, and what to do about it.
184Section 6 describes a simple mechanism for
185handling operator precedences in arithmetic expressions.
186Section 7 discusses error detection and recovery.
187Section 8 discusses the operating environment and special features
188of the parsers Yacc produces.
189Section 9 gives some suggestions which should improve the
190style and efficiency of the specifications.
191Section 10 discusses some advanced topics, and Section 11 gives
192acknowledgements.
193Appendix A has a brief example, and Appendix B gives a
194summary of the Yacc input syntax.
195Appendix C gives an example using some of the more advanced
196features of Yacc, and, finally,
197Appendix D describes mechanisms and syntax
198no longer actively supported, but
199provided for historical continuity with older versions of Yacc.