Commit | Line | Data |
---|---|---|
2240a03d TL |
1 | .SH |
2 | 0: Introduction | |
3 | .PP | |
4 | Yacc provides a general tool for imposing structure on the input to a computer program. | |
5 | The Yacc user prepares a | |
6 | specification of the input process; this includes rules | |
7 | describing the input structure, code to be invoked when these | |
8 | rules are recognized, and a low-level routine to do the | |
9 | basic input. | |
10 | Yacc then generates a function to control the input process. | |
11 | This function, called a | |
12 | .I parser , | |
13 | calls the user-supplied low-level input routine | |
14 | (the | |
15 | .I "lexical analyzer" ) | |
16 | to pick up the basic items | |
17 | (called | |
18 | .I tokens ) | |
19 | from the input stream. | |
20 | These tokens are organized according to the input structure rules, | |
21 | called | |
22 | .I "grammar rules" \|; | |
23 | when one of these rules has been recognized, | |
24 | then user code supplied for this rule, an | |
25 | .I action , | |
26 | is invoked; actions have the ability to return values and | |
27 | make use of the values of other actions. | |
28 | .PP | |
29 | Yacc is written in a portable dialect of C | |
30 | .[ | |
31 | Ritchie Kernighan Language Prentice | |
32 | .] | |
33 | and the actions, and output subroutine, are in C as well. | |
34 | Moreover, many of the syntactic conventions of Yacc follow C. | |
35 | .PP | |
36 | The heart of the input specification is a collection of grammar rules. | |
37 | Each rule describes an allowable structure and gives it a name. | |
38 | For example, one grammar rule might be | |
39 | .DS | |
40 | date : month\_name day \',\' year ; | |
41 | .DE | |
42 | Here, | |
43 | .I date , | |
44 | .I month\_name , | |
45 | .I day , | |
46 | and | |
47 | .I year | |
48 | represent structures of interest in the input process; | |
49 | presumably, | |
50 | .I month\_name , | |
51 | .I day , | |
52 | and | |
53 | .I year | |
54 | are defined elsewhere. | |
55 | The comma ``,'' is enclosed in single quotes; this implies that the | |
56 | comma is to appear literally in the input. | |
57 | The colon and semicolon merely serve as punctuation in the rule, and have | |
58 | no significance in controlling the input. | |
59 | Thus, with proper definitions, the input | |
60 | .DS | |
61 | July 4, 1776 | |
62 | .DE | |
63 | might be matched by the above rule. | |
64 | .PP | |
65 | An important part of the input process is carried out by the | |
66 | lexical analyzer. | |
67 | This user routine reads the input stream, recognizing the lower level structures, | |
68 | and communicates these tokens | |
69 | to the parser. | |
70 | For historical reasons, a structure recognized by the lexical analyzer is called a | |
71 | .I "terminal symbol" , | |
72 | while the structure recognized by the parser is called a | |
73 | .I "nonterminal symbol" . | |
74 | To avoid confusion, terminal symbols will usually be referred to as | |
75 | .I tokens . | |
76 | .PP | |
77 | There is considerable leeway in deciding whether to recognize structures using the lexical | |
78 | analyzer or grammar rules. | |
79 | For example, the rules | |
80 | .DS | |
81 | month\_name : \'J\' \'a\' \'n\' ; | |
82 | month\_name : \'F\' \'e\' \'b\' ; | |
83 | ||
84 | . . . | |
85 | ||
86 | month\_name : \'D\' \'e\' \'c\' ; | |
87 | .DE | |
88 | might be used in the above example. | |
89 | The lexical analyzer would only need to recognize individual letters, and | |
90 | .I month\_name | |
91 | would be a nonterminal symbol. | |
92 | Such low-level rules tend to waste time and space, and may | |
93 | complicate the specification beyond Yacc's ability to deal with it. | |
94 | Usually, the lexical analyzer would | |
95 | recognize the month names, | |
96 | and return an indication that a | |
97 | .I month\_name | |
98 | was seen; in this case, | |
99 | .I month\_name | |
100 | would be a token. | |
101 | .PP | |
102 | Literal characters such as ``,'' must also be passed through the lexical | |
103 | analyzer, and are also considered tokens. | |
104 | .PP | |
105 | Specification files are very flexible. | |
106 | It is realively easy to add to the above example the rule | |
107 | .DS | |
108 | date : month \'/\' day \'/\' year ; | |
109 | .DE | |
110 | allowing | |
111 | .DS | |
112 | 7 / 4 / 1776 | |
113 | .DE | |
114 | as a synonym for | |
115 | .DS | |
116 | July 4, 1776 | |
117 | .DE | |
118 | In most cases, this new rule could be ``slipped in'' to a working system with minimal effort, | |
119 | and little danger of disrupting existing input. | |
120 | .PP | |
121 | The input being read may not conform to the | |
122 | specifications. | |
123 | These input errors are detected as early as is theoretically possible with a | |
124 | left-to-right scan; | |
125 | thus, not only is the chance of reading and computing with bad | |
126 | input data substantially reduced, but the bad data can usually be quickly found. | |
127 | Error handling, | |
128 | provided as part of the input specifications, | |
129 | permits the reentry of bad data, | |
130 | or the continuation of the input process after skipping over the bad data. | |
131 | .PP | |
132 | In some cases, Yacc fails to produce a parser when given a set of | |
133 | specifications. | |
134 | For example, the specifications may be self contradictory, or they may | |
135 | require a more powerful recognition mechanism than that available to Yacc. | |
136 | The former cases represent design errors; | |
137 | the latter cases | |
138 | can often be corrected | |
139 | by making | |
140 | the lexical analyzer | |
141 | more powerful, or by rewriting some of the grammar rules. | |
142 | While Yacc cannot handle all possible specifications, its power | |
143 | compares favorably with similar systems; | |
144 | moreover, the | |
145 | constructions which are difficult for Yacc to handle are | |
146 | also frequently difficult for human beings to handle. | |
147 | Some users have reported that the discipline of formulating valid | |
148 | Yacc specifications for their input revealed errors of | |
149 | conception or design early in the program development. | |
150 | .PP | |
151 | The theory underlying Yacc has been described elsewhere. | |
152 | .[ | |
153 | Aho Johnson Surveys LR Parsing | |
154 | .] | |
155 | .[ | |
156 | Aho Johnson Ullman Ambiguous Grammars | |
157 | .] | |
158 | .[ | |
159 | Aho Ullman Principles Compiler Design | |
160 | .] | |
161 | Yacc has been extensively used in numerous practical applications, | |
162 | including | |
163 | .I lint , | |
164 | .[ | |
165 | Johnson Lint | |
166 | .] | |
167 | the Portable C Compiler, | |
168 | .[ | |
169 | Johnson Portable Compiler Theory | |
170 | .] | |
171 | and a system for typesetting mathematics. | |
172 | .[ | |
173 | Kernighan Cherry typesetting system CACM | |
174 | .] | |
175 | .PP | |
176 | The next several sections describe the | |
177 | basic process of preparing a Yacc specification; | |
178 | Section 1 describes the preparation of grammar rules, | |
179 | Section 2 the preparation of the user supplied actions associated with these rules, | |
180 | and Section 3 the preparation of lexical analyzers. | |
181 | Section 4 describes the operation of the parser. | |
182 | Section 5 discusses various reasons why Yacc may be unable to produce a | |
183 | parser from a specification, and what to do about it. | |
184 | Section 6 describes a simple mechanism for | |
185 | handling operator precedences in arithmetic expressions. | |
186 | Section 7 discusses error detection and recovery. | |
187 | Section 8 discusses the operating environment and special features | |
188 | of the parsers Yacc produces. | |
189 | Section 9 gives some suggestions which should improve the | |
190 | style and efficiency of the specifications. | |
191 | Section 10 discusses some advanced topics, and Section 11 gives | |
192 | acknowledgements. | |
193 | Appendix A has a brief example, and Appendix B gives a | |
194 | summary of the Yacc input syntax. | |
195 | Appendix C gives an example using some of the more advanced | |
196 | features of Yacc, and, finally, | |
197 | Appendix D describes mechanisms and syntax | |
198 | no longer actively supported, but | |
199 | provided for historical continuity with older versions of Yacc. |