Commit | Line | Data |
---|---|---|
c9528a00 C |
1 | .SH |
2 | 5: Ambiguity and Conflicts | |
3 | .PP | |
4 | A set of grammar rules is | |
5 | .I ambiguous | |
6 | if there is some input string that can be structured in two or more different ways. | |
7 | For example, the grammar rule | |
8 | .DS | |
9 | expr : expr \'\-\' expr | |
10 | .DE | |
11 | is a natural way of expressing the fact that one way of forming an arithmetic expression | |
12 | is to put two other expressions together with a minus sign between them. | |
13 | Unfortunately, this grammar rule does not | |
14 | completely specify the way that all complex inputs | |
15 | should be structured. | |
16 | For example, if the input is | |
17 | .DS | |
18 | expr \- expr \- expr | |
19 | .DE | |
20 | the rule allows this input to be structured as either | |
21 | .DS | |
22 | ( expr \- expr ) \- expr | |
23 | .DE | |
24 | or as | |
25 | .DS | |
26 | expr \- ( expr \- expr ) | |
27 | .DE | |
28 | (The first is called | |
29 | .I "left association" , | |
30 | the second | |
31 | .I "right association" ). | |
32 | .PP | |
33 | Yacc detects such ambiguities when it is attempting to build the parser. | |
34 | It is instructive to consider the problem that confronts the parser when it is | |
35 | given an input such as | |
36 | .DS | |
37 | expr \- expr \- expr | |
38 | .DE | |
39 | When the parser has read the second expr, the input that it has seen: | |
40 | .DS | |
41 | expr \- expr | |
42 | .DE | |
43 | matches the right side of the grammar rule above. | |
44 | The parser could | |
45 | .I reduce | |
46 | the input by applying this rule; | |
47 | after applying the rule; | |
48 | the input is reduced to | |
49 | .I expr (the | |
50 | left side of the rule). | |
51 | The parser would then read the final part of the input: | |
52 | .DS | |
53 | \- expr | |
54 | .DE | |
55 | and again reduce. | |
56 | The effect of this is to take the left associative interpretation. | |
57 | .PP | |
58 | Alternatively, when the parser has seen | |
59 | .DS | |
60 | expr \- expr | |
61 | .DE | |
62 | it could defer the immediate application of the rule, and continue reading | |
63 | the input until it had seen | |
64 | .DS | |
65 | expr \- expr \- expr | |
66 | .DE | |
67 | It could then apply the rule to the rightmost three symbols, reducing them to | |
68 | .I expr | |
69 | and leaving | |
70 | .DS | |
71 | expr \- expr | |
72 | .DE | |
73 | Now the rule can be reduced once more; the effect is to | |
74 | take the right associative interpretation. | |
75 | Thus, having read | |
76 | .DS | |
77 | expr \- expr | |
78 | .DE | |
79 | the parser can do two legal things, a shift or a reduction, and has no way of | |
80 | deciding between them. | |
81 | This is called a | |
82 | .I "shift / reduce conflict" . | |
83 | It may also happen that the parser has a choice of two legal reductions; | |
84 | this is called a | |
85 | .I "reduce / reduce conflict" . | |
86 | Note that there are never any ``Shift/shift'' conflicts. | |
87 | .PP | |
88 | When there are shift/reduce or reduce/reduce conflicts, Yacc still produces a parser. | |
89 | It does this by selecting one of the valid steps wherever it has a choice. | |
90 | A rule describing which choice to make in a given situation is called | |
91 | a | |
92 | .I "disambiguating rule" . | |
93 | .PP | |
94 | Yacc invokes two disambiguating rules by default: | |
95 | .IP 1. | |
96 | In a shift/reduce conflict, the default is to do the shift. | |
97 | .IP 2. | |
98 | In a reduce/reduce conflict, the default is to reduce by the | |
99 | .I earlier | |
100 | grammar rule (in the input sequence). | |
101 | .PP | |
102 | Rule 1 implies that reductions are deferred whenever there is a choice, | |
103 | in favor of shifts. | |
104 | Rule 2 gives the user rather crude control over the behavior of the parser | |
105 | in this situation, but reduce/reduce conflicts should be avoided whenever possible. | |
106 | .PP | |
107 | Conflicts may arise because of mistakes in input or logic, or because the grammar rules, while consistent, | |
108 | require a more complex parser than Yacc can construct. | |
109 | The use of actions within rules can also cause conflicts, if the action must | |
110 | be done before the parser can be sure which rule is being recognized. | |
111 | In these cases, the application of disambiguating rules is inappropriate, | |
112 | and leads to an incorrect parser. | |
113 | For this reason, Yacc | |
114 | always reports the number of shift/reduce and reduce/reduce conflicts resolved by Rule 1 and Rule 2. | |
115 | .PP | |
116 | In general, whenever it is possible to apply disambiguating rules to produce a correct parser, it is also | |
117 | possible to rewrite the grammar rules so that the same inputs are read but there are no | |
118 | conflicts. | |
119 | For this reason, most previous parser generators | |
120 | have considered conflicts to be fatal errors. | |
121 | Our experience has suggested that this rewriting is somewhat unnatural, | |
122 | and produces slower parsers; thus, Yacc will produce parsers even in the presence of conflicts. | |
123 | .PP | |
124 | As an example of the power of disambiguating rules, consider a fragment from a programming | |
125 | language involving an ``if-then-else'' construction: | |
126 | .DS | |
127 | stat : IF \'(\' cond \')\' stat | |
128 | | IF \'(\' cond \')\' stat ELSE stat | |
129 | ; | |
130 | .DE | |
131 | In these rules, | |
132 | .I IF | |
133 | and | |
134 | .I ELSE | |
135 | are tokens, | |
136 | .I cond | |
137 | is a nonterminal symbol describing | |
138 | conditional (logical) expressions, and | |
139 | .I stat | |
140 | is a nonterminal symbol describing statements. | |
141 | The first rule will be called the | |
142 | .ul | |
143 | simple-if | |
144 | rule, and the | |
145 | second the | |
146 | .ul | |
147 | if-else | |
148 | rule. | |
149 | .PP | |
150 | These two rules form an ambiguous construction, since input of the form | |
151 | .DS | |
152 | IF ( C1 ) IF ( C2 ) S1 ELSE S2 | |
153 | .DE | |
154 | can be structured according to these rules in two ways: | |
155 | .DS | |
156 | IF ( C1 ) { | |
157 | IF ( C2 ) S1 | |
158 | } | |
159 | ELSE S2 | |
160 | .DE | |
161 | or | |
162 | .DS | |
163 | IF ( C1 ) { | |
164 | IF ( C2 ) S1 | |
165 | ELSE S2 | |
166 | } | |
167 | .DE | |
168 | The second interpretation is the one given in most programming languages | |
169 | having this construct. | |
170 | Each | |
171 | .I ELSE | |
172 | is associated with the last preceding | |
173 | ``un-\fIELSE'\fRd'' | |
174 | .I IF . | |
175 | In this example, consider the situation where the parser has seen | |
176 | .DS | |
177 | IF ( C1 ) IF ( C2 ) S1 | |
178 | .DE | |
179 | and is looking at the | |
180 | .I ELSE . | |
181 | It can immediately | |
182 | reduce | |
183 | by the simple-if rule to get | |
184 | .DS | |
185 | IF ( C1 ) stat | |
186 | .DE | |
187 | and then read the remaining input, | |
188 | .DS | |
189 | ELSE S2 | |
190 | .DE | |
191 | and reduce | |
192 | .DS | |
193 | IF ( C1 ) stat ELSE S2 | |
194 | .DE | |
195 | by the if-else rule. | |
196 | This leads to the first of the above groupings of the input. | |
197 | .PP | |
198 | On the other hand, the | |
199 | .I ELSE | |
200 | may be shifted, | |
201 | .I S2 | |
202 | read, and then the right hand portion of | |
203 | .DS | |
204 | IF ( C1 ) IF ( C2 ) S1 ELSE S2 | |
205 | .DE | |
206 | can be reduced by the if-else rule to get | |
207 | .DS | |
208 | IF ( C1 ) stat | |
209 | .DE | |
210 | which can be reduced by the simple-if rule. | |
211 | This leads to the second of the above groupings of the input, which | |
212 | is usually desired. | |
213 | .PP | |
214 | Once again the parser can do two valid things \- there is a shift/reduce conflict. | |
215 | The application of disambiguating rule 1 tells the parser to shift in this case, | |
216 | which leads to the desired grouping. | |
217 | .PP | |
218 | This shift/reduce conflict arises only when there is a particular current input symbol, | |
219 | .I ELSE , | |
220 | and particular inputs already seen, such as | |
221 | .DS | |
222 | IF ( C1 ) IF ( C2 ) S1 | |
223 | .DE | |
224 | In general, there may be many conflicts, and each one | |
225 | will be associated with an input symbol and | |
226 | a set of previously read inputs. | |
227 | The previously read inputs are characterized by the | |
228 | state | |
229 | of the parser. | |
230 | .PP | |
231 | The conflict messages of Yacc are best understood | |
232 | by examining the verbose (\fB\-v\fR) option output file. | |
233 | For example, the output corresponding to the above | |
234 | conflict state might be: | |
235 | .DS L | |
236 | 23: shift/reduce conflict (shift 45, reduce 18) on ELSE | |
237 | ||
238 | state 23 | |
239 | ||
240 | stat : IF ( cond ) stat\_ (18) | |
241 | stat : IF ( cond ) stat\_ELSE stat | |
242 | ||
243 | ELSE shift 45 | |
244 | . reduce 18 | |
245 | ||
246 | .DE | |
247 | The first line describes the conflict, giving the state and the input symbol. | |
248 | The ordinary state description follows, giving | |
249 | the grammar rules active in the state, and the parser actions. | |
250 | Recall that the underline marks the | |
251 | portion of the grammar rules which has been seen. | |
252 | Thus in the example, in state 23 the parser has seen input corresponding | |
253 | to | |
254 | .DS | |
255 | IF ( cond ) stat | |
256 | .DE | |
257 | and the two grammar rules shown are active at this time. | |
258 | The parser can do two possible things. | |
259 | If the input symbol is | |
260 | .I ELSE , | |
261 | it is possible to shift into state | |
262 | 45. | |
263 | State 45 will have, as part of its description, the line | |
264 | .DS | |
265 | stat : IF ( cond ) stat ELSE\_stat | |
266 | .DE | |
267 | since the | |
268 | .I ELSE | |
269 | will have been shifted in this state. | |
270 | Back in state 23, the alternative action, described by ``\fB.\fR'', | |
271 | is to be done if the input symbol is not mentioned explicitly in the above actions; thus, | |
272 | in this case, if the input symbol is not | |
273 | .I ELSE , | |
274 | the parser reduces by grammar rule 18: | |
275 | .DS | |
276 | stat : IF \'(\' cond \')\' stat | |
277 | .DE | |
278 | Once again, notice that the numbers following ``shift'' commands refer to other states, | |
279 | while the numbers following ``reduce'' commands refer to grammar | |
280 | rule numbers. | |
281 | In the | |
282 | .I y.output | |
283 | file, the rule numbers are printed after those rules which can be reduced. | |
284 | In most one states, there will be at most reduce action possible in the | |
285 | state, and this will be the default command. | |
286 | The user who encounters unexpected shift/reduce conflicts will probably want to | |
287 | look at the verbose output to decide whether the default actions are appropriate. | |
288 | In really tough cases, the user might need to know more about | |
289 | the behavior and construction of the parser than can be covered here. | |
290 | In this case, one of the theoretical references | |
291 | .[ | |
292 | Aho Johnson Surveys Parsing | |
293 | .] | |
294 | .[ | |
295 | Aho Johnson Ullman Deterministic Ambiguous | |
296 | .] | |
297 | .[ | |
298 | Aho Ullman Principles Design | |
299 | .] | |
300 | might be consulted; the services of a local guru might also be appropriate. |