BSD 4_2 development
[unix-history] / usr / doc / yacc / ss5
CommitLineData
c9528a00
C
1.SH
25: Ambiguity and Conflicts
3.PP
4A set of grammar rules is
5.I ambiguous
6if there is some input string that can be structured in two or more different ways.
7For example, the grammar rule
8.DS
9expr : expr \'\-\' expr
10.DE
11is a natural way of expressing the fact that one way of forming an arithmetic expression
12is to put two other expressions together with a minus sign between them.
13Unfortunately, this grammar rule does not
14completely specify the way that all complex inputs
15should be structured.
16For example, if the input is
17.DS
18expr \- expr \- expr
19.DE
20the rule allows this input to be structured as either
21.DS
22( expr \- expr ) \- expr
23.DE
24or as
25.DS
26expr \- ( expr \- expr )
27.DE
28(The first is called
29.I "left association" ,
30the second
31.I "right association" ).
32.PP
33Yacc detects such ambiguities when it is attempting to build the parser.
34It is instructive to consider the problem that confronts the parser when it is
35given an input such as
36.DS
37expr \- expr \- expr
38.DE
39When the parser has read the second expr, the input that it has seen:
40.DS
41expr \- expr
42.DE
43matches the right side of the grammar rule above.
44The parser could
45.I reduce
46the input by applying this rule;
47after applying the rule;
48the input is reduced to
49.I expr (the
50left side of the rule).
51The parser would then read the final part of the input:
52.DS
53\- expr
54.DE
55and again reduce.
56The effect of this is to take the left associative interpretation.
57.PP
58Alternatively, when the parser has seen
59.DS
60expr \- expr
61.DE
62it could defer the immediate application of the rule, and continue reading
63the input until it had seen
64.DS
65expr \- expr \- expr
66.DE
67It could then apply the rule to the rightmost three symbols, reducing them to
68.I expr
69and leaving
70.DS
71expr \- expr
72.DE
73Now the rule can be reduced once more; the effect is to
74take the right associative interpretation.
75Thus, having read
76.DS
77expr \- expr
78.DE
79the parser can do two legal things, a shift or a reduction, and has no way of
80deciding between them.
81This is called a
82.I "shift / reduce conflict" .
83It may also happen that the parser has a choice of two legal reductions;
84this is called a
85.I "reduce / reduce conflict" .
86Note that there are never any ``Shift/shift'' conflicts.
87.PP
88When there are shift/reduce or reduce/reduce conflicts, Yacc still produces a parser.
89It does this by selecting one of the valid steps wherever it has a choice.
90A rule describing which choice to make in a given situation is called
91a
92.I "disambiguating rule" .
93.PP
94Yacc invokes two disambiguating rules by default:
95.IP 1.
96In a shift/reduce conflict, the default is to do the shift.
97.IP 2.
98In a reduce/reduce conflict, the default is to reduce by the
99.I earlier
100grammar rule (in the input sequence).
101.PP
102Rule 1 implies that reductions are deferred whenever there is a choice,
103in favor of shifts.
104Rule 2 gives the user rather crude control over the behavior of the parser
105in this situation, but reduce/reduce conflicts should be avoided whenever possible.
106.PP
107Conflicts may arise because of mistakes in input or logic, or because the grammar rules, while consistent,
108require a more complex parser than Yacc can construct.
109The use of actions within rules can also cause conflicts, if the action must
110be done before the parser can be sure which rule is being recognized.
111In these cases, the application of disambiguating rules is inappropriate,
112and leads to an incorrect parser.
113For this reason, Yacc
114always reports the number of shift/reduce and reduce/reduce conflicts resolved by Rule 1 and Rule 2.
115.PP
116In general, whenever it is possible to apply disambiguating rules to produce a correct parser, it is also
117possible to rewrite the grammar rules so that the same inputs are read but there are no
118conflicts.
119For this reason, most previous parser generators
120have considered conflicts to be fatal errors.
121Our experience has suggested that this rewriting is somewhat unnatural,
122and produces slower parsers; thus, Yacc will produce parsers even in the presence of conflicts.
123.PP
124As an example of the power of disambiguating rules, consider a fragment from a programming
125language involving an ``if-then-else'' construction:
126.DS
127stat : IF \'(\' cond \')\' stat
128 | IF \'(\' cond \')\' stat ELSE stat
129 ;
130.DE
131In these rules,
132.I IF
133and
134.I ELSE
135are tokens,
136.I cond
137is a nonterminal symbol describing
138conditional (logical) expressions, and
139.I stat
140is a nonterminal symbol describing statements.
141The first rule will be called the
142.ul
143simple-if
144rule, and the
145second the
146.ul
147if-else
148rule.
149.PP
150These two rules form an ambiguous construction, since input of the form
151.DS
152IF ( C1 ) IF ( C2 ) S1 ELSE S2
153.DE
154can be structured according to these rules in two ways:
155.DS
156IF ( C1 ) {
157 IF ( C2 ) S1
158 }
159ELSE S2
160.DE
161or
162.DS
163IF ( C1 ) {
164 IF ( C2 ) S1
165 ELSE S2
166 }
167.DE
168The second interpretation is the one given in most programming languages
169having this construct.
170Each
171.I ELSE
172is associated with the last preceding
173``un-\fIELSE'\fRd''
174.I IF .
175In this example, consider the situation where the parser has seen
176.DS
177IF ( C1 ) IF ( C2 ) S1
178.DE
179and is looking at the
180.I ELSE .
181It can immediately
182reduce
183by the simple-if rule to get
184.DS
185IF ( C1 ) stat
186.DE
187and then read the remaining input,
188.DS
189ELSE S2
190.DE
191and reduce
192.DS
193IF ( C1 ) stat ELSE S2
194.DE
195by the if-else rule.
196This leads to the first of the above groupings of the input.
197.PP
198On the other hand, the
199.I ELSE
200may be shifted,
201.I S2
202read, and then the right hand portion of
203.DS
204IF ( C1 ) IF ( C2 ) S1 ELSE S2
205.DE
206can be reduced by the if-else rule to get
207.DS
208IF ( C1 ) stat
209.DE
210which can be reduced by the simple-if rule.
211This leads to the second of the above groupings of the input, which
212is usually desired.
213.PP
214Once again the parser can do two valid things \- there is a shift/reduce conflict.
215The application of disambiguating rule 1 tells the parser to shift in this case,
216which leads to the desired grouping.
217.PP
218This shift/reduce conflict arises only when there is a particular current input symbol,
219.I ELSE ,
220and particular inputs already seen, such as
221.DS
222IF ( C1 ) IF ( C2 ) S1
223.DE
224In general, there may be many conflicts, and each one
225will be associated with an input symbol and
226a set of previously read inputs.
227The previously read inputs are characterized by the
228state
229of the parser.
230.PP
231The conflict messages of Yacc are best understood
232by examining the verbose (\fB\-v\fR) option output file.
233For example, the output corresponding to the above
234conflict state might be:
235.DS L
23623: shift/reduce conflict (shift 45, reduce 18) on ELSE
237
238state 23
239
240 stat : IF ( cond ) stat\_ (18)
241 stat : IF ( cond ) stat\_ELSE stat
242
243 ELSE shift 45
244 . reduce 18
245
246.DE
247The first line describes the conflict, giving the state and the input symbol.
248The ordinary state description follows, giving
249the grammar rules active in the state, and the parser actions.
250Recall that the underline marks the
251portion of the grammar rules which has been seen.
252Thus in the example, in state 23 the parser has seen input corresponding
253to
254.DS
255IF ( cond ) stat
256.DE
257and the two grammar rules shown are active at this time.
258The parser can do two possible things.
259If the input symbol is
260.I ELSE ,
261it is possible to shift into state
26245.
263State 45 will have, as part of its description, the line
264.DS
265stat : IF ( cond ) stat ELSE\_stat
266.DE
267since the
268.I ELSE
269will have been shifted in this state.
270Back in state 23, the alternative action, described by ``\fB.\fR'',
271is to be done if the input symbol is not mentioned explicitly in the above actions; thus,
272in this case, if the input symbol is not
273.I ELSE ,
274the parser reduces by grammar rule 18:
275.DS
276stat : IF \'(\' cond \')\' stat
277.DE
278Once again, notice that the numbers following ``shift'' commands refer to other states,
279while the numbers following ``reduce'' commands refer to grammar
280rule numbers.
281In the
282.I y.output
283file, the rule numbers are printed after those rules which can be reduced.
284In most one states, there will be at most reduce action possible in the
285state, and this will be the default command.
286The user who encounters unexpected shift/reduce conflicts will probably want to
287look at the verbose output to decide whether the default actions are appropriate.
288In really tough cases, the user might need to know more about
289the behavior and construction of the parser than can be covered here.
290In this case, one of the theoretical references
291.[
292Aho Johnson Surveys Parsing
293.]
294.[
295Aho Johnson Ullman Deterministic Ambiguous
296.]
297.[
298Aho Ullman Principles Design
299.]
300might be consulted; the services of a local guru might also be appropriate.