BSD 3 development
[unix-history] / usr / doc / yacc / ss7
CommitLineData
8340f87c
BJ
1.SH
27: Error Handling
3.PP
4Error handling is an extremely difficult area, and many of the problems are semantic ones.
5When an error is found, for example, it may be necessary to reclaim parse tree storage,
6delete or alter symbol table entries, and, typically, set switches to avoid generating any further output.
7.PP
8It is seldom acceptable to stop all processing when an error is found; it is more useful to continue
9scanning the input to find further syntax errors.
10This leads to the problem of getting the parser ``restarted'' after an error.
11A general class of algorithms to do this involves discarding a number of tokens
12from the input string, and attempting to adjust the parser so that input can continue.
13.PP
14To allow the user some control over this process,
15Yacc provides a simple, but reasonably general, feature.
16The token name ``error'' is reserved for error handling.
17This name can be used in grammar rules;
18in effect, it suggests places where errors are expected, and recovery might take place.
19The parser pops its stack until it enters a state where the token ``error'' is legal.
20It then behaves as if the token ``error'' were the current lookahead token,
21and performs the action encountered.
22The lookahead token is then reset to the token that caused the error.
23If no special error rules have been specified, the processing halts when an error is detected.
24.PP
25In order to prevent a cascade of error messages, the parser, after
26detecting an error, remains in error state until three tokens have been successfully
27read and shifted.
28If an error is detected when the parser is already in error state,
29no message is given, and the input token is quietly deleted.
30.PP
31As an example, a rule of the form
32.DS
33stat : error
34.DE
35would, in effect, mean that on a syntax error the parser would attempt to skip over the statement
36in which the error was seen.
37More precisely, the parser will
38scan ahead, looking for three tokens that might legally follow
39a statement, and start processing at the first of these; if
40the beginnings of statements are not sufficiently distinctive, it may make a
41false start in the middle of a statement, and end up reporting a
42second error where there is in fact no error.
43.PP
44Actions may be used with these special error rules.
45These actions might attempt to reinitialize tables, reclaim symbol table space, etc.
46.PP
47Error rules such as the above are very general, but difficult to control.
48Somewhat easier are rules such as
49.DS
50stat : error \';\'
51.DE
52Here, when there is an error, the parser attempts to skip over the statement, but
53will do so by skipping to the next \';\'.
54All tokens after the error and before the next \';\' cannot be shifted, and are discarded.
55When the \';\' is seen, this rule will be reduced, and any ``cleanup''
56action associated with it performed.
57.PP
58Another form of error rule arises in interactive applications, where
59it may be desirable to permit a line to be reentered after an error.
60A possible error rule might be
61.DS
62input : error \'\en\' { printf( "Reenter last line: " ); } input
63 { $$ = $4; }
64.DE
65There is one potential difficulty with this approach;
66the parser must correctly process three input tokens before it
67admits that it has correctly resynchronized after the error.
68If the reentered line contains an error
69in the first two tokens, the parser deletes the offending tokens,
70and gives no message; this is clearly unacceptable.
71For this reason, there is a mechanism that
72can be used to force the parser
73to believe that an error has been fully recovered from.
74The statement
75.DS
76yyerrok ;
77.DE
78in an action
79resets the parser to its normal mode.
80The last example is better written
81.DS
82input : error \'\en\'
83 { yyerrok;
84 printf( "Reenter last line: " ); }
85 input
86 { $$ = $4; }
87 ;
88.DE
89.PP
90As mentioned above, the token seen immediately
91after the ``error'' symbol is the input token at which the
92error was discovered.
93Sometimes, this is inappropriate; for example, an
94error recovery action might
95take upon itself the job of finding the correct place to resume input.
96In this case,
97the previous lookahead token must be cleared.
98The statement
99.DS
100yyclearin ;
101.DE
102in an action will have this effect.
103For example, suppose the action after error
104were to call some sophisticated resynchronization routine,
105supplied by the user, that attempted to advance the input to the
106beginning of the next valid statement.
107After this routine was called, the next token returned by yylex would presumably
108be the first token in a legal statement;
109the old, illegal token must be discarded, and the error state reset.
110This could be done by a rule like
111.DS
112stat : error
113 { resynch();
114 yyerrok ;
115 yyclearin ; }
116 ;
117.DE
118.PP
119These mechanisms are admittedly crude, but do allow for a simple, fairly effective recovery of the parser
120from many errors;
121moreover, the user can get control to deal with
122the error actions required by other portions of the program.