document distributed with 4.2BSD
[unix-history] / usr / src / old / yacc / PSD.doc / ss7
CommitLineData
2b4fa0ec 1.\" @(#)ss7 5.1 (Berkeley) %G%
dc5a33cf
KM
2.\"
3.SH
47: Error Handling
5.PP
6Error handling is an extremely difficult area, and many of the problems are semantic ones.
7When an error is found, for example, it may be necessary to reclaim parse tree storage,
8delete or alter symbol table entries, and, typically, set switches to avoid generating any further output.
9.PP
10It is seldom acceptable to stop all processing when an error is found; it is more useful to continue
11scanning the input to find further syntax errors.
12This leads to the problem of getting the parser ``restarted'' after an error.
13A general class of algorithms to do this involves discarding a number of tokens
14from the input string, and attempting to adjust the parser so that input can continue.
15.PP
16To allow the user some control over this process,
17Yacc provides a simple, but reasonably general, feature.
18The token name ``error'' is reserved for error handling.
19This name can be used in grammar rules;
20in effect, it suggests places where errors are expected, and recovery might take place.
21The parser pops its stack until it enters a state where the token ``error'' is legal.
22It then behaves as if the token ``error'' were the current lookahead token,
23and performs the action encountered.
24The lookahead token is then reset to the token that caused the error.
25If no special error rules have been specified, the processing halts when an error is detected.
26.PP
27In order to prevent a cascade of error messages, the parser, after
28detecting an error, remains in error state until three tokens have been successfully
29read and shifted.
30If an error is detected when the parser is already in error state,
31no message is given, and the input token is quietly deleted.
32.PP
33As an example, a rule of the form
34.DS
35stat : error
36.DE
37would, in effect, mean that on a syntax error the parser would attempt to skip over the statement
38in which the error was seen.
39More precisely, the parser will
40scan ahead, looking for three tokens that might legally follow
41a statement, and start processing at the first of these; if
42the beginnings of statements are not sufficiently distinctive, it may make a
43false start in the middle of a statement, and end up reporting a
44second error where there is in fact no error.
45.PP
46Actions may be used with these special error rules.
47These actions might attempt to reinitialize tables, reclaim symbol table space, etc.
48.PP
49Error rules such as the above are very general, but difficult to control.
50Somewhat easier are rules such as
51.DS
52stat : error \';\'
53.DE
54Here, when there is an error, the parser attempts to skip over the statement, but
55will do so by skipping to the next \';\'.
56All tokens after the error and before the next \';\' cannot be shifted, and are discarded.
57When the \';\' is seen, this rule will be reduced, and any ``cleanup''
58action associated with it performed.
59.PP
60Another form of error rule arises in interactive applications, where
61it may be desirable to permit a line to be reentered after an error.
62A possible error rule might be
63.DS
64input : error \'\en\' { printf( "Reenter last line: " ); } input
65 { $$ = $4; }
66.DE
67There is one potential difficulty with this approach;
68the parser must correctly process three input tokens before it
69admits that it has correctly resynchronized after the error.
70If the reentered line contains an error
71in the first two tokens, the parser deletes the offending tokens,
72and gives no message; this is clearly unacceptable.
73For this reason, there is a mechanism that
74can be used to force the parser
75to believe that an error has been fully recovered from.
76The statement
77.DS
78yyerrok ;
79.DE
80in an action
81resets the parser to its normal mode.
82The last example is better written
83.DS
84input : error \'\en\'
85 { yyerrok;
86 printf( "Reenter last line: " ); }
87 input
88 { $$ = $4; }
89 ;
90.DE
91.PP
92As mentioned above, the token seen immediately
93after the ``error'' symbol is the input token at which the
94error was discovered.
95Sometimes, this is inappropriate; for example, an
96error recovery action might
97take upon itself the job of finding the correct place to resume input.
98In this case,
99the previous lookahead token must be cleared.
100The statement
101.DS
102yyclearin ;
103.DE
104in an action will have this effect.
105For example, suppose the action after error
106were to call some sophisticated resynchronization routine,
107supplied by the user, that attempted to advance the input to the
108beginning of the next valid statement.
109After this routine was called, the next token returned by yylex would presumably
110be the first token in a legal statement;
111the old, illegal token must be discarded, and the error state reset.
112This could be done by a rule like
113.DS
114stat : error
115 { resynch();
116 yyerrok ;
117 yyclearin ; }
118 ;
119.DE
120.PP
121These mechanisms are admittedly crude, but do allow for a simple, fairly effective recovery of the parser
122from many errors;
123moreover, the user can get control to deal with
124the error actions required by other portions of the program.