Commit | Line | Data |
---|---|---|
2240a03d TL |
1 | .SH |
2 | 7: Error Handling | |
3 | .PP | |
4 | Error handling is an extremely difficult area, and many of the problems are semantic ones. | |
5 | When an error is found, for example, it may be necessary to reclaim parse tree storage, | |
6 | delete or alter symbol table entries, and, typically, set switches to avoid generating any further output. | |
7 | .PP | |
8 | It is seldom acceptable to stop all processing when an error is found; it is more useful to continue | |
9 | scanning the input to find further syntax errors. | |
10 | This leads to the problem of getting the parser ``restarted'' after an error. | |
11 | A general class of algorithms to do this involves discarding a number of tokens | |
12 | from the input string, and attempting to adjust the parser so that input can continue. | |
13 | .PP | |
14 | To allow the user some control over this process, | |
15 | Yacc provides a simple, but reasonably general, feature. | |
16 | The token name ``error'' is reserved for error handling. | |
17 | This name can be used in grammar rules; | |
18 | in effect, it suggests places where errors are expected, and recovery might take place. | |
19 | The parser pops its stack until it enters a state where the token ``error'' is legal. | |
20 | It then behaves as if the token ``error'' were the current lookahead token, | |
21 | and performs the action encountered. | |
22 | The lookahead token is then reset to the token that caused the error. | |
23 | If no special error rules have been specified, the processing halts when an error is detected. | |
24 | .PP | |
25 | In order to prevent a cascade of error messages, the parser, after | |
26 | detecting an error, remains in error state until three tokens have been successfully | |
27 | read and shifted. | |
28 | If an error is detected when the parser is already in error state, | |
29 | no message is given, and the input token is quietly deleted. | |
30 | .PP | |
31 | As an example, a rule of the form | |
32 | .DS | |
33 | stat : error | |
34 | .DE | |
35 | would, in effect, mean that on a syntax error the parser would attempt to skip over the statement | |
36 | in which the error was seen. | |
37 | More precisely, the parser will | |
38 | scan ahead, looking for three tokens that might legally follow | |
39 | a statement, and start processing at the first of these; if | |
40 | the beginnings of statements are not sufficiently distinctive, it may make a | |
41 | false start in the middle of a statement, and end up reporting a | |
42 | second error where there is in fact no error. | |
43 | .PP | |
44 | Actions may be used with these special error rules. | |
45 | These actions might attempt to reinitialize tables, reclaim symbol table space, etc. | |
46 | .PP | |
47 | Error rules such as the above are very general, but difficult to control. | |
48 | Somewhat easier are rules such as | |
49 | .DS | |
50 | stat : error \';\' | |
51 | .DE | |
52 | Here, when there is an error, the parser attempts to skip over the statement, but | |
53 | will do so by skipping to the next \';\'. | |
54 | All tokens after the error and before the next \';\' cannot be shifted, and are discarded. | |
55 | When the \';\' is seen, this rule will be reduced, and any ``cleanup'' | |
56 | action associated with it performed. | |
57 | .PP | |
58 | Another form of error rule arises in interactive applications, where | |
59 | it may be desirable to permit a line to be reentered after an error. | |
60 | A possible error rule might be | |
61 | .DS | |
62 | input : error \'\en\' { printf( "Reenter last line: " ); } input | |
63 | { $$ = $4; } | |
64 | .DE | |
65 | There is one potential difficulty with this approach; | |
66 | the parser must correctly process three input tokens before it | |
67 | admits that it has correctly resynchronized after the error. | |
68 | If the reentered line contains an error | |
69 | in the first two tokens, the parser deletes the offending tokens, | |
70 | and gives no message; this is clearly unacceptable. | |
71 | For this reason, there is a mechanism that | |
72 | can be used to force the parser | |
73 | to believe that an error has been fully recovered from. | |
74 | The statement | |
75 | .DS | |
76 | yyerrok ; | |
77 | .DE | |
78 | in an action | |
79 | resets the parser to its normal mode. | |
80 | The last example is better written | |
81 | .DS | |
82 | input : error \'\en\' | |
83 | { yyerrok; | |
84 | printf( "Reenter last line: " ); } | |
85 | input | |
86 | { $$ = $4; } | |
87 | ; | |
88 | .DE | |
89 | .PP | |
90 | As mentioned above, the token seen immediately | |
91 | after the ``error'' symbol is the input token at which the | |
92 | error was discovered. | |
93 | Sometimes, this is inappropriate; for example, an | |
94 | error recovery action might | |
95 | take upon itself the job of finding the correct place to resume input. | |
96 | In this case, | |
97 | the previous lookahead token must be cleared. | |
98 | The statement | |
99 | .DS | |
100 | yyclearin ; | |
101 | .DE | |
102 | in an action will have this effect. | |
103 | For example, suppose the action after error | |
104 | were to call some sophisticated resynchronization routine, | |
105 | supplied by the user, that attempted to advance the input to the | |
106 | beginning of the next valid statement. | |
107 | After this routine was called, the next token returned by yylex would presumably | |
108 | be the first token in a legal statement; | |
109 | the old, illegal token must be discarded, and the error state reset. | |
110 | This could be done by a rule like | |
111 | .DS | |
112 | stat : error | |
113 | { resynch(); | |
114 | yyerrok ; | |
115 | yyclearin ; } | |
116 | ; | |
117 | .DE | |
118 | .PP | |
119 | These mechanisms are admittedly crude, but do allow for a simple, fairly effective recovery of the parser | |
120 | from many errors; | |
121 | moreover, the user can get control to deal with | |
122 | the error actions required by other portions of the program. |