Commit | Line | Data |
---|---|---|
2b4fa0ec | 1 | .\" @(#)ss7 5.1 (Berkeley) %G% |
dc5a33cf KM |
2 | .\" |
3 | .SH | |
4 | 7: Error Handling | |
5 | .PP | |
6 | Error handling is an extremely difficult area, and many of the problems are semantic ones. | |
7 | When an error is found, for example, it may be necessary to reclaim parse tree storage, | |
8 | delete or alter symbol table entries, and, typically, set switches to avoid generating any further output. | |
9 | .PP | |
10 | It is seldom acceptable to stop all processing when an error is found; it is more useful to continue | |
11 | scanning the input to find further syntax errors. | |
12 | This leads to the problem of getting the parser ``restarted'' after an error. | |
13 | A general class of algorithms to do this involves discarding a number of tokens | |
14 | from the input string, and attempting to adjust the parser so that input can continue. | |
15 | .PP | |
16 | To allow the user some control over this process, | |
17 | Yacc provides a simple, but reasonably general, feature. | |
18 | The token name ``error'' is reserved for error handling. | |
19 | This name can be used in grammar rules; | |
20 | in effect, it suggests places where errors are expected, and recovery might take place. | |
21 | The parser pops its stack until it enters a state where the token ``error'' is legal. | |
22 | It then behaves as if the token ``error'' were the current lookahead token, | |
23 | and performs the action encountered. | |
24 | The lookahead token is then reset to the token that caused the error. | |
25 | If no special error rules have been specified, the processing halts when an error is detected. | |
26 | .PP | |
27 | In order to prevent a cascade of error messages, the parser, after | |
28 | detecting an error, remains in error state until three tokens have been successfully | |
29 | read and shifted. | |
30 | If an error is detected when the parser is already in error state, | |
31 | no message is given, and the input token is quietly deleted. | |
32 | .PP | |
33 | As an example, a rule of the form | |
34 | .DS | |
35 | stat : error | |
36 | .DE | |
37 | would, in effect, mean that on a syntax error the parser would attempt to skip over the statement | |
38 | in which the error was seen. | |
39 | More precisely, the parser will | |
40 | scan ahead, looking for three tokens that might legally follow | |
41 | a statement, and start processing at the first of these; if | |
42 | the beginnings of statements are not sufficiently distinctive, it may make a | |
43 | false start in the middle of a statement, and end up reporting a | |
44 | second error where there is in fact no error. | |
45 | .PP | |
46 | Actions may be used with these special error rules. | |
47 | These actions might attempt to reinitialize tables, reclaim symbol table space, etc. | |
48 | .PP | |
49 | Error rules such as the above are very general, but difficult to control. | |
50 | Somewhat easier are rules such as | |
51 | .DS | |
52 | stat : error \';\' | |
53 | .DE | |
54 | Here, when there is an error, the parser attempts to skip over the statement, but | |
55 | will do so by skipping to the next \';\'. | |
56 | All tokens after the error and before the next \';\' cannot be shifted, and are discarded. | |
57 | When the \';\' is seen, this rule will be reduced, and any ``cleanup'' | |
58 | action associated with it performed. | |
59 | .PP | |
60 | Another form of error rule arises in interactive applications, where | |
61 | it may be desirable to permit a line to be reentered after an error. | |
62 | A possible error rule might be | |
63 | .DS | |
64 | input : error \'\en\' { printf( "Reenter last line: " ); } input | |
65 | { $$ = $4; } | |
66 | .DE | |
67 | There is one potential difficulty with this approach; | |
68 | the parser must correctly process three input tokens before it | |
69 | admits that it has correctly resynchronized after the error. | |
70 | If the reentered line contains an error | |
71 | in the first two tokens, the parser deletes the offending tokens, | |
72 | and gives no message; this is clearly unacceptable. | |
73 | For this reason, there is a mechanism that | |
74 | can be used to force the parser | |
75 | to believe that an error has been fully recovered from. | |
76 | The statement | |
77 | .DS | |
78 | yyerrok ; | |
79 | .DE | |
80 | in an action | |
81 | resets the parser to its normal mode. | |
82 | The last example is better written | |
83 | .DS | |
84 | input : error \'\en\' | |
85 | { yyerrok; | |
86 | printf( "Reenter last line: " ); } | |
87 | input | |
88 | { $$ = $4; } | |
89 | ; | |
90 | .DE | |
91 | .PP | |
92 | As mentioned above, the token seen immediately | |
93 | after the ``error'' symbol is the input token at which the | |
94 | error was discovered. | |
95 | Sometimes, this is inappropriate; for example, an | |
96 | error recovery action might | |
97 | take upon itself the job of finding the correct place to resume input. | |
98 | In this case, | |
99 | the previous lookahead token must be cleared. | |
100 | The statement | |
101 | .DS | |
102 | yyclearin ; | |
103 | .DE | |
104 | in an action will have this effect. | |
105 | For example, suppose the action after error | |
106 | were to call some sophisticated resynchronization routine, | |
107 | supplied by the user, that attempted to advance the input to the | |
108 | beginning of the next valid statement. | |
109 | After this routine was called, the next token returned by yylex would presumably | |
110 | be the first token in a legal statement; | |
111 | the old, illegal token must be discarded, and the error state reset. | |
112 | This could be done by a rule like | |
113 | .DS | |
114 | stat : error | |
115 | { resynch(); | |
116 | yyerrok ; | |
117 | yyclearin ; } | |
118 | ; | |
119 | .DE | |
120 | .PP | |
121 | These mechanisms are admittedly crude, but do allow for a simple, fairly effective recovery of the parser | |
122 | from many errors; | |
123 | moreover, the user can get control to deal with | |
124 | the error actions required by other portions of the program. |