[unix-history] / usr / src / old / yacc / PSD.doc / ss7

.\"	@(#)ss7	5.1 (Berkeley) %G%
.\"
.SH
7: Error Handling
.PP
Error handling is an extremely difficult area, and many of the problems are semantic ones.
When an error is found, for example, it may be necessary to reclaim parse tree storage,
delete or alter symbol table entries, and, typically, set switches to avoid generating any further output.
.PP
It is seldom acceptable to stop all processing when an error is found; it is more useful to continue
scanning the input to find further syntax errors.
This leads to the problem of getting the parser ``restarted'' after an error.
A general class of algorithms to do this involves discarding a number of tokens
from the input string, and attempting to adjust the parser so that input can continue.
.PP
To allow the user some control over this process,
Yacc provides a simple, but reasonably general, feature.
The token name ``error'' is reserved for error handling.
This name can be used in grammar rules;
in effect, it suggests places where errors are expected, and recovery might take place.
The parser pops its stack until it enters a state where the token ``error'' is legal.
It then behaves as if the token ``error'' were the current lookahead token,
and performs the action encountered.
The lookahead token is then reset to the token that caused the error.
If no special error rules have been specified, the processing halts when an error is detected.
.PP
In order to prevent a cascade of error messages, the parser, after
detecting an error, remains in error state until three tokens have been successfully
read and shifted.
If an error is detected when the parser is already in error state,
no message is given, and the input token is quietly deleted.
.PP
As an example, a rule of the form
.DS
stat	:	error
.DE
would, in effect, mean that on a syntax error the parser would attempt to skip over the statement
in which the error was seen.
More precisely, the parser will
scan ahead, looking for three tokens that might legally follow
a statement, and start processing at the first of these; if
the beginnings of statements are not sufficiently distinctive, it may make a
false start in the middle of a statement, and end up reporting a
second error where there is in fact no error.
.PP
Actions may be used with these special error rules.
These actions might attempt to reinitialize tables, reclaim symbol table space, etc.
.PP
Error rules such as the above are very general, but difficult to control.
Somewhat easier are rules such as
.DS
stat	:	error  \';\'
.DE
Here, when there is an error, the parser attempts to skip over the statement, but
will do so by skipping to the next \';\'.
All tokens after the error and before the next \';\' cannot be shifted, and are discarded.
When the \';\' is seen, this rule will be reduced, and any ``cleanup''
action associated with it performed.
.PP
Another form of error rule arises in interactive applications, where
it may be desirable to permit a line to be reentered after an error.
A possible error rule might be
.DS
input	:	error  \'\en\'  {  printf( "Reenter last line: " );  }  input
			{	$$  =  $4;  }
.DE
There is one potential difficulty with this approach;
the parser must correctly process three input tokens before it
admits that it has correctly resynchronized after the error.
If the reentered line contains an error
in the first two tokens, the parser deletes the offending tokens,
and gives no message; this is clearly unacceptable.
For this reason, there is a mechanism that
can be used to force the parser
to believe that an error has been fully recovered from.
The statement
.DS
yyerrok ;
.DE
in an action
resets the parser to its normal mode.
The last example is better written
.DS
input	:	error  \'\en\'
			{	yyerrok;
				printf( "Reenter last line: " );   }
		input
			{	$$  =  $4;  }
	;
.DE
.PP
As mentioned above, the token seen immediately
after the ``error'' symbol is the input token at which the
error was discovered.
Sometimes, this is inappropriate; for example, an
error recovery action might
take upon itself the job of finding the correct place to resume input.
In this case,
the previous lookahead token must be cleared.
The statement
.DS
yyclearin ;
.DE
in an action will have this effect.
For example, suppose the action after error
were to call some sophisticated resynchronization routine,
supplied by the user, that attempted to advance the input to the
beginning of the next valid statement.
After this routine was called, the next token returned by yylex would presumably
be the first token in a legal statement;
the old, illegal token must be discarded, and the error state reset.
This could be done by a rule like
.DS
stat	:	error 
			{	resynch();
				yyerrok ;
				yyclearin ;   }
	;
.DE
.PP
These mechanisms are admittedly crude, but do allow for a simple, fairly effective recovery of the parser
from many errors;
moreover, the user can get control to deal with
the error actions required by other portions of the program.
Commit	Line	Data
2b4fa0ec	1	.\" @(#)ss7 5.1 (Berkeley) %G%
dc5a33cf KM	2	.\"
	3	.SH
	4	7: Error Handling
	5	.PP
	6	Error handling is an extremely difficult area, and many of the problems are semantic ones.
	7	When an error is found, for example, it may be necessary to reclaim parse tree storage,
	8	delete or alter symbol table entries, and, typically, set switches to avoid generating any further output.
	9	.PP
	10	It is seldom acceptable to stop all processing when an error is found; it is more useful to continue
	11	scanning the input to find further syntax errors.
	12	This leads to the problem of getting the parser ``restarted'' after an error.
	13	A general class of algorithms to do this involves discarding a number of tokens
	14	from the input string, and attempting to adjust the parser so that input can continue.
	15	.PP
	16	To allow the user some control over this process,
	17	Yacc provides a simple, but reasonably general, feature.
	18	The token name ``error'' is reserved for error handling.
	19	This name can be used in grammar rules;
	20	in effect, it suggests places where errors are expected, and recovery might take place.
	21	The parser pops its stack until it enters a state where the token ``error'' is legal.
	22	It then behaves as if the token ``error'' were the current lookahead token,
	23	and performs the action encountered.
	24	The lookahead token is then reset to the token that caused the error.
	25	If no special error rules have been specified, the processing halts when an error is detected.
	26	.PP
	27	In order to prevent a cascade of error messages, the parser, after
	28	detecting an error, remains in error state until three tokens have been successfully
	29	read and shifted.
	30	If an error is detected when the parser is already in error state,
	31	no message is given, and the input token is quietly deleted.
	32	.PP
	33	As an example, a rule of the form
	34	.DS
	35	stat : error
	36	.DE
	37	would, in effect, mean that on a syntax error the parser would attempt to skip over the statement
	38	in which the error was seen.
	39	More precisely, the parser will
	40	scan ahead, looking for three tokens that might legally follow
	41	a statement, and start processing at the first of these; if
	42	the beginnings of statements are not sufficiently distinctive, it may make a
	43	false start in the middle of a statement, and end up reporting a
	44	second error where there is in fact no error.
	45	.PP
	46	Actions may be used with these special error rules.
	47	These actions might attempt to reinitialize tables, reclaim symbol table space, etc.
	48	.PP
	49	Error rules such as the above are very general, but difficult to control.
	50	Somewhat easier are rules such as
	51	.DS
	52	stat : error \';\'
	53	.DE
	54	Here, when there is an error, the parser attempts to skip over the statement, but
	55	will do so by skipping to the next \';\'.
	56	All tokens after the error and before the next \';\' cannot be shifted, and are discarded.
	57	When the \';\' is seen, this rule will be reduced, and any ``cleanup''
	58	action associated with it performed.
	59	.PP
	60	Another form of error rule arises in interactive applications, where
	61	it may be desirable to permit a line to be reentered after an error.
	62	A possible error rule might be
	63	.DS
	64	input : error \'\en\' { printf( "Reenter last line: " ); } input
	65	{ $$ = $4; }
66	.DE
67	There is one potential difficulty with this approach;
68	the parser must correctly process three input tokens before it
69	admits that it has correctly resynchronized after the error.
70	If the reentered line contains an error
71	in the first two tokens, the parser deletes the offending tokens,
72	and gives no message; this is clearly unacceptable.
73	For this reason, there is a mechanism that
74	can be used to force the parser
75	to believe that an error has been fully recovered from.
76	The statement
77	.DS
78	yyerrok ;
79	.DE
80	in an action
81	resets the parser to its normal mode.
82	The last example is better written
83	.DS
84	input : error \'\en\'
85	{ yyerrok;
86	printf( "Reenter last line: " ); }
87	input
88	{ $$ = $4; }
89	;
90	.DE
91	.PP
92	As mentioned above, the token seen immediately
93	after the ``error'' symbol is the input token at which the
94	error was discovered.
95	Sometimes, this is inappropriate; for example, an
96	error recovery action might
97	take upon itself the job of finding the correct place to resume input.
98	In this case,
99	the previous lookahead token must be cleared.
100	The statement
101	.DS
102	yyclearin ;
103	.DE
104	in an action will have this effect.
105	For example, suppose the action after error
106	were to call some sophisticated resynchronization routine,
107	supplied by the user, that attempted to advance the input to the
108	beginning of the next valid statement.
109	After this routine was called, the next token returned by yylex would presumably
110	be the first token in a legal statement;
111	the old, illegal token must be discarded, and the error state reset.
112	This could be done by a rule like
113	.DS
114	stat : error
115	{ resynch();
116	yyerrok ;
117	yyclearin ; }
118	;
119	.DE
120	.PP
121	These mechanisms are admittedly crude, but do allow for a simple, fairly effective recovery of the parser
122	from many errors;
123	moreover, the user can get control to deal with
124	the error actions required by other portions of the program.