BSD 3 development
[unix-history] / usr / doc / yacc / ss9
CommitLineData
8340f87c
BJ
1.SH
29: Hints for Preparing Specifications
3.PP
4This section contains miscellaneous hints on preparing efficient, easy to change,
5and clear specifications.
6The individual subsections are more or less
7independent.
8.SH
9Input Style
10.PP
11It is difficult to
12provide rules with substantial actions
13and still have a readable specification file.
14The following style hints owe much to Brian Kernighan.
15.IP a.
16Use all capital letters for token names, all lower case letters for
17nonterminal names.
18This rule comes under the heading of ``knowing who to blame when
19things go wrong.''
20.IP b.
21Put grammar rules and actions on separate lines.
22This allows either to be changed without
23an automatic need to change the other.
24.IP c.
25Put all rules with the same left hand side together.
26Put the left hand side in only once, and let all
27following rules begin with a vertical bar.
28.IP d.
29Put a semicolon only after the last rule with a given left hand side,
30and put the semicolon on a separate line.
31This allows new rules to be easily added.
32.IP e.
33Indent rule bodies by two tab stops, and action bodies by three
34tab stops.
35.PP
36The example in Appendix A is written following this style, as are
37the examples in the text of this paper (where space permits).
38The user must make up his own mind about these stylistic questions;
39the central problem, however, is to make the rules visible through
40the morass of action code.
41.SH
42Left Recursion
43.PP
44The algorithm used by the Yacc parser encourages so called ``left recursive''
45grammar rules: rules of the form
46.DS
47name : name rest_of_rule ;
48.DE
49These rules frequently arise when
50writing specifications of sequences and lists:
51.DS
52list : item
53 | list \',\' item
54 ;
55.DE
56and
57.DS
58seq : item
59 | seq item
60 ;
61.DE
62In each of these cases, the first rule
63will be reduced for the first item only, and the second rule
64will be reduced for the second and all succeeding items.
65.PP
66With right recursive rules, such as
67.DS
68seq : item
69 | item seq
70 ;
71.DE
72the parser would be a bit bigger, and the items would be seen, and reduced,
73from right to left.
74More seriously, an internal stack in the parser
75would be in danger of overflowing if a very long sequence were read.
76Thus, the user should use left recursion wherever reasonable.
77.PP
78It is worth considering whether a sequence with zero
79elements has any meaning, and if so, consider writing
80the sequence specification with an empty rule:
81.DS
82seq : /* empty */
83 | seq item
84 ;
85.DE
86Once again, the first rule would always be reduced exactly once, before the
87first item was read,
88and then the second rule would be reduced once for each item read.
89Permitting empty sequences
90often leads to increased generality.
91However, conflicts might arise if Yacc is asked to decide
92which empty sequence it has seen, when it hasn't seen enough to
93know!
94.SH
95Lexical Tie-ins
96.PP
97Some lexical decisions depend on context.
98For example, the lexical analyzer might want to
99delete blanks normally, but not within quoted strings.
100Or names might be entered into a symbol table in declarations,
101but not in expressions.
102.PP
103One way of handling this situation is
104to create a global flag that is
105examined by the lexical analyzer, and set by actions.
106For example, suppose a program
107consists of 0 or more declarations, followed by 0 or more statements.
108Consider:
109.DS
110%{
111 int dflag;
112%}
113 ... other declarations ...
114
115%%
116
117prog : decls stats
118 ;
119
120decls : /* empty */
121 { dflag = 1; }
122 | decls declaration
123 ;
124
125stats : /* empty */
126 { dflag = 0; }
127 | stats statement
128 ;
129
130 ... other rules ...
131.DE
132The flag
133.I dflag
134is now 0 when reading statements, and 1 when reading declarations,
135.ul
136except for the first token in the first statement.
137This token must be seen by the parser before it can tell that
138the declaration section has ended and the statements have
139begun.
140In many cases, this single token exception does not
141affect the lexical scan.
142.PP
143This kind of ``backdoor'' approach can be elaborated
144to a noxious degree.
145Nevertheless, it represents a way of doing some things
146that are difficult, if not impossible, to
147do otherwise.
148.SH
149Reserved Words
150.PP
151Some programming languages
152permit the user to
153use words like ``if'', which are normally reserved,
154as label or variable names, provided that such use does not
155conflict with the legal use of these names in the programming language.
156This is extremely hard to do in the framework of Yacc;
157it is difficult to pass information to the lexical analyzer
158telling it ``this instance of `if' is a keyword, and that instance is a variable''.
159The user can make a stab at it, using the
160mechanism described in the last subsection,
161but it is difficult.
162.PP
163A number of ways of making this easier are under advisement.
164Until then, it is better that the keywords be
165.I reserved \|;
166that is, be forbidden for use as variable names.
167There are powerful stylistic reasons for preferring this, anyway.