Commit | Line | Data |
---|---|---|
c9528a00 C |
1 | .tr *\(** |
2 | .tr |\(or | |
3 | .SH | |
4 | 1: Basic Specifications | |
5 | .PP | |
6 | Names refer to either tokens or nonterminal symbols. | |
7 | Yacc requires | |
8 | token names to be declared as such. | |
9 | In addition, for reasons discussed in Section 3, it is often desirable | |
10 | to include the lexical analyzer as part of the specification file; | |
11 | it may be useful to include other programs as well. | |
12 | Thus, every specification file consists of three sections: | |
13 | the | |
14 | .I declarations , | |
15 | .I "(grammar) rules" , | |
16 | and | |
17 | .I programs . | |
18 | The sections are separated by double percent ``%%'' marks. | |
19 | (The percent ``%'' is generally used in Yacc specifications as an escape character.) | |
20 | .PP | |
21 | In other words, a full specification file looks like | |
22 | .DS | |
23 | declarations | |
24 | %% | |
25 | rules | |
26 | %% | |
27 | programs | |
28 | .DE | |
29 | .PP | |
30 | The declaration section may be empty. | |
31 | Moreover, if the programs section is omitted, the second %% mark may be omitted also; | |
32 | thus, the smallest legal Yacc specification is | |
33 | .DS | |
34 | %% | |
35 | rules | |
36 | .DE | |
37 | .PP | |
38 | Blanks, tabs, and newlines are ignored except | |
39 | that they may not appear in names or multi-character reserved symbols. | |
40 | Comments may appear wherever a name is legal; they are enclosed | |
41 | in /* . . . */, as in C and PL/I. | |
42 | .PP | |
43 | The rules section is made up of one or more grammar rules. | |
44 | A grammar rule has the form: | |
45 | .DS | |
46 | A : BODY ; | |
47 | .DE | |
48 | A represents a nonterminal name, and BODY represents a sequence of zero or more names and literals. | |
49 | The colon and the semicolon are Yacc punctuation. | |
50 | .PP | |
51 | Names may be of arbitrary length, and may be made up of letters, dot ``.'', underscore ``\_'', and | |
52 | non-initial digits. | |
53 | Upper and lower case letters are distinct. | |
54 | The names used in the body of a grammar rule may represent tokens or nonterminal symbols. | |
55 | .PP | |
56 | A literal consists of a character enclosed in single quotes ``\'''. | |
57 | As in C, the backslash ``\e'' is an escape character within literals, and all the C escapes | |
58 | are recognized. | |
59 | Thus | |
60 | .DS | |
61 | \'\en\' newline | |
62 | \'\er\' return | |
63 | \'\e\'\' single quote ``\''' | |
64 | \'\e\e\' backslash ``\e'' | |
65 | \'\et\' tab | |
66 | \'\eb\' backspace | |
67 | \'\ef\' form feed | |
68 | \'\exxx\' ``xxx'' in octal | |
69 | .DE | |
70 | For a number of technical reasons, the | |
71 | \s-2NUL\s0 | |
72 | character (\'\e0\' or 0) should never | |
73 | be used in grammar rules. | |
74 | .PP | |
75 | If there are several grammar rules with the same left hand side, the vertical bar ``|'' | |
76 | can be used to avoid rewriting the left hand side. | |
77 | In addition, | |
78 | the semicolon at the end of a rule can be dropped before a vertical bar. | |
79 | Thus the grammar rules | |
80 | .DS | |
81 | A : B C D ; | |
82 | A : E F ; | |
83 | A : G ; | |
84 | .DE | |
85 | can be given to Yacc as | |
86 | .DS | |
87 | A : B C D | |
88 | | E F | |
89 | | G | |
90 | ; | |
91 | .DE | |
92 | It is not necessary that all grammar rules with the same left side appear together in the grammar rules section, | |
93 | although it makes the input much more readable, and easier to change. | |
94 | .PP | |
95 | If a nonterminal symbol matches the empty string, this can be indicated in the obvious way: | |
96 | .DS | |
97 | empty : ; | |
98 | .DE | |
99 | .PP | |
100 | Names representing tokens must be declared; this is most simply done by writing | |
101 | .DS | |
102 | %token name1 name2 . . . | |
103 | .DE | |
104 | in the declarations section. | |
105 | (See Sections 3 , 5, and 6 for much more discussion). | |
106 | Every name not defined in the declarations section is assumed to represent a nonterminal symbol. | |
107 | Every nonterminal symbol must appear on the left side of at least one rule. | |
108 | .PP | |
109 | Of all the nonterminal symbols, one, called the | |
110 | .I "start symbol" , | |
111 | has particular importance. | |
112 | The parser is designed to recognize the start symbol; thus, | |
113 | this symbol represents the largest, | |
114 | most general structure described by the grammar rules. | |
115 | By default, | |
116 | the start symbol is taken to be the left hand side of the first | |
117 | grammar rule in the rules section. | |
118 | It is possible, and in fact desirable, to declare the start | |
119 | symbol explicitly in the declarations section using the %start keyword: | |
120 | .DS | |
121 | %start symbol | |
122 | .DE | |
123 | .PP | |
124 | The end of the input to the parser is signaled by a special token, called the | |
125 | .I endmarker . | |
126 | If the tokens up to, but not including, the endmarker form a structure | |
127 | which matches the start symbol, the parser function returns to its caller | |
128 | after the endmarker is seen; it | |
129 | .I accepts | |
130 | the input. | |
131 | If the endmarker is seen in any other context, it is an error. | |
132 | .PP | |
133 | It is the job of the user-supplied lexical analyzer | |
134 | to return the endmarker when appropriate; see section 3, below. | |
135 | Usually the endmarker represents some reasonably obvious | |
136 | I/O status, such as ``end-of-file'' or ``end-of-record''. |