Commit | Line | Data |
---|---|---|
755c0b0d TL |
1 | .NH |
2 | Language Theory | |
3 | .PP | |
4 | The basic structure of the language is | |
5 | not a particularly original one. | |
6 | Equations are pictured as a set of ``boxes,'' | |
7 | pieced together in various ways. | |
8 | For example, something with a subscript is | |
9 | just a box followed by another box moved downward | |
10 | and shrunk | |
11 | by an appropriate amount. | |
12 | A fraction is just a box centered above another box, | |
13 | at the right altitude, | |
14 | with a line of correct length drawn between them. | |
15 | .PP | |
16 | The grammar for the language is shown below. | |
17 | For purposes of exposition, we have collapsed | |
18 | some productions. In the original grammar, there | |
19 | are about 70 productions, but many of these | |
20 | are simple ones used only to guarantee | |
21 | that some keyword is recognized early enough in the parsing process. | |
22 | Symbols in | |
23 | capital letters | |
24 | are terminal symbols; | |
25 | lower case | |
26 | symbols are non-terminals, i.e., syntactic categories. | |
27 | The vertical bar \(bv indicates an alternative; | |
28 | the brackets [ ] indicate optional material. | |
29 | A | |
30 | .UC TEXT | |
31 | is a string of non-blank characters or | |
32 | any string inside double quotes; | |
33 | the other terminal symbols represent literal occurrences | |
34 | of the corresponding keyword. | |
35 | .P1 | |
36 | .ce 0 | |
37 | .ta .3i | |
38 | .ps 9 | |
39 | .ne 17 | |
40 | .in 1 | |
41 | eqn : box | eqn box | |
42 | .sp 5p | |
43 | box : text | |
44 | | { eqn } | |
45 | | box OVER box | |
46 | | SQRT box | |
47 | | box SUB box | box SUP box | |
48 | | [ L | C | R ]PILE { list } | |
49 | | LEFT text eqn [ RIGHT text ] | |
50 | | box [ FROM box ] [ TO box ] | |
51 | | SIZE text box | |
52 | | [ROMAN | BOLD | ITALIC] box | |
53 | | box [HAT | BAR | DOT | DOTDOT | TILDE] | |
54 | | DEFINE text text | |
55 | .sp 5p | |
56 | list : eqn | list ABOVE eqn | |
57 | .sp 5p | |
58 | text : TEXT | |
59 | .ps 10 | |
60 | .in 0 | |
61 | .P2 | |
62 | .PP | |
63 | The grammar makes it obvious why there are few exceptions. | |
64 | For example, the observation that something can be replaced by a more complicated something | |
65 | in braces is implicit in the productions: | |
66 | .P1 | |
67 | .ce 0 | |
68 | eqn : box | eqn box | |
69 | box : text | { eqn } | |
70 | .P2 | |
71 | Anywhere a single character could be used, | |
72 | .ul | |
73 | any | |
74 | legal construction can be used. | |
75 | .PP | |
76 | Clearly, our grammar is highly ambiguous. | |
77 | What, for instance, do we do with the input | |
78 | .P1 | |
79 | a over b over c ? | |
80 | .P2 | |
81 | Is it | |
82 | .P1 | |
83 | {a over b} over c | |
84 | .P2 | |
85 | or is it | |
86 | .P1 | |
87 | a over {b over c} ? | |
88 | .P2 | |
89 | .PP | |
90 | To answer questions like this, the grammar | |
91 | is supplemented with a small set of rules that describe the precedence | |
92 | and associativity | |
93 | of operators. | |
94 | In particular, we specify (more or less arbitrarily) | |
95 | that | |
96 | .ul | |
97 | over | |
98 | associates to the left, | |
99 | so the first alternative above is the one chosen. | |
100 | On the other hand, | |
101 | .ul | |
102 | sub | |
103 | and | |
104 | .ul | |
105 | sup | |
106 | bind to the right, | |
107 | because this is closer to standard mathematical practice. | |
108 | That is, we assume $x sup a sup b$ is $x sup {(a sup b )}$, | |
109 | not $(x sup a ) sup b$. | |
110 | .PP | |
111 | The precedence rules resolve the ambiguity in a construction like | |
112 | .P1 | |
113 | a sup 2 over b | |
114 | .P2 | |
115 | We define | |
116 | .ul | |
117 | sup | |
118 | to have a higher precedence than | |
119 | .ul | |
120 | over, | |
121 | so this construction is parsed as | |
122 | $a sup 2 over b$ instead of $a sup {2 over b}$. | |
123 | .PP | |
124 | Naturally, a user can always | |
125 | force a particular parsing | |
126 | by placing braces around expressions. | |
127 | .PP | |
128 | The ambiguous grammar approach seems to be quite useful. | |
129 | The grammar we use is small enough to be easily understood, | |
130 | for it contains none of the productions that would be | |
131 | normally used for resolving ambiguity. | |
132 | Instead the supplemental information about | |
133 | precedence and associativity (also small enough to be understood) | |
134 | provides the compiler-compiler | |
135 | with the information it needs | |
136 | to make a fast, deterministic parser for | |
137 | the specific language we want. | |
138 | When the language is supplemented by the disambiguating rules, | |
139 | it is in fact | |
140 | .UC LR(1) | |
141 | and thus easy to parse[5]. | |
142 | .PP | |
143 | The output code is generated as the input is scanned. | |
144 | Any time a production | |
145 | of the grammar is recognized, | |
146 | (potentially) some | |
147 | .UC TROFF | |
148 | commands are output. | |
149 | For example, when the lexical analyzer | |
150 | reports that it has found a | |
151 | .UC TEXT | |
152 | (i.e., a string of contiguous characters), | |
153 | we have recognized the production: | |
154 | .P1 | |
155 | text : TEXT | |
156 | .P2 | |
157 | The translation of this is simple. | |
158 | We generate a local name for the string, | |
159 | then hand the name and the string to | |
160 | .UC TROFF, | |
161 | and let | |
162 | .UC TROFF | |
163 | perform the storage management. | |
164 | All we save is the name of the string, | |
165 | its height, and its baseline. | |
166 | .PP | |
167 | As another example, | |
168 | the translation associated with the production | |
169 | .P1 | |
170 | box : box OVER box | |
171 | .P2 | |
172 | is: | |
173 | .P1 | |
174 | .ce 0 | |
175 | .in 1 | |
176 | .ne 14 | |
177 | Width of output box = | |
178 | slightly more than largest input width | |
179 | Height of output box = | |
180 | slightly more than sum of input heights | |
181 | Base of output box = | |
182 | slightly more than height of bottom input box | |
183 | String describing output box = | |
184 | move down; | |
185 | move right enough to center bottom box; | |
186 | draw bottom box (i.e., copy string for bottom box); | |
187 | move up; move left enough to center top box; | |
188 | draw top box (i.e., copy string for top box); | |
189 | move down and left; draw line full width; | |
190 | return to proper base line. | |
191 | .in 0 | |
192 | .P2 | |
193 | Most of the other productions have | |
194 | equally simple semantic actions. | |
195 | Picturing the output as a set of properly placed boxes | |
196 | makes the right sequence of positioning commands | |
197 | quite obvious. | |
198 | The main difficulty is in finding the right numbers to use | |
199 | for esthetically pleasing positioning. | |
200 | .PP | |
201 | With a grammar, it is usually clear how to extend the language. | |
202 | For instance, one of our users | |
203 | suggested a | |
204 | .UC TENSOR | |
205 | operator, to make constructions like | |
206 | .EQ | |
207 | ~ sub size 7 m sup size 7 l | |
208 | {bold T from n to k} sub size 7 i sup size 7 j | |
209 | .EN | |
210 | Grammatically, this is easy: | |
211 | it is sufficient to add a production like | |
212 | .P1 | |
213 | box : TENSOR { list } | |
214 | .P2 | |
215 | Semantically, we need only juggle the boxes to the right places. |