Commit | Line | Data |
---|---|---|
1662094b BJ |
1 | .if !\n(xx .so tmac.p |
2 | .ND | |
3 | 'nr H1 2 | |
4 | .NH | |
5 | Automatic program formatting | |
6 | .NH 2 | |
7 | Motivation | |
8 | .PP | |
9 | Just as software packages to automatically format documents | |
10 | aid the construction of readable and accurate documents, | |
11 | packages which aid the construction of programs by helping prepare | |
12 | and maintain the text in a standard and easily readable format | |
13 | can aid the entire programming process. | |
14 | With an automatically structured listing, the reader can trust the | |
15 | textual display of the program to be accurate and can concentrate | |
16 | on understanding the program, rather than deciphering the style in | |
17 | which it is written. | |
18 | Even when | |
19 | .I "programming secretaries" | |
20 | are available, an automated preparation system can improve the | |
21 | programming process by defining a standard format for programs, | |
22 | making them easier to read. | |
23 | After programs are completed, and modifications to provide new features | |
24 | or to correct bugs are required, an automatic formatting system | |
25 | can work with an intelligently designed source code control | |
26 | system to help maintain clean programs with their histories. | |
27 | .PP | |
28 | The first version of | |
29 | .I pxp | |
30 | took a step toward the goal of machine aided program formatting | |
31 | by formatting the code of the program. | |
32 | It did not, however, in any way help with the annotation of | |
33 | the program with comments. | |
34 | The following sections describe a comment processing | |
35 | facility design which was added to | |
36 | .I pxp. | |
37 | .NH 2 | |
38 | Implementation | |
39 | .PP | |
40 | When parsing the program information is saved in the parse tree which | |
41 | tells the source text coordinates (sequence numbers and columns) | |
42 | for the input tokens at the boundaries of chosen productions. | |
43 | The comments from the source program are saved in a separate data | |
44 | structure together with information about their source text locations and a | |
45 | classification of their ``type.'' | |
46 | The comment reformatting process proceeds by printing out the parsed program | |
47 | and periodically flushing out comments. | |
48 | .NH 3 | |
49 | The kinds of comments | |
50 | .PP | |
51 | .I Pxp | |
52 | distinguishes several types of comments in the input thereby varying | |
53 | their placement in the output. | |
54 | The four basic kinds of comments are: | |
55 | .IP | |
56 | .B | |
57 | Left marginal. | |
58 | .R | |
59 | At the left margin on input these comments are retained at the | |
60 | left margin in the output. | |
61 | .IP | |
62 | .B | |
63 | Aligned. | |
64 | .R | |
65 | Comments which are not at the left margin on input but which have no tokens | |
66 | preceding them on the input line are aligned with the program text on output. | |
67 | .IP | |
68 | .B | |
69 | Trailing. | |
70 | .R | |
71 | Comments appearing after tokens in the input line but separated from the | |
72 | last token on the line by only a small amount of white space are placed | |
73 | similarly in the output. | |
74 | .IP | |
75 | .B | |
76 | Right marginal. | |
77 | .R | |
78 | Comments which are more than a small amount of white space past the last | |
79 | token on a line are aligned about 35 spaces to the right of the running | |
80 | program text. | |
81 | .PP | |
82 | In addition to these comments, other formatting features of the input | |
83 | are preserved to the output. | |
84 | These include blank lines, form ejects, and | |
85 | .B include | |
86 | directives. | |
87 | .NH 2 | |
88 | Examples of comment types | |
89 | .PP | |
90 | Consider the following example: | |
91 | .LS | |
92 | { Factorial program - Assigment 1 | |
93 | John Jones, CS 153, Fall 1977 } | |
94 | ||
95 | \fBprogram\fP fact(output); | |
96 | \fBconst\fP maxfact = 20; {last factorial to be computed} | |
97 | ||
98 | \fBfunction\fP rfact(i: integer): integer; | |
99 | \fBbegin\fP | |
100 | \fBif\fP i <= 1 \fBthen\fP {terminate} | |
101 | fact := 1 | |
102 | \fBelse\fP {recurse} | |
103 | fact := i * fact(i - 1) | |
104 | \fBend\fP; | |
105 | ||
106 | \fBbegin\fP | |
107 | i := 1; | |
108 | j := 1; | |
109 | {iterative factorial} | |
110 | \fBrepeat\fP | |
111 | writeln(i, j); | |
112 | j := j * i; | |
113 | i := i + 1; | |
114 | \fBuntil\fP i > maxfact; | |
115 | ||
116 | {recursive factorial} | |
117 | \fBfor\fP i := 1 \fBto\fP maxfact \fBdo\fP | |
118 | writeln(i, rfact(i)) | |
119 | \fBend\fP. | |
120 | .LE | |
121 | .PP | |
122 | This program has all four basic comment types. | |
123 | The first comment is | |
124 | .I marginal | |
125 | (and is followed by a blank line which is also preserved in a | |
126 | reformatted listing.) | |
127 | The comments in the text ``iterative factorial'' | |
128 | and ``recursive factorial'' are | |
129 | .I aligned | |
130 | comments. | |
131 | The comment on the declaration of | |
132 | .I maxfact | |
133 | is a | |
134 | .I trailing | |
135 | comment while the comments in | |
136 | .I rfact | |
137 | are | |
138 | .I "right marginal." | |
139 | .PP | |
140 | Since the objective of the program reformatting is to not require the | |
141 | saving of the raw programs which produced the restructured programs, | |
142 | it is necessary for the reformatting to produce programs which are | |
143 | .I "fixed points" | |
144 | with respect to the comment processing; | |
145 | that is the form of restructured programs must be preserved under | |
146 | repeated reformatting. | |
147 | The above types of comments have been carefully chosen so that | |
148 | this occurs. | |
149 | .NH 3 | |
150 | Data structures (overview) | |
151 | .PP | |
152 | The following sections provide a brief descriptions of the data structures | |
153 | used by the comment formatting cluster and the method used by the cluster | |
154 | itself. | |
155 | The actual reformatting process involves a number of complications not | |
156 | detailed here, necessary to discern as much as possible from the source | |
157 | text in order to reasonably classify comments into one of the four available | |
158 | types, and in order to live with the existing structure of the parser of | |
159 | .I pi . | |
160 | .PP | |
161 | As each comment is encountered in the source text it is chained | |
162 | onto a singly linked list recording the kind of comment involved; | |
163 | the comment delimiter, either `{' or `(*'; | |
164 | the source text coordinates of the comment; | |
165 | and a linked list of lines of comment text. | |
166 | .PP | |
167 | The other data structure used to gather information is a stack parallel to the | |
168 | parse stack. | |
169 | For each element of the parse stack this stack contains the source text | |
170 | coordinates of the leftmost token shifted over in creating the associated | |
171 | state. | |
172 | Thus, when a reduction occurs, it is possible to identify the portion of the | |
173 | input text which was reduced. | |
174 | At numerous points in the parse tree where comment text is to be processed | |
175 | we save the source coordinates of the first and last token in the reduced | |
176 | production as well as the coordinates of the following (lookahead) token. | |
177 | .NH 3 | |
178 | Comment processing (overview) | |
179 | .PP | |
180 | Formatting of the comments back into the program uses the source text | |
181 | coordinates embedded in the parse tree and attached to comments to construct | |
182 | a formatted listing. | |
183 | The ideas involved are quite simple. | |
184 | When we begin to print out a statement level subtree we first print out | |
185 | comments which precede this statement. | |
186 | We then print out the statement itself, | |
187 | possible invoking this process recursively. | |
188 | In recursive declarations provisions are made for embedded comments also. | |
189 | .PP | |
190 | The most important complication is that comments which appear after the last | |
191 | token of a | |
192 | .B procedure | |
193 | or | |
194 | .B function | |
195 | must be associated with this routine rather than being printed before the | |
196 | next. | |
197 | This requires a special ``to end of line'' kind of comment flushing. | |
198 | Other complications arise because of the pre-existing data structures in | |
199 | .I pxp . | |
200 | The code for | |
201 | .I pxp | |
202 | contains a number of comments detailing the resolution of these and other | |
203 | problems. | |
204 | .bp | |
205 | .NH | |
206 | Conclusions | |
207 | .NH 2 | |
208 | Design | |
209 | .PP | |
210 | In retrospect, most of the design decisions were well-made. | |
211 | The counter placement rules resulted in a small number of counters. | |
212 | The reparsing of the program runs at a reasonable rate, | |
213 | approximately twice the speed of compilation. | |
214 | The inaccurate counts which may be generated with non-local | |
215 | .B goto | |
216 | statements have as yet caused no problems. | |
217 | The biggest deficiency in the design is, in fact, the lack of a debugger | |
218 | implementation to complement the profiling facilities. | |
219 | .NH 2 | |
220 | Implementation | |
221 | .PP | |
222 | The implementation proved to be quite simple. | |
223 | The design choices allowing | |
224 | .I pxp | |
225 | to use the framework of | |
226 | .I pi | |
227 | were well taken. | |
228 | The largest weakness of the implementation may be the fact that | |
229 | the print cluster structure may not necessarily be the best one for | |
230 | doing long statement folding across line boundaries, and for | |
231 | processing the placement of multiple statements per line. | |
232 | Whether or not this is true would be seen if such a implementation | |
233 | modification were attempted. | |
234 | .NH 2 | |
235 | Profiling | |
236 | .PP | |
237 | The format of the profile worked out quite well. | |
238 | Basing the format on [9] was an excellent choice. | |
239 | For initialization procedures and some others, multiple | |
240 | statement per line placements is noticeably needed. | |
241 | It is felt that languages which offer initialization facilities | |
242 | for structured statements will likewise need more complex format | |
243 | processing in such declarations. | |
244 | With comment reformatting a profile can substitute | |
245 | for a program listing. | |
246 | In this case the philosophy of suppressing unexecuted | |
247 | .B procedure | |
248 | and | |
249 | .B function | |
250 | bodies as well as declaration parts | |
251 | might be re-examined. | |
252 | .NH 2 | |
253 | Reformatting | |
254 | .PP | |
255 | For program formatting, a comment formatting facility is a must. | |
256 | The author feels that the basic format of programs is well chosen, | |
257 | but most persons would prefer an (at least slightly) different format. | |
258 | Again, long statement folding and the placement of multiple statements | |
259 | per line would be a plus here. | |
260 | Even in its present state, the formatting facilities are judged to be | |
261 | useful, especially for showing students with no perceivable style | |
262 | one plausible way of formatting Pascal programs. | |
263 | .NH 2 | |
264 | Future systems | |
265 | .PP | |
266 | Execution profiling is an important tool because it provides feedback at the | |
267 | source program level. | |
268 | A number of systems including such facilities exist or are proposed | |
269 | [10] [11] [12] [13]. | |
270 | The author expects the following to be components of future systems: | |
271 | .HP | |
272 | .RS | |
273 | .IP 1) | |
274 | Source language editors [16] [17] | |
275 | .IP 2) | |
276 | Symbolic source language debuggers [8] [16] | |
277 | .IP 3) | |
278 | Source code control systems [18] | |
279 | .RE | |
280 | .PP | |
281 | A well-designed programming language system with these components could provide | |
282 | systems programmers with a powerful set of system construction tools, | |
283 | similar to those available in excellent \s-2LISP\s0 systems such as [16]. | |
284 | .bp | |
285 | .SH | |
286 | References | |
287 | .PP | |
288 | .IP [1] | |
289 | Charles B. Haley | |
290 | .br | |
291 | Master's Project Report | |
292 | .br | |
293 | U.C. Berkeley, June, 1977. | |
294 | .IP [2] | |
295 | William N. Joy | |
296 | .br | |
297 | .I "PX 1.1 Implementation Notes" | |
298 | .br | |
299 | October, 1978. | |
300 | .IP [3] | |
301 | William N. Joy, Susan L. Graham, and Charles B. Haley | |
302 | .br | |
303 | .I "Berkeley Pascal User's Manual" | |
304 | .br | |
305 | Version 1.0 \- November, 1977. | |
306 | .IP [4] | |
307 | K. Thompson and D. M. Ritchie | |
308 | .br | |
309 | .I | |
310 | UNIX | |
311 | Programmers Manual | |
312 | .R | |
313 | .br | |
314 | Version 6.7 | |
315 | (revised at University of California) | |
316 | .br | |
317 | June 1977. | |
318 | .IP [5] | |
319 | Kathleen Jensen and Niklaus Wirth | |
320 | .br | |
321 | .I | |
322 | Pascal \- User Manual and Report | |
323 | .R | |
324 | .br | |
325 | Springer-Verlag, New York | |
326 | .br | |
327 | 1975. | |
328 | .IP [6] | |
329 | William N. Joy | |
330 | .br | |
331 | .I "PDB Design notes and draft manual" | |
332 | .br | |
333 | January, 1977. | |
334 | .IP [7] | |
335 | D. E. Knuth and F. R. Stevenson | |
336 | .br | |
337 | .I "Optimal measurement points for program frequency counts" | |
338 | .br | |
339 | BIT 13 (1973) 313-322. | |
340 | .IP [8] | |
341 | Edwin H. Satterthwaite | |
342 | .br | |
343 | .I "Source Language Debugging Tools" | |
344 | .br | |
345 | STAN-CS-75-494, May, 1975. | |
346 | .IP [9] | |
347 | Edwin H. Satterthwaite | |
348 | .br | |
349 | .I "Debugging Tools for High Level Languages" | |
350 | .br | |
351 | Software, Practice and Experience | |
352 | .br | |
353 | Vol. 2, 197-217, 1972. | |
354 | .IP [10] | |
355 | J. D. Ichbiah, J, C. H\*'eliard, J. P. Rissen, P. Cousot | |
356 | .br | |
357 | .I "The Systems Implementation Language LIS - Reference Manual" | |
358 | .br | |
359 | Technical Report 4549 E1/EN. | |
360 | .br | |
361 | Compagnie Internationale pour l\(aaInformatique | |
362 | .br | |
363 | 68, route de Versailles \- 78430 LOUVECIENNES | |
364 | .br | |
365 | December 1974. Revised January 1976. | |
366 | .IP [11] | |
367 | B. W. Lampson, J. J. Horning, R. L. London, J. G. Mitchell, G. L. Popek | |
368 | .br | |
369 | .I "Report on the Programming Language Euclid" | |
370 | .br | |
371 | Sigplan Notices, Volume 12, Number 2. | |
372 | .br | |
373 | February 1977. | |
374 | .br | |
375 | Pp. 64-65. | |
376 | .IP [12] | |
377 | D. T. Barnard | |
378 | .br | |
379 | .I "Automatic generation of syntax-repairing and paragraphing parsers" | |
380 | .br | |
381 | Technical Report CSRG-52. | |
382 | .br | |
383 | Computer Systems Research Group | |
384 | .br | |
385 | University of Toronto, Toronto, Ontario. | |
386 | .br | |
387 | April 1975. | |
388 | .IP [13] | |
389 | R. C. Holt and D. T. Barnard | |
390 | .br | |
391 | .I "Syntax directed error repair and paragraphing" | |
392 | .br | |
393 | Computer Systems Research Group. | |
394 | .br | |
395 | University of Toronto, Toronto, Ontario. | |
396 | .br | |
397 | June, 1976. | |
398 | .IP [14] | |
399 | A. van Wijngaarden, et. al. | |
400 | .br | |
401 | .I "Revised Report on the algorithmic language ALGOL 68" | |
402 | .br | |
403 | Sigplan Notices, Volume 12, Number 5. | |
404 | .br | |
405 | May 1977. | |
406 | .IP [15] | |
407 | Niklaus Wirth | |
408 | .br | |
409 | .I "Modula: A language for modular multiprogramming" | |
410 | .br | |
411 | Institut f\*:ur Informatik | |
412 | .br | |
413 | ETH, CH 8092 Z\*:urich | |
414 | .br | |
415 | March, 1976. | |
416 | .IP [16] | |
417 | Warren Teitleman | |
418 | .br | |
419 | .I "Interlisp Reference Manual" | |
420 | .br | |
421 | Xerox Palo Alto Research Center | |
422 | .br | |
423 | Palo Alto, California | |
424 | .br | |
425 | December 1974. | |
426 | .IP [17] | |
427 | Steve German | |
428 | .br | |
429 | .I | |
430 | ECL in-core editor | |
431 | .R | |
432 | .br | |
433 | Documentation dated 10/20/1973. | |
434 | .IP [18] | |
435 | Rochkind, Marc J. | |
436 | .br | |
437 | The Source Code Control System | |
438 | .br | |
439 | IEEE TOSE Vol SE-1 #4 | |
440 | .br | |
441 | Dec. 1975, 364-370 |