BSD 3 development
[unix-history] / .ref-BSD-2 / doc / pxp / pxpin3.n
CommitLineData
1662094b
BJ
1.if !\n(xx .so tmac.p
2.ND
3'nr H1 2
4.NH
5Automatic program formatting
6.NH 2
7Motivation
8.PP
9Just as software packages to automatically format documents
10aid the construction of readable and accurate documents,
11packages which aid the construction of programs by helping prepare
12and maintain the text in a standard and easily readable format
13can aid the entire programming process.
14With an automatically structured listing, the reader can trust the
15textual display of the program to be accurate and can concentrate
16on understanding the program, rather than deciphering the style in
17which it is written.
18Even when
19.I "programming secretaries"
20are available, an automated preparation system can improve the
21programming process by defining a standard format for programs,
22making them easier to read.
23After programs are completed, and modifications to provide new features
24or to correct bugs are required, an automatic formatting system
25can work with an intelligently designed source code control
26system to help maintain clean programs with their histories.
27.PP
28The first version of
29.I pxp
30took a step toward the goal of machine aided program formatting
31by formatting the code of the program.
32It did not, however, in any way help with the annotation of
33the program with comments.
34The following sections describe a comment processing
35facility design which was added to
36.I pxp.
37.NH 2
38Implementation
39.PP
40When parsing the program information is saved in the parse tree which
41tells the source text coordinates (sequence numbers and columns)
42for the input tokens at the boundaries of chosen productions.
43The comments from the source program are saved in a separate data
44structure together with information about their source text locations and a
45classification of their ``type.''
46The comment reformatting process proceeds by printing out the parsed program
47and periodically flushing out comments.
48.NH 3
49The kinds of comments
50.PP
51.I Pxp
52distinguishes several types of comments in the input thereby varying
53their placement in the output.
54The four basic kinds of comments are:
55.IP
56.B
57Left marginal.
58.R
59At the left margin on input these comments are retained at the
60left margin in the output.
61.IP
62.B
63Aligned.
64.R
65Comments which are not at the left margin on input but which have no tokens
66preceding them on the input line are aligned with the program text on output.
67.IP
68.B
69Trailing.
70.R
71Comments appearing after tokens in the input line but separated from the
72last token on the line by only a small amount of white space are placed
73similarly in the output.
74.IP
75.B
76Right marginal.
77.R
78Comments which are more than a small amount of white space past the last
79token on a line are aligned about 35 spaces to the right of the running
80program text.
81.PP
82In addition to these comments, other formatting features of the input
83are preserved to the output.
84These include blank lines, form ejects, and
85.B include
86directives.
87.NH 2
88Examples of comment types
89.PP
90Consider the following example:
91.LS
92{ Factorial program - Assigment 1
93 John Jones, CS 153, Fall 1977 }
94
95\fBprogram\fP fact(output);
96\fBconst\fP maxfact = 20; {last factorial to be computed}
97
98 \fBfunction\fP rfact(i: integer): integer;
99 \fBbegin\fP
100 \fBif\fP i <= 1 \fBthen\fP {terminate}
101 fact := 1
102 \fBelse\fP {recurse}
103 fact := i * fact(i - 1)
104 \fBend\fP;
105
106\fBbegin\fP
107 i := 1;
108 j := 1;
109 {iterative factorial}
110 \fBrepeat\fP
111 writeln(i, j);
112 j := j * i;
113 i := i + 1;
114 \fBuntil\fP i > maxfact;
115
116 {recursive factorial}
117 \fBfor\fP i := 1 \fBto\fP maxfact \fBdo\fP
118 writeln(i, rfact(i))
119\fBend\fP.
120.LE
121.PP
122This program has all four basic comment types.
123The first comment is
124.I marginal
125(and is followed by a blank line which is also preserved in a
126reformatted listing.)
127The comments in the text ``iterative factorial''
128and ``recursive factorial'' are
129.I aligned
130comments.
131The comment on the declaration of
132.I maxfact
133is a
134.I trailing
135comment while the comments in
136.I rfact
137are
138.I "right marginal."
139.PP
140Since the objective of the program reformatting is to not require the
141saving of the raw programs which produced the restructured programs,
142it is necessary for the reformatting to produce programs which are
143.I "fixed points"
144with respect to the comment processing;
145that is the form of restructured programs must be preserved under
146repeated reformatting.
147The above types of comments have been carefully chosen so that
148this occurs.
149.NH 3
150Data structures (overview)
151.PP
152The following sections provide a brief descriptions of the data structures
153used by the comment formatting cluster and the method used by the cluster
154itself.
155The actual reformatting process involves a number of complications not
156detailed here, necessary to discern as much as possible from the source
157text in order to reasonably classify comments into one of the four available
158types, and in order to live with the existing structure of the parser of
159.I pi .
160.PP
161As each comment is encountered in the source text it is chained
162onto a singly linked list recording the kind of comment involved;
163the comment delimiter, either `{' or `(*';
164the source text coordinates of the comment;
165and a linked list of lines of comment text.
166.PP
167The other data structure used to gather information is a stack parallel to the
168parse stack.
169For each element of the parse stack this stack contains the source text
170coordinates of the leftmost token shifted over in creating the associated
171state.
172Thus, when a reduction occurs, it is possible to identify the portion of the
173input text which was reduced.
174At numerous points in the parse tree where comment text is to be processed
175we save the source coordinates of the first and last token in the reduced
176production as well as the coordinates of the following (lookahead) token.
177.NH 3
178Comment processing (overview)
179.PP
180Formatting of the comments back into the program uses the source text
181coordinates embedded in the parse tree and attached to comments to construct
182a formatted listing.
183The ideas involved are quite simple.
184When we begin to print out a statement level subtree we first print out
185comments which precede this statement.
186We then print out the statement itself,
187possible invoking this process recursively.
188In recursive declarations provisions are made for embedded comments also.
189.PP
190The most important complication is that comments which appear after the last
191token of a
192.B procedure
193or
194.B function
195must be associated with this routine rather than being printed before the
196next.
197This requires a special ``to end of line'' kind of comment flushing.
198Other complications arise because of the pre-existing data structures in
199.I pxp .
200The code for
201.I pxp
202contains a number of comments detailing the resolution of these and other
203problems.
204.bp
205.NH
206Conclusions
207.NH 2
208Design
209.PP
210In retrospect, most of the design decisions were well-made.
211The counter placement rules resulted in a small number of counters.
212The reparsing of the program runs at a reasonable rate,
213approximately twice the speed of compilation.
214The inaccurate counts which may be generated with non-local
215.B goto
216statements have as yet caused no problems.
217The biggest deficiency in the design is, in fact, the lack of a debugger
218implementation to complement the profiling facilities.
219.NH 2
220Implementation
221.PP
222The implementation proved to be quite simple.
223The design choices allowing
224.I pxp
225to use the framework of
226.I pi
227were well taken.
228The largest weakness of the implementation may be the fact that
229the print cluster structure may not necessarily be the best one for
230doing long statement folding across line boundaries, and for
231processing the placement of multiple statements per line.
232Whether or not this is true would be seen if such a implementation
233modification were attempted.
234.NH 2
235Profiling
236.PP
237The format of the profile worked out quite well.
238Basing the format on [9] was an excellent choice.
239For initialization procedures and some others, multiple
240statement per line placements is noticeably needed.
241It is felt that languages which offer initialization facilities
242for structured statements will likewise need more complex format
243processing in such declarations.
244With comment reformatting a profile can substitute
245for a program listing.
246In this case the philosophy of suppressing unexecuted
247.B procedure
248and
249.B function
250bodies as well as declaration parts
251might be re-examined.
252.NH 2
253Reformatting
254.PP
255For program formatting, a comment formatting facility is a must.
256The author feels that the basic format of programs is well chosen,
257but most persons would prefer an (at least slightly) different format.
258Again, long statement folding and the placement of multiple statements
259per line would be a plus here.
260Even in its present state, the formatting facilities are judged to be
261useful, especially for showing students with no perceivable style
262one plausible way of formatting Pascal programs.
263.NH 2
264Future systems
265.PP
266Execution profiling is an important tool because it provides feedback at the
267source program level.
268A number of systems including such facilities exist or are proposed
269[10] [11] [12] [13].
270The author expects the following to be components of future systems:
271.HP
272.RS
273.IP 1)
274Source language editors [16] [17]
275.IP 2)
276Symbolic source language debuggers [8] [16]
277.IP 3)
278Source code control systems [18]
279.RE
280.PP
281A well-designed programming language system with these components could provide
282systems programmers with a powerful set of system construction tools,
283similar to those available in excellent \s-2LISP\s0 systems such as [16].
284.bp
285.SH
286References
287.PP
288.IP [1]
289Charles B. Haley
290.br
291Master's Project Report
292.br
293U.C. Berkeley, June, 1977.
294.IP [2]
295William N. Joy
296.br
297.I "PX 1.1 Implementation Notes"
298.br
299October, 1978.
300.IP [3]
301William N. Joy, Susan L. Graham, and Charles B. Haley
302.br
303.I "Berkeley Pascal User's Manual"
304.br
305Version 1.0 \- November, 1977.
306.IP [4]
307K. Thompson and D. M. Ritchie
308.br
309.I
310UNIX
311Programmers Manual
312.R
313.br
314Version 6.7
315(revised at University of California)
316.br
317June 1977.
318.IP [5]
319Kathleen Jensen and Niklaus Wirth
320.br
321.I
322Pascal \- User Manual and Report
323.R
324.br
325Springer-Verlag, New York
326.br
3271975.
328.IP [6]
329William N. Joy
330.br
331.I "PDB Design notes and draft manual"
332.br
333January, 1977.
334.IP [7]
335D. E. Knuth and F. R. Stevenson
336.br
337.I "Optimal measurement points for program frequency counts"
338.br
339BIT 13 (1973) 313-322.
340.IP [8]
341Edwin H. Satterthwaite
342.br
343.I "Source Language Debugging Tools"
344.br
345STAN-CS-75-494, May, 1975.
346.IP [9]
347Edwin H. Satterthwaite
348.br
349.I "Debugging Tools for High Level Languages"
350.br
351Software, Practice and Experience
352.br
353Vol. 2, 197-217, 1972.
354.IP [10]
355J. D. Ichbiah, J, C. H\*'eliard, J. P. Rissen, P. Cousot
356.br
357.I "The Systems Implementation Language LIS - Reference Manual"
358.br
359Technical Report 4549 E1/EN.
360.br
361Compagnie Internationale pour l\(aaInformatique
362.br
36368, route de Versailles \- 78430 LOUVECIENNES
364.br
365December 1974. Revised January 1976.
366.IP [11]
367B. W. Lampson, J. J. Horning, R. L. London, J. G. Mitchell, G. L. Popek
368.br
369.I "Report on the Programming Language Euclid"
370.br
371Sigplan Notices, Volume 12, Number 2.
372.br
373February 1977.
374.br
375Pp. 64-65.
376.IP [12]
377D. T. Barnard
378.br
379.I "Automatic generation of syntax-repairing and paragraphing parsers"
380.br
381Technical Report CSRG-52.
382.br
383Computer Systems Research Group
384.br
385University of Toronto, Toronto, Ontario.
386.br
387April 1975.
388.IP [13]
389R. C. Holt and D. T. Barnard
390.br
391.I "Syntax directed error repair and paragraphing"
392.br
393Computer Systems Research Group.
394.br
395University of Toronto, Toronto, Ontario.
396.br
397June, 1976.
398.IP [14]
399A. van Wijngaarden, et. al.
400.br
401.I "Revised Report on the algorithmic language ALGOL 68"
402.br
403Sigplan Notices, Volume 12, Number 5.
404.br
405May 1977.
406.IP [15]
407Niklaus Wirth
408.br
409.I "Modula: A language for modular multiprogramming"
410.br
411Institut f\*:ur Informatik
412.br
413ETH, CH 8092 Z\*:urich
414.br
415March, 1976.
416.IP [16]
417Warren Teitleman
418.br
419.I "Interlisp Reference Manual"
420.br
421Xerox Palo Alto Research Center
422.br
423Palo Alto, California
424.br
425December 1974.
426.IP [17]
427Steve German
428.br
429.I
430ECL in-core editor
431.R
432.br
433Documentation dated 10/20/1973.
434.IP [18]
435Rochkind, Marc J.
436.br
437The Source Code Control System
438.br
439IEEE TOSE Vol SE-1 #4
440.br
441Dec. 1975, 364-370