[unix-history] / .ref-BSD-2 / doc / pxp / pxpin3.n

.if !\n(xx .so tmac.p
.ND
'nr H1 2
.NH
Automatic program formatting
.NH 2
Motivation
.PP
Just as software packages to automatically format documents 
aid the construction of readable and accurate documents,
packages which aid the construction of programs by helping prepare
and maintain the text in a standard and easily readable format
can aid the entire programming process.
With an automatically structured listing, the reader can trust the
textual display of the program to be accurate and can concentrate
on understanding the program, rather than deciphering the style in
which it is written.
Even when
.I "programming secretaries"
are available, an automated preparation system can improve the
programming process by defining a standard format for programs,
making them easier to read.
After programs are completed, and modifications to provide new features
or to correct bugs are required, an automatic formatting system
can work with an intelligently designed source code control
system to help maintain clean programs with their histories.
.PP
The first version of
.I pxp
took a step toward the goal of machine aided program formatting
by formatting the code of the program.
It did not, however, in any way help with the annotation of
the program with comments.
The following sections describe a comment processing
facility design which was added to
.I pxp.
.NH 2
Implementation
.PP
When parsing the program information is saved in the parse tree which
tells the source text coordinates (sequence numbers and columns)
for the input tokens at the boundaries of chosen productions.
The comments from the source program are saved in a separate data
structure together with information about their source text locations and a
classification of their ``type.''
The comment reformatting process proceeds by printing out the parsed program
and periodically flushing out comments.
.NH 3
The kinds of comments
.PP
.I Pxp
distinguishes several types of comments in the input thereby varying
their placement in the output.
The four basic kinds of comments are:
.IP
.B
Left marginal.
.R
At the left margin on input these comments are retained at the
left margin in the output.
.IP
.B
Aligned.
.R
Comments which are not at the left margin on input but which have no tokens
preceding them on the input line are aligned with the program text on output.
.IP
.B
Trailing.
.R
Comments appearing after tokens in the input line but separated from the
last token on the line by only a small amount of white space are placed
similarly in the output.
.IP
.B
Right marginal.
.R
Comments which are more than a small amount of white space past the last
token on a line are aligned about 35 spaces to the right of the running
program text.
.PP
In addition to these comments, other formatting features of the input
are preserved to the output.
These include blank lines, form ejects, and
.B include
directives.
.NH 2
Examples of comment types
.PP
Consider the following example:
.LS
{ Factorial program - Assigment 1
  John Jones, CS 153, Fall 1977 }

\fBprogram\fP fact(output);
\fBconst\fP maxfact = 20; {last factorial to be computed}

        \fBfunction\fP rfact(i: integer): integer;
        \fBbegin\fP
               \fBif\fP i <= 1 \fBthen\fP         {terminate}
                      fact := 1
               \fBelse\fP                   {recurse}
                      fact := i * fact(i - 1)
        \fBend\fP;

\fBbegin\fP
        i := 1;
        j := 1;
        {iterative factorial}
        \fBrepeat\fP
               writeln(i, j);
               j := j * i;
               i := i + 1;
        \fBuntil\fP i > maxfact;

        {recursive factorial}
        \fBfor\fP i := 1 \fBto\fP maxfact \fBdo\fP
               writeln(i, rfact(i))
\fBend\fP.
.LE
.PP
This program has all four basic comment types.
The first comment is
.I marginal
(and is followed by a blank line which is also preserved in a
reformatted listing.)
The comments in the text ``iterative factorial''
and ``recursive factorial'' are 
.I aligned
comments.
The comment on the declaration of
.I maxfact
is a
.I trailing
comment while the comments in
.I rfact
are
.I "right marginal."
.PP
Since the objective of the program reformatting is to not require the
saving of the raw programs which produced the restructured programs,
it is necessary for the reformatting to produce programs which are
.I "fixed points"
with respect to the comment processing;
that is the form of restructured programs must be preserved under
repeated reformatting.
The above types of comments have been carefully chosen so that
this occurs.
.NH 3
Data structures (overview)
.PP
The following sections provide a brief descriptions of the data structures
used by the comment formatting cluster and the method used by the cluster
itself.
The actual reformatting process involves a number of complications not
detailed here, necessary to discern as much as possible from the source
text in order to reasonably classify comments into one of the four available
types, and in order to live with the existing structure of the parser of
.I pi .
.PP
As each comment is encountered in the source text it is chained 
onto a singly linked list recording the kind of comment involved;
the comment delimiter, either `{' or `(*';
the source text coordinates of the comment;
and a linked list of lines of comment text.
.PP
The other data structure used to gather information is a stack parallel to the
parse stack.
For each element of the parse stack this stack contains the source text
coordinates of the leftmost token shifted over in creating the associated
state.
Thus, when a reduction occurs, it is possible to identify the portion of the
input text which was reduced.
At numerous points in the parse tree where comment text is to be processed
we save the source coordinates of the first and last token in the reduced
production as well as the coordinates of the following (lookahead) token.
.NH 3
Comment processing (overview)
.PP
Formatting of the comments back into the program uses the source text
coordinates embedded in the parse tree and attached to comments to construct
a formatted listing.
The ideas involved are quite simple.
When we begin to print out a statement level subtree we first print out
comments which precede this statement.
We then print out the statement itself,
possible invoking this process recursively.
In recursive declarations provisions are made for embedded comments also.
.PP
The most important complication is that comments which appear after the last
token of a
.B procedure
or
.B function
must be associated with this routine rather than being printed before the
next.
This requires a special ``to end of line'' kind of comment flushing.
Other complications arise because of the pre-existing data structures in
.I pxp .
The code for
.I pxp
contains a number of comments detailing the resolution of these and other
problems.
.bp
.NH
Conclusions
.NH 2
Design
.PP
In retrospect, most of the design decisions were well-made.
The counter placement rules resulted in a small number of counters.
The reparsing of the program runs at a reasonable rate,
approximately twice the speed of compilation.
The inaccurate counts which may be generated with non-local
.B goto
statements have as yet caused no problems.
The biggest deficiency in the design is, in fact, the lack of a debugger
implementation to complement the profiling facilities.
.NH 2
Implementation
.PP
The implementation proved to be quite simple.
The design choices allowing
.I pxp
to use the framework of
.I pi
were well taken.
The largest weakness of the implementation may be the fact that
the print cluster structure may not necessarily be the best one for
doing long statement folding across line boundaries, and for
processing the placement of multiple statements per line.
Whether or not this is true would be seen if such a implementation
modification were attempted.
.NH 2
Profiling
.PP
The format of the profile worked out quite well.
Basing the format on [9] was an excellent choice.
For initialization procedures and some others, multiple
statement per line placements is noticeably needed.
It is felt that languages which offer initialization facilities
for structured statements will likewise need more complex format
processing in such declarations.
With comment reformatting a profile can substitute
for a program listing.
In this case the philosophy of suppressing unexecuted
.B procedure
and
.B function
bodies as well as declaration parts
might be re-examined.
.NH 2
Reformatting
.PP
For program formatting, a comment formatting facility is a must.
The author feels that the basic format of programs is well chosen,
but most persons would prefer an (at least slightly) different format.
Again, long statement folding and the placement of multiple statements
per line would be a plus here.
Even in its present state, the formatting facilities are judged to be
useful, especially for showing students with no perceivable style
one plausible way of formatting Pascal programs.
.NH 2
Future systems
.PP
Execution profiling is an important tool because it provides feedback at the
source program level.
A number of systems including such facilities exist or are proposed
[10] [11] [12] [13].
The author expects the following to be components of future systems:
.HP
.RS
.IP 1)
Source language editors [16] [17]
.IP 2)
Symbolic source language debuggers [8] [16]
.IP 3)
Source code control systems [18]
.RE
.PP
A well-designed programming language system with these components could provide
systems programmers with a powerful set of system construction tools,
similar to those available in excellent \s-2LISP\s0 systems such as [16].
.bp
.SH
References
.PP
.IP [1]
Charles B. Haley
.br
Master's Project Report
.br
U.C. Berkeley, June, 1977.
.IP [2]
William N. Joy
.br
.I "PX 1.1 Implementation Notes"
.br
October, 1978.
.IP [3]
William N. Joy, Susan L. Graham, and Charles B. Haley
.br
.I "Berkeley Pascal User's Manual"
.br
Version 1.0 \- November, 1977.
.IP [4]
K. Thompson and D. M. Ritchie
.br
.I
UNIX
Programmers Manual
.R
.br
Version 6.7
(revised at University of California)
.br
June 1977.
.IP [5]
Kathleen Jensen and Niklaus Wirth
.br
.I
Pascal \- User Manual and Report
.R
.br
Springer-Verlag, New York
.br
1975.
.IP [6]
William N. Joy
.br
.I "PDB Design notes and draft manual"
.br
January, 1977.
.IP [7]
D. E. Knuth and F. R. Stevenson
.br
.I "Optimal measurement points for program frequency counts"
.br
BIT 13 (1973) 313-322.
.IP [8]
Edwin H. Satterthwaite
.br
.I "Source Language Debugging Tools"
.br
STAN-CS-75-494, May, 1975.
.IP [9]
Edwin H. Satterthwaite
.br
.I "Debugging Tools for High Level Languages"
.br
Software, Practice and Experience
.br
Vol. 2, 197-217, 1972.
.IP [10]
J. D. Ichbiah, J, C. H\*'eliard, J. P. Rissen, P. Cousot
.br
.I "The Systems Implementation Language LIS - Reference Manual"
.br
Technical Report 4549 E1/EN.
.br
Compagnie Internationale pour l\(aaInformatique
.br
68, route de Versailles \- 78430 LOUVECIENNES
.br
December 1974. Revised January 1976.
.IP [11]
B. W. Lampson, J. J. Horning, R. L. London, J. G. Mitchell, G. L. Popek
.br
.I "Report on the Programming Language Euclid"
.br
Sigplan Notices, Volume 12, Number 2.
.br
February 1977.
.br
Pp. 64-65.
.IP [12]
D. T. Barnard
.br
.I "Automatic generation of syntax-repairing and paragraphing parsers"
.br
Technical Report CSRG-52.
.br
Computer Systems Research Group
.br
University of Toronto, Toronto, Ontario.
.br
April 1975.
.IP [13]
R. C. Holt and D. T. Barnard
.br
.I "Syntax directed error repair and paragraphing"
.br
Computer Systems Research Group.
.br
University of Toronto, Toronto, Ontario.
.br
June, 1976.
.IP [14]
A. van Wijngaarden, et. al.
.br
.I "Revised Report on the algorithmic language ALGOL 68"
.br
Sigplan Notices, Volume 12, Number 5.
.br
May 1977.
.IP [15]
Niklaus Wirth
.br
.I "Modula: A language for modular multiprogramming"
.br
Institut f\*:ur Informatik
.br
ETH, CH 8092 Z\*:urich
.br
March, 1976.
.IP [16]
Warren Teitleman
.br
.I "Interlisp Reference Manual"
.br
Xerox Palo Alto Research Center
.br
Palo Alto, California
.br
December 1974.
.IP [17]
Steve German
.br
.I
ECL in-core editor
.R
.br
Documentation dated 10/20/1973.
.IP [18]
Rochkind, Marc J.
.br
The Source Code Control System
.br
IEEE TOSE Vol SE-1 #4
.br
Dec. 1975, 364-370
Commit	Line	Data
1662094b BJ	1	.if !\n(xx .so tmac.p
	2	.ND
	3	'nr H1 2
	4	.NH
	5	Automatic program formatting
	6	.NH 2
	7	Motivation
	8	.PP
	9	Just as software packages to automatically format documents
	10	aid the construction of readable and accurate documents,
	11	packages which aid the construction of programs by helping prepare
	12	and maintain the text in a standard and easily readable format
	13	can aid the entire programming process.
	14	With an automatically structured listing, the reader can trust the
	15	textual display of the program to be accurate and can concentrate
	16	on understanding the program, rather than deciphering the style in
	17	which it is written.
	18	Even when
	19	.I "programming secretaries"
	20	are available, an automated preparation system can improve the
	21	programming process by defining a standard format for programs,
	22	making them easier to read.
	23	After programs are completed, and modifications to provide new features
	24	or to correct bugs are required, an automatic formatting system
	25	can work with an intelligently designed source code control
	26	system to help maintain clean programs with their histories.
	27	.PP
	28	The first version of
	29	.I pxp
	30	took a step toward the goal of machine aided program formatting
	31	by formatting the code of the program.
	32	It did not, however, in any way help with the annotation of
	33	the program with comments.
	34	The following sections describe a comment processing
	35	facility design which was added to
	36	.I pxp.
	37	.NH 2
	38	Implementation
	39	.PP
	40	When parsing the program information is saved in the parse tree which
	41	tells the source text coordinates (sequence numbers and columns)
	42	for the input tokens at the boundaries of chosen productions.
	43	The comments from the source program are saved in a separate data
	44	structure together with information about their source text locations and a
	45	classification of their ``type.''
	46	The comment reformatting process proceeds by printing out the parsed program
	47	and periodically flushing out comments.
	48	.NH 3
	49	The kinds of comments
	50	.PP
	51	.I Pxp
	52	distinguishes several types of comments in the input thereby varying
	53	their placement in the output.
	54	The four basic kinds of comments are:
	55	.IP
	56	.B
	57	Left marginal.
	58	.R
	59	At the left margin on input these comments are retained at the
	60	left margin in the output.
	61	.IP
	62	.B
	63	Aligned.
	64	.R
65	Comments which are not at the left margin on input but which have no tokens
66	preceding them on the input line are aligned with the program text on output.
67	.IP
68	.B
69	Trailing.
70	.R
71	Comments appearing after tokens in the input line but separated from the
72	last token on the line by only a small amount of white space are placed
73	similarly in the output.
74	.IP
75	.B
76	Right marginal.
77	.R
78	Comments which are more than a small amount of white space past the last
79	token on a line are aligned about 35 spaces to the right of the running
80	program text.
81	.PP
82	In addition to these comments, other formatting features of the input
83	are preserved to the output.
84	These include blank lines, form ejects, and
85	.B include
86	directives.
87	.NH 2
88	Examples of comment types
89	.PP
90	Consider the following example:
91	.LS
92	{ Factorial program - Assigment 1
93	John Jones, CS 153, Fall 1977 }
94
95	\fBprogram\fP fact(output);
96	\fBconst\fP maxfact = 20; {last factorial to be computed}
97
98	\fBfunction\fP rfact(i: integer): integer;
99	\fBbegin\fP
100	\fBif\fP i <= 1 \fBthen\fP {terminate}
101	fact := 1
102	\fBelse\fP {recurse}
103	fact := i * fact(i - 1)
104	\fBend\fP;
105
106	\fBbegin\fP
107	i := 1;
108	j := 1;
109	{iterative factorial}
110	\fBrepeat\fP
111	writeln(i, j);
112	j := j * i;
113	i := i + 1;
114	\fBuntil\fP i > maxfact;
115
116	{recursive factorial}
117	\fBfor\fP i := 1 \fBto\fP maxfact \fBdo\fP
118	writeln(i, rfact(i))
119	\fBend\fP.
120	.LE
121	.PP
122	This program has all four basic comment types.
123	The first comment is
124	.I marginal
125	(and is followed by a blank line which is also preserved in a
126	reformatted listing.)
127	The comments in the text ``iterative factorial''
128	and ``recursive factorial'' are
129	.I aligned
130	comments.
131	The comment on the declaration of
132	.I maxfact
133	is a
134	.I trailing
135	comment while the comments in
136	.I rfact
137	are
138	.I "right marginal."
139	.PP
140	Since the objective of the program reformatting is to not require the
141	saving of the raw programs which produced the restructured programs,
142	it is necessary for the reformatting to produce programs which are
143	.I "fixed points"
144	with respect to the comment processing;
145	that is the form of restructured programs must be preserved under
146	repeated reformatting.
147	The above types of comments have been carefully chosen so that
148	this occurs.
149	.NH 3
150	Data structures (overview)
151	.PP
152	The following sections provide a brief descriptions of the data structures
153	used by the comment formatting cluster and the method used by the cluster
154	itself.
155	The actual reformatting process involves a number of complications not
156	detailed here, necessary to discern as much as possible from the source
157	text in order to reasonably classify comments into one of the four available
158	types, and in order to live with the existing structure of the parser of
159	.I pi .
160	.PP
161	As each comment is encountered in the source text it is chained
162	onto a singly linked list recording the kind of comment involved;
163	the comment delimiter, either `{' or `(*';
164	the source text coordinates of the comment;
165	and a linked list of lines of comment text.
166	.PP
167	The other data structure used to gather information is a stack parallel to the
168	parse stack.
169	For each element of the parse stack this stack contains the source text
170	coordinates of the leftmost token shifted over in creating the associated
171	state.
172	Thus, when a reduction occurs, it is possible to identify the portion of the
173	input text which was reduced.
174	At numerous points in the parse tree where comment text is to be processed
175	we save the source coordinates of the first and last token in the reduced
176	production as well as the coordinates of the following (lookahead) token.
177	.NH 3
178	Comment processing (overview)
179	.PP
180	Formatting of the comments back into the program uses the source text
181	coordinates embedded in the parse tree and attached to comments to construct
182	a formatted listing.
183	The ideas involved are quite simple.
184	When we begin to print out a statement level subtree we first print out
185	comments which precede this statement.
186	We then print out the statement itself,
187	possible invoking this process recursively.
188	In recursive declarations provisions are made for embedded comments also.
189	.PP
190	The most important complication is that comments which appear after the last
191	token of a
192	.B procedure
193	or
194	.B function
195	must be associated with this routine rather than being printed before the
196	next.
197	This requires a special ``to end of line'' kind of comment flushing.
198	Other complications arise because of the pre-existing data structures in
199	.I pxp .
200	The code for
201	.I pxp
202	contains a number of comments detailing the resolution of these and other
203	problems.
204	.bp
205	.NH
206	Conclusions
207	.NH 2
208	Design
209	.PP
210	In retrospect, most of the design decisions were well-made.
211	The counter placement rules resulted in a small number of counters.
212	The reparsing of the program runs at a reasonable rate,
213	approximately twice the speed of compilation.
214	The inaccurate counts which may be generated with non-local
215	.B goto
216	statements have as yet caused no problems.
217	The biggest deficiency in the design is, in fact, the lack of a debugger
218	implementation to complement the profiling facilities.
219	.NH 2
220	Implementation
221	.PP
222	The implementation proved to be quite simple.
223	The design choices allowing
224	.I pxp
225	to use the framework of
226	.I pi
227	were well taken.
228	The largest weakness of the implementation may be the fact that
229	the print cluster structure may not necessarily be the best one for
230	doing long statement folding across line boundaries, and for
231	processing the placement of multiple statements per line.
232	Whether or not this is true would be seen if such a implementation
233	modification were attempted.
234	.NH 2
235	Profiling
236	.PP
237	The format of the profile worked out quite well.
238	Basing the format on [9] was an excellent choice.
239	For initialization procedures and some others, multiple
240	statement per line placements is noticeably needed.
241	It is felt that languages which offer initialization facilities
242	for structured statements will likewise need more complex format
243	processing in such declarations.
244	With comment reformatting a profile can substitute
245	for a program listing.
246	In this case the philosophy of suppressing unexecuted
247	.B procedure
248	and
249	.B function
250	bodies as well as declaration parts
251	might be re-examined.
252	.NH 2
253	Reformatting
254	.PP
255	For program formatting, a comment formatting facility is a must.
256	The author feels that the basic format of programs is well chosen,
257	but most persons would prefer an (at least slightly) different format.
258	Again, long statement folding and the placement of multiple statements
259	per line would be a plus here.
260	Even in its present state, the formatting facilities are judged to be
261	useful, especially for showing students with no perceivable style
262	one plausible way of formatting Pascal programs.
263	.NH 2
264	Future systems
265	.PP
266	Execution profiling is an important tool because it provides feedback at the
267	source program level.
268	A number of systems including such facilities exist or are proposed
269	[10] [11] [12] [13].
270	The author expects the following to be components of future systems:
271	.HP
272	.RS
273	.IP 1)
274	Source language editors [16] [17]
275	.IP 2)
276	Symbolic source language debuggers [8] [16]
277	.IP 3)
278	Source code control systems [18]
279	.RE
280	.PP
281	A well-designed programming language system with these components could provide
282	systems programmers with a powerful set of system construction tools,
283	similar to those available in excellent \s-2LISP\s0 systems such as [16].
284	.bp
285	.SH
286	References
287	.PP
288	.IP [1]
289	Charles B. Haley
290	.br
291	Master's Project Report
292	.br
293	U.C. Berkeley, June, 1977.
294	.IP [2]
295	William N. Joy
296	.br
297	.I "PX 1.1 Implementation Notes"
298	.br
299	October, 1978.
300	.IP [3]
301	William N. Joy, Susan L. Graham, and Charles B. Haley
302	.br
303	.I "Berkeley Pascal User's Manual"
304	.br
305	Version 1.0 \- November, 1977.
306	.IP [4]
307	K. Thompson and D. M. Ritchie
308	.br
309	.I
310	UNIX
311	Programmers Manual
312	.R
313	.br
314	Version 6.7
315	(revised at University of California)
316	.br
317	June 1977.
318	.IP [5]
319	Kathleen Jensen and Niklaus Wirth
320	.br
321	.I
322	Pascal \- User Manual and Report
323	.R
324	.br
325	Springer-Verlag, New York
326	.br
327	1975.
328	.IP [6]
329	William N. Joy
330	.br
331	.I "PDB Design notes and draft manual"
332	.br
333	January, 1977.
334	.IP [7]
335	D. E. Knuth and F. R. Stevenson
336	.br
337	.I "Optimal measurement points for program frequency counts"
338	.br
339	BIT 13 (1973) 313-322.
340	.IP [8]
341	Edwin H. Satterthwaite
342	.br
343	.I "Source Language Debugging Tools"
344	.br
345	STAN-CS-75-494, May, 1975.
346	.IP [9]
347	Edwin H. Satterthwaite
348	.br
349	.I "Debugging Tools for High Level Languages"
350	.br
351	Software, Practice and Experience
352	.br
353	Vol. 2, 197-217, 1972.
354	.IP [10]
355	J. D. Ichbiah, J, C. H\*'eliard, J. P. Rissen, P. Cousot
356	.br
357	.I "The Systems Implementation Language LIS - Reference Manual"
358	.br
359	Technical Report 4549 E1/EN.
360	.br
361	Compagnie Internationale pour l\(aaInformatique
362	.br
363	68, route de Versailles \- 78430 LOUVECIENNES
364	.br
365	December 1974. Revised January 1976.
366	.IP [11]
367	B. W. Lampson, J. J. Horning, R. L. London, J. G. Mitchell, G. L. Popek
368	.br
369	.I "Report on the Programming Language Euclid"
370	.br
371	Sigplan Notices, Volume 12, Number 2.
372	.br
373	February 1977.
374	.br
375	Pp. 64-65.
376	.IP [12]
377	D. T. Barnard
378	.br
379	.I "Automatic generation of syntax-repairing and paragraphing parsers"
380	.br
381	Technical Report CSRG-52.
382	.br
383	Computer Systems Research Group
384	.br
385	University of Toronto, Toronto, Ontario.
386	.br
387	April 1975.
388	.IP [13]
389	R. C. Holt and D. T. Barnard
390	.br
391	.I "Syntax directed error repair and paragraphing"
392	.br
393	Computer Systems Research Group.
394	.br
395	University of Toronto, Toronto, Ontario.
396	.br
397	June, 1976.
398	.IP [14]
399	A. van Wijngaarden, et. al.
400	.br
401	.I "Revised Report on the algorithmic language ALGOL 68"
402	.br
403	Sigplan Notices, Volume 12, Number 5.
404	.br
405	May 1977.
406	.IP [15]
407	Niklaus Wirth
408	.br
409	.I "Modula: A language for modular multiprogramming"
410	.br
411	Institut f\*:ur Informatik
412	.br
413	ETH, CH 8092 Z\*:urich
414	.br
415	March, 1976.
416	.IP [16]
417	Warren Teitleman
418	.br
419	.I "Interlisp Reference Manual"
420	.br
421	Xerox Palo Alto Research Center
422	.br
423	Palo Alto, California
424	.br
425	December 1974.
426	.IP [17]
427	Steve German
428	.br
429	.I
430	ECL in-core editor
431	.R
432	.br
433	Documentation dated 10/20/1973.
434	.IP [18]
435	Rochkind, Marc J.
436	.br
437	The Source Code Control System
438	.br
439	IEEE TOSE Vol SE-1 #4
440	.br
441	Dec. 1975, 364-370