polle changes.
[unix-history] / usr / src / usr.bin / gprof / PSD.doc / present.me
CommitLineData
c962f264 1\" @(#)present.me 1.2 %G%
443c8968
PK
2.sh 1 "Data Presentation"
3.pp
4The data is presented to the user in two different formats.
5The first presentation simply lists the routines
6without regard to the amount of time their descendants use.
7The second presentation incorporates the call graph of the
8program.
9.sh 2 "The Flat Profile
10.pp
11The flat profile consists of a list of all the routines
12that are called during execution of the program,
13with the count of the number of times they are called
14and the number of seconds of execution time for which they
15are themselves accountable.
16The routines are listed in decreasing order of execution time.
17A list of the routines that are never called during execution of
18the program is also available
19to verify that nothing important is omitted by
20this execution.
21The flat profile gives a quick overview of the routines that are used,
22and shows the routines that are themselves responsible
23for large fractions of the execution time.
24In practice,
25this profile usually indicates that no single function
26is overwhelmingly responsible for
27the total time of the program.
28Notice that for this profile,
29the individual times sum to the total execution time.
30.sh 2 "The Call Graph Profile"
31.pp
32Ideally, we would like to print the call graph of the program,
33but we are limited by the two-dimensional nature of our output
34devices.
35We cannot assume that a call graph is planar,
36and even if it is, that we can print a planar version of it.
37Instead, we choose to print each routine,
38together with information about
39the routines that are its direct parents and children.
40This listing presents a window into the call graph.
41Based on our experience,
42both parent information and child information
43is important,
44and should be available without searching
45through the output.
46.pp
47The major entries of the call graph profile are the entries from the
48flat profile, augmented by the time propagated to each
49routine from its descendants.
50This profile is sorted by the sum of the time for the routine
51itself plus the time inherited from its descendants.
52The profile shows which of the higher level routines
53spend large portions of the total execution time
54in the routines that they call.
55For each routine, we show the amount of time passed by each child
56to the routine, which includes time for the child itself
57and for the descendants of the child
58(and thus the descendants of the routine).
59We also show the percentage these times represent of the total time
60accounted to the child.
61Similarly, the parents of each routine are listed,
62along with time,
63and percentage of total routine time,
64propagated to each one.
65.pp
66Cycles are handled as single entities.
67The cycle as a whole is shown as though it were a single routine,
68except that members of the cycle are listed in place of the children.
69Although the number of calls of each member
70from within the cycle are shown,
71they do not affect time propagation.
72When a child is a member of a cycle,
73the time shown is the appropriate fraction of the time
74for the whole cycle.
75Self-recursive routines have their calls broken
c962f264 76down into calls from the outside and self-recursive calls.
443c8968
PK
77Only the outside calls affect the propagation of time.
78.pp
79The following example a typical fragment of a call graph.
80.(b
81.TS
82center;
83c c c.
84Caller1 Caller2
85
86 Example
87
88Sub1 Sub2 Sub3
89.TE
90.)b
91This example would have the following entry
92in the profile listing.
93.(b
94.TS
95box center;
96c c c c c l l
97c c c c c l l
98c c c c c l l
99l n n n c l l.
100 called/total \ \ parents
101index %time self descendants called+self name index
102 called/total \ \ children
103_
104 0.20 1.20 4/10 \ \ Caller1 [7]
105 0.30 1.80 6/10 \ \ Caller2 [1]
106[2] 41.5 0.50 3.00 10+4 Example [2]
107 1.50 1.00 20/40 \ \ Sub1 <cycle1> [4]
108 0.00 0.50 1/5 \ \ Sub2 [9]
109 0.00 0.00 0/5 \ \ Sub3 [11]
110.TE
111.)b
112.pp
113The entry is for routine Example, which has
114the Caller routines as its parents,
115and the Sub routines as its children.
116The reader should keep in mind that all information
117is given \fIwith respect to Example\fP.
118The index in the first column indicates that Example
119is the second entry in the profile listing.
120The Example routine is called ten times, four times by Caller1,
121and six times by Caller2.
122Consequently 40% of Example's time is propagated to Caller1,
123and 60% of Example's time is propagated to Caller2.
124The self and descendant fields of the parents
125show the amount of self and descendant time Example
126propagates to them (but not the time used by
127the parents directly).
128Note that Example calls itself recursively four times.
129The routine Example calls routine Sub1 twenty times, Sub2 once,
130and never calls Sub3.
131Since Sub2 is called a total of five times,
13220% of its self and descendant time is propagated to Example's
133descendant time field.
134Because Sub1 is a member of \fIcycle 1\fR,
135the self and descendant times are those for the cycle as a whole.
136Since Sub1 is called a total of forty times from outside
137of its cycle, it propagates 50% of the cycle's self and descendant
138time to Example's descendant time field.
139Finally each name is followed by an index that indicates
140where on the listing to find the entry for that routine.
141.sh 1 "Using the Profiles"
142.pp
143The profiler is a useful tool for improving
144a set of routines that implement an abstraction.
145It can be helpful in identifying poorly coded routines,
146and in evaluating the new algorithms and code that replace them.
147Taking full advantage of the profiler
148requires a careful examination of the call graph profile,
149and a thorough knowledge of the abstractions underlying
150the program.
151.pp
c962f264 152One of the easiest optimizations that can be performed
443c8968
PK
153is a small change
154to a control construct or data structure that improves the
155running time of the program.
156An obvious starting point
157is a routine that is called many times.
158For example, suppose an output
159routine is the only parent
160of a routine that formats the data.
161If this format routine is expanded inline in the
162output routine, the overhead of a function call and
163return can be saved for each datum that needs to be formatted.
164.pp
165The drawback to inline expansion is that the data abstractions
166in the program may become less parameterized,
167hence less clearly defined.
168The profiling will also become less useful since the loss of
169routines will make its output more granular.
170For example,
171if the symbol table functions ``lookup'', ``insert'', and ``delete''
172are all merged into a single parameterized routine,
173it will be impossible to determine the costs
174of any one of these individual functions from the profile.
175.pp
176Further potential for optimization lies in routines that
177implement data abstractions whose total execution
178time is long.
179For example, a lookup routine might be called only a few
180times, but use an inefficient linear search algorithm,
181that might be replaced with a binary search.
182Alternately, the discovery that a rehashing function is being
183called excessively, can lead to a different
184hash function or a larger hash table.
185If the data abstraction function cannot easily be speeded up,
186it may be advantageous to cache its results,
187and eliminate the need to rerun
188it for identical inputs.
189.pp
c962f264 190This tool is best used in an iterative approach:
443c8968
PK
191profiling the program,
192eliminating one bottleneck,
193then finding some other part of the program
194that begins to dominate execution time.
195For instance, we have used \fBgprof\fR on itself;
196eliminating, rewriting, and inline expanding routines,
197until reading
198data files (hardly a target for optimization!)
199represents the dominating factor in its execution time.
200.pp
201Certain types of programs are not easily analyzed by \fBgprof\fR.
202They are typified by programs that exhibit a large degree of
203recursion, such as recursive descent compilers.
204The problem is that most of the major routines are grouped
205into a single monolithic cycle.
206As in the case of the symbol table abstraction that is placed
207in one routine,
208it is impossible to distinguish which members of the cycle are
209responsible for the execution time.
c962f264 210Unfortunately there are no easy modifications to these programs that
443c8968
PK
211make them amenable to analysis.
212.pp
213A completely different use of the profiler is to analyze the control
214flow of an unfamiliar program.
215If you receive a program from another user that you need to modify
216in some small way,
217it is often unclear where the changes need to be made.
218By running the program on an example and then using \fBgprof\fR,
219you can get a view of the structure of the program.
220Consider an example in which you need to change the output format
221of the program.
222For purposes of this example suppose that the call graph
223of the output portion of the program has the following structure:
224.(b
225.TS
226center;
227c c c.
228calc1 calc2 calc3
229
230format1 format2
231
232 ``write''
233.TE
234.)b
235Initially you look through the \fBgprof\fR
236output for the system call ``write''.
237The format routine you will need to change is probably
238among the parents of the ``write'' procedure.
239The next step is to look at the profile entry for each
240of parents of ``write'',
241in this example either ``format1'' or ``format2'',
242to determine which one to change.
243Each format routine will have one or more parents,
244in this example ``calc1'', ``calc2'', and ``calc3''.
245By inspecting the source code for each of these routines
246you can determine which format routine generates the output that
247you wish to modify.
248Since the \fBgprof\fR entry indicates all the
249potential calls to the format routine you intend to change,
250you can determine if your modifications will affect output that
251should be left alone.
252If you desire to change the output of ``calc2'', but not ``calc3'',
253then formatting routine ``format2'' needs to be split
254into two separate routines,
255one of which implements the new format.
256You can then retarget just the call by ``calc2''
257that needs the new format.
258It should be noted that the static call information is particularly
259useful in this case since the test case you run probably will not
260exercise the entire program.
261.sh 1 "Conclusions"
262.pp
263We have created a profiler that aids in the evaluation
264of modular programs.
265For each routine in the program,
266the profile shows the extent to which that routine
267helps support various abstractions,
268and how that routine uses other abstractions.
269The profile accurately assesses the cost of routines
270at all levels of the program decomposition.
271The profiler is easily used,
272and does not require any prior planning on the part
273of the programmer.
274It does not add excessively to either the execution
275or the I/O of the program being measured,
276and allows programs to be measured in their actual
277environments.
278Finally, the profiler runs on a time-sharing system
279using only the normal services provided by the operating system.
c962f264
PK
280.(q
281polle says: make sure each of these points is clearly shown by the
282paper! this is a nice clear paragraph and maybe should also appear
283(in a different form) earlier to motivate or describe what the paper
284will say.
285.)q