add pathnames.h
[unix-history] / usr / src / usr.bin / gprof / PSD.doc / present.me
CommitLineData
73772756
KM
1.\" @(#)present.me 1.12 (Berkeley) %G%
2.\"
443c8968
PK
3.sh 1 "Data Presentation"
4.pp
5The data is presented to the user in two different formats.
6The first presentation simply lists the routines
7without regard to the amount of time their descendants use.
8The second presentation incorporates the call graph of the
9program.
10.sh 2 "The Flat Profile
11.pp
12The flat profile consists of a list of all the routines
13that are called during execution of the program,
14with the count of the number of times they are called
15and the number of seconds of execution time for which they
16are themselves accountable.
17The routines are listed in decreasing order of execution time.
18A list of the routines that are never called during execution of
19the program is also available
20to verify that nothing important is omitted by
21this execution.
22The flat profile gives a quick overview of the routines that are used,
23and shows the routines that are themselves responsible
24for large fractions of the execution time.
25In practice,
cde52728 26this profile usually shows that no single function
443c8968
PK
27is overwhelmingly responsible for
28the total time of the program.
29Notice that for this profile,
30the individual times sum to the total execution time.
31.sh 2 "The Call Graph Profile"
627116d0
PK
32.sz 10
33.(z
34.TS
35box center;
36c c c c c l l
37c c c c c l l
38c c c c c l l
39l n n n c l l.
40 called/total \ \ parents
41index %time self descendants called+self name index
42 called/total \ \ children
43_
44 0.20 1.20 4/10 \ \ \s-1CALLER1\s+1 [7]
45 0.30 1.80 6/10 \ \ \s-1CALLER2\s+1 [1]
46[2] 41.5 0.50 3.00 10+4 \s-1EXAMPLE\s+1 [2]
47 1.50 1.00 20/40 \ \ \s-1SUB1\s+1 <cycle1> [4]
48 0.00 0.50 1/5 \ \ \s-1SUB2\s+1 [9]
49 0.00 0.00 0/5 \ \ \s-1SUB3\s+1 [11]
50.TE
51.ce 2
52Profile entry for \s-1EXAMPLE\s+1.
53Figure 4.
54.)z
443c8968
PK
55.pp
56Ideally, we would like to print the call graph of the program,
57but we are limited by the two-dimensional nature of our output
58devices.
59We cannot assume that a call graph is planar,
60and even if it is, that we can print a planar version of it.
4fc97ef0 61Instead, we choose to list each routine,
443c8968
PK
62together with information about
63the routines that are its direct parents and children.
64This listing presents a window into the call graph.
65Based on our experience,
66both parent information and child information
67is important,
68and should be available without searching
69through the output.
70.pp
71The major entries of the call graph profile are the entries from the
72flat profile, augmented by the time propagated to each
73routine from its descendants.
74This profile is sorted by the sum of the time for the routine
75itself plus the time inherited from its descendants.
76The profile shows which of the higher level routines
77spend large portions of the total execution time
78in the routines that they call.
79For each routine, we show the amount of time passed by each child
80to the routine, which includes time for the child itself
81and for the descendants of the child
82(and thus the descendants of the routine).
83We also show the percentage these times represent of the total time
84accounted to the child.
85Similarly, the parents of each routine are listed,
86along with time,
87and percentage of total routine time,
88propagated to each one.
89.pp
90Cycles are handled as single entities.
91The cycle as a whole is shown as though it were a single routine,
92except that members of the cycle are listed in place of the children.
93Although the number of calls of each member
94from within the cycle are shown,
95they do not affect time propagation.
96When a child is a member of a cycle,
97the time shown is the appropriate fraction of the time
98for the whole cycle.
99Self-recursive routines have their calls broken
c962f264 100down into calls from the outside and self-recursive calls.
443c8968
PK
101Only the outside calls affect the propagation of time.
102.pp
ceba7594 103The following example is a typical fragment of a call graph.
443c8968 104.(b
73772756 105.so pres1.pic
443c8968 106.)b
19435cbf 107The entry in the call graph profile listing for this example is
627116d0 108shown in Figure 4.
443c8968 109.pp
627116d0 110The entry is for routine \s-1EXAMPLE\s+1, which has
443c8968
PK
111the Caller routines as its parents,
112and the Sub routines as its children.
113The reader should keep in mind that all information
627116d0
PK
114is given \fIwith respect to \s-1EXAMPLE\s+1\fP.
115The index in the first column shows that \s-1EXAMPLE\s+1
443c8968 116is the second entry in the profile listing.
627116d0
PK
117The \s-1EXAMPLE\s+1 routine is called ten times, four times by \s-1CALLER1\s+1,
118and six times by \s-1CALLER2\s+1.
119Consequently 40% of \s-1EXAMPLE\s+1's time is propagated to \s-1CALLER1\s+1,
120and 60% of \s-1EXAMPLE\s+1's time is propagated to \s-1CALLER2\s+1.
443c8968 121The self and descendant fields of the parents
627116d0 122show the amount of self and descendant time \s-1EXAMPLE\s+1
443c8968
PK
123propagates to them (but not the time used by
124the parents directly).
627116d0
PK
125Note that \s-1EXAMPLE\s+1 calls itself recursively four times.
126The routine \s-1EXAMPLE\s+1 calls routine \s-1SUB1\s+1 twenty times, \s-1SUB2\s+1 once,
127and never calls \s-1SUB3\s+1.
128Since \s-1SUB2\s+1 is called a total of five times,
12920% of its self and descendant time is propagated to \s-1EXAMPLE\s+1's
443c8968 130descendant time field.
627116d0 131Because \s-1SUB1\s+1 is a member of \fIcycle 1\fR,
4fc97ef0
PK
132the self and descendant times
133and call count fraction
134are those for the cycle as a whole.
135Since cycle 1 is called a total of forty times
136(not counting calls among members of the cycle),
137it propagates 50% of the cycle's self and descendant
627116d0 138time to \s-1EXAMPLE\s+1's descendant time field.
cde52728 139Finally each name is followed by an index that shows
443c8968
PK
140where on the listing to find the entry for that routine.
141.sh 1 "Using the Profiles"
142.pp
143The profiler is a useful tool for improving
144a set of routines that implement an abstraction.
145It can be helpful in identifying poorly coded routines,
146and in evaluating the new algorithms and code that replace them.
147Taking full advantage of the profiler
148requires a careful examination of the call graph profile,
149and a thorough knowledge of the abstractions underlying
150the program.
151.pp
cde52728 152The easiest optimization that can be performed
443c8968
PK
153is a small change
154to a control construct or data structure that improves the
155running time of the program.
156An obvious starting point
157is a routine that is called many times.
158For example, suppose an output
159routine is the only parent
160of a routine that formats the data.
161If this format routine is expanded inline in the
162output routine, the overhead of a function call and
163return can be saved for each datum that needs to be formatted.
164.pp
165The drawback to inline expansion is that the data abstractions
166in the program may become less parameterized,
167hence less clearly defined.
168The profiling will also become less useful since the loss of
169routines will make its output more granular.
170For example,
171if the symbol table functions ``lookup'', ``insert'', and ``delete''
172are all merged into a single parameterized routine,
173it will be impossible to determine the costs
174of any one of these individual functions from the profile.
175.pp
176Further potential for optimization lies in routines that
177implement data abstractions whose total execution
178time is long.
179For example, a lookup routine might be called only a few
180times, but use an inefficient linear search algorithm,
181that might be replaced with a binary search.
182Alternately, the discovery that a rehashing function is being
183called excessively, can lead to a different
184hash function or a larger hash table.
185If the data abstraction function cannot easily be speeded up,
186it may be advantageous to cache its results,
187and eliminate the need to rerun
188it for identical inputs.
dc2d6ac1
PK
189These and other ideas for program improvement are discussed in
190[Bentley81].
443c8968 191.pp
c962f264 192This tool is best used in an iterative approach:
443c8968
PK
193profiling the program,
194eliminating one bottleneck,
195then finding some other part of the program
196that begins to dominate execution time.
197For instance, we have used \fBgprof\fR on itself;
198eliminating, rewriting, and inline expanding routines,
199until reading
200data files (hardly a target for optimization!)
201represents the dominating factor in its execution time.
202.pp
203Certain types of programs are not easily analyzed by \fBgprof\fR.
204They are typified by programs that exhibit a large degree of
205recursion, such as recursive descent compilers.
206The problem is that most of the major routines are grouped
207into a single monolithic cycle.
cde52728 208As in the symbol table abstraction that is placed
443c8968
PK
209in one routine,
210it is impossible to distinguish which members of the cycle are
211responsible for the execution time.
c962f264 212Unfortunately there are no easy modifications to these programs that
443c8968
PK
213make them amenable to analysis.
214.pp
215A completely different use of the profiler is to analyze the control
216flow of an unfamiliar program.
217If you receive a program from another user that you need to modify
218in some small way,
219it is often unclear where the changes need to be made.
220By running the program on an example and then using \fBgprof\fR,
221you can get a view of the structure of the program.
ceba7594 222.pp
443c8968
PK
223Consider an example in which you need to change the output format
224of the program.
225For purposes of this example suppose that the call graph
226of the output portion of the program has the following structure:
227.(b
73772756 228.so pres2.pic
443c8968
PK
229.)b
230Initially you look through the \fBgprof\fR
627116d0 231output for the system call ``\s-1WRITE\s+1''.
443c8968 232The format routine you will need to change is probably
627116d0 233among the parents of the ``\s-1WRITE\s+1'' procedure.
443c8968 234The next step is to look at the profile entry for each
627116d0
PK
235of parents of ``\s-1WRITE\s+1'',
236in this example either ``\s-1FORMAT1\s+1'' or ``\s-1FORMAT2\s+1'',
443c8968
PK
237to determine which one to change.
238Each format routine will have one or more parents,
627116d0 239in this example ``\s-1CALC1\s+1'', ``\s-1CALC2\s+1'', and ``\s-1CALC3\s+1''.
443c8968
PK
240By inspecting the source code for each of these routines
241you can determine which format routine generates the output that
242you wish to modify.
cde52728 243Since the \fBgprof\fR entry shows all the
443c8968
PK
244potential calls to the format routine you intend to change,
245you can determine if your modifications will affect output that
246should be left alone.
627116d0
PK
247If you desire to change the output of ``\s-1CALC2\s+1'', but not ``\s-1CALC3\s+1'',
248then formatting routine ``\s-1FORMAT2\s+1'' needs to be split
443c8968
PK
249into two separate routines,
250one of which implements the new format.
627116d0 251You can then retarget just the call by ``\s-1CALC2\s+1''
443c8968
PK
252that needs the new format.
253It should be noted that the static call information is particularly
cde52728 254useful here since the test case you run probably will not
443c8968
PK
255exercise the entire program.
256.sh 1 "Conclusions"
257.pp
258We have created a profiler that aids in the evaluation
259of modular programs.
260For each routine in the program,
261the profile shows the extent to which that routine
262helps support various abstractions,
263and how that routine uses other abstractions.
264The profile accurately assesses the cost of routines
265at all levels of the program decomposition.
266The profiler is easily used,
cde52728
PK
267and can be compiled into the program without any prior planning by
268the programmer.
bfb7a486 269It adds only five to thirty percent execution overhead to the program
ceba7594 270being profiled,
bfb7a486
KM
271produces no additional output until after the program finishes,
272and allows the program to be measured in its actual environment.
443c8968 273Finally, the profiler runs on a time-sharing system
6b7f5073
KM
274using only the normal services provided by the operating system
275and compilers.