Commit | Line | Data |
---|---|---|
73772756 KM |
1 | .\" @(#)present.me 1.12 (Berkeley) %G% |
2 | .\" | |
443c8968 PK |
3 | .sh 1 "Data Presentation" |
4 | .pp | |
5 | The data is presented to the user in two different formats. | |
6 | The first presentation simply lists the routines | |
7 | without regard to the amount of time their descendants use. | |
8 | The second presentation incorporates the call graph of the | |
9 | program. | |
10 | .sh 2 "The Flat Profile | |
11 | .pp | |
12 | The flat profile consists of a list of all the routines | |
13 | that are called during execution of the program, | |
14 | with the count of the number of times they are called | |
15 | and the number of seconds of execution time for which they | |
16 | are themselves accountable. | |
17 | The routines are listed in decreasing order of execution time. | |
18 | A list of the routines that are never called during execution of | |
19 | the program is also available | |
20 | to verify that nothing important is omitted by | |
21 | this execution. | |
22 | The flat profile gives a quick overview of the routines that are used, | |
23 | and shows the routines that are themselves responsible | |
24 | for large fractions of the execution time. | |
25 | In practice, | |
cde52728 | 26 | this profile usually shows that no single function |
443c8968 PK |
27 | is overwhelmingly responsible for |
28 | the total time of the program. | |
29 | Notice that for this profile, | |
30 | the individual times sum to the total execution time. | |
31 | .sh 2 "The Call Graph Profile" | |
627116d0 PK |
32 | .sz 10 |
33 | .(z | |
34 | .TS | |
35 | box center; | |
36 | c c c c c l l | |
37 | c c c c c l l | |
38 | c c c c c l l | |
39 | l n n n c l l. | |
40 | called/total \ \ parents | |
41 | index %time self descendants called+self name index | |
42 | called/total \ \ children | |
43 | _ | |
44 | 0.20 1.20 4/10 \ \ \s-1CALLER1\s+1 [7] | |
45 | 0.30 1.80 6/10 \ \ \s-1CALLER2\s+1 [1] | |
46 | [2] 41.5 0.50 3.00 10+4 \s-1EXAMPLE\s+1 [2] | |
47 | 1.50 1.00 20/40 \ \ \s-1SUB1\s+1 <cycle1> [4] | |
48 | 0.00 0.50 1/5 \ \ \s-1SUB2\s+1 [9] | |
49 | 0.00 0.00 0/5 \ \ \s-1SUB3\s+1 [11] | |
50 | .TE | |
51 | .ce 2 | |
52 | Profile entry for \s-1EXAMPLE\s+1. | |
53 | Figure 4. | |
54 | .)z | |
443c8968 PK |
55 | .pp |
56 | Ideally, we would like to print the call graph of the program, | |
57 | but we are limited by the two-dimensional nature of our output | |
58 | devices. | |
59 | We cannot assume that a call graph is planar, | |
60 | and even if it is, that we can print a planar version of it. | |
4fc97ef0 | 61 | Instead, we choose to list each routine, |
443c8968 PK |
62 | together with information about |
63 | the routines that are its direct parents and children. | |
64 | This listing presents a window into the call graph. | |
65 | Based on our experience, | |
66 | both parent information and child information | |
67 | is important, | |
68 | and should be available without searching | |
69 | through the output. | |
70 | .pp | |
71 | The major entries of the call graph profile are the entries from the | |
72 | flat profile, augmented by the time propagated to each | |
73 | routine from its descendants. | |
74 | This profile is sorted by the sum of the time for the routine | |
75 | itself plus the time inherited from its descendants. | |
76 | The profile shows which of the higher level routines | |
77 | spend large portions of the total execution time | |
78 | in the routines that they call. | |
79 | For each routine, we show the amount of time passed by each child | |
80 | to the routine, which includes time for the child itself | |
81 | and for the descendants of the child | |
82 | (and thus the descendants of the routine). | |
83 | We also show the percentage these times represent of the total time | |
84 | accounted to the child. | |
85 | Similarly, the parents of each routine are listed, | |
86 | along with time, | |
87 | and percentage of total routine time, | |
88 | propagated to each one. | |
89 | .pp | |
90 | Cycles are handled as single entities. | |
91 | The cycle as a whole is shown as though it were a single routine, | |
92 | except that members of the cycle are listed in place of the children. | |
93 | Although the number of calls of each member | |
94 | from within the cycle are shown, | |
95 | they do not affect time propagation. | |
96 | When a child is a member of a cycle, | |
97 | the time shown is the appropriate fraction of the time | |
98 | for the whole cycle. | |
99 | Self-recursive routines have their calls broken | |
c962f264 | 100 | down into calls from the outside and self-recursive calls. |
443c8968 PK |
101 | Only the outside calls affect the propagation of time. |
102 | .pp | |
ceba7594 | 103 | The following example is a typical fragment of a call graph. |
443c8968 | 104 | .(b |
73772756 | 105 | .so pres1.pic |
443c8968 | 106 | .)b |
19435cbf | 107 | The entry in the call graph profile listing for this example is |
627116d0 | 108 | shown in Figure 4. |
443c8968 | 109 | .pp |
627116d0 | 110 | The entry is for routine \s-1EXAMPLE\s+1, which has |
443c8968 PK |
111 | the Caller routines as its parents, |
112 | and the Sub routines as its children. | |
113 | The reader should keep in mind that all information | |
627116d0 PK |
114 | is given \fIwith respect to \s-1EXAMPLE\s+1\fP. |
115 | The index in the first column shows that \s-1EXAMPLE\s+1 | |
443c8968 | 116 | is the second entry in the profile listing. |
627116d0 PK |
117 | The \s-1EXAMPLE\s+1 routine is called ten times, four times by \s-1CALLER1\s+1, |
118 | and six times by \s-1CALLER2\s+1. | |
119 | Consequently 40% of \s-1EXAMPLE\s+1's time is propagated to \s-1CALLER1\s+1, | |
120 | and 60% of \s-1EXAMPLE\s+1's time is propagated to \s-1CALLER2\s+1. | |
443c8968 | 121 | The self and descendant fields of the parents |
627116d0 | 122 | show the amount of self and descendant time \s-1EXAMPLE\s+1 |
443c8968 PK |
123 | propagates to them (but not the time used by |
124 | the parents directly). | |
627116d0 PK |
125 | Note that \s-1EXAMPLE\s+1 calls itself recursively four times. |
126 | The routine \s-1EXAMPLE\s+1 calls routine \s-1SUB1\s+1 twenty times, \s-1SUB2\s+1 once, | |
127 | and never calls \s-1SUB3\s+1. | |
128 | Since \s-1SUB2\s+1 is called a total of five times, | |
129 | 20% of its self and descendant time is propagated to \s-1EXAMPLE\s+1's | |
443c8968 | 130 | descendant time field. |
627116d0 | 131 | Because \s-1SUB1\s+1 is a member of \fIcycle 1\fR, |
4fc97ef0 PK |
132 | the self and descendant times |
133 | and call count fraction | |
134 | are those for the cycle as a whole. | |
135 | Since cycle 1 is called a total of forty times | |
136 | (not counting calls among members of the cycle), | |
137 | it propagates 50% of the cycle's self and descendant | |
627116d0 | 138 | time to \s-1EXAMPLE\s+1's descendant time field. |
cde52728 | 139 | Finally each name is followed by an index that shows |
443c8968 PK |
140 | where on the listing to find the entry for that routine. |
141 | .sh 1 "Using the Profiles" | |
142 | .pp | |
143 | The profiler is a useful tool for improving | |
144 | a set of routines that implement an abstraction. | |
145 | It can be helpful in identifying poorly coded routines, | |
146 | and in evaluating the new algorithms and code that replace them. | |
147 | Taking full advantage of the profiler | |
148 | requires a careful examination of the call graph profile, | |
149 | and a thorough knowledge of the abstractions underlying | |
150 | the program. | |
151 | .pp | |
cde52728 | 152 | The easiest optimization that can be performed |
443c8968 PK |
153 | is a small change |
154 | to a control construct or data structure that improves the | |
155 | running time of the program. | |
156 | An obvious starting point | |
157 | is a routine that is called many times. | |
158 | For example, suppose an output | |
159 | routine is the only parent | |
160 | of a routine that formats the data. | |
161 | If this format routine is expanded inline in the | |
162 | output routine, the overhead of a function call and | |
163 | return can be saved for each datum that needs to be formatted. | |
164 | .pp | |
165 | The drawback to inline expansion is that the data abstractions | |
166 | in the program may become less parameterized, | |
167 | hence less clearly defined. | |
168 | The profiling will also become less useful since the loss of | |
169 | routines will make its output more granular. | |
170 | For example, | |
171 | if the symbol table functions ``lookup'', ``insert'', and ``delete'' | |
172 | are all merged into a single parameterized routine, | |
173 | it will be impossible to determine the costs | |
174 | of any one of these individual functions from the profile. | |
175 | .pp | |
176 | Further potential for optimization lies in routines that | |
177 | implement data abstractions whose total execution | |
178 | time is long. | |
179 | For example, a lookup routine might be called only a few | |
180 | times, but use an inefficient linear search algorithm, | |
181 | that might be replaced with a binary search. | |
182 | Alternately, the discovery that a rehashing function is being | |
183 | called excessively, can lead to a different | |
184 | hash function or a larger hash table. | |
185 | If the data abstraction function cannot easily be speeded up, | |
186 | it may be advantageous to cache its results, | |
187 | and eliminate the need to rerun | |
188 | it for identical inputs. | |
dc2d6ac1 PK |
189 | These and other ideas for program improvement are discussed in |
190 | [Bentley81]. | |
443c8968 | 191 | .pp |
c962f264 | 192 | This tool is best used in an iterative approach: |
443c8968 PK |
193 | profiling the program, |
194 | eliminating one bottleneck, | |
195 | then finding some other part of the program | |
196 | that begins to dominate execution time. | |
197 | For instance, we have used \fBgprof\fR on itself; | |
198 | eliminating, rewriting, and inline expanding routines, | |
199 | until reading | |
200 | data files (hardly a target for optimization!) | |
201 | represents the dominating factor in its execution time. | |
202 | .pp | |
203 | Certain types of programs are not easily analyzed by \fBgprof\fR. | |
204 | They are typified by programs that exhibit a large degree of | |
205 | recursion, such as recursive descent compilers. | |
206 | The problem is that most of the major routines are grouped | |
207 | into a single monolithic cycle. | |
cde52728 | 208 | As in the symbol table abstraction that is placed |
443c8968 PK |
209 | in one routine, |
210 | it is impossible to distinguish which members of the cycle are | |
211 | responsible for the execution time. | |
c962f264 | 212 | Unfortunately there are no easy modifications to these programs that |
443c8968 PK |
213 | make them amenable to analysis. |
214 | .pp | |
215 | A completely different use of the profiler is to analyze the control | |
216 | flow of an unfamiliar program. | |
217 | If you receive a program from another user that you need to modify | |
218 | in some small way, | |
219 | it is often unclear where the changes need to be made. | |
220 | By running the program on an example and then using \fBgprof\fR, | |
221 | you can get a view of the structure of the program. | |
ceba7594 | 222 | .pp |
443c8968 PK |
223 | Consider an example in which you need to change the output format |
224 | of the program. | |
225 | For purposes of this example suppose that the call graph | |
226 | of the output portion of the program has the following structure: | |
227 | .(b | |
73772756 | 228 | .so pres2.pic |
443c8968 PK |
229 | .)b |
230 | Initially you look through the \fBgprof\fR | |
627116d0 | 231 | output for the system call ``\s-1WRITE\s+1''. |
443c8968 | 232 | The format routine you will need to change is probably |
627116d0 | 233 | among the parents of the ``\s-1WRITE\s+1'' procedure. |
443c8968 | 234 | The next step is to look at the profile entry for each |
627116d0 PK |
235 | of parents of ``\s-1WRITE\s+1'', |
236 | in this example either ``\s-1FORMAT1\s+1'' or ``\s-1FORMAT2\s+1'', | |
443c8968 PK |
237 | to determine which one to change. |
238 | Each format routine will have one or more parents, | |
627116d0 | 239 | in this example ``\s-1CALC1\s+1'', ``\s-1CALC2\s+1'', and ``\s-1CALC3\s+1''. |
443c8968 PK |
240 | By inspecting the source code for each of these routines |
241 | you can determine which format routine generates the output that | |
242 | you wish to modify. | |
cde52728 | 243 | Since the \fBgprof\fR entry shows all the |
443c8968 PK |
244 | potential calls to the format routine you intend to change, |
245 | you can determine if your modifications will affect output that | |
246 | should be left alone. | |
627116d0 PK |
247 | If you desire to change the output of ``\s-1CALC2\s+1'', but not ``\s-1CALC3\s+1'', |
248 | then formatting routine ``\s-1FORMAT2\s+1'' needs to be split | |
443c8968 PK |
249 | into two separate routines, |
250 | one of which implements the new format. | |
627116d0 | 251 | You can then retarget just the call by ``\s-1CALC2\s+1'' |
443c8968 PK |
252 | that needs the new format. |
253 | It should be noted that the static call information is particularly | |
cde52728 | 254 | useful here since the test case you run probably will not |
443c8968 PK |
255 | exercise the entire program. |
256 | .sh 1 "Conclusions" | |
257 | .pp | |
258 | We have created a profiler that aids in the evaluation | |
259 | of modular programs. | |
260 | For each routine in the program, | |
261 | the profile shows the extent to which that routine | |
262 | helps support various abstractions, | |
263 | and how that routine uses other abstractions. | |
264 | The profile accurately assesses the cost of routines | |
265 | at all levels of the program decomposition. | |
266 | The profiler is easily used, | |
cde52728 PK |
267 | and can be compiled into the program without any prior planning by |
268 | the programmer. | |
bfb7a486 | 269 | It adds only five to thirty percent execution overhead to the program |
ceba7594 | 270 | being profiled, |
bfb7a486 KM |
271 | produces no additional output until after the program finishes, |
272 | and allows the program to be measured in its actual environment. | |
443c8968 | 273 | Finally, the profiler runs on a time-sharing system |
6b7f5073 KM |
274 | using only the normal services provided by the operating system |
275 | and compilers. |