Commit | Line | Data |
---|---|---|
cde52728 | 1 | \" @(#)present.me 1.6 %G% |
443c8968 PK |
2 | .sh 1 "Data Presentation" |
3 | .pp | |
4 | The data is presented to the user in two different formats. | |
5 | The first presentation simply lists the routines | |
6 | without regard to the amount of time their descendants use. | |
7 | The second presentation incorporates the call graph of the | |
8 | program. | |
9 | .sh 2 "The Flat Profile | |
10 | .pp | |
11 | The flat profile consists of a list of all the routines | |
12 | that are called during execution of the program, | |
13 | with the count of the number of times they are called | |
14 | and the number of seconds of execution time for which they | |
15 | are themselves accountable. | |
16 | The routines are listed in decreasing order of execution time. | |
17 | A list of the routines that are never called during execution of | |
18 | the program is also available | |
19 | to verify that nothing important is omitted by | |
20 | this execution. | |
21 | The flat profile gives a quick overview of the routines that are used, | |
22 | and shows the routines that are themselves responsible | |
23 | for large fractions of the execution time. | |
24 | In practice, | |
cde52728 | 25 | this profile usually shows that no single function |
443c8968 PK |
26 | is overwhelmingly responsible for |
27 | the total time of the program. | |
28 | Notice that for this profile, | |
29 | the individual times sum to the total execution time. | |
30 | .sh 2 "The Call Graph Profile" | |
31 | .pp | |
32 | Ideally, we would like to print the call graph of the program, | |
33 | but we are limited by the two-dimensional nature of our output | |
34 | devices. | |
35 | We cannot assume that a call graph is planar, | |
36 | and even if it is, that we can print a planar version of it. | |
37 | Instead, we choose to print each routine, | |
38 | together with information about | |
39 | the routines that are its direct parents and children. | |
40 | This listing presents a window into the call graph. | |
41 | Based on our experience, | |
42 | both parent information and child information | |
43 | is important, | |
44 | and should be available without searching | |
45 | through the output. | |
46 | .pp | |
47 | The major entries of the call graph profile are the entries from the | |
48 | flat profile, augmented by the time propagated to each | |
49 | routine from its descendants. | |
50 | This profile is sorted by the sum of the time for the routine | |
51 | itself plus the time inherited from its descendants. | |
52 | The profile shows which of the higher level routines | |
53 | spend large portions of the total execution time | |
54 | in the routines that they call. | |
55 | For each routine, we show the amount of time passed by each child | |
56 | to the routine, which includes time for the child itself | |
57 | and for the descendants of the child | |
58 | (and thus the descendants of the routine). | |
59 | We also show the percentage these times represent of the total time | |
60 | accounted to the child. | |
61 | Similarly, the parents of each routine are listed, | |
62 | along with time, | |
63 | and percentage of total routine time, | |
64 | propagated to each one. | |
65 | .pp | |
66 | Cycles are handled as single entities. | |
67 | The cycle as a whole is shown as though it were a single routine, | |
68 | except that members of the cycle are listed in place of the children. | |
69 | Although the number of calls of each member | |
70 | from within the cycle are shown, | |
71 | they do not affect time propagation. | |
72 | When a child is a member of a cycle, | |
73 | the time shown is the appropriate fraction of the time | |
74 | for the whole cycle. | |
75 | Self-recursive routines have their calls broken | |
c962f264 | 76 | down into calls from the outside and self-recursive calls. |
443c8968 PK |
77 | Only the outside calls affect the propagation of time. |
78 | .pp | |
ceba7594 | 79 | The following example is a typical fragment of a call graph. |
443c8968 PK |
80 | .(b |
81 | .TS | |
82 | center; | |
83 | c c c. | |
ceba7594 | 84 | |
443c8968 PK |
85 | Caller1 Caller2 |
86 | ||
ceba7594 | 87 | |
443c8968 PK |
88 | Example |
89 | ||
ceba7594 | 90 | |
443c8968 | 91 | Sub1 Sub2 Sub3 |
ceba7594 | 92 | |
443c8968 PK |
93 | .TE |
94 | .)b | |
ceba7594 | 95 | This example would have the following form of entry |
443c8968 | 96 | in the profile listing. |
6b7f5073 | 97 | .sz 10 |
443c8968 PK |
98 | .(b |
99 | .TS | |
100 | box center; | |
101 | c c c c c l l | |
102 | c c c c c l l | |
103 | c c c c c l l | |
104 | l n n n c l l. | |
105 | called/total \ \ parents | |
106 | index %time self descendants called+self name index | |
107 | called/total \ \ children | |
108 | _ | |
109 | 0.20 1.20 4/10 \ \ Caller1 [7] | |
110 | 0.30 1.80 6/10 \ \ Caller2 [1] | |
111 | [2] 41.5 0.50 3.00 10+4 Example [2] | |
112 | 1.50 1.00 20/40 \ \ Sub1 <cycle1> [4] | |
113 | 0.00 0.50 1/5 \ \ Sub2 [9] | |
114 | 0.00 0.00 0/5 \ \ Sub3 [11] | |
115 | .TE | |
116 | .)b | |
6b7f5073 | 117 | .sz 12 |
443c8968 PK |
118 | .pp |
119 | The entry is for routine Example, which has | |
120 | the Caller routines as its parents, | |
121 | and the Sub routines as its children. | |
122 | The reader should keep in mind that all information | |
123 | is given \fIwith respect to Example\fP. | |
cde52728 | 124 | The index in the first column shows that Example |
443c8968 PK |
125 | is the second entry in the profile listing. |
126 | The Example routine is called ten times, four times by Caller1, | |
127 | and six times by Caller2. | |
128 | Consequently 40% of Example's time is propagated to Caller1, | |
129 | and 60% of Example's time is propagated to Caller2. | |
130 | The self and descendant fields of the parents | |
131 | show the amount of self and descendant time Example | |
132 | propagates to them (but not the time used by | |
133 | the parents directly). | |
134 | Note that Example calls itself recursively four times. | |
135 | The routine Example calls routine Sub1 twenty times, Sub2 once, | |
136 | and never calls Sub3. | |
137 | Since Sub2 is called a total of five times, | |
138 | 20% of its self and descendant time is propagated to Example's | |
139 | descendant time field. | |
140 | Because Sub1 is a member of \fIcycle 1\fR, | |
141 | the self and descendant times are those for the cycle as a whole. | |
142 | Since Sub1 is called a total of forty times from outside | |
143 | of its cycle, it propagates 50% of the cycle's self and descendant | |
144 | time to Example's descendant time field. | |
cde52728 | 145 | Finally each name is followed by an index that shows |
443c8968 PK |
146 | where on the listing to find the entry for that routine. |
147 | .sh 1 "Using the Profiles" | |
148 | .pp | |
149 | The profiler is a useful tool for improving | |
150 | a set of routines that implement an abstraction. | |
151 | It can be helpful in identifying poorly coded routines, | |
152 | and in evaluating the new algorithms and code that replace them. | |
153 | Taking full advantage of the profiler | |
154 | requires a careful examination of the call graph profile, | |
155 | and a thorough knowledge of the abstractions underlying | |
156 | the program. | |
157 | .pp | |
cde52728 | 158 | The easiest optimization that can be performed |
443c8968 PK |
159 | is a small change |
160 | to a control construct or data structure that improves the | |
161 | running time of the program. | |
162 | An obvious starting point | |
163 | is a routine that is called many times. | |
164 | For example, suppose an output | |
165 | routine is the only parent | |
166 | of a routine that formats the data. | |
167 | If this format routine is expanded inline in the | |
168 | output routine, the overhead of a function call and | |
169 | return can be saved for each datum that needs to be formatted. | |
170 | .pp | |
171 | The drawback to inline expansion is that the data abstractions | |
172 | in the program may become less parameterized, | |
173 | hence less clearly defined. | |
174 | The profiling will also become less useful since the loss of | |
175 | routines will make its output more granular. | |
176 | For example, | |
177 | if the symbol table functions ``lookup'', ``insert'', and ``delete'' | |
178 | are all merged into a single parameterized routine, | |
179 | it will be impossible to determine the costs | |
180 | of any one of these individual functions from the profile. | |
181 | .pp | |
182 | Further potential for optimization lies in routines that | |
183 | implement data abstractions whose total execution | |
184 | time is long. | |
185 | For example, a lookup routine might be called only a few | |
186 | times, but use an inefficient linear search algorithm, | |
187 | that might be replaced with a binary search. | |
188 | Alternately, the discovery that a rehashing function is being | |
189 | called excessively, can lead to a different | |
190 | hash function or a larger hash table. | |
191 | If the data abstraction function cannot easily be speeded up, | |
192 | it may be advantageous to cache its results, | |
193 | and eliminate the need to rerun | |
194 | it for identical inputs. | |
195 | .pp | |
c962f264 | 196 | This tool is best used in an iterative approach: |
443c8968 PK |
197 | profiling the program, |
198 | eliminating one bottleneck, | |
199 | then finding some other part of the program | |
200 | that begins to dominate execution time. | |
201 | For instance, we have used \fBgprof\fR on itself; | |
202 | eliminating, rewriting, and inline expanding routines, | |
203 | until reading | |
204 | data files (hardly a target for optimization!) | |
205 | represents the dominating factor in its execution time. | |
206 | .pp | |
207 | Certain types of programs are not easily analyzed by \fBgprof\fR. | |
208 | They are typified by programs that exhibit a large degree of | |
209 | recursion, such as recursive descent compilers. | |
210 | The problem is that most of the major routines are grouped | |
211 | into a single monolithic cycle. | |
cde52728 | 212 | As in the symbol table abstraction that is placed |
443c8968 PK |
213 | in one routine, |
214 | it is impossible to distinguish which members of the cycle are | |
215 | responsible for the execution time. | |
c962f264 | 216 | Unfortunately there are no easy modifications to these programs that |
443c8968 PK |
217 | make them amenable to analysis. |
218 | .pp | |
219 | A completely different use of the profiler is to analyze the control | |
220 | flow of an unfamiliar program. | |
221 | If you receive a program from another user that you need to modify | |
222 | in some small way, | |
223 | it is often unclear where the changes need to be made. | |
224 | By running the program on an example and then using \fBgprof\fR, | |
225 | you can get a view of the structure of the program. | |
ceba7594 | 226 | .pp |
443c8968 PK |
227 | Consider an example in which you need to change the output format |
228 | of the program. | |
229 | For purposes of this example suppose that the call graph | |
230 | of the output portion of the program has the following structure: | |
231 | .(b | |
232 | .TS | |
233 | center; | |
234 | c c c. | |
ceba7594 | 235 | |
443c8968 PK |
236 | calc1 calc2 calc3 |
237 | ||
ceba7594 | 238 | |
443c8968 PK |
239 | format1 format2 |
240 | ||
ceba7594 | 241 | |
443c8968 | 242 | ``write'' |
ceba7594 | 243 | |
443c8968 PK |
244 | .TE |
245 | .)b | |
246 | Initially you look through the \fBgprof\fR | |
247 | output for the system call ``write''. | |
248 | The format routine you will need to change is probably | |
249 | among the parents of the ``write'' procedure. | |
250 | The next step is to look at the profile entry for each | |
251 | of parents of ``write'', | |
252 | in this example either ``format1'' or ``format2'', | |
253 | to determine which one to change. | |
254 | Each format routine will have one or more parents, | |
255 | in this example ``calc1'', ``calc2'', and ``calc3''. | |
256 | By inspecting the source code for each of these routines | |
257 | you can determine which format routine generates the output that | |
258 | you wish to modify. | |
cde52728 | 259 | Since the \fBgprof\fR entry shows all the |
443c8968 PK |
260 | potential calls to the format routine you intend to change, |
261 | you can determine if your modifications will affect output that | |
262 | should be left alone. | |
263 | If you desire to change the output of ``calc2'', but not ``calc3'', | |
264 | then formatting routine ``format2'' needs to be split | |
265 | into two separate routines, | |
266 | one of which implements the new format. | |
267 | You can then retarget just the call by ``calc2'' | |
268 | that needs the new format. | |
269 | It should be noted that the static call information is particularly | |
cde52728 | 270 | useful here since the test case you run probably will not |
443c8968 PK |
271 | exercise the entire program. |
272 | .sh 1 "Conclusions" | |
273 | .pp | |
274 | We have created a profiler that aids in the evaluation | |
275 | of modular programs. | |
276 | For each routine in the program, | |
277 | the profile shows the extent to which that routine | |
278 | helps support various abstractions, | |
279 | and how that routine uses other abstractions. | |
280 | The profile accurately assesses the cost of routines | |
281 | at all levels of the program decomposition. | |
282 | The profiler is easily used, | |
cde52728 PK |
283 | and can be compiled into the program without any prior planning by |
284 | the programmer. | |
bfb7a486 | 285 | It adds only five to thirty percent execution overhead to the program |
ceba7594 | 286 | being profiled, |
bfb7a486 KM |
287 | produces no additional output until after the program finishes, |
288 | and allows the program to be measured in its actual environment. | |
443c8968 | 289 | Finally, the profiler runs on a time-sharing system |
6b7f5073 KM |
290 | using only the normal services provided by the operating system |
291 | and compilers. |