Commit | Line | Data |
---|---|---|
607a525f | 1 | \" @(#)gathering.me 1.7 %G% |
9107ec01 PK |
2 | .sh 1 "Gathering Profile Data" |
3 | .pp | |
4 | Routine calls or statement executions can be measured by having a | |
5 | compiler augment the code at strategic points. | |
6 | The additions can be inline increments to counters [Knuth71] | |
7 | [Satterthwaite72] [Joy79] or calls to | |
8 | monitoring routines [Unix]. | |
9 | The counter increment overhead is low, and is suitable for | |
602c58f5 | 10 | profiling statements. |
ed1b79b0 PK |
11 | A call of the monitoring routine has an overhead comparable with a |
12 | call of a regular routine, and is therefore only suited | |
602c58f5 | 13 | to profiling on a routine by routine basis. |
9107ec01 PK |
14 | However, the monitoring routine solution has certain advantages. |
15 | Whatever counters are needed by the monitoring routine can be | |
16 | managed by the monitoring routine itself, rather than being | |
17 | distributed around the code. | |
ed1b79b0 PK |
18 | In particular, a monitoring routine can easily be called from separately |
19 | compiled programs. | |
9107ec01 | 20 | In addition, different monitoring routines can be linked into the |
607a525f PK |
21 | program |
22 | being measured | |
23 | to assemble different profiling data without having to | |
9107ec01 PK |
24 | change the compiler or recompile the program. |
25 | We have exploited this approach; | |
26 | our compilers for C, Fortran77, and Pascal can insert calls to a | |
27 | monitoring routine in the prologue for each routine. | |
130ed122 PK |
28 | Use of the monitoring routine requires no planning on part of a |
29 | programmer other than to request that augmented routine | |
30 | prologues be produced during compilation. | |
9107ec01 PK |
31 | .pp |
32 | We are interested in gathering three pieces of information during | |
33 | program execution: call counts and execution times for | |
34 | each profiled routine, and the arcs of the dynamic call graph | |
35 | traversed by this execution of the program. | |
bded3f9b | 36 | By post-processing of this data we can build the dynamic call |
9107ec01 PK |
37 | graph for this execution of the program and propagate times along |
38 | the edges of this graph to attribute times for routines to the | |
39 | routines that invoke them. | |
40 | .pp | |
602c58f5 | 41 | Gathering of the profiling information should not greatly |
9107ec01 PK |
42 | interfere with the running of the program. |
43 | Thus, the monitoring routine must not produce trace output each | |
44 | time it is invoked. | |
45 | The volume of data thus produced would be unmanageably large, | |
46 | and the time required to record it would overwhelm the running | |
47 | time of most programs. | |
bded3f9b | 48 | Similarly, the monitoring routine can not do the analysis of |
9107ec01 PK |
49 | the profiling data (e.g. assembling the call graph, propagating |
50 | times around it, discovering cycles, etc.) during program | |
51 | execution. | |
52 | Our solution is to gather profiling data in memory during program | |
53 | execution and to condense it to a file as the profiled | |
54 | program exits. | |
55 | This file is then processed by a separate program to produce the | |
56 | listing of the profile data. | |
57 | An advantage of this approach is that the profile data for | |
58 | several executions of a program can be combined by the | |
bded3f9b | 59 | post-processing to provide a profile of many |
ed1b79b0 | 60 | executions. |
9107ec01 PK |
61 | .pp |
62 | The execution time monitoring consists of three parts. | |
63 | The first part allocates and initializes the runtime monitoring data | |
64 | structures before the program begins execution. | |
9107ec01 PK |
65 | The second part is the monitoring routine invoked from the |
66 | prologue of each profiled routine. | |
ed1b79b0 PK |
67 | The third part condenses the data structures and writes them |
68 | to a file as the program terminates. | |
9107ec01 PK |
69 | The monitoring routine is discussed in detail in the following sections. |
70 | .sh 2 "Execution Counts" | |
71 | .pp | |
602c58f5 PK |
72 | The \fBgprof\fP monitoring routine counts the number of times |
73 | each profiled routine is called. | |
9107ec01 PK |
74 | The monitoring routine also records the arc in the call graph |
75 | that activated the profiled routine. | |
bded3f9b | 76 | The count is associated with the arc in the call graph |
9107ec01 PK |
77 | rather than with the routine. |
78 | Call counts for routines can then be determined by summing the counts | |
79 | on arcs directed into that routine. | |
80 | In a machine-dependent fashion, the monitoring routine notes its | |
81 | own return address. | |
82 | This address is in the prologue of some profiled routine that is | |
602c58f5 | 83 | the destination of an arc in the dynamic call graph. |
9107ec01 | 84 | The monitoring routine also discovers the return address for that |
602c58f5 PK |
85 | routine, thus identifying the call site, or source of the arc. |
86 | The source of the arc is in the \fIcaller\fP, and the destination is in | |
9107ec01 PK |
87 | the \fIcallee\fP. |
88 | For example, if a routine A calls a routine B, A is the caller, | |
89 | and B is the callee. | |
90 | The prologue of B will include a call to the monitoring routine | |
91 | that will note the arc from A to B and either initialize or | |
92 | increment a counter for that arc. | |
93 | .pp | |
94 | One can not afford to have the monitoring routine output tracing | |
95 | information as each arc is identified. | |
96 | Therefore, the monitoring routine maintains a table of all the | |
97 | arcs discovered, with counts of the numbers of times each is | |
98 | traversed during execution. | |
607a525f PK |
99 | This table is accessed once per routine call. |
100 | Access to it | |
9107ec01 PK |
101 | must be as fast as possible so as not to overwhelm the time |
102 | required to execute the program. | |
103 | .pp | |
104 | Our solution is to access the table through a hash table. | |
105 | We use the call site as the primary key with the callee | |
106 | address being the secondary key. | |
107 | Since each call site typically calls only one callee, we can | |
108 | reduce (usually to one) the number of minor lookups based on the callee. | |
ed1b79b0 | 109 | Another alternative would use the callee as the primary key and the |
9107ec01 PK |
110 | call site as the secondary key. |
111 | Such an organization has the advantage of associating callers with | |
112 | callees, at the expense of longer lookups in the monitoring | |
113 | routine. | |
114 | We are fortunate to be running in a virtual memory environment, | |
115 | and (for the sake of speed) were able to allocate enough space | |
116 | for the primary hash table to allow a one-to-one mapping from | |
117 | call site addresses to the primary hash table. | |
607a525f PK |
118 | Thus our hash function is trivial to calculate and collisions |
119 | occur only for call sites that call multiple | |
120 | destinations (e.g. functional parameters and functional variables). | |
9107ec01 PK |
121 | A one level hash function using both call site and callee would |
122 | result in an unreasonably large hash table. | |
0020feb2 | 123 | Further, the number of dynamic call sites and callees is not known during |
9107ec01 PK |
124 | execution of the profiled program. |
125 | .pp | |
126 | Not all callers and callees can be identified by the monitoring | |
127 | routine. | |
128 | Routines that were compiled without the profiling augmentations | |
129 | will not call the monitoring routine as part of their prologue, | |
607a525f | 130 | and thus no arcs will be recorded whose destinations are in these |
9107ec01 PK |
131 | routines. |
132 | One need not profile all the routines in a program. | |
133 | Routines that are not profiled run at full speed. | |
134 | Certain routines, notably exception handlers, are invoked by | |
135 | non-standard calling sequences. | |
607a525f PK |
136 | Thus the monitoring routine may know the destination of an arc |
137 | (the callee), | |
ed1b79b0 | 138 | but find it difficult or |
607a525f PK |
139 | impossible to determine the source of the arc (the caller). |
140 | Often in these cases the apparent source of the arc is not a call | |
9107ec01 | 141 | site at all. |
ed1b79b0 | 142 | Such anomalous invocations are declared ``spontaneous''. |
9107ec01 PK |
143 | .sh 2 "Execution Times" |
144 | .pp | |
145 | The execution times for routines can be gathered in at least two | |
146 | ways. | |
147 | One method measures the execution time of a routine by measuring | |
148 | the elapsed time from routine entry to routine exit. | |
ed1b79b0 | 149 | Unfortunately, time measurement is complicated on time-sharing |
602c58f5 | 150 | systems by the time-slicing of the program. |
0020feb2 | 151 | A second method samples the value of the program counter at some |
9107ec01 PK |
152 | interval, and infers execution time from the distribution of the |
153 | samples within the program. | |
ed1b79b0 | 154 | This technique is particularly suited to time-sharing systems, |
602c58f5 | 155 | where the time-slicing can serve as the basis for sampling |
9107ec01 PK |
156 | the program counter. |
157 | Notice that, whereas the first method could provide exact timings, | |
158 | the second is inherently a statistical approximation. | |
159 | .pp | |
160 | The sampling method need not require support from the operating | |
161 | system: all that is needed is the ability to set and respond to | |
162 | ``alarm clock'' interrupts that run relative to program time. | |
163 | It is imperative that the intervals be uniform since the | |
164 | sampling of the program counter rather than the duration of the | |
165 | interval is the basis of the distribution. | |
166 | If sampling is done too often, the interruptions to sample the | |
167 | program counter will overwhelm the running of the profiled program. | |
168 | On the other hand, the program must run for enough sampled | |
169 | intervals that the distribution of the samples accurately | |
170 | represents the distribution of time for the execution of the | |
171 | program. | |
172 | As with routine call tracing, the monitoring routine can not | |
173 | afford to output information for each program counter | |
174 | sample. | |
175 | In our computing environment, the operating system can provide a | |
176 | histogram of the location of the program counter at the end of | |
177 | each clock tick (1/60th of a second) in which a program runs. | |
178 | The histogram is assembled in memory as the program runs. | |
179 | This facility is enabled by our monitoring routine. | |
180 | We have adjusted the granularity of the histogram so that | |
181 | program counter values map one-to-one onto the histogram. | |
602c58f5 PK |
182 | We make the simplifying assumption that all calls to a specific |
183 | routine require the same amount of time to execute. | |
bded3f9b | 184 | This assumption may disguise that some calls |
602c58f5 PK |
185 | (or worse, some call sites) always invoke a routine |
186 | such that its execution is faster (or slower) | |
187 | than the average time for that routine. | |
9107ec01 | 188 | .pp |
607a525f | 189 | When the profiled program terminates, |
9107ec01 PK |
190 | the arc table and the histogram of |
191 | program counter samples is written to a file. | |
607a525f | 192 | The arc table is condensed to consist of the source and destination |
9107ec01 PK |
193 | addresses of the arc and the count of the number of times the arc |
194 | was traversed by this execution of the program. | |
195 | The recorded histogram consists of counters of the number of | |
196 | times the program counter was found to be in each of the ranges covered | |
197 | by the histogram. | |
ed1b79b0 PK |
198 | The ranges themselves are summarized as a |
199 | lower and upper bound and a step size. |