Commit | Line | Data |
---|---|---|
602c58f5 | 1 | \" @(#)gathering.me 1.3 %G% |
9107ec01 PK |
2 | .sh 1 "Gathering Profile Data" |
3 | .pp | |
4 | Routine calls or statement executions can be measured by having a | |
5 | compiler augment the code at strategic points. | |
6 | The additions can be inline increments to counters [Knuth71] | |
7 | [Satterthwaite72] [Joy79] or calls to | |
8 | monitoring routines [Unix]. | |
9 | The counter increment overhead is low, and is suitable for | |
602c58f5 | 10 | profiling statements. |
ed1b79b0 PK |
11 | A call of the monitoring routine has an overhead comparable with a |
12 | call of a regular routine, and is therefore only suited | |
602c58f5 | 13 | to profiling on a routine by routine basis. |
9107ec01 PK |
14 | However, the monitoring routine solution has certain advantages. |
15 | Whatever counters are needed by the monitoring routine can be | |
16 | managed by the monitoring routine itself, rather than being | |
17 | distributed around the code. | |
ed1b79b0 PK |
18 | In particular, a monitoring routine can easily be called from separately |
19 | compiled programs. | |
9107ec01 PK |
20 | In addition, different monitoring routines can be linked into the |
21 | program to assemble different profiling data without having to | |
22 | change the compiler or recompile the program. | |
23 | We have exploited this approach; | |
24 | our compilers for C, Fortran77, and Pascal can insert calls to a | |
25 | monitoring routine in the prologue for each routine. | |
26 | .pp | |
27 | We are interested in gathering three pieces of information during | |
28 | program execution: call counts and execution times for | |
29 | each profiled routine, and the arcs of the dynamic call graph | |
30 | traversed by this execution of the program. | |
31 | By post-processing of this data we can construct the dynamic call | |
32 | graph for this execution of the program and propagate times along | |
33 | the edges of this graph to attribute times for routines to the | |
34 | routines that invoke them. | |
35 | .pp | |
602c58f5 | 36 | Gathering of the profiling information should not greatly |
9107ec01 PK |
37 | interfere with the running of the program. |
38 | Thus, the monitoring routine must not produce trace output each | |
39 | time it is invoked. | |
40 | The volume of data thus produced would be unmanageably large, | |
41 | and the time required to record it would overwhelm the running | |
42 | time of most programs. | |
43 | Similarly, the monitoring routine can not perform the analysis of | |
44 | the profiling data (e.g. assembling the call graph, propagating | |
45 | times around it, discovering cycles, etc.) during program | |
46 | execution. | |
47 | Our solution is to gather profiling data in memory during program | |
48 | execution and to condense it to a file as the profiled | |
49 | program exits. | |
50 | This file is then processed by a separate program to produce the | |
51 | listing of the profile data. | |
52 | An advantage of this approach is that the profile data for | |
53 | several executions of a program can be combined by the | |
54 | post-processing to provide a profile of a large number of | |
ed1b79b0 | 55 | executions. |
9107ec01 PK |
56 | .pp |
57 | The execution time monitoring consists of three parts. | |
58 | The first part allocates and initializes the runtime monitoring data | |
59 | structures before the program begins execution. | |
9107ec01 PK |
60 | The second part is the monitoring routine invoked from the |
61 | prologue of each profiled routine. | |
ed1b79b0 PK |
62 | The third part condenses the data structures and writes them |
63 | to a file as the program terminates. | |
9107ec01 PK |
64 | The monitoring routine is discussed in detail in the following sections. |
65 | .sh 2 "Execution Counts" | |
66 | .pp | |
602c58f5 PK |
67 | The \fBgprof\fP monitoring routine counts the number of times |
68 | each profiled routine is called. | |
9107ec01 PK |
69 | The monitoring routine also records the arc in the call graph |
70 | that activated the profiled routine. | |
71 | In fact, the count is associated with the arc in the call graph | |
72 | rather than with the routine. | |
73 | Call counts for routines can then be determined by summing the counts | |
74 | on arcs directed into that routine. | |
75 | In a machine-dependent fashion, the monitoring routine notes its | |
76 | own return address. | |
77 | This address is in the prologue of some profiled routine that is | |
602c58f5 | 78 | the destination of an arc in the dynamic call graph. |
9107ec01 | 79 | The monitoring routine also discovers the return address for that |
602c58f5 PK |
80 | routine, thus identifying the call site, or source of the arc. |
81 | The source of the arc is in the \fIcaller\fP, and the destination is in | |
9107ec01 PK |
82 | the \fIcallee\fP. |
83 | For example, if a routine A calls a routine B, A is the caller, | |
84 | and B is the callee. | |
85 | The prologue of B will include a call to the monitoring routine | |
86 | that will note the arc from A to B and either initialize or | |
87 | increment a counter for that arc. | |
88 | .pp | |
89 | One can not afford to have the monitoring routine output tracing | |
90 | information as each arc is identified. | |
91 | Therefore, the monitoring routine maintains a table of all the | |
92 | arcs discovered, with counts of the numbers of times each is | |
93 | traversed during execution. | |
94 | This table is accessed once per routine call, so access to it | |
95 | must be as fast as possible so as not to overwhelm the time | |
96 | required to execute the program. | |
97 | .pp | |
98 | Our solution is to access the table through a hash table. | |
99 | We use the call site as the primary key with the callee | |
100 | address being the secondary key. | |
101 | Since each call site typically calls only one callee, we can | |
102 | reduce (usually to one) the number of minor lookups based on the callee. | |
ed1b79b0 | 103 | Another alternative would use the callee as the primary key and the |
9107ec01 PK |
104 | call site as the secondary key. |
105 | Such an organization has the advantage of associating callers with | |
106 | callees, at the expense of longer lookups in the monitoring | |
107 | routine. | |
108 | We are fortunate to be running in a virtual memory environment, | |
109 | and (for the sake of speed) were able to allocate enough space | |
110 | for the primary hash table to allow a one-to-one mapping from | |
111 | call site addresses to the primary hash table. | |
112 | Thus our hash function was trivial to calculate and collisions | |
113 | occurred only for call sites that called to multiple | |
114 | destinations (e.g. functional parameters and variables). | |
115 | A one level hash function using both call site and callee would | |
116 | result in an unreasonably large hash table. | |
117 | Further, the number of call sites and callees is not known during | |
118 | execution of the profiled program. | |
119 | .pp | |
120 | Not all callers and callees can be identified by the monitoring | |
121 | routine. | |
122 | Routines that were compiled without the profiling augmentations | |
123 | will not call the monitoring routine as part of their prologue, | |
ed1b79b0 | 124 | and thus no arcs will be recorded whose heads are in these |
9107ec01 PK |
125 | routines. |
126 | One need not profile all the routines in a program. | |
127 | Routines that are not profiled run at full speed. | |
128 | Certain routines, notably exception handlers, are invoked by | |
129 | non-standard calling sequences. | |
ed1b79b0 PK |
130 | Thus the monitoring routine may know the head of an arc, |
131 | but find it difficult or | |
9107ec01 PK |
132 | impossible to determine the tail of the arc. |
133 | Often in these cases the apparent tail of the arc is not a call | |
134 | site at all. | |
ed1b79b0 | 135 | Such anomalous invocations are declared ``spontaneous''. |
9107ec01 PK |
136 | .sh 2 "Execution Times" |
137 | .pp | |
138 | The execution times for routines can be gathered in at least two | |
139 | ways. | |
140 | One method measures the execution time of a routine by measuring | |
141 | the elapsed time from routine entry to routine exit. | |
ed1b79b0 | 142 | Unfortunately, time measurement is complicated on time-sharing |
602c58f5 | 143 | systems by the time-slicing of the program. |
9107ec01 PK |
144 | A second method records the value of the program counter at some |
145 | interval, and infers execution time from the distribution of the | |
146 | samples within the program. | |
ed1b79b0 | 147 | This technique is particularly suited to time-sharing systems, |
602c58f5 | 148 | where the time-slicing can serve as the basis for sampling |
9107ec01 PK |
149 | the program counter. |
150 | Notice that, whereas the first method could provide exact timings, | |
151 | the second is inherently a statistical approximation. | |
152 | .pp | |
153 | The sampling method need not require support from the operating | |
154 | system: all that is needed is the ability to set and respond to | |
155 | ``alarm clock'' interrupts that run relative to program time. | |
156 | It is imperative that the intervals be uniform since the | |
157 | sampling of the program counter rather than the duration of the | |
158 | interval is the basis of the distribution. | |
159 | If sampling is done too often, the interruptions to sample the | |
160 | program counter will overwhelm the running of the profiled program. | |
161 | On the other hand, the program must run for enough sampled | |
162 | intervals that the distribution of the samples accurately | |
163 | represents the distribution of time for the execution of the | |
164 | program. | |
165 | As with routine call tracing, the monitoring routine can not | |
166 | afford to output information for each program counter | |
167 | sample. | |
168 | In our computing environment, the operating system can provide a | |
169 | histogram of the location of the program counter at the end of | |
170 | each clock tick (1/60th of a second) in which a program runs. | |
171 | The histogram is assembled in memory as the program runs. | |
172 | This facility is enabled by our monitoring routine. | |
173 | We have adjusted the granularity of the histogram so that | |
174 | program counter values map one-to-one onto the histogram. | |
602c58f5 PK |
175 | We make the simplifying assumption that all calls to a specific |
176 | routine require the same amount of time to execute. | |
177 | This assumption may disguise the fact that some calls | |
178 | (or worse, some call sites) always invoke a routine | |
179 | such that its execution is faster (or slower) | |
180 | than the average time for that routine. | |
9107ec01 PK |
181 | .pp |
182 | As the profiled program terminates, | |
183 | the arc table and the histogram of | |
184 | program counter samples is written to a file. | |
185 | The arc table is condensed to consist of the head and tail | |
186 | addresses of the arc and the count of the number of times the arc | |
187 | was traversed by this execution of the program. | |
188 | The recorded histogram consists of counters of the number of | |
189 | times the program counter was found to be in each of the ranges covered | |
190 | by the histogram. | |
ed1b79b0 PK |
191 | The ranges themselves are summarized as a |
192 | lower and upper bound and a step size. |