12 point equations.
[unix-history] / usr / src / usr.bin / gprof / PSD.doc / gathering.me
CommitLineData
602c58f5 1\" @(#)gathering.me 1.3 %G%
9107ec01
PK
2.sh 1 "Gathering Profile Data"
3.pp
4Routine calls or statement executions can be measured by having a
5compiler augment the code at strategic points.
6The additions can be inline increments to counters [Knuth71]
7[Satterthwaite72] [Joy79] or calls to
8monitoring routines [Unix].
9The counter increment overhead is low, and is suitable for
602c58f5 10profiling statements.
ed1b79b0
PK
11A call of the monitoring routine has an overhead comparable with a
12call of a regular routine, and is therefore only suited
602c58f5 13to profiling on a routine by routine basis.
9107ec01
PK
14However, the monitoring routine solution has certain advantages.
15Whatever counters are needed by the monitoring routine can be
16managed by the monitoring routine itself, rather than being
17distributed around the code.
ed1b79b0
PK
18In particular, a monitoring routine can easily be called from separately
19compiled programs.
9107ec01
PK
20In addition, different monitoring routines can be linked into the
21program to assemble different profiling data without having to
22change the compiler or recompile the program.
23We have exploited this approach;
24our compilers for C, Fortran77, and Pascal can insert calls to a
25monitoring routine in the prologue for each routine.
26.pp
27We are interested in gathering three pieces of information during
28program execution: call counts and execution times for
29each profiled routine, and the arcs of the dynamic call graph
30traversed by this execution of the program.
31By post-processing of this data we can construct the dynamic call
32graph for this execution of the program and propagate times along
33the edges of this graph to attribute times for routines to the
34routines that invoke them.
35.pp
602c58f5 36Gathering of the profiling information should not greatly
9107ec01
PK
37interfere with the running of the program.
38Thus, the monitoring routine must not produce trace output each
39time it is invoked.
40The volume of data thus produced would be unmanageably large,
41and the time required to record it would overwhelm the running
42time of most programs.
43Similarly, the monitoring routine can not perform the analysis of
44the profiling data (e.g. assembling the call graph, propagating
45times around it, discovering cycles, etc.) during program
46execution.
47Our solution is to gather profiling data in memory during program
48execution and to condense it to a file as the profiled
49program exits.
50This file is then processed by a separate program to produce the
51listing of the profile data.
52An advantage of this approach is that the profile data for
53several executions of a program can be combined by the
54post-processing to provide a profile of a large number of
ed1b79b0 55executions.
9107ec01
PK
56.pp
57The execution time monitoring consists of three parts.
58The first part allocates and initializes the runtime monitoring data
59structures before the program begins execution.
9107ec01
PK
60The second part is the monitoring routine invoked from the
61prologue of each profiled routine.
ed1b79b0
PK
62The third part condenses the data structures and writes them
63to a file as the program terminates.
9107ec01
PK
64The monitoring routine is discussed in detail in the following sections.
65.sh 2 "Execution Counts"
66.pp
602c58f5
PK
67The \fBgprof\fP monitoring routine counts the number of times
68each profiled routine is called.
9107ec01
PK
69The monitoring routine also records the arc in the call graph
70that activated the profiled routine.
71In fact, the count is associated with the arc in the call graph
72rather than with the routine.
73Call counts for routines can then be determined by summing the counts
74on arcs directed into that routine.
75In a machine-dependent fashion, the monitoring routine notes its
76own return address.
77This address is in the prologue of some profiled routine that is
602c58f5 78the destination of an arc in the dynamic call graph.
9107ec01 79The monitoring routine also discovers the return address for that
602c58f5
PK
80routine, thus identifying the call site, or source of the arc.
81The source of the arc is in the \fIcaller\fP, and the destination is in
9107ec01
PK
82the \fIcallee\fP.
83For example, if a routine A calls a routine B, A is the caller,
84and B is the callee.
85The prologue of B will include a call to the monitoring routine
86that will note the arc from A to B and either initialize or
87increment a counter for that arc.
88.pp
89One can not afford to have the monitoring routine output tracing
90information as each arc is identified.
91Therefore, the monitoring routine maintains a table of all the
92arcs discovered, with counts of the numbers of times each is
93traversed during execution.
94This table is accessed once per routine call, so access to it
95must be as fast as possible so as not to overwhelm the time
96required to execute the program.
97.pp
98Our solution is to access the table through a hash table.
99We use the call site as the primary key with the callee
100address being the secondary key.
101Since each call site typically calls only one callee, we can
102reduce (usually to one) the number of minor lookups based on the callee.
ed1b79b0 103Another alternative would use the callee as the primary key and the
9107ec01
PK
104call site as the secondary key.
105Such an organization has the advantage of associating callers with
106callees, at the expense of longer lookups in the monitoring
107routine.
108We are fortunate to be running in a virtual memory environment,
109and (for the sake of speed) were able to allocate enough space
110for the primary hash table to allow a one-to-one mapping from
111call site addresses to the primary hash table.
112Thus our hash function was trivial to calculate and collisions
113occurred only for call sites that called to multiple
114destinations (e.g. functional parameters and variables).
115A one level hash function using both call site and callee would
116result in an unreasonably large hash table.
117Further, the number of call sites and callees is not known during
118execution of the profiled program.
119.pp
120Not all callers and callees can be identified by the monitoring
121routine.
122Routines that were compiled without the profiling augmentations
123will not call the monitoring routine as part of their prologue,
ed1b79b0 124and thus no arcs will be recorded whose heads are in these
9107ec01
PK
125routines.
126One need not profile all the routines in a program.
127Routines that are not profiled run at full speed.
128Certain routines, notably exception handlers, are invoked by
129non-standard calling sequences.
ed1b79b0
PK
130Thus the monitoring routine may know the head of an arc,
131but find it difficult or
9107ec01
PK
132impossible to determine the tail of the arc.
133Often in these cases the apparent tail of the arc is not a call
134site at all.
ed1b79b0 135Such anomalous invocations are declared ``spontaneous''.
9107ec01
PK
136.sh 2 "Execution Times"
137.pp
138The execution times for routines can be gathered in at least two
139ways.
140One method measures the execution time of a routine by measuring
141the elapsed time from routine entry to routine exit.
ed1b79b0 142Unfortunately, time measurement is complicated on time-sharing
602c58f5 143systems by the time-slicing of the program.
9107ec01
PK
144A second method records the value of the program counter at some
145interval, and infers execution time from the distribution of the
146samples within the program.
ed1b79b0 147This technique is particularly suited to time-sharing systems,
602c58f5 148where the time-slicing can serve as the basis for sampling
9107ec01
PK
149the program counter.
150Notice that, whereas the first method could provide exact timings,
151the second is inherently a statistical approximation.
152.pp
153The sampling method need not require support from the operating
154system: all that is needed is the ability to set and respond to
155``alarm clock'' interrupts that run relative to program time.
156It is imperative that the intervals be uniform since the
157sampling of the program counter rather than the duration of the
158interval is the basis of the distribution.
159If sampling is done too often, the interruptions to sample the
160program counter will overwhelm the running of the profiled program.
161On the other hand, the program must run for enough sampled
162intervals that the distribution of the samples accurately
163represents the distribution of time for the execution of the
164program.
165As with routine call tracing, the monitoring routine can not
166afford to output information for each program counter
167sample.
168In our computing environment, the operating system can provide a
169histogram of the location of the program counter at the end of
170each clock tick (1/60th of a second) in which a program runs.
171The histogram is assembled in memory as the program runs.
172This facility is enabled by our monitoring routine.
173We have adjusted the granularity of the histogram so that
174program counter values map one-to-one onto the histogram.
602c58f5
PK
175We make the simplifying assumption that all calls to a specific
176routine require the same amount of time to execute.
177This assumption may disguise the fact that some calls
178(or worse, some call sites) always invoke a routine
179such that its execution is faster (or slower)
180than the average time for that routine.
9107ec01
PK
181.pp
182As the profiled program terminates,
183the arc table and the histogram of
184program counter samples is written to a file.
185The arc table is condensed to consist of the head and tail
186addresses of the arc and the count of the number of times the arc
187was traversed by this execution of the program.
188The recorded histogram consists of counters of the number of
189times the program counter was found to be in each of the ranges covered
190by the histogram.
ed1b79b0
PK
191The ranges themselves are summarized as a
192lower and upper bound and a step size.