Flush out the last dregs in the terminal before quitting when
[unix-history] / usr / src / usr.bin / gprof / PSD.doc / gathering.me
CommitLineData
607a525f 1\" @(#)gathering.me 1.7 %G%
9107ec01
PK
2.sh 1 "Gathering Profile Data"
3.pp
4Routine calls or statement executions can be measured by having a
5compiler augment the code at strategic points.
6The additions can be inline increments to counters [Knuth71]
7[Satterthwaite72] [Joy79] or calls to
8monitoring routines [Unix].
9The counter increment overhead is low, and is suitable for
602c58f5 10profiling statements.
ed1b79b0
PK
11A call of the monitoring routine has an overhead comparable with a
12call of a regular routine, and is therefore only suited
602c58f5 13to profiling on a routine by routine basis.
9107ec01
PK
14However, the monitoring routine solution has certain advantages.
15Whatever counters are needed by the monitoring routine can be
16managed by the monitoring routine itself, rather than being
17distributed around the code.
ed1b79b0
PK
18In particular, a monitoring routine can easily be called from separately
19compiled programs.
9107ec01 20In addition, different monitoring routines can be linked into the
607a525f
PK
21program
22being measured
23to assemble different profiling data without having to
9107ec01
PK
24change the compiler or recompile the program.
25We have exploited this approach;
26our compilers for C, Fortran77, and Pascal can insert calls to a
27monitoring routine in the prologue for each routine.
130ed122
PK
28Use of the monitoring routine requires no planning on part of a
29programmer other than to request that augmented routine
30prologues be produced during compilation.
9107ec01
PK
31.pp
32We are interested in gathering three pieces of information during
33program execution: call counts and execution times for
34each profiled routine, and the arcs of the dynamic call graph
35traversed by this execution of the program.
bded3f9b 36By post-processing of this data we can build the dynamic call
9107ec01
PK
37graph for this execution of the program and propagate times along
38the edges of this graph to attribute times for routines to the
39routines that invoke them.
40.pp
602c58f5 41Gathering of the profiling information should not greatly
9107ec01
PK
42interfere with the running of the program.
43Thus, the monitoring routine must not produce trace output each
44time it is invoked.
45The volume of data thus produced would be unmanageably large,
46and the time required to record it would overwhelm the running
47time of most programs.
bded3f9b 48Similarly, the monitoring routine can not do the analysis of
9107ec01
PK
49the profiling data (e.g. assembling the call graph, propagating
50times around it, discovering cycles, etc.) during program
51execution.
52Our solution is to gather profiling data in memory during program
53execution and to condense it to a file as the profiled
54program exits.
55This file is then processed by a separate program to produce the
56listing of the profile data.
57An advantage of this approach is that the profile data for
58several executions of a program can be combined by the
bded3f9b 59post-processing to provide a profile of many
ed1b79b0 60executions.
9107ec01
PK
61.pp
62The execution time monitoring consists of three parts.
63The first part allocates and initializes the runtime monitoring data
64structures before the program begins execution.
9107ec01
PK
65The second part is the monitoring routine invoked from the
66prologue of each profiled routine.
ed1b79b0
PK
67The third part condenses the data structures and writes them
68to a file as the program terminates.
9107ec01
PK
69The monitoring routine is discussed in detail in the following sections.
70.sh 2 "Execution Counts"
71.pp
602c58f5
PK
72The \fBgprof\fP monitoring routine counts the number of times
73each profiled routine is called.
9107ec01
PK
74The monitoring routine also records the arc in the call graph
75that activated the profiled routine.
bded3f9b 76The count is associated with the arc in the call graph
9107ec01
PK
77rather than with the routine.
78Call counts for routines can then be determined by summing the counts
79on arcs directed into that routine.
80In a machine-dependent fashion, the monitoring routine notes its
81own return address.
82This address is in the prologue of some profiled routine that is
602c58f5 83the destination of an arc in the dynamic call graph.
9107ec01 84The monitoring routine also discovers the return address for that
602c58f5
PK
85routine, thus identifying the call site, or source of the arc.
86The source of the arc is in the \fIcaller\fP, and the destination is in
9107ec01
PK
87the \fIcallee\fP.
88For example, if a routine A calls a routine B, A is the caller,
89and B is the callee.
90The prologue of B will include a call to the monitoring routine
91that will note the arc from A to B and either initialize or
92increment a counter for that arc.
93.pp
94One can not afford to have the monitoring routine output tracing
95information as each arc is identified.
96Therefore, the monitoring routine maintains a table of all the
97arcs discovered, with counts of the numbers of times each is
98traversed during execution.
607a525f
PK
99This table is accessed once per routine call.
100Access to it
9107ec01
PK
101must be as fast as possible so as not to overwhelm the time
102required to execute the program.
103.pp
104Our solution is to access the table through a hash table.
105We use the call site as the primary key with the callee
106address being the secondary key.
107Since each call site typically calls only one callee, we can
108reduce (usually to one) the number of minor lookups based on the callee.
ed1b79b0 109Another alternative would use the callee as the primary key and the
9107ec01
PK
110call site as the secondary key.
111Such an organization has the advantage of associating callers with
112callees, at the expense of longer lookups in the monitoring
113routine.
114We are fortunate to be running in a virtual memory environment,
115and (for the sake of speed) were able to allocate enough space
116for the primary hash table to allow a one-to-one mapping from
117call site addresses to the primary hash table.
607a525f
PK
118Thus our hash function is trivial to calculate and collisions
119occur only for call sites that call multiple
120destinations (e.g. functional parameters and functional variables).
9107ec01
PK
121A one level hash function using both call site and callee would
122result in an unreasonably large hash table.
0020feb2 123Further, the number of dynamic call sites and callees is not known during
9107ec01
PK
124execution of the profiled program.
125.pp
126Not all callers and callees can be identified by the monitoring
127routine.
128Routines that were compiled without the profiling augmentations
129will not call the monitoring routine as part of their prologue,
607a525f 130and thus no arcs will be recorded whose destinations are in these
9107ec01
PK
131routines.
132One need not profile all the routines in a program.
133Routines that are not profiled run at full speed.
134Certain routines, notably exception handlers, are invoked by
135non-standard calling sequences.
607a525f
PK
136Thus the monitoring routine may know the destination of an arc
137(the callee),
ed1b79b0 138but find it difficult or
607a525f
PK
139impossible to determine the source of the arc (the caller).
140Often in these cases the apparent source of the arc is not a call
9107ec01 141site at all.
ed1b79b0 142Such anomalous invocations are declared ``spontaneous''.
9107ec01
PK
143.sh 2 "Execution Times"
144.pp
145The execution times for routines can be gathered in at least two
146ways.
147One method measures the execution time of a routine by measuring
148the elapsed time from routine entry to routine exit.
ed1b79b0 149Unfortunately, time measurement is complicated on time-sharing
602c58f5 150systems by the time-slicing of the program.
0020feb2 151A second method samples the value of the program counter at some
9107ec01
PK
152interval, and infers execution time from the distribution of the
153samples within the program.
ed1b79b0 154This technique is particularly suited to time-sharing systems,
602c58f5 155where the time-slicing can serve as the basis for sampling
9107ec01
PK
156the program counter.
157Notice that, whereas the first method could provide exact timings,
158the second is inherently a statistical approximation.
159.pp
160The sampling method need not require support from the operating
161system: all that is needed is the ability to set and respond to
162``alarm clock'' interrupts that run relative to program time.
163It is imperative that the intervals be uniform since the
164sampling of the program counter rather than the duration of the
165interval is the basis of the distribution.
166If sampling is done too often, the interruptions to sample the
167program counter will overwhelm the running of the profiled program.
168On the other hand, the program must run for enough sampled
169intervals that the distribution of the samples accurately
170represents the distribution of time for the execution of the
171program.
172As with routine call tracing, the monitoring routine can not
173afford to output information for each program counter
174sample.
175In our computing environment, the operating system can provide a
176histogram of the location of the program counter at the end of
177each clock tick (1/60th of a second) in which a program runs.
178The histogram is assembled in memory as the program runs.
179This facility is enabled by our monitoring routine.
180We have adjusted the granularity of the histogram so that
181program counter values map one-to-one onto the histogram.
602c58f5
PK
182We make the simplifying assumption that all calls to a specific
183routine require the same amount of time to execute.
bded3f9b 184This assumption may disguise that some calls
602c58f5
PK
185(or worse, some call sites) always invoke a routine
186such that its execution is faster (or slower)
187than the average time for that routine.
9107ec01 188.pp
607a525f 189When the profiled program terminates,
9107ec01
PK
190the arc table and the histogram of
191program counter samples is written to a file.
607a525f 192The arc table is condensed to consist of the source and destination
9107ec01
PK
193addresses of the arc and the count of the number of times the arc
194was traversed by this execution of the program.
195The recorded histogram consists of counters of the number of
196times the program counter was found to be in each of the ranges covered
197by the histogram.
ed1b79b0
PK
198The ranges themselves are summarized as a
199lower and upper bound and a step size.