Commit | Line | Data |
---|---|---|
98e01bad | 1 | \" @(#)postp.me 1.6 %G% |
30a96129 PK |
2 | .EQ |
3 | delim ## | |
98e01bad | 4 | gsize 12 |
30a96129 | 5 | .EN |
819c1fed PK |
6 | .sh 1 "Post Processing" |
7 | .pp | |
8 | Having gathered the arcs of the call graph and timing information | |
9 | for an execution of the program, | |
10 | we are interested in attributing the time for each routine to the | |
11 | routines that call it. | |
12 | We construct a dynamic call graph with arcs from caller to callee, | |
13 | and propagate time from descendants to ancestors | |
14 | by topologically sorting the call graph. | |
28f2ab56 PK |
15 | Time propagation is performed from the leaves of the |
16 | call graph toward the roots, according to the order | |
17 | assigned by a topological numbering algorithm. | |
18 | The topological numbering ensures that | |
19 | all edges in the graph go from higher numbered nodes to lower | |
20 | numbered nodes. | |
21 | If we propagate time from nodes in the | |
22 | reverse order assigned by the algorithm, | |
23 | execution time can be propagated from descendants to ancestors | |
24 | after a single traversal of each arc in the call graph. | |
25 | Each parent receives some fraction of a child's time. | |
26 | Thus time is charged to the | |
27 | caller in addition to being charged to the callee. | |
28 | .(b | |
30a96129 PK |
29 | .TS |
30 | center; | |
31 | c s c c s. | |
28f2ab56 | 32 | 8 9 |
30a96129 PK |
33 | |
34 | ||
35 | .T& | |
36 | c c c c c. | |
37 | 3 7 | |
38 | ||
39 | ||
40 | 2 5 6 | |
41 | ||
42 | ||
43 | 1 4 | |
44 | .TE | |
b8b92e41 PK |
45 | .ce 1 |
46 | Topological ordering | |
47 | .ce 0 | |
28f2ab56 | 48 | .)b |
bf3fc937 | 49 | .pp |
30a96129 PK |
50 | Let #C sub e# be the number of calls to some routine, |
51 | #e#, and #C sub e sup r# be the number of | |
52 | calls from a caller #r# to a callee #e#. | |
53 | Since we are assuming each call to a routine takes the | |
54 | average amount of time for all calls to that routine, | |
55 | the caller is accountable for | |
56 | #C sub e sup r# \(sl #C sub e# | |
57 | of the time spent by the callee. | |
58 | Let the #S sub e# be the #selftime# of a routine, #e#. | |
59 | The selftime of a routine can be determined from the | |
60 | timing information gathered during profiled program execution. | |
61 | The total time, #T sub r#, we wish to account to a routine | |
62 | #r#, is then given by the recurence equation: | |
63 | .EQ | |
64 | T sub r ~ = ~ {S sub r} ~ + ~ | |
65 | sum from {r ~ roman CALLS ~ e} | |
66 | {T sub e \(mu {{C sub e sup r} over {C sub e}}} | |
67 | .EN | |
68 | where #r ~ roman CALLS ~ e# is a relation indicating all routines | |
69 | #e# called by a routine #r#. | |
70 | This relation is easily available from the call graph. | |
71 | .pp | |
819c1fed PK |
72 | However, if the execution contains recursive calls, |
73 | the call graph has cycles that | |
74 | cannot be topologically sorted. | |
bf3fc937 | 75 | In these cases, we discover strongly-connected |
819c1fed PK |
76 | components in the call graph, |
77 | treat each such component as a single node, | |
78 | and then sort the resulting graph. | |
79 | We use a variation of Tarjan's strongly-connected | |
80 | components algorithm | |
81 | that discovers strongly-connected components as it is assigning | |
82 | topological order numbers [Tarjan72]. | |
819c1fed PK |
83 | .pp |
84 | Time propagation within strongly connected | |
85 | components is a problem. | |
86 | For example, a self-recursive routine | |
87 | (a trivial cycle in the call graph) | |
88 | is accountable for all the time it | |
89 | uses in all its recursive instantiations. | |
90 | In our scheme, this time should be | |
91 | shared among its call graph parents. | |
92 | The arcs from a routine to itself are of interest, | |
93 | but do not participate in time propagation. | |
28f2ab56 PK |
94 | Thus the simple equation for time propagation |
95 | does not work within strongly connected components. | |
819c1fed PK |
96 | Time is not propagated from one member of a cycle to another, |
97 | since, by definition, this involves propagating time from a routine | |
98 | to itself. | |
99 | In addition, children of one member of a cycle | |
100 | must be considered children of all members of the cycle. | |
101 | Similarly, parents of one member of the cycle must inherit | |
102 | all members of the cycle as descendants. | |
103 | It is for these reasons that we collapse connected components. | |
104 | Our solution collects all members of a cycle together, | |
105 | summing the time and call counts for all members. | |
106 | All calls into the cycle are made to share the total | |
107 | time of the cycle, and all descendants of the cycle | |
108 | propagate time into the cycle as a whole. | |
109 | Calls among the members of the cycle | |
110 | do not propagate any time, | |
111 | though they are listed in the call graph profile. | |
28f2ab56 | 112 | .(b |
30a96129 PK |
113 | .TS |
114 | center; | |
115 | c s c c s. | |
28f2ab56 | 116 | o o |
30a96129 PK |
117 | |
118 | ||
119 | .T& | |
120 | c c c c c. | |
b8b92e41 | 121 | o o |
30a96129 PK |
122 | |
123 | ||
b8b92e41 | 124 | o o o |
30a96129 PK |
125 | |
126 | ||
b8b92e41 PK |
127 | o o |
128 | .TE | |
129 | .ce 1 | |
130 | Cycle to be collapsed. | |
28f2ab56 PK |
131 | .)b |
132 | .(b | |
b8b92e41 PK |
133 | .TS |
134 | center; | |
135 | c s c c s. | |
28f2ab56 | 136 | 7 8 |
b8b92e41 PK |
137 | |
138 | ||
139 | .T& | |
140 | c c c c c. | |
141 | 6 6 | |
142 | ||
143 | ||
144 | 2 4 5 | |
145 | ||
146 | ||
147 | 1 3 | |
30a96129 | 148 | .TE |
b8b92e41 PK |
149 | .ce 1 |
150 | Topological numbering after cycle collapsing. | |
151 | .ce 0 | |
28f2ab56 | 152 | .)b |
819c1fed PK |
153 | .pp |
154 | Since the technique described above only collects the | |
155 | dynamic call graph, | |
156 | and the program typically does not call every routine | |
157 | on each execution, | |
158 | different executions can introduce different cycles in the | |
159 | dynamic call graph. | |
160 | Since cycles often have a significant effect on time propagation, | |
161 | it is desirable to incorporate the static call graph so that cycles | |
162 | will have the same members regardless of how the program runs. | |
163 | .pp | |
164 | The static call graph can be constructed from the source text | |
165 | of the program. | |
166 | However, discovering the static call graph from the source text | |
167 | would require two moderately difficult steps: | |
168 | finding the source text for the program | |
169 | (which may not be available), | |
170 | and scanning and parsing that text, | |
171 | which may be in any one of a number of languages. | |
172 | In our programming system, | |
173 | the static calling information is also contained in the | |
174 | executable version of the program, | |
175 | which we already have available, | |
176 | and which is in language-independent form. | |
177 | One can examine the instructions | |
178 | in the object program, | |
179 | looking for calls to routines, and note which | |
180 | routines can be called. | |
181 | This technique allows us to add arcs to those already in the | |
182 | dynamic call graph. | |
183 | If a statically discovered arc already exists in the dynamic call | |
184 | graph, no action is required. | |
185 | Statically discovered arcs that do not exist in the dynamic call | |
186 | graph are added to the graph with a traversal count of zero. | |
187 | Thus they are never responsible for any time propagation. | |
188 | However, they may affect the structure of the graph. | |
189 | Since they may complete strongly connected components, | |
190 | the static call graph construction is | |
191 | done before topological ordering. |