[unix-history] / usr / src / usr.bin / gprof / PSD.doc / postp.me

\"	@(#)postp.me	1.6 %G%
.EQ
delim ##
gsize 12
.EN
.sh 1 "Post Processing"
.pp
Having gathered the arcs of the call graph and timing information
for an execution of the program,
we are interested in attributing the time for each routine to the
routines that call it.
We construct a dynamic call graph with arcs from caller to callee,
and propagate time from descendants to ancestors
by topologically sorting the call graph.
Time propagation is performed from the leaves of the
call graph toward the roots, according to the order
assigned by a topological numbering algorithm.
The topological numbering ensures that
all edges in the graph go from higher numbered nodes to lower
numbered nodes.
If we propagate time from nodes in the
reverse order assigned by the algorithm,
execution time can be propagated from descendants to ancestors
after a single traversal of each arc in the call graph.
Each parent receives some fraction of a child's time.
Thus time is charged to the
caller in addition to being charged to the callee.
.(b
.TS
center;
c s c c s.
8		9	


.T&
c c c c c.
	3		7


2		5		6


	1		4
.TE
.ce 1
Topological ordering
.ce 0
.)b
.pp
Let #C sub e# be the number of calls to some routine,
#e#, and #C sub e sup r# be the number of
calls from a caller #r# to a callee #e#.
Since we are assuming each call to a routine takes the
average amount of time for all calls to that routine,
the caller is accountable for
#C sub e sup r# \(sl #C sub e#
of the time spent by the callee.
Let the #S sub e# be the #selftime# of a routine, #e#.
The selftime of a routine can be determined from the
timing information gathered during profiled program execution.
The total time, #T sub r#, we wish to account to a routine
#r#, is then given by the recurence equation:
.EQ
T sub r ~ = ~ {S sub r} ~ + ~
                   sum from {r ~ roman CALLS ~ e}
                   {T sub e \(mu {{C sub e sup r} over {C sub e}}}
.EN
where #r ~ roman CALLS ~ e# is a relation indicating all routines
#e# called by a routine #r#.
This relation is easily available from the call graph.
.pp
However, if the execution contains recursive calls,
the call graph has cycles that
cannot be topologically sorted.
In these cases, we discover strongly-connected
components in the call graph,
treat each such component as a single node,
and then sort the resulting graph.
We use a variation of Tarjan's strongly-connected
components algorithm
that discovers strongly-connected components as it is assigning
topological order numbers [Tarjan72].
.pp
Time propagation within strongly connected
components is a problem.
For example, a self-recursive routine
(a trivial cycle in the call graph)
is accountable for all the time it
uses in all its recursive instantiations.
In our scheme, this time should be
shared among its call graph parents.
The arcs from a routine to itself are of interest,
but do not participate in time propagation.
Thus the simple equation for time propagation
does not work within strongly connected components.
Time is not propagated from one member of a cycle to another,
since, by definition, this involves propagating time from a routine
to itself.
In addition, children of one member of a cycle
must be considered children of all members of the cycle.
Similarly, parents of one member of the cycle must inherit
all members of the cycle as descendants.
It is for these reasons that we collapse connected components.
Our solution collects all members of a cycle together,
summing the time and call counts for all members.
All calls into the cycle are made to share the total 
time of the cycle, and all descendants of the cycle
propagate time into the cycle as a whole.
Calls among the members of the cycle 
do not propagate any time,
though they are listed in the call graph profile.
.(b
.TS
center;
c s c c s.
o		o	


.T&
c c c c c.
	o		o


o		o		o


	o		o
.TE
.ce 1
Cycle to be collapsed.
.)b
.(b
.TS
center;
c s c c s.
7		8	


.T&
c c c c c.
	6		6


2		4		5


	1		3
.TE
.ce 1
Topological numbering after cycle collapsing.
.ce 0
.)b
.pp
Since the technique described above only collects the
dynamic call graph,
and the program typically does not call every routine
on each execution,
different executions can introduce different cycles in the
dynamic call graph.
Since cycles often have a significant effect on time propagation,
it is desirable to incorporate the static call graph so that cycles
will have the same members regardless of how the program runs.
.pp
The static call graph can be constructed from the source text
of the program.
However, discovering the static call graph from the source text
would require two moderately difficult steps:
finding the source text for the program
(which may not be available),
and scanning and parsing that text,
which may be in any one of a number of languages.
In our programming system,
the static calling information is also contained in the 
executable version of the program,
which we already have available,
and which is in language-independent form.
One can examine the instructions
in the object program,
looking for calls to routines, and note which
routines can be called.
This technique allows us to add arcs to those already in the
dynamic call graph.
If a statically discovered arc already exists in the dynamic call
graph, no action is required.
Statically discovered arcs that do not exist in the dynamic call
graph are added to the graph with a traversal count of zero.
Thus they are never responsible for any time propagation.
However, they may affect the structure of the graph.
Since they may complete strongly connected components,
the static call graph construction is
done before topological ordering.
Commit	Line	Data
98e01bad	1	\" @(#)postp.me 1.6 %G%
30a96129 PK	2	.EQ
30a96129 PK	3	delim ##
98e01bad	4	gsize 12
30a96129	5	.EN
819c1fed PK	6	.sh 1 "Post Processing"
	7	.pp
	8	Having gathered the arcs of the call graph and timing information
	9	for an execution of the program,
	10	we are interested in attributing the time for each routine to the
	11	routines that call it.
	12	We construct a dynamic call graph with arcs from caller to callee,
	13	and propagate time from descendants to ancestors
	14	by topologically sorting the call graph.
28f2ab56 PK	15	Time propagation is performed from the leaves of the
	16	call graph toward the roots, according to the order
	17	assigned by a topological numbering algorithm.
	18	The topological numbering ensures that
	19	all edges in the graph go from higher numbered nodes to lower
	20	numbered nodes.
	21	If we propagate time from nodes in the
	22	reverse order assigned by the algorithm,
	23	execution time can be propagated from descendants to ancestors
	24	after a single traversal of each arc in the call graph.
	25	Each parent receives some fraction of a child's time.
	26	Thus time is charged to the
	27	caller in addition to being charged to the callee.
	28	.(b
30a96129 PK	29	.TS
	30	center;
	31	c s c c s.
28f2ab56	32	8 9
30a96129 PK	33
	34
	35	.T&
	36	c c c c c.
	37	3 7
	38
	39
	40	2 5 6
	41
	42
	43	1 4
	44	.TE
b8b92e41 PK	45	.ce 1
	46	Topological ordering
	47	.ce 0
28f2ab56	48	.)b
bf3fc937	49	.pp
30a96129 PK	50	Let #C sub e# be the number of calls to some routine,
	51	#e#, and #C sub e sup r# be the number of
	52	calls from a caller #r# to a callee #e#.
	53	Since we are assuming each call to a routine takes the
	54	average amount of time for all calls to that routine,
	55	the caller is accountable for
	56	#C sub e sup r# \(sl #C sub e#
	57	of the time spent by the callee.
	58	Let the #S sub e# be the #selftime# of a routine, #e#.
	59	The selftime of a routine can be determined from the
	60	timing information gathered during profiled program execution.
	61	The total time, #T sub r#, we wish to account to a routine
	62	#r#, is then given by the recurence equation:
	63	.EQ
	64	T sub r ~ = ~ {S sub r} ~ + ~
	65	sum from {r ~ roman CALLS ~ e}
	66	{T sub e \(mu {{C sub e sup r} over {C sub e}}}
	67	.EN
	68	where #r ~ roman CALLS ~ e# is a relation indicating all routines
	69	#e# called by a routine #r#.
	70	This relation is easily available from the call graph.
	71	.pp
819c1fed PK	72	However, if the execution contains recursive calls,
	73	the call graph has cycles that
	74	cannot be topologically sorted.
bf3fc937	75	In these cases, we discover strongly-connected
819c1fed PK	76	components in the call graph,
	77	treat each such component as a single node,
	78	and then sort the resulting graph.
	79	We use a variation of Tarjan's strongly-connected
	80	components algorithm
	81	that discovers strongly-connected components as it is assigning
	82	topological order numbers [Tarjan72].
819c1fed PK	83	.pp
	84	Time propagation within strongly connected
	85	components is a problem.
	86	For example, a self-recursive routine
	87	(a trivial cycle in the call graph)
	88	is accountable for all the time it
	89	uses in all its recursive instantiations.
	90	In our scheme, this time should be
	91	shared among its call graph parents.
	92	The arcs from a routine to itself are of interest,
	93	but do not participate in time propagation.
28f2ab56 PK	94	Thus the simple equation for time propagation
28f2ab56 PK	95	does not work within strongly connected components.
819c1fed PK	96	Time is not propagated from one member of a cycle to another,
	97	since, by definition, this involves propagating time from a routine
	98	to itself.
	99	In addition, children of one member of a cycle
	100	must be considered children of all members of the cycle.
	101	Similarly, parents of one member of the cycle must inherit
	102	all members of the cycle as descendants.
	103	It is for these reasons that we collapse connected components.
	104	Our solution collects all members of a cycle together,
	105	summing the time and call counts for all members.
	106	All calls into the cycle are made to share the total
	107	time of the cycle, and all descendants of the cycle
	108	propagate time into the cycle as a whole.
	109	Calls among the members of the cycle
	110	do not propagate any time,
	111	though they are listed in the call graph profile.
28f2ab56	112	.(b
30a96129 PK	113	.TS
	114	center;
	115	c s c c s.
28f2ab56	116	o o
30a96129 PK	117
	118
	119	.T&
	120	c c c c c.
b8b92e41	121	o o
30a96129 PK	122
30a96129 PK	123
b8b92e41	124	o o o
30a96129 PK	125
30a96129 PK	126
b8b92e41 PK	127	o o
	128	.TE
	129	.ce 1
	130	Cycle to be collapsed.
28f2ab56 PK	131	.)b
28f2ab56 PK	132	.(b
b8b92e41 PK	133	.TS
	134	center;
	135	c s c c s.
28f2ab56	136	7 8
b8b92e41 PK	137
	138
	139	.T&
	140	c c c c c.
	141	6 6
	142
	143
	144	2 4 5
	145
	146
	147	1 3
30a96129	148	.TE
b8b92e41 PK	149	.ce 1
	150	Topological numbering after cycle collapsing.
	151	.ce 0
28f2ab56	152	.)b
819c1fed PK	153	.pp
	154	Since the technique described above only collects the
	155	dynamic call graph,
	156	and the program typically does not call every routine
	157	on each execution,
	158	different executions can introduce different cycles in the
	159	dynamic call graph.
	160	Since cycles often have a significant effect on time propagation,
	161	it is desirable to incorporate the static call graph so that cycles
	162	will have the same members regardless of how the program runs.
	163	.pp
	164	The static call graph can be constructed from the source text
	165	of the program.
	166	However, discovering the static call graph from the source text
	167	would require two moderately difficult steps:
	168	finding the source text for the program
	169	(which may not be available),
	170	and scanning and parsing that text,
	171	which may be in any one of a number of languages.
	172	In our programming system,
	173	the static calling information is also contained in the
	174	executable version of the program,
	175	which we already have available,
	176	and which is in language-independent form.
	177	One can examine the instructions
	178	in the object program,
	179	looking for calls to routines, and note which
	180	routines can be called.
	181	This technique allows us to add arcs to those already in the
	182	dynamic call graph.
	183	If a statically discovered arc already exists in the dynamic call
	184	graph, no action is required.
	185	Statically discovered arcs that do not exist in the dynamic call
	186	graph are added to the graph with a traversal count of zero.
	187	Thus they are never responsible for any time propagation.
	188	However, they may affect the structure of the graph.
	189	Since they may complete strongly connected components,
	190	the static call graph construction is
	191	done before topological ordering.