Commit | Line | Data |
---|---|---|
86530b38 AT |
1 | profile.doc last updated 6/23/94 [by Guido] |
2 | ||
3 | PROFILER DOCUMENTATION and (mini) USER'S MANUAL | |
4 | ||
5 | Copyright 1994, by InfoSeek Corporation, all rights reserved. | |
6 | Written by James Roskind | |
7 | ||
8 | Permission to use, copy, modify, and distribute this Python software | |
9 | and its associated documentation for any purpose (subject to the | |
10 | restriction in the following sentence) without fee is hereby granted, | |
11 | provided that the above copyright notice appears in all copies, and | |
12 | that both that copyright notice and this permission notice appear in | |
13 | supporting documentation, and that the name of InfoSeek not be used in | |
14 | advertising or publicity pertaining to distribution of the software | |
15 | without specific, written prior permission. This permission is | |
16 | explicitly restricted to the copying and modification of the software | |
17 | to remain in Python, compiled Python, or other languages (such as C) | |
18 | wherein the modified or derived code is exclusively imported into a | |
19 | Python module. | |
20 | ||
21 | INFOSEEK CORPORATION DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS | |
22 | SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND | |
23 | FITNESS. IN NO EVENT SHALL INFOSEEK CORPORATION BE LIABLE FOR ANY | |
24 | SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER | |
25 | RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF | |
26 | CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN | |
27 | CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. | |
28 | ||
29 | ||
30 | The profiler was written after only programming in Python for 3 weeks. | |
31 | As a result, it is probably clumsy code, but I don't know for sure yet | |
32 | 'cause I'm a beginner :-). I did work hard to make the code run fast, | |
33 | so that profiling would be a reasonable thing to do. I tried not to | |
34 | repeat code fragments, but I'm sure I did some stuff in really awkward | |
35 | ways at times. Please send suggestions for improvements to: | |
36 | jar@infoseek.com. I won't promise *any* support. ...but I'd | |
37 | appreciate the feedback. | |
38 | ||
39 | ||
40 | SECTION HEADING LIST: | |
41 | INTRODUCTION | |
42 | HOW IS THIS profile DIFFERENT FROM THE OLD profile MODULE? | |
43 | INSTANT USERS MANUAL | |
44 | WHAT IS DETERMINISTIC PROFILING? | |
45 | REFERENCE MANUAL | |
46 | FUNCTION profile.run(string, filename_opt) | |
47 | CLASS Stats(filename, ...) | |
48 | METHOD strip_dirs() | |
49 | METHOD add(filename, ...) | |
50 | METHOD sort_stats(key, ...) | |
51 | METHOD reverse_order() | |
52 | METHOD print_stats(restriction, ...) | |
53 | METHOD print_callers(restrictions, ...) | |
54 | METHOD print_callees(restrictions, ...) | |
55 | METHOD ignore() | |
56 | LIMITATIONS | |
57 | CALIBRATION | |
58 | EXTENSIONS: Deriving Better Profilers | |
59 | ||
60 | ||
61 | ||
62 | INTRODUCTION | |
63 | ||
64 | A "profiler" is a program that describes the run time performance of a | |
65 | program, providing a variety of statistics. This documentation | |
66 | describes the profiler functionality provided in the modules | |
67 | "profile" and "pstats." This profiler provides "deterministic | |
68 | profiling" of any Python programs. It also provides a series of | |
69 | report generation tools to allow users to rapidly examine the results | |
70 | of a profile operation. | |
71 | ||
72 | ||
73 | HOW IS THIS profile DIFFERENT FROM THE OLD profile MODULE? | |
74 | ||
75 | The big changes from standard profiling module are that you get more | |
76 | information, and you pay less CPU time. It's not a trade-off, it's a | |
77 | trade-up. | |
78 | ||
79 | To be specific: | |
80 | ||
81 | bugs removed: local stack frame is no longer molested, execution time | |
82 | is now charged to correct functions, .... | |
83 | ||
84 | accuracy increased: profiler execution time is no longer charged to | |
85 | user's code, calibration for platform is supported, file reads | |
86 | are not done *by* profiler *during* profiling (and charged to | |
87 | user's code!), ... | |
88 | ||
89 | speed increased: Overhead CPU cost was reduced by more than a factor of | |
90 | two (perhaps a factor of five), lightweight profiler module is | |
91 | all that must be loaded, and the report generating module | |
92 | (pstats) is not needed during profiling. | |
93 | ||
94 | recursive functions support: cumulative times in recursive functions | |
95 | are correctly calculated; recursive entries are counted; ... | |
96 | ||
97 | large growth in report generating UI: distinct profiles runs can be added | |
98 | together forming a comprehensive report; functions that import | |
99 | statistics take arbitrary lists of files; sorting criteria is now | |
100 | based on keywords (instead of 4 integer options); reports shows | |
101 | what functions were profiled as well as what profile file was | |
102 | referenced; output format has been improved, ... | |
103 | ||
104 | ||
105 | INSTANT USERS MANUAL | |
106 | ||
107 | This section is provided for users that "don't want to read the | |
108 | manual." It provides a very brief overview, and allows a user to | |
109 | rapidly perform profiling on an existing application. | |
110 | ||
111 | To profile an application with a main entry point of "foo()", you | |
112 | would add the following to your module: | |
113 | ||
114 | import profile | |
115 | profile.run("foo()") | |
116 | ||
117 | The above action would cause "foo()" to be run, and a series of | |
118 | informative lines (the profile) to be printed. The above approach is | |
119 | most useful when working with the interpreter. If you would like to | |
120 | save the results of a profile into a file for later examination, you | |
121 | can supply a file name as the second argument to the run() function: | |
122 | ||
123 | import profile | |
124 | profile.run("foo()", 'fooprof') | |
125 | ||
126 | When you wish to review the profile, you should use the methods in the | |
127 | pstats module. Typically you would load the statistics data as | |
128 | follows: | |
129 | ||
130 | import pstats | |
131 | p = pstats.Stats('fooprof') | |
132 | ||
133 | The class "Stats" (the above code just created an instance of this | |
134 | class) has a variety of methods for manipulating and printing the data | |
135 | that was just read into "p". When you ran profile.run() above, what | |
136 | was printed was the result of three method calls: | |
137 | ||
138 | p.strip_dirs().sort_stats(-1).print_stats() | |
139 | ||
140 | The first method removed the extraneous path from all the module | |
141 | names. The second method sorted all the entries according to the | |
142 | standard module/line/name string that is printed (this is to comply | |
143 | with the semantics of the old profiler). The third method printed out | |
144 | all the statistics. You might try the following sort calls: | |
145 | ||
146 | p.sort_stats('name') | |
147 | p.print_stats() | |
148 | ||
149 | The first call will actually sort the list by function name, and the | |
150 | second call will print out the statistics. The following are some | |
151 | interesting calls to experiment with: | |
152 | ||
153 | p.sort_stats('cumulative').print_stats(10) | |
154 | ||
155 | This sorts the profile by cumulative time in a function, and then only | |
156 | prints the ten most significant lines. If you want to understand what | |
157 | algorithms are taking time, the above line is what you would use. | |
158 | ||
159 | If you were looking to see what functions were looping a lot, and | |
160 | taking a lot of time, you would do: | |
161 | ||
162 | p.sort_stats('time').print_stats(10) | |
163 | ||
164 | to sort according to time spent within each function, and then print | |
165 | the statistics for the top ten functions. | |
166 | ||
167 | You might also try: | |
168 | ||
169 | p.sort_stats('file').print_stats('__init__') | |
170 | ||
171 | This will sort all the statistics by file name, and then print out | |
172 | statistics for only the class init methods ('cause they are spelled | |
173 | with "__init__" in them). As one final example, you could try: | |
174 | ||
175 | p.sort_stats('time', 'cum').print_stats(.5, 'init') | |
176 | ||
177 | This line sorts stats with a primary key of time, and a secondary key | |
178 | of cumulative time, and then prints out some of the statistics. To be | |
179 | specific, the list is first culled down to 50% (re: .5) of its | |
180 | original size, then only lines containing "init" are maintained, and | |
181 | that sub-sub-list is printed. | |
182 | ||
183 | If you wondered what functions called the above functions, you could | |
184 | now (p is still sorted according to the last criteria) do: | |
185 | ||
186 | p.print_callers(.5, 'init') | |
187 | ||
188 | and you would get a list of callers for each of the listed functions. | |
189 | ||
190 | If you want more functionality, you're going to have to read the | |
191 | manual (or guess) what the following functions do: | |
192 | ||
193 | p.print_callees() | |
194 | p.add('fooprof') | |
195 | ||
196 | ||
197 | WHAT IS DETERMINISTIC PROFILING? | |
198 | ||
199 | "Deterministic profiling" is meant to reflect the fact that all | |
200 | "function call", "function return", and "exception" events are | |
201 | monitored, and precise timings are made for the intervals between | |
202 | these events (during which time the user's code is executing). In | |
203 | contrast, "statistical profiling" (which is not done by this module) | |
204 | randomly samples the effective instruction pointer, and deduces where | |
205 | time is being spent. The latter technique traditionally involves less | |
206 | overhead (as the code does not need to be instrumented), but provides | |
207 | only relative indications of where time is being spent. | |
208 | ||
209 | In Python, since there is an interpreter active during execution, the | |
210 | presence of instrumented code is not required to do deterministic | |
211 | profiling. Python automatically provides a hook (optional callback) | |
212 | for each event. In addition, the interpreted nature of Python tends | |
213 | to add so much overhead to execution, that deterministic profiling | |
214 | tends to only add small processing overhead, in typical applications. | |
215 | The result is that deterministic profiling is not that expensive, but | |
216 | yet provides extensive run time statistics about the execution of a | |
217 | Python program. | |
218 | ||
219 | Call count statistics can be used to identify bugs in code (surprising | |
220 | counts), and to identify possible inline-expansion points (high call | |
221 | counts). Internal time statistics can be used to identify hot loops | |
222 | that should be carefully optimized. Cumulative time statistics should | |
223 | be used to identify high level errors in the selection of algorithms. | |
224 | Note that the unusual handling of cumulative times in this profiler | |
225 | allows statistics for recursive implementations of algorithms to be | |
226 | directly compared to iterative implementations. | |
227 | ||
228 | ||
229 | REFERENCE MANUAL | |
230 | ||
231 | The primary entry point for the profiler is the global function | |
232 | profile.run(). It is typically used to create any profile | |
233 | information. The reports are formatted and printed using methods for | |
234 | the class pstats.Stats. The following is a description of all of | |
235 | these standard entry points and functions. For a more in-depth view | |
236 | of some of the code, consider reading the later section on "Profiler | |
237 | Extensions," which includes discussion of how to derive "better" | |
238 | profilers from the classes presented, or reading the source code for | |
239 | these modules. | |
240 | ||
241 | ||
242 | FUNCTION profile.run(string, filename_opt) | |
243 | ||
244 | This function takes a single argument that has can be passed to the | |
245 | "exec" statement, and an optional file name. In all cases this | |
246 | routine attempts to "exec" its first argument, and gather profiling | |
247 | statistics from the execution. If no file name is present, then this | |
248 | function automatically prints a simple profiling report, sorted by the | |
249 | standard name string (file/line/function-name) that is presented in | |
250 | each line. The following is a typical output from such a call: | |
251 | ||
252 | cut here---- | |
253 | ||
254 | main() | |
255 | 2706 function calls (2004 primitive calls) in 4.504 CPU seconds | |
256 | ||
257 | Ordered by: standard name | |
258 | ||
259 | ncalls tottime percall cumtime percall filename:lineno(function) | |
260 | 2 0.006 0.003 0.953 0.477 pobject.py:75(save_objects) | |
261 | 43/3 0.533 0.012 0.749 0.250 pobject.py:99(evaluate) | |
262 | ... | |
263 | ||
264 | cut here---- | |
265 | ||
266 | The first line indicates that this profile was generated by the call: | |
267 | profile.run('main()'), and hence the exec'ed string is 'main()'. The | |
268 | second line indicates that 2706 calls were monitored. Of those calls, | |
269 | 2004 were "primitive." We define "primitive" to mean that the call | |
270 | was not induced via recursion. The next line: "Ordered by: standard | |
271 | name", indicates that the text string in the far right column was used | |
272 | to sort the output. The column headings include: | |
273 | ||
274 | "ncalls" for the number of calls, | |
275 | "tottime" for the total time spent in the given function | |
276 | (and excluding time made in calls to sub-functions), | |
277 | "percall" is the quotient of "tottime" divided by "ncalls" | |
278 | "cumtime" is the total time spent in this and all subfunctions | |
279 | (i.e., from invocation till exit). This figure is | |
280 | accurate *even* for recursive functions. | |
281 | "percall" is the quotient of "cumtime" divided by primitive | |
282 | calls | |
283 | "filename:lineno(function)" provides the respective data of | |
284 | each function | |
285 | ||
286 | When there are two numbers in the first column (e.g.: 43/3), then the | |
287 | latter is the number of primitive calls, and the former is the actual | |
288 | number of calls. Note that when the function does not recurse, these | |
289 | two values are the same, and only the single figure is printed. | |
290 | ||
291 | ||
292 | CLASS Stats(filename, ...) | |
293 | ||
294 | This class constructor creates an instance of a statistics object from | |
295 | a filename (or set of filenames). Stats objects are manipulated by | |
296 | methods, in order to print useful reports. | |
297 | ||
298 | The file selected by the above constructor must have been created by | |
299 | the corresponding version of profile. To be specific, there is *NO* | |
300 | file compatibility guaranteed with future versions of this profiler, | |
301 | and there is no compatibility with files produced by other profilers | |
302 | (e.g., the standard system profiler). | |
303 | ||
304 | If several files are provided, all the statistics for identical | |
305 | functions will be coalesced, so that an overall view of several | |
306 | processes can be considered in a single report. If additional files | |
307 | need to be combined with data in an existing Stats object, the add() | |
308 | method can be used. | |
309 | ||
310 | ||
311 | METHOD strip_dirs() | |
312 | ||
313 | This method for the Stats class removes all leading path information | |
314 | from file names. It is very useful in reducing the size of the | |
315 | printout to fit within (close to) 80 columns. This method modifies | |
316 | the object, and the striped information is lost. After performing a | |
317 | strip operation, the object is considered to have its entries in a | |
318 | "random" order, as it was just after object initialization and | |
319 | loading. If strip_dir() causes two function names to be | |
320 | indistinguishable (i.e., they are on the same line of the same | |
321 | filename, and have the same function name), then the statistics for | |
322 | these two entries are accumulated into a single entry. | |
323 | ||
324 | ||
325 | METHOD add(filename, ...) | |
326 | ||
327 | This methods of the Stats class accumulates additional profiling | |
328 | information into the current profiling object. Its arguments should | |
329 | refer to filenames created my the corresponding version of | |
330 | profile.run(). Statistics for identically named (re: file, line, | |
331 | name) functions are automatically accumulated into single function | |
332 | statistics. | |
333 | ||
334 | ||
335 | METHOD sort_stats(key, ...) | |
336 | ||
337 | This method modifies the Stats object by sorting it according to the | |
338 | supplied criteria. The argument is typically a string identifying the | |
339 | basis of a sort (example: "time" or "name"). | |
340 | ||
341 | When more than one key is provided, then additional keys are used as | |
342 | secondary criteria when there is equality in all keys selected | |
343 | before them. For example, sort_stats('name', 'file') will sort all | |
344 | the entries according to their function name, and resolve all ties | |
345 | (identical function names) by sorting by file name. | |
346 | ||
347 | Abbreviations can be used for any key names, as long as the | |
348 | abbreviation is unambiguous. The following are the keys currently | |
349 | defined: | |
350 | ||
351 | Valid Arg Meaning | |
352 | "calls" call count | |
353 | "cumulative" cumulative time | |
354 | "file" file name | |
355 | "module" file name | |
356 | "pcalls" primitive call count | |
357 | "line" line number | |
358 | "name" function name | |
359 | "nfl" name/file/line | |
360 | "stdname" standard name | |
361 | "time" internal time | |
362 | ||
363 | Note that all sorts on statistics are in descending order (placing most | |
364 | time consuming items first), where as name, file, and line number | |
365 | searches are in ascending order (i.e., alphabetical). The subtle | |
366 | distinction between "nfl" and "stdname" is that the standard name is a | |
367 | sort of the name as printed, which means that the embedded line | |
368 | numbers get compared in an odd way. For example, lines 3, 20, and 40 | |
369 | would (if the file names were the same) appear in the string order | |
370 | "20" "3" and "40". In contrast, "nfl" does a numeric compare of the | |
371 | line numbers. In fact, sort_stats("nfl") is the same as | |
372 | sort_stats("name", "file", "line"). | |
373 | ||
374 | For compatibility with the standard profiler, the numeric argument -1, | |
375 | 0, 1, and 2 are permitted. They are interpreted as "stdname", | |
376 | "calls", "time", and "cumulative" respectively. If this old style | |
377 | format (numeric) is used, only one sort key (the numeric key) will be | |
378 | used, and additionally arguments will be silently ignored. | |
379 | ||
380 | ||
381 | METHOD reverse_order() | |
382 | ||
383 | This method for the Stats class reverses the ordering of the basic | |
384 | list within the object. This method is provided primarily for | |
385 | compatibility with the standard profiler. Its utility is questionable | |
386 | now that ascending vs descending order is properly selected based on | |
387 | the sort key of choice. | |
388 | ||
389 | ||
390 | METHOD print_stats(restriction, ...) | |
391 | ||
392 | This method for the Stats class prints out a report as described in | |
393 | the profile.run() definition. | |
394 | ||
395 | The order of the printing is based on the last sort_stats() operation | |
396 | done on the object (subject to caveats in add() and strip_dirs()). | |
397 | ||
398 | The arguments provided (if any) can be used to limit the list down to | |
399 | the significant entries. Initially, the list is taken to be the | |
400 | complete set of profiled functions. Each restriction is either an | |
401 | integer (to select a count of lines), or a decimal fraction between | |
402 | 0.0 and 1.0 inclusive (to select a percentage of lines), or a regular | |
403 | expression (to pattern match the standard name that is printed). If | |
404 | several restrictions are provided, then they are applied sequentially. | |
405 | For example: | |
406 | ||
407 | print_stats(.1, "foo:") | |
408 | ||
409 | would first limit the printing to first 10% of list, and then only | |
410 | print functions that were part of filename ".*foo:". In contrast, the | |
411 | command: | |
412 | ||
413 | print_stats("foo:", .1) | |
414 | ||
415 | would limit the list to all functions having file names ".*foo:", and | |
416 | then proceed to only print the first 10% of them. | |
417 | ||
418 | ||
419 | METHOD print_callers(restrictions, ...) | |
420 | ||
421 | This method for the Stats class prints a list of all functions that | |
422 | called each function in the profiled database. The ordering is | |
423 | identical to that provided by print_stats(), and the definition of the | |
424 | restricting argument is also identical. For convenience, a number is | |
425 | shown in parentheses after each caller to show how many times this | |
426 | specific call was made. A second non-parenthesized number is the | |
427 | cumulative time spent in the function at the right. | |
428 | ||
429 | ||
430 | METHOD print_callees(restrictions, ...) | |
431 | ||
432 | This method for the Stats class prints a list of all function that | |
433 | were called by the indicated function. Aside from this reversal of | |
434 | direction of calls (re: called vs was called by), the arguments and | |
435 | ordering are identical to the print_callers() method. | |
436 | ||
437 | ||
438 | METHOD ignore() | |
439 | ||
440 | This method of the Stats class is used to dispose of the value | |
441 | returned by earlier methods. All standard methods in this class | |
442 | return the instance that is being processed, so that the commands can | |
443 | be strung together. For example: | |
444 | ||
445 | pstats.Stats('foofile').strip_dirs().sort_stats('cum').print_stats().ignore() | |
446 | ||
447 | would perform all the indicated functions, but it would not return | |
448 | the final reference to the Stats instance. | |
449 | ||
450 | ||
451 | ||
452 | ||
453 | LIMITATIONS | |
454 | ||
455 | There are two fundamental limitations on this profiler. The first is | |
456 | that it relies on the Python interpreter to dispatch "call", "return", | |
457 | and "exception" events. Compiled C code does not get interpreted, | |
458 | and hence is "invisible" to the profiler. All time spent in C code | |
459 | (including builtin functions) will be charged to the Python function | |
460 | that was invoked the C code. IF the C code calls out to some native | |
461 | Python code, then those calls will be profiled properly. | |
462 | ||
463 | The second limitation has to do with accuracy of timing information. | |
464 | There is a fundamental problem with deterministic profilers involving | |
465 | accuracy. The most obvious restriction is that the underlying "clock" | |
466 | is only ticking at a rate (typically) of about .001 seconds. Hence no | |
467 | measurements will be more accurate than that underlying clock. If | |
468 | enough measurements are taken, then the "error" will tend to average | |
469 | out. Unfortunately, removing this first error induces a second source | |
470 | of error... | |
471 | ||
472 | The second problem is that it "takes a while" from when an event is | |
473 | dispatched until the profiler's call to get the time actually *gets* | |
474 | the state of the clock. Similarly, there is a certain lag when | |
475 | exiting the profiler event handler from the time that the clock's | |
476 | value was obtained (and then squirreled away), until the user's code | |
477 | is once again executing. As a result, functions that are called many | |
478 | times, or call many functions, will typically accumulate this error. | |
479 | The error that accumulates in this fashion is typically less than the | |
480 | accuracy of the clock (i.e., less than one clock tick), but it *can* | |
481 | accumulate and become very significant. This profiler provides a | |
482 | means of calibrating itself for a give platform so that this error can | |
483 | be probabilistically (i.e., on the average) removed. After the | |
484 | profiler is calibrated, it will be more accurate (in a least square | |
485 | sense), but it will sometimes produce negative numbers (when call | |
486 | counts are exceptionally low, and the gods of probability work against | |
487 | you :-). ) Do *NOT* be alarmed by negative numbers in the profile. | |
488 | They should *only* appear if you have calibrated your profiler, and | |
489 | the results are actually better than without calibration. | |
490 | ||
491 | ||
492 | CALIBRATION | |
493 | ||
494 | The profiler class has a hard coded constant that is added to each | |
495 | event handling time to compensate for the overhead of calling the time | |
496 | function, and socking away the results. The following procedure can | |
497 | be used to obtain this constant for a given platform (see discussion | |
498 | in LIMITATIONS above). | |
499 | ||
500 | import profile | |
501 | pr = profile.Profile() | |
502 | pr.calibrate(100) | |
503 | pr.calibrate(100) | |
504 | pr.calibrate(100) | |
505 | ||
506 | The argument to calibrate() is the number of times to try to do the | |
507 | sample calls to get the CPU times. If your computer is *very* fast, | |
508 | you might have to do: | |
509 | ||
510 | pr.calibrate(1000) | |
511 | ||
512 | or even: | |
513 | ||
514 | pr.calibrate(10000) | |
515 | ||
516 | The object of this exercise is to get a fairly consistent result. | |
517 | When you have a consistent answer, you are ready to use that number in | |
518 | the source code. For a Sun Sparcstation 1000 running Solaris 2.3, the | |
519 | magical number is about .00053. If you have a choice, you are better | |
520 | off with a smaller constant, and your results will "less often" show | |
521 | up as negative in profile statistics. | |
522 | ||
523 | The following shows how the trace_dispatch() method in the Profile | |
524 | class should be modified to install the calibration constant on a Sun | |
525 | Sparcstation 1000: | |
526 | ||
527 | def trace_dispatch(self, frame, event, arg): | |
528 | t = self.timer() | |
529 | t = t[0] + t[1] - self.t - .00053 # Calibration constant | |
530 | ||
531 | if self.dispatch[event](frame,t): | |
532 | t = self.timer() | |
533 | self.t = t[0] + t[1] | |
534 | else: | |
535 | r = self.timer() | |
536 | self.t = r[0] + r[1] - t # put back unrecorded delta | |
537 | return | |
538 | ||
539 | Note that if there is no calibration constant, then the line | |
540 | containing the callibration constant should simply say: | |
541 | ||
542 | t = t[0] + t[1] - self.t # no calibration constant | |
543 | ||
544 | You can also achieve the same results using a derived class (and the | |
545 | profiler will actually run equally fast!!), but the above method is | |
546 | the simplest to use. I could have made the profiler "self | |
547 | calibrating", but it would have made the initialization of the | |
548 | profiler class slower, and would have required some *very* fancy | |
549 | coding, or else the use of a variable where the constant .00053 was | |
550 | placed in the code shown. This is a ****VERY**** critical performance | |
551 | section, and there is no reason to use a variable lookup at this | |
552 | point, when a constant can be used. | |
553 | ||
554 | ||
555 | EXTENSIONS: Deriving Better Profilers | |
556 | ||
557 | The Profile class of profile was written so that derived classes | |
558 | could be developed to extend the profiler. Rather than describing all | |
559 | the details of such an effort, I'll just present the following two | |
560 | examples of derived classes that can be used to do profiling. If the | |
561 | reader is an avid Python programmer, then it should be possible to use | |
562 | these as a model and create similar (and perchance better) profile | |
563 | classes. | |
564 | ||
565 | If all you want to do is change how the timer is called, or which | |
566 | timer function is used, then the basic class has an option for that in | |
567 | the constructor for the class. Consider passing the name of a | |
568 | function to call into the constructor: | |
569 | ||
570 | pr = profile.Profile(your_time_func) | |
571 | ||
572 | The resulting profiler will call your time function instead of | |
573 | os.times(). The function should return either a single number, or a | |
574 | list of numbers (like what os.times() returns). If the function | |
575 | returns a single time number, or the list of returned numbers has | |
576 | length 2, then you will get an especially fast version of the dispatch | |
577 | routine. | |
578 | ||
579 | Be warned that you *should* calibrate the profiler class for the | |
580 | timer function that you choose. For most machines, a timer that | |
581 | returns a lone integer value will provide the best results in terms of | |
582 | low overhead during profiling. (os.times is *pretty* bad, 'cause it | |
583 | returns a tuple of floating point values, so all arithmetic is | |
584 | floating point in the profiler!). If you want to be substitute a | |
585 | better timer in the cleanest fashion, you should derive a class, and | |
586 | simply put in the replacement dispatch method that better handles your timer | |
587 | call, along with the appropriate calibration constant :-). | |
588 | ||
589 | ||
590 | cut here------------------------------------------------------------------ | |
591 | #**************************************************************************** | |
592 | # OldProfile class documentation | |
593 | #**************************************************************************** | |
594 | # | |
595 | # The following derived profiler simulates the old style profile, providing | |
596 | # errant results on recursive functions. The reason for the usefulness of this | |
597 | # profiler is that it runs faster (i.e., less overhead) than the old | |
598 | # profiler. It still creates all the caller stats, and is quite | |
599 | # useful when there is *no* recursion in the user's code. It is also | |
600 | # a lot more accurate than the old profiler, as it does not charge all | |
601 | # its overhead time to the user's code. | |
602 | #**************************************************************************** | |
603 | class OldProfile(Profile): | |
604 | def trace_dispatch_exception(self, frame, t): | |
605 | rt, rtt, rct, rfn, rframe, rcur = self.cur | |
606 | if rcur and not rframe is frame: | |
607 | return self.trace_dispatch_return(rframe, t) | |
608 | return 0 | |
609 | ||
610 | def trace_dispatch_call(self, frame, t): | |
611 | fn = `frame.f_code` | |
612 | ||
613 | self.cur = (t, 0, 0, fn, frame, self.cur) | |
614 | if self.timings.has_key(fn): | |
615 | tt, ct, callers = self.timings[fn] | |
616 | self.timings[fn] = tt, ct, callers | |
617 | else: | |
618 | self.timings[fn] = 0, 0, {} | |
619 | return 1 | |
620 | ||
621 | def trace_dispatch_return(self, frame, t): | |
622 | rt, rtt, rct, rfn, frame, rcur = self.cur | |
623 | rtt = rtt + t | |
624 | sft = rtt + rct | |
625 | ||
626 | pt, ptt, pct, pfn, pframe, pcur = rcur | |
627 | self.cur = pt, ptt+rt, pct+sft, pfn, pframe, pcur | |
628 | ||
629 | tt, ct, callers = self.timings[rfn] | |
630 | if callers.has_key(pfn): | |
631 | callers[pfn] = callers[pfn] + 1 | |
632 | else: | |
633 | callers[pfn] = 1 | |
634 | self.timings[rfn] = tt+rtt, ct + sft, callers | |
635 | ||
636 | return 1 | |
637 | ||
638 | ||
639 | def snapshot_stats(self): | |
640 | self.stats = {} | |
641 | for func in self.timings.keys(): | |
642 | tt, ct, callers = self.timings[func] | |
643 | nor_func = self.func_normalize(func) | |
644 | nor_callers = {} | |
645 | nc = 0 | |
646 | for func_caller in callers.keys(): | |
647 | nor_callers[self.func_normalize(func_caller)]=\ | |
648 | callers[func_caller] | |
649 | nc = nc + callers[func_caller] | |
650 | self.stats[nor_func] = nc, nc, tt, ct, nor_callers | |
651 | ||
652 | ||
653 | ||
654 | #**************************************************************************** | |
655 | # HotProfile class documentation | |
656 | #**************************************************************************** | |
657 | # | |
658 | # This profiler is the fastest derived profile example. It does not | |
659 | # calculate caller-callee relationships, and does not calculate cumulative | |
660 | # time under a function. It only calculates time spent in a function, so | |
661 | # it runs very quickly (re: very low overhead). In truth, the basic | |
662 | # profiler is so fast, that is probably not worth the savings to give | |
663 | # up the data, but this class still provides a nice example. | |
664 | #**************************************************************************** | |
665 | class HotProfile(Profile): | |
666 | def trace_dispatch_exception(self, frame, t): | |
667 | rt, rtt, rfn, rframe, rcur = self.cur | |
668 | if rcur and not rframe is frame: | |
669 | return self.trace_dispatch_return(rframe, t) | |
670 | return 0 | |
671 | ||
672 | def trace_dispatch_call(self, frame, t): | |
673 | self.cur = (t, 0, frame, self.cur) | |
674 | return 1 | |
675 | ||
676 | def trace_dispatch_return(self, frame, t): | |
677 | rt, rtt, frame, rcur = self.cur | |
678 | ||
679 | rfn = `frame.f_code` | |
680 | ||
681 | pt, ptt, pframe, pcur = rcur | |
682 | self.cur = pt, ptt+rt, pframe, pcur | |
683 | ||
684 | if self.timings.has_key(rfn): | |
685 | nc, tt = self.timings[rfn] | |
686 | self.timings[rfn] = nc + 1, rt + rtt + tt | |
687 | else: | |
688 | self.timings[rfn] = 1, rt + rtt | |
689 | ||
690 | return 1 | |
691 | ||
692 | ||
693 | def snapshot_stats(self): | |
694 | self.stats = {} | |
695 | for func in self.timings.keys(): | |
696 | nc, tt = self.timings[func] | |
697 | nor_func = self.func_normalize(func) | |
698 | self.stats[nor_func] = nc, nc, tt, 0, {} | |
699 | ||
700 | ||
701 | ||
702 | cut here------------------------------------------------------------------ |