Commit | Line | Data |
---|---|---|
920dae64 AT |
1 | \documentclass[10pt]{article} |
2 | ||
3 | \usepackage{rqdefs} \usepackage{rqfullpage} \usepackage{utopia} | |
4 | \usepackage{rqcode} | |
5 | ||
6 | %% Example ltoh commands (start with %-ltoh-) | |
7 | %-ltoh- title := The RS trace format, aka RST | |
8 | %-ltoh- :{}:\gold:<font color=gold>:</>: | |
9 | %-ltoh- :comm:\salsa:<strong>salsa</strong>:: | |
10 | ||
11 | \rqfullpageD | |
12 | ||
13 | \begin{document} | |
14 | ||
15 | \rqtitle{The RST Trace format} | |
16 | \rqsubtitle{AAD Tools, last updated \today} | |
17 | ||
18 | \toc | |
19 | ||
20 | A HTML version of this document is available at | |
21 | \rqlink{http://ppgweb.eng/archperf/rstFormat.html}. | |
22 | ||
23 | The master workspace for RST is \textff{/import/archperf/ws/rstf}. This | |
24 | workspace contains source to generate PS and PDF versions of this | |
25 | document. A (possibly out of date) PostScript version of this document | |
26 | is available at \texttt{/home/quong/proj/rstf/rstFormat.ps}. | |
27 | ||
28 | \section{Other Useful links} | |
29 | ||
30 | \begin{tabular}{|l|l|} \hline | |
31 | What & URL \\ \hline | |
32 | List of known traces & | |
33 | \rqlink{http://traces.eng/} \\ \hline | |
34 | ArchTools Trace FAQ & | |
35 | \rqlink{http://ppgweb.eng/archperf/traceFAQ.html} \\ \hline | |
36 | The RST trace format & | |
37 | \rqlink{http://ppgweb.eng/archperf/rstFormat.html} \\ \hline | |
38 | The atrace tool/format & | |
39 | \rqlink{http://muskoka.eng/\rqtilde{}bmc/atrace/} \\ \hline | |
40 | Converting atrace to RST & | |
41 | \rqlink{http://ppgweb.eng/\rqtilde{}quong/atrace2rst.html} \\ \hline | |
42 | Converting bustraces to RST & | |
43 | \rqlink{http://ppgweb.eng/\rqtilde{}quong/bustrace.html} \\ \hline | |
44 | Instruction Trace validation & | |
45 | \rqlink{http://ppgweb.eng/archperf/trace-validation2002.html} \\ \hline | |
46 | Blaze web page & | |
47 | \rqlink{http://ppgweb.eng/archperf/blaze.html} \\ \hline | |
48 | Blaze user guide & | |
49 | \rqlink{http://ppgweb.eng/\rqtilde{}quong/blaze-userguide.html} \\ \hline | |
50 | Blaze TPCC trace & | |
51 | \rqlink{http://ppgweb.eng/\rqtilde{}quong/blaze-tpcc-try500.html} \\ \hline | |
52 | \end{tabular} | |
53 | ||
54 | The trace FAQ answers many questions on how to analyze/process RST | |
55 | traces. Unfortunately, there are many issues that could reasonably | |
56 | belong in the trace FAQ page or this page. We (RQ) have tried to limit | |
57 | this web page to RST format definition issues, but the need for examples | |
58 | causes this web page to overlap with the trace FAQ. | |
59 | ||
60 | \section{What is RST?} | |
61 | ||
62 | RST is short for RS Trace format, which is a format for computer | |
63 | architecture traces. RS stands for "Really Simple" or "Russell's Simple", | |
64 | depending on who you talk to (or "to whom you talk" if you hate dangling | |
65 | particples). An RST trace consists of fixed-length records (24 bytes) | |
66 | in which the first byte of each record, known as the \textsl{rtype}, | |
67 | specifies the type. These two properties ensure that an RST trace is | |
68 | easy to decode both now and in the future. In particular if your old | |
69 | analyzer sees a new \textsl{rtype} it does not understand, your code can | |
70 | simply skip that record, ensuring forward compatibility. | |
71 | ||
72 | There are many different kinds of RST records types, including | |
73 | \begin{rqitemize}{0em} | |
74 | \item instructions | |
75 | \item events (traps, interrupts) | |
76 | \item MMU state (changes to the TLB, PA-VA diffs) | |
77 | \item string data | |
78 | \item internal processor state (register dumps) | |
79 | \item high-level evnts (process/context switchs, thread switch) | |
80 | \item markers (timestamp, current CPU) | |
81 | \item state (cache/memory state) | |
82 | \item define your own | |
83 | \end{rqitemize} | |
84 | ||
85 | Here is an example of 4 records in an RST trace: pavadiff, instr, instr, | |
86 | and trap. Again, each record is the same size and the \textsl{rtype} | |
87 | indicates the record type. | |
88 | ||
89 | \begin{rqcode}{\small} | |
90 | +================+ | |
91 | |rtype=PAVADIFFT | | |
92 | | i/d contexts | | |
93 | | PA-VA for PC | | |
94 | | PA-VA for EA | | |
95 | +================+ | |
96 | |rtype=INSTR\_T | | |
97 | | flags+instr | | |
98 | | PC (VA) | | |
99 | | EA (VA) | | |
100 | +================+ | |
101 | |rtype=INSTR\_T | | |
102 | | flags+instr | | |
103 | | PC (VA) | | |
104 | | EA (VA) | | |
105 | +================+ | |
106 | |rtype=TRAP\_T | | |
107 | | trap type | | |
108 | | trap level | | |
109 | | | | |
110 | +================+ | |
111 | \end{rqcode} | |
112 | ||
113 | ||
114 | \section{Why should I use RST?} | |
115 | ||
116 | Because RST it is simple, flexible, extensible and supported. It has | |
117 | provisions for MP traces, traps, events (snooping, DMA, etc), VA/PA, and | |
118 | time stamps, descriptive strings, and trace patching. Adding new types | |
119 | of data in an RST trace, is as simple as defining new rtypes. In short, | |
120 | RST was designed to all kinds of trace data for the next 5 years. | |
121 | ||
122 | There are numerous tools based on RST. There is an RST compressor, | |
123 | rstzip; used with gzip, we typically find compression rates of 18-40X | |
124 | and have seen compression of 200X. | |
125 | ||
126 | The \textss{rst-snapper} reads RST. The \textss{rstgen} is a shade | |
127 | analyzer which generates RST. The RST tracer module ( | |
128 | \textss{rstracer.so} ) for \rqhttp{blaze}{} geneerates rich RST traces. | |
129 | Additionally, snaps for the new (4/2000) version of Aztecs, the cycle | |
130 | accurate simulator for Cheetah/Jubatus, use RST instructions. | |
131 | ||
132 | \section{What tools exist for RST?} | |
133 | ||
134 | The following tools exist for handling RST traces. All binaries exist | |
135 | in \textss{/import/archperf/bin}. | |
136 | ||
137 | \begin{tabular}{|l|l|} \hline | |
138 | Tool & Description \\ \hline | |
139 | \textss{trv.sh} & view (un/compressed) RST trace \\ \hline | |
140 | \textss{trconv} & view RST trace in ASCII format \\ \hline | |
141 | \textss{rstFilter} & process an RST trace producing a new RST trace \\ \hline | |
142 | \textss{atr2rst.sh} & script to \rqhttp{convert atrace to | |
143 | RST}{atrace2rst.html} \\ \hline | |
144 | \textss{rstzip2} & De/Compressor tailored for MP RST instr traces \\ \hline | |
145 | \textss{rstzip} & De/Compressor tailored for RST instr traces (deprecated) \\ \hline | |
146 | \textss{rstsnap} & snap an RST trace for Aztecs \\ \hline | |
147 | \textss{rstgen} & generate an RST trace from shade (used for Spec2K) | |
148 | \\ \hline | |
149 | \textss{rstracer.so} & Blaze module to dump/generate RST traces \\ \hline | |
150 | \end{tabular} | |
151 | ||
152 | \subsection{RST Compression} | |
153 | ||
154 | A significant, but often overlooked, benefit of RST is an excellent | |
155 | compression algorithm. Typically, RST traces compress to about 1 bytes | |
156 | per instruction. The compressor was implemented by Kelvin Fong. | |
157 | ||
158 | Unfortunately, there are two incompatible RST compression formats, V1 | |
159 | and V2, requiring different de/compressors, \textss{rstzip} and | |
160 | \textss{rstzip2}, respectively. Version 2 RST compression supports MP | |
161 | traces and is the preferred compression method for all RST instruction | |
162 | traces after Aug 2001. Most RST traces before Jun 2001 were compressed | |
163 | with V1 compression (rstzip). | |
164 | ||
165 | \begin{tabular}{|l|l|l|} | |
166 | Format & Suffix & Description \\ | |
167 | V1 & \textss{rz.gz} \textss{rsz} \textss{rsz.gz} & Original 1P format \\ | |
168 | V2 & \textss{rz2.gz} & Support for MP and value tracing \\ | |
169 | \end{tabular} | |
170 | ||
171 | \subsection{Viewing an RST trace} | |
172 | ||
173 | Use \textss{trv.sh} or \textss{trconv}. This code was recently updated | |
174 | on 6/28/2000, so some flags may have changed. | |
175 | ||
176 | \begin{rqenumerate}{0em} | |
177 | \item An example run is \textss{trv.sh -n 1000 -s 200 trace45.rz2.gz | less}. | |
178 | ||
179 | \item An example run is \textss{trconv -n 1000 -s 20 file.rst}. | |
180 | ||
181 | \item There is on-line help for \textss{trconv}. | |
182 | ||
183 | \item In the \texttt{INSTR}, \texttt{PAVADIFF} and \texttt{PHYSADDR} | |
184 | records, there is a \textss{ea\_valid} field which indicates if the | |
185 | corresponding \textss{ea\_xxxx} field contains valid data. If | |
186 | \textss{ea\_valid} is false (0) then the \textss{ea\_xxxx} field must be | |
187 | ignored. | |
188 | ||
189 | Most of my programs (atr2rst, the blaze rstracer) which generate RST | |
190 | traces put a bogus, easily recognizable value in \textss{ea\_xxxx} if | |
191 | \textss{ea\_valid} is false. | |
192 | \end{rqenumerate} | |
193 | ||
194 | \section{Getting started analyzing a trace} | |
195 | ||
196 | In a typical trace analyzer (e.g. cache simulator) two records types | |
197 | suffice for most of what you want to do. The \textss{INSTR\_T} record | |
198 | gives you the instruction word, PC and optional EA. The | |
199 | \textss{PAVADIFF\_T} record lets you translate the VAs into PAs. For | |
200 | both the I and D references, the last seen \textss{PAVADIFF\_T} contains | |
201 | the current (PA-VA) difference values and the effective context used. | |
202 | ||
203 | Let's consider an example. Consider the following output from | |
204 | \textss{trconv}: | |
205 | ||
206 | \begin{verbatim} | |
207 | 8896 pavadiff: cpuid=0icontext=0 dcontext=0 pc_pa_va=0xffffffffff400000 ea_pa_va=0xffffffffff400000 ea_valid=1 | |
208 | 8897 instr : cpuid=0 p [0x0000000001085be0] lduw [%g2 + 0xf0], %g2 [0x00000000014000f0] | |
209 | 8898 instr : cpuid=0 p [0x0000000001085be4] srl %g2, 0xb, %g2 | |
210 | 8899 instr : cpuid=0 p [0x0000000001085be8] subcc %g2, 0, %g0 | |
211 | 8900 instr : cpuid=0 p [0x0000000001085bec] bpe,a,pt %icc, 0x1085c00 T [0x0000000001085c00] | |
212 | 8901 pavadiff: cpuid=0icontext=0 dcontext=0 pc_pa_va=0xffffffffff400000 ea_pa_va=0xfffffd5fe5c08000 ea_valid=1 | |
213 | 8902 instr : cpuid=0 p [0x0000000001085bf0] lduh [%g7 + 0x188], %g2 [0x000002a100225ec8] | |
214 | 8903 instr : cpuid=0 p [0x0000000001085c00] sra %g2, 0, %i1 | |
215 | 8904 instr : cpuid=0 p [0x0000000001085c04] call 0x1032420 T [0x0000000001032420] | |
216 | 8905 instr : cpuid=0 p [0x0000000001085c08] restore %g0, %g0, %g0 | |
217 | \end{verbatim} | |
218 | ||
219 | At record 8897, we have an LDUW instruction with PC=0x0000000001085be0. | |
220 | Thus the PC PA is 0x0000000001085be00 + 0xffffffffff400000. The | |
221 | PAVADIFF also indicates the I-context is 0, meaning we are in priviledge | |
222 | mode hence executing kernel code. (We could also determine this as the | |
223 | PC VA is in the kernel text region.) The EA of the load is in record | |
224 | 8897. To derive PA of the LDUW, we use the | |
225 | \textss{PAVADIFF\_T::ea\_pa\_va\_diff} field. | |
226 | ||
227 | At record 8898, we have an SRL instruction with PC=0x0000000001085be4. | |
228 | We continue to use the values from the last \textss{PAVADIFF\_T} to get | |
229 | the PC PA. | |
230 | ||
231 | At record 8901, we see another \textss{PAVADIFF\_T} record, because the | |
232 | LDUH load in the next record, 8902, accesses data on a different page. | |
233 | This \textss{PAVADIFF\_T} record contains the necessary | |
234 | \textss{ea\_pa\_va\_diff} value for the following LDUH instruction. In | |
235 | this case, the PC (PA-VA) value has not changed. | |
236 | ||
237 | The dependence on only two record types and the use of the last PAVADIFF | |
238 | record data makes processing simple. Here is the main loop of | |
239 | \textff{readRST.C} which prints out the VA and PA of each instruction. | |
240 | ||
241 | There is C and C++ starter source code for RST readers in | |
242 | \textff{/import/archperf/ws/rstf}. | |
243 | ||
244 | \begin{verbatim} | |
245 | ... | |
246 | for (long long ix = 0; ix < nrecords; ix++) { | |
247 | long long recidx = ix + skip; | |
248 | rstf_unionT * up = &rec[ix]; | |
249 | uint8_t rtype = up->context.rtype; | |
250 | if (rtype == CONTEXT_T) { | |
251 | icontext = (up->context.traplevel > 0) ? 0 : up->context.primD; | |
252 | dcontext = up->context.primD; | |
253 | } else if (rtype == PAVADIFF_T) { | |
254 | rstf_pavadiffT * pv = & up->pavadiff; | |
255 | icontext = pv->icontext; | |
256 | dcontext = pv->dcontext; | |
257 | pava_pc = pv->pc_pa_va; | |
258 | if (pv->ea_valid) { | |
259 | pava_ea = pv->ea_pa_va; | |
260 | } | |
261 | } else if (rtype == INSTR_T) { | |
262 | rstf_instrT * ip = & up->instr; | |
263 | short ih = ip->ihash; | |
264 | if ( ih == IH_LDX ) { // shade V5 ihash values | |
265 | // instr is a LDX | |
266 | } | |
267 | iw = ip->instr; | |
268 | pc = ip->pc_va; | |
269 | pc_pa = pc + pava_pc; | |
270 | fprintf(out, "%4lld PC V/P %llx / %llx (IW=0x%8x)", | |
271 | recidx, pc, pc_pa, iw); | |
272 | if (ip->ea_valid) { | |
273 | ea = ip->ea_va; | |
274 | ea_pa = ea + pava_ea; | |
275 | fprintf(out, "(EA V/P %llx / %llx)", | |
276 | ea, ea_pa, iw); | |
277 | } | |
278 | fprintf(out, "\n"); | |
279 | } | |
280 | } | |
281 | \end{verbatim} | |
282 | ||
283 | ||
284 | \section{Versions of RST} | |
285 | ||
286 | \begin{verbatim} | |
287 | 2.0 6/??/2001 Mostly the same as V1.10. Has official MP support | |
288 | 1.10 5/09/2001 MP support via cpuid field; better utility fn API\n\ | |
289 | 1.9 4/20/2001 PREG_T: add cpuid, rename asiReg\n\ | |
290 | 1.8 3/27/2001 unixcommand(), rstf_snprintf(), stdized rstf_headerT,magic\n\ | |
291 | 1.7 3/26/2001 Add RECNUM_T for rst-snapper\n\ | |
292 | 1.6 3/15/2001 Add support for MP (cpu-id to pavadiff, more TLB info)\n\ | |
293 | 1.5 9/18/2000 Fixed Shade V6 record types (thanks Kelvin)\n\ | |
294 | 1.4 9/9/2000 Added icontext and dcontext to PAVADIFF_T rec\n\ | |
295 | 1.4 9/?/2000 Added major, minor numbers to HEADER_T rec\n\ | |
296 | 1.3 8/25/2000 Added PATCH_T type.\n\ | |
297 | 1.2 8/22/2000 Added STATUS_T type.\n\ | |
298 | \end{verbatim} | |
299 | ||
300 | \subsection{Where do I get RST code?} | |
301 | ||
302 | The \textff{rstf.h} header file is at | |
303 | \textff{/import/archperf/include/rstf/rstf.h}. | |
304 | Various other RST binaries exist in \textff{/import/archperf/bin/} | |
305 | ||
306 | Source code to various RST utilities are in the Code Manager WS | |
307 | \textff{/import/archperf/ws/rstf/}. | |
308 | ||
309 | \labsec{Canonical} | |
310 | \section{Detailed specification of the RST trace type} | |
311 | ||
312 | We present a precise specification for common record types, as a | |
313 | reference for various analyzers. Compounding matters, different trace | |
314 | sources, produce slightly differing values even for common cases. In | |
315 | particular, the \textss{ea\_va} field for branches has several | |
316 | interpretations. Historical note: In my mind, when defining the | |
317 | \textss{INSTR\_T} record, there was no chance for ambiguity. I was | |
318 | wrong. | |
319 | ||
320 | \subsection{Analyzing a canonical RST trace} | |
321 | ||
322 | The following table shows where/how to get information from an RST | |
323 | trace. The notation \textss{ttt::fff} means to look at field | |
324 | \textss{fff} in the \textem{last} record of type \textss{ttt}. The | |
325 | notation \textss{ppp?xxx:yyy} means use value \textss{xxx} if predicate | |
326 | \textss{ppp} is true, else use \textss{yyy}; if \textss{yyy} is missing | |
327 | then there is no valid data value. | |
328 | ||
329 | \begin{tabular}{|l|l|} \hline | |
330 | Value of interest & Where / how \\ \hline | |
331 | IW (instr) & \textss{I-TLB miss} ? 0x0 : \textss{INSTR\_T::instr} \\ \hline | |
332 | PC VA & \textss{INSTR\_T::pc\_va} \\ \hline | |
333 | PC PA & \textss{INSTR\_T::pc\_va} + \textss{PAVADIFF\_T::pc\_pa\_va} | |
334 | (known wrong if I-TLB miss) \\ \hline | |
335 | ld/st VA & \textss{INSTR\_T::ea\_valid ? INSTR\_T::ea\_va} \\ \hline | |
336 | ld/st PA & \textss{INSTR\_T::ea\_valid ? (INSTR\_T::ea\_va + | |
337 | PAVADIFF\_T::ea\_pa\_va)} (PA invalid on D-TLB miss or non-translating ASI) \\ \hline | |
338 | PC I context & \textss{PAVADIFF\_T::icontext} \\ \hline | |
339 | ld/st D context & \textss{PAVADIFF\_T::dcontext} \\ \hline | |
340 | ld/st ASI goes to mem ? & examine ASI; translating+bypass ASI's go to memory \\ \hline | |
341 | instr is CTI ? & decode IW or look at \textss{INSTR\_T::ihash} \\ \hline | |
342 | CTI is taken ? & \textss{INSTR\_T::bt} \\ \hline | |
343 | taken CTI target & \textss{INSTR\_T::bt ? INSTR\_T::ea\_va} \\ \hline | |
344 | instr is annulled ? & \textss{INSTR\_T::an} \\ \hline | |
345 | \hline | |
346 | ld/st ASI & \textss{Immed-ASI ? IW : PREG\_T::asireg} (Decode IW to | |
347 | determine Immed-ASI)\\ \hline | |
348 | trap level & \textss{saw TRAP\_T ? (TRAP\_T:tl + 1) : | |
349 | PREG\_T::trap\_lvl} \\ \hline | |
350 | enter trap & Get a \textss{TRAP\_T} record and/or | |
351 | \textss{INSTR\_T::tr} \\ \hline | |
352 | trap type & \textss{TRAP\_T::ttype} \\ \hline | |
353 | exit trap & Get DONE / RETRY IW and/or get \textss{PREG\_T} \\ \hline | |
354 | system call & \textss{TRAP\_T::syscall} \\ \hline | |
355 | TLB demap & \textss{TLB\_T::demap} is 1. | |
356 | \end{tabular} | |
357 | ||
358 | The \textss{PREG\_T} (priviledged register) record is the new name for | |
359 | badly-named \textss{CONTEXT\_T} record. The \textss{PREG\_T} record | |
360 | encodes various hardware values. And making things all the more | |
361 | galling, the \textss{CONTEXT\_T/PREG\_T} \textem{should not be used to | |
362 | detect context switches}. It is deprecated as of RST format V2.05 and | |
363 | has been renamed \textss{PREG\_T}. | |
364 | ||
365 | \begin{tabular}{|l|l|} \hline | |
366 | Register & Where / how \\ \hline | |
367 | PSTATE & \textss{CONTEXT\_T::pstate} \\ \hline | |
368 | ASI reg & \textss{CONTEXT\_T::asireg} \\ \hline | |
369 | I-MMU primary context & \textss{CONTEXT\_T::primA} (non-existent for | |
370 | SPARC) \\ \hline | |
371 | I-MMU secondary context & \textss{CONTEXT\_T::secA} (non-existent for | |
372 | SPARC) \\ \hline | |
373 | D-MMU primary context & \textss{CONTEXT\_T::primD} \\ \hline | |
374 | D-MMU secondary context & \textss{CONTEXT\_T::secD} \\ \hline | |
375 | \end{tabular} | |
376 | ||
377 | \subsection{Table of common cases for INSTR and PAVADIFF records} | |
378 | ||
379 | To help clarify the above information, the following table lists the | |
380 | value from \textss{INSTR\_T} and \textss{PAVADIFF\_T} fields for the | |
381 | various common cases. We use the nomenclature \texttt{undef}=undefined, | |
382 | \textss{valid}=expected value, N/A=not used or not applicable, and | |
383 | \texttt{impdep}=implementation dependent. In particular, pavadiff | |
384 | records are emitted on demand and a N/A entry will not trigger a | |
385 | pavadiff record. | |
386 | ||
387 | \begin{tabular}{|l|l|l|l|l|l|} | |
388 | & \mc{3}{|c|}{\textss{INSTR\_T}} & \mc{2}{|c|}{\textss{PAVADIFF\_T}} \\ | |
389 | Case & \texttt{IW} & \texttt{pc\_va} & \texttt{ea\_va} & \texttt{pavaPC} & \texttt{pavaEA} \\ | |
390 | ITLB miss & norec & norec & norec & norec & norec \\ | |
391 | non-mem, non-CTI & valid & PC VA & undef & valid & valid \\ | |
392 | memop good & valid & PC VA & EA VA & valid & valid \\ | |
393 | memop DTLB miss & valid & PC VA & EA VA & valid & N/A \\ | |
394 | CTI taken & valid & PC VA & target PC & valid & N/A \\ | |
395 | CTI non-taken & valid & PC VA & \textem{impdep} & valid & N/A \\ | |
396 | annul ITLB miss & undef & PC VA & norec & norec & norec \\ | |
397 | annul instr & valid & PC VA & N/A & valid & N/A \\ | |
398 | \end{tabular} | |
399 | ||
400 | \subsection{Ordering of simultaneous records} | |
401 | ||
402 | When several RST records apply to a given event, we recommend the | |
403 | following ordering. Records in the same group can be ordered | |
404 | arbitrarily; in practice, only Our guiding strategy is to make it easy | |
405 | for an RST trace consumer to process the information. Generally, we try | |
406 | to put \textss{INSTR\_T} record last, unless there is a | |
407 | \textss{REGVAL\_T} record containing values produced by the instr. | |
408 | ||
409 | \begin{tabular}{|l|l|} | |
410 | Group 1 & \textss{CPU\_T} \\ | |
411 | Group 2 & \textss{PREG\_T}, \textss{REGVAL\_T} with postInstr=0 \\ | |
412 | Group 3 & \textss{TRAP\_T}, \textss{TRAPEXIT\_T} \\ | |
413 | Group 4 & \textss{PAVADIFF\_T} \\ | |
414 | Group 5 & \textss{INSTR\_T} \\ | |
415 | Group last & \textss{REGVAL\_T} with postInstr=1 \\ | |
416 | \end{tabular} | |
417 | ||
418 | If \textss{REGVAL\_T::postInstr} is set, then the values are those | |
419 | present after the instruction has executed. | |
420 | ||
421 | \swallow{ | |
422 | arch specific | |
423 | algorithm for I context changes in MM | |
424 | TLB field | |
425 | ihash fn | |
426 | } | |
427 | ||
428 | \subsection{Common errors} | |
429 | ||
430 | To detect a context switch, examine \textss{PAVADIFF\_T::icontext}, | |
431 | \textsl{do not use a \textss{CONTEXT\_T} record}. | |
432 | \textss{PAVADIFF\_T::icontext} and \textss{PAVADIFF\_T::dcontext} give | |
433 | the effective I and D contexts being used, which is correct to the best | |
434 | of my knowledge. | |
435 | ||
436 | On a ASI LD/ST using an immediate ASI, do not use the | |
437 | \textss{CONTEXT\_T::asireg} field, as this field contains the contents | |
438 | of the ASI register. | |
439 | ||
440 | The \textss{TLB\_T::valid} bit is meaning less. To determine if a TLB | |
441 | line is valid, instead, look at the valid bit in the TLB TTE data. | |
442 | ||
443 | \subsection{Clarification and the blaze RST tracer} | |
444 | ||
445 | We cover several ambiguous corner cases in this section. We also | |
446 | describe what the blaze V1.x and V2.x RST tracer does in these cases. | |
447 | ||
448 | \begin{rqitemize}{0em} | |
449 | \item {}[Annulled instr - I-TLB hit] | |
450 | The VA PC, PA PC and IW are all valid. The blaze RST tracer emits an | |
451 | instruction record with the annulled bit set. | |
452 | ||
453 | \item {}[Annulled instr - with ITLB miss] If there is an I-TLB miss on | |
454 | an annulled instruction, \textsl{only the VA PC is valid}. Because we | |
455 | cannot determine the PC PA, we cannot even fetch the IW. | |
456 | ||
457 | (The blaze RST tracer) On an annulled instr that misses the I-TLB, we | |
458 | emit an IW=0 (illegal trap) and blithely let the previous PAVADIFF | |
459 | remain. While this PC PAVADIFF is technically wrong, we deemed it it | |
460 | less intrusive than generating a PAVADIFF just to flag that we have an | |
461 | unknow PC PA. (I had tried emitting a special PAVADIFF, but have since | |
462 | retracted this approach.) Additionally, trace analyzers that do care | |
463 | about the PC PA of an annulled instruction should be smart enough to | |
464 | suppress the I-TLB miss. | |
465 | ||
466 | \item {}[Memory ops with a D-TLB miss] | |
467 | ||
468 | A memop that has a D-TLB miss will appear twice in the trace. The | |
469 | sequence will be $(i)$ memop-try-1 + DTLB miss, $(ii)$ D-TLB miss handler | |
470 | with possible complications like TSB miss and/or page fault, $(iii)$ | |
471 | memop try-2 which succeeds. On a D-TLB miss, the | |
472 | PA EA will be unknown on the first try. | |
473 | ||
474 | (The blaze RST tracer) On a D-TLB miss, we first emit an RST trap | |
475 | record indicating the D-TLB miss and then it emits instr record for the | |
476 | memop. The RST instruction record for the memop has its trap bit set. | |
477 | That's it. In particular, we do not emit a PAVADIFF record, as we rely | |
478 | on the trace consumer to detect the DTLB miss and squash the memop. | |
479 | ||
480 | \item {}[EA for untaken branches] | |
481 | If a CTI instr is not taken (an untaken branch), we know we shall fall | |
482 | through to next PC. In this case, \textss{INSTR\_T::ea\_valid} is 0, | |
483 | and \textss{INSTR\_T::ea\_va} \textsl{is unspecified}. For blaze and | |
484 | atrace-based RST traces, \textss{INSTR\_T::ea\_va = PC +8}; for | |
485 | shade-based RST traces, \textss{INSTR\_T::ea\_va = taken-target-PC}. | |
486 | ||
487 | \item {} [PSTATE.AM = 1] If the AM bit is one, all virtual addresses | |
488 | are limited to 32 bits. The \textss{INSTR\_T::ea\_va} field must | |
489 | contain a 32-bit value; namely the upper 32 bits of the 64-bit | |
490 | \textss{ea\_va} field must be zero. | |
491 | ||
492 | \item {}[PA EA only applies to mem ops] | |
493 | Although \textss{INSTR\_T::ea\_va} holds both $(i)$ memory addresses and | |
494 | $(ii)$ CTI target, you must use \textss{PAVADIFF\_T::ea\_pa\_va} value | |
495 | only for memory operations. The RST spec forbids using PAVADIFF to get | |
496 | the PA of a CTI/branch target, because $(a)$ you can get the PA PC from | |
497 | the actual target instruction itself and $(b)$ at the time of the | |
498 | CTI/branch, the PA PC may not be known as we may incur an I-MMU miss. | |
499 | ||
500 | \item {}[TLB demap operation] On a TLB demap operation, we record the | |
501 | VA and context of the TLB entry that is demapped. We do not record | |
502 | the TTE\_data of the entry being demapped. (There was a bug where the | |
503 | \textss{TLB\_T::demap} was not being set.) | |
504 | ||
505 | \item {}[LD/ST ASI to non-memory (e.g. to an MMU register)] For all | |
506 | loads and stores, even load/store ASI instructions, | |
507 | \textss{INSTR\_T::ea\_valid = 1} and \textss{INSTR\_T::ea\_va} holds | |
508 | the virtual address. Some of the ld/st ASI have a meaningful virtual | |
509 | addresses, e.g. \textss{ASI\_UDB\_INTR\_W} or | |
510 | \textss{ASI\_DTLB\_DATA\_TAG\_REG}, so the \textss{INSTR\_T::ea\_va} | |
511 | must contain the effective address. | |
512 | ||
513 | For non-translating ASI's, the downstream trace analyzer must not | |
514 | generate a PA, which puts the burden of knowing whether to generate a PA | |
515 | on the trace analyzer. The ASI's obey the following breakdown, where | |
516 | [aa,bb] is the range inclusive of aa and bb, namely $aa \le x \le bb$. | |
517 | ||
518 | \begin{tabular}{|l|l|} \hline | |
519 | Range & How to get PA from VA \\ \hline | |
520 | {}[0x04,0x11] [0x18,0x19] [0x24,0x2c] & Translate via MMU \\ | |
521 | {}[0x70,0x73] [0x78,0x79] [0x80,0xff] & Translate via MMU \\ \hline | |
522 | {}[0x14,0x15] [0x1c,0x1d] & Bypass. PA=VA \\ \hline | |
523 | {}[0x45,0x6f] [0x76,0x77] [0x7e,0x7f] & Non-translating (no PA) \\ \hline | |
524 | \end{tabular} | |
525 | ||
526 | The following tables lists some common translating ASI's. | |
527 | ||
528 | \begin{tabular}{|l|l|} | |
529 | Value & ASI name \\ | |
530 | 0X04 & NUCLEUS \\ | |
531 | 0X0C & NUCLEUS\_LITTLE \\ | |
532 | 0X10 & AS\_IF\_USER\_PRIMARY \\ | |
533 | 0X11 & AS\_IF\_USER\_SECONDARY \\ | |
534 | 0X80 & PRIMARY (the default ASI for all loads) \\ | |
535 | 0X81 & SECONDARY \\ | |
536 | 0X82 & PRIMARY\_NO\_FAULT \\ | |
537 | 0X83 & SECONDARY\_NO\_FAULT \\ | |
538 | 0X88 & PRIMARY\_LITTLE \\ | |
539 | 0X89 & SECONDARY\_LITTLE \\ | |
540 | 0XE0 & BLK\_COMMIT\_PRIMARY \\ | |
541 | 0XE1 & BLK\_COMMIT\_SECONDARY \\ | |
542 | 0XF0 & BLK\_PRIMARY \\ | |
543 | 0XF1 & BLK\_SECONDARY \\ | |
544 | \end{tabular} | |
545 | ||
546 | \end{rqitemize} | |
547 | ||
548 | \labsec{PAVA} | |
549 | \subsection{VA to PA translation} | |
550 | ||
551 | The RST format is designed to capture both VA and PAs. There are | |
552 | several overlapping ways to specify the necessary information. As of | |
553 | 11/2001, the defacto standard is the \texttt{PAVADIFF} method and as | |
554 | such, you may safely assume \textss{PAVADIFF\_T} records always exist. | |
555 | (As of 3/2001, blaze, atrace and shade based RST traces all use | |
556 | PAVADIFF\_T records). | |
557 | ||
558 | \textsl{PAVADIFF method:} A standard method is to use \texttt{PAVADIFF} | |
559 | records, which captures the (PA-VA) values for the I-TLB and D-TLB. In | |
560 | a \texttt{PAVADIFF} record, the \textss{pc\_pa\_va} field contains the | |
561 | difference of (PA-VA) for the PC of the next \texttt{INSTR} record. If | |
562 | the INSTR is a \texttt{load} or \texttt{store} (but not a | |
563 | \texttt{branch/call/jump} ), then the \textss{ea\_pa\_va} field of the | |
564 | \texttt{PAVADIFF} record holds the (PA-VA) for the EA, which is how the | |
565 | D-TLB would translate that EA. Here is how the trace might look. | |
566 | ||
567 | As of RST V1.9 (4/2001), there is a separate \texttt{PAVADIFF} record | |
568 | for each CPU. The CPU ID is contained in \texttt{PAVADIFF\_T::cpuid}. | |
569 | ||
570 | {\footnotesize | |
571 | \begin{verbatim} | |
572 | 3 pavadiff: context=571 cpu=0 pc_pa_va=0x00000002c0800000 ((ea_pa_va=0xffffffffffffffff)) ea_valid=0 | |
573 | 4 instr : u [0x000000010050f144] srl %i1, 0, %i0 | |
574 | 5 instr : u [0x000000010050f148] or %g3, %g2, %g2 | |
575 | 6 instr : u [0x000000010050f14c] sll %i0, 2, %g3 | |
576 | 7 pavadiff: context=571 pc_pa_va=0x00000002c0800000 ea_pa_va=0x00000002c0800000 ea_valid=1 | |
577 | 8 instr : u [0x000000010050f150] ldsw [%g3 + %g2], %g3 [0x000000010050e390] | |
578 | 9 instr : u [0x000000010050f154] jmpl %g3 + %g2, %g0 T | |
579 | 10 instr : u [0x000000010050f158] nop | |
580 | \end{verbatim} | |
581 | } | |
582 | ||
583 | Note for a control-transfter instruction (e.g \texttt{br/call/jmpl} etc) | |
584 | which jumps to a target PC, \texttt{targPC}, you must "wait" until you | |
585 | see the target INSTR record to determine the PA for \texttt{targPC}. If | |
586 | \texttt{targPC} is on a different page with a different (PA-VA) value | |
587 | than the current PC, there will be a PAVADIFF record before the target | |
588 | instruction, if necessary. | |
589 | ||
590 | \begin{rqcode}{ } | |
591 | PAVADIFF pc\_pa\_va=diffaa | |
592 | ... | |
593 | PCaa br targPC // targPC is on a different page | |
594 | PCaa+4 delay slot instr | |
595 | ||
596 | PAVADIFF pc\_pa\_va=diffbb // new value for PA-VA for targPC. | |
597 | targPC target instruction // PA of targPC = (targPC + diffbb) | |
598 | \end{rqcode} | |
599 | ||
600 | There are two common ways of using PAVADIFF records. With | |
601 | \textsl{on-change}, I generate PAVADIFF records only the (PA-VA) values | |
602 | change for either the PC or the EA. If the \textss{ea\_valid} field is | |
603 | false (0), then the previous \textss{ea\_pa\_va} value is still assumed | |
604 | to be correct. One caveat, it is possible for the (PA-VA) values to be the same for | |
605 | different (TLB) pages, in which case, you may not see PAVADIFF record | |
606 | even when we cross pages. | |
607 | ||
608 | Another possible use of PAVADIFF records is on a \textsl{every-instr} | |
609 | basis, in which PAVADIFF record precedes every instr, which nearly | |
610 | doubles the trace size. Nobody in their right mind does this as of | |
611 | 1/2001. | |
612 | ||
613 | \textsl{TLB method:} The conceptually preferred (but practically | |
614 | difficult) method is to have \texttt{TLB} records which describe all the | |
615 | necessary mappings before they are used. Whenever a TLB line is | |
616 | changed, a corresponding TLB records appears in the trace. Also, in an | |
617 | RST trace from blaze, the entire TLB is dumped at the beginning of the | |
618 | trace. As of RST V1.9, a TLB record contians the TLB unit (e.g. Cheetah | |
619 | has two I-TLBS units) and the CPU to which it applies. | |
620 | ||
621 | Despite its compactness, TLB records cannot be used universally for | |
622 | VA-to-PA translation. The TLB method for tramslation is the most | |
623 | compact, because the TLB records should be relatively infrequent in a | |
624 | trace. However, to get PA's, your analyzer program must simulate a TLB, | |
625 | which has proven to be difficult, slow and hence extremely unpopular. | |
626 | And in many cases, (e.g. when the trace is from atrace or shade), TLB | |
627 | information is unavailable, so TLB records will be missing. | |
628 | ||
629 | Here is sample output from a RST trace from the blaze \textss{rstracer} | |
630 | module. Records 9-4014 (roughly 2048 I-TLB + 2048 D-TLB) are the | |
631 | initial TLB values. At record 4649, we replace I-TLB entry 1090. Here | |
632 | \texttt{type=0} means I-TLB. The \texttt{demap=0} field means that this | |
633 | entry is being added to the TLB, which also replaces any previous entry. | |
634 | ||
635 | {\footnotesize | |
636 | \begin{verbatim} | |
637 | 1 string : string=date=00-06-26 | |
638 | 2 string : string=host=bigc | |
639 | 3 string : string=ramsize=1024M | |
640 | ||
641 | 4 string : string=tlbsize=2048 | |
642 | 5 string : string=nwins=8 | |
643 | 6 string : string=cpufreq=600000000 | |
644 | 7 string : string=mpsteps=200 | |
645 | 8 cpu : cpu=0 timestamp=0x45a318a1df | |
646 | 9 tlb : demap=0 type=0 valid=0 index=0 state=0x0000 context=0x0000 tag=0x00000000ffd00000 data=0xa000000000e00064 pa=0xe00000 | |
647 | 10 tlb : demap=0 type=0 valid=0 index=1 state=0x0000 context=0x0000 tag=0x00000000ffd10000 data=0xa000000000e10064 pa=0xe10000 | |
648 | ... | |
649 | ... | |
650 | 4103 tlb : demap=0 type=1 valid=1 index=2046 state=0x0000 context=0x110f tag=0x00000003a68b310f data=0xe00010000c000032 pa=0x | |
651 | c000000 | |
652 | 4104 tlb : demap=0 type=1 valid=0 index=2047 state=0x0000 context=0x1115 tag=0x00000003a9fc7115 data=0xe000000008c00032 pa=0x | |
653 | 8c00000 | |
654 | 4105 cpu : cpu=1077781320 timestamp=0x45a318a1df | |
655 | 4106 context : asi=0x0082 last_context=0x0120 trap_lvl=0x00 trap_type=0x00 pstate=0x0012 primA=0x0000 secA=0x0000 primD=0x0120 se | |
656 | cD=0x0120 | |
657 | 4107 instr : u [0x000000010087ec18] add %i1, %o2, %o0 | |
658 | 4108 instr : u [0x000000010087ec1c] ldub [%o2 + %o4], %g3 [0x0000000101b7a7ba] | |
659 | 4109 instr : u [0x000000010087ec20] subcc %g3, %g2, %g0 | |
660 | 4110 instr : u [0x000000010087ec24] bple,a,pn %icc, 0x10087ec30 T [0x000000010087ec30] | |
661 | ... | |
662 | ... | |
663 | 4648 instr : p [0x0000000010000cb4] nop an | |
664 | 4649 tlb : demap=0 type=0 valid=0 index=1090 state=0x0000 context=0x0120 tag=0x0000000100b2a120 data=0x8000000003756020 pa=0x | |
665 | 3756000 | |
666 | 4650 instr : p [0x0000000010000cb8] stxa %g5, [%g0 + %g0]0x54 | |
667 | 4651 context : asi=0x0082 last_context=0x0000 trap_lvl=0x00 trap_type=0x00 pstate=0x0012 primA=0x0000 secA=0x0000 primD=0x0120 se | |
668 | cD=0x0120 | |
669 | 4652 instr : p [0x0000000010000cbc] retry T [0x0000000100b2bed0] | |
670 | 4653 instr : p [0x0000000100b2bed0] or %g0, %o0, %g2 | |
671 | \end{verbatim} | |
672 | } | |
673 | ||
674 | \textsl{PHYSADDR method:} The last brute force method is to put a | |
675 | PHYSADDR record before every instr record. The PHYSADDR record contains | |
676 | the PA for the following PC and the EA, if appropriate. | |
677 | ||
678 | \subsection{VA to PA translation historical notes} | |
679 | ||
680 | For many months, I (RQ) was convinced the TLB method was the correct way | |
681 | to handle VA-PA translation. How difficult could simulating a TLB be? | |
682 | The PAVADIFF was meant to be a stop-gap, until correct TLB simulators | |
683 | were written. I was wrong. Very wrong. | |
684 | ||
685 | In retrospect, the use of PAVADIFF records has greatly simplified RST | |
686 | trace processing. Even now (03/2002), two years after the initial | |
687 | discussion, finding a correct TLB simulator (e.g. one that agrees with | |
688 | the PAVADIFF recors) remains elusive. Special "thanks" to the MM team, | |
689 | especially Sudi K, for steadfastly being unable to use TLB records, | |
690 | forcing PAVADIFF records to become the standard. | |
691 | ||
692 | \subsection{Underlying philosophy} | |
693 | ||
694 | You should be able to glean most of what you want to know from just | |
695 | \textss{INSTR\_T}, \textss{PAVADIFF\_T} and \textss{TRAP\_T} records. | |
696 | ||
697 | ((to be finished)) | |
698 | ||
699 | The \textss{TLB\_T} records let you do your own VA to PA translation | |
700 | were extremely unpopular and have been superceded in practice by | |
701 | \textss{PAVADIFF\_T} records. | |
702 | ||
703 | \textss{CONTEXT\_T} records were originally meant to be much more useful. | |
704 | ||
705 | \section{The blaze RST trace} | |
706 | ||
707 | While each individual RST record type is fairly unambigious, how the | |
708 | records are put together is implementation dependent. | |
709 | ||
710 | In a \textss{TRAP\_T} record, the HW state is that \textbf{before} the | |
711 | trap is taken. Thus, a trap from user code will show \textss{TL=0}. | |
712 | ||
713 | If executing an instr causes a trap, say a D-TLB miss, you will see the | |
714 | instr (with the \textss{tr} bit set) and then trap. If fetching an | |
715 | instr causes a trap (e.g. IMMU miss), you will not see the instruction | |
716 | until after the trap returns. | |
717 | ||
718 | If there is a write to the \textss{PSTATE} or the \textss{TL} registers, | |
719 | the new values are shown in a \textss{CONTEXT\_T} record. | |
720 | ||
721 | \subsection{Information in the RST header} | |
722 | ||
723 | As of 4/2001 (V1.4), the blaze \textss{rstracer} module spits out | |
724 | copious information about the configuration. Before that, a more modest | |
725 | modicum of information was spit out. As of Version 1.4, we get the | |
726 | following series of records at the start of a trace. | |
727 | ||
728 | \begin{verbatim} | |
729 | $ trv.sh -n 24 /import/arch-trace03/blaze/tpcc-try8/try8-t6.rsz | |
730 | RST trace format (stdin) | |
731 | ================ | |
732 | User/ Branch | |
733 | Rec # Type Priv PC Disassembly Taken EA | |
734 | 0 header : majorVer=1 minorVer=8 RST Header v1.8 | |
735 | 1 strdesc : "Blaze [ rstracer.so ]" | |
736 | 2 strdesc : "rstracer=V1.4" | |
737 | 6 strdesc : "rstracer [compiled against Blz3.48 - Excal 5.8 RW MP ||Disk API=[Trace,Timing]]" | |
738 | 8 strdesc : "date=2001-04-05_01:51:56" | |
739 | 9 strdesc : "host=bigc" | |
740 | 10 strdesc : "<blazeinfo>" | |
741 | 13 strdesc : "blz::version=3.49 - Excal 5.8 RW MP ||Disk API=[Trace,Timing]" | |
742 | 14 strdesc : "blz::ncpus=1" | |
743 | 15 strdesc : "blz::ram=1024M " | |
744 | 16 strdesc : "blz::tlbsize=2048" | |
745 | 17 strdesc : "blz::mmutype=spitfire" | |
746 | 18 strdesc : "blz::cpufreq=200000000" | |
747 | 19 strdesc : "blz::sysfreq=10000000" | |
748 | 20 strdesc : "blz::diskdelay=800000" | |
749 | 21 strdesc : "blz::nwins=8" | |
750 | 22 strdesc : "blz::mpsteps=2" | |
751 | 23 strdesc : "</blazeinfo>" | |
752 | \end{verbatim} | |
753 | ||
754 | \section{Where is the source for RST?} | |
755 | ||
756 | The current source is in \textss{/import/archperf/pkgs/rstf/latest/}. | |
757 | ||
758 | \begin{tabular}{|l|l|} \hline | |
759 | file & Description \\ \hline | |
760 | \rqhttp{\textss{rstf.h}}{file:/import/archperf/pkgs/rstf/latest/rstf.h} & the RST | |
761 | format \\ \hline | |
762 | \rqhttp{\textss{rstf.c}}{file:/import/archperf/pkgs/rstf/latest/rstf.c} & a few utility routines and some test code \\ \hline | |
763 | \end{tabular} | |
764 | ||
765 | \section{I want to process an RST trace, where do I start?} | |
766 | ||
767 | A simple, sample C++ skeleton to read an RST trace file at | |
768 | \textss{/import/archperf/pkgs/rstf/latest/readRST.C}. | |
769 | ||
770 | A simple, sample ANSI C skeleton to read an RST trace file at | |
771 | \textss{/import/archperf/pkgs/rstf/latest/readRST-ansiC.c}. I "thank" | |
772 | Anders who found the C++ skeleton impenetrable, and so spent several | |
773 | hours doing numerous moronic things getting this code to work. | |
774 | ||
775 | Finally, the file \textss{/import/archperf/ws/rstf/rstFilter.h} | |
776 | contains a more realistic (i.e. complicated) example of RST processing | |
777 | in which we read an RST trace, adding/modifying/deleting records, and | |
778 | generate a new RST trace. This code double buffers both input and | |
779 | output to guarantee that we can always access/modify the previous K | |
780 | records at both the input and output. (In contrast, if you use a single | |
781 | buffer and you just happen to fill (flush) the input (output) buffer, | |
782 | you cannot access or modify the previous record). | |
783 | ||
784 | \subsection{The actual RST code} | |
785 | ||
786 | Here are the corresponding record definitions directly from | |
787 | \textss{rstf.h}. The code on this web page maybe a bit out of date, so | |
788 | check the source \rqlink{/import/archperf/pkgs/rstf/latest/rstf.h}. | |
789 | ||
790 | \begin{verbatim} | |
791 | typedef struct { | |
792 | uint8_t rtype; /* value = INSTR_T */ | |
793 | unsigned notused : 1; /* not used */ | |
794 | unsigned ea_valid : 1; /* ea_va field is valid */ | |
795 | unsigned tr : 1; /* trap occured 1=yes */ | |
796 | unsigned notused2 : 1; /* not used */ | |
797 | unsigned pr : 1; /* priviledged or user 1=priv */ | |
798 | unsigned bt : 1; /* branch/trap taken, cond-move/st done, like Shade6 */ | |
799 | unsigned an : 1; /* 1=annulled (instr was not executed) */ | |
800 | unsigned reservedCompress : 1; /* used by rstzip compression */ | |
801 | uint16_t ihash; /* ihash value (optional) */ | |
802 | uint32_t instr; /* instruction word (opcode, src, dest) */ | |
803 | uint64_t pc_va; /* VA */ | |
804 | uint64_t ea_va; /* Eff addr VA */ | |
805 | } rstf_instrT; | |
806 | ||
807 | typedef struct { | |
808 | uint8_t rtype; /* value = PAVADIFF_T */ | |
809 | unsigned ea_valid : 1; /* does ea_pa contain a valid address */ | |
810 | unsigned cpuid : 7; | |
811 | uint16_t notused16; /* (deprecated) context used for these diffs */ | |
812 | uint16_t icontext; /* I-context used for these diffs */ | |
813 | uint16_t dcontext; /* only valid if ea_valid is true, */ | |
814 | uint64_t pc_pa_va; /* (PA-VA) of PC */ | |
815 | uint64_t ea_pa_va; /* (PA-VA) of EA for ld/st (not branches), if ea_valid is true */ | |
816 | } rstf_pavadiffT; | |
817 | ||
818 | typedef struct { | |
819 | uint8_t rtype; /* value = TRAP_T */ | |
820 | unsigned is_async : 1 ; /* asynchronous trap ? */ | |
821 | unsigned unused : 3 ; /* unused */ | |
822 | unsigned tl : 4 ; /* trap level in the trap handler */ | |
823 | uint16_t ttype; /* trap type for V9, only 9 bits matter */ | |
824 | ||
825 | uint16_t pstate; /* Pstate register in the trap, only 9 bits */ | |
826 | uint16_t syscall; /* If a system call, the syscall # */ | |
827 | ||
828 | uint64_t pc; | |
829 | uint64_t npc; | |
830 | } rstf_trapT; | |
831 | \end{verbatim} | |
832 | ||
833 | \section{System calls} | |
834 | ||
835 | Depending on the tracing harness, system call information maybe present | |
836 | in the trace. E.g. in \textss{RST/blaze}, system call information is | |
837 | present. | |
838 | ||
839 | A system call consists of a (software) trap instruction to trap TRNUM, | |
840 | with the \texttt{\%g1} register containing the system call number. | |
841 | There is one trap number for 32-bit and a second trap for 64-bit system | |
842 | calls. | |
843 | ||
844 | \begin{tabular}{|l|l|} | |
845 | TRNUM & system call \\ | |
846 | \texttt{0x108} & 32 bit system call \\ | |
847 | \texttt{0x140} & 64 bit system call \\ | |
848 | \end{tabular} | |
849 | ||
850 | See the C header file \textff{/usr/include/sys/syscall.h} for the system | |
851 | call numbers. Thus \textss{2=fork}, \textss{5=open}, and | |
852 | \textss{173=pread}. The header file \textff{/usr/include/sys/trap.h} | |
853 | has the 32-bit trap number. (I forgot where I found the 64-bit system | |
854 | call trap.) | |
855 | ||
856 | Thus, a system call will appear as a \textss{TRAP\_T} record with the | |
857 | \textss{ttype} field set to either \textss{0x108} or \textss{0x140} and | |
858 | the \textss{syscall} field holding system call index. For example, in | |
859 | this blaze TPCC trace snippet (\textss{t5sds}), the trap record at 86034 | |
860 | indicates a system call (pread) is being made at instruction record | |
861 | 86035. | |
862 | ||
863 | \begin{verbatim} | |
864 | 86032 instr : cpuid=0 u [0xffffffff7dfa34d8] stx %o0, [%sp + 0x87f] [0xffffffff7fff0910] | |
865 | 86033 instr : cpuid=0 u [0xffffffff7dfa34dc] or %g0, 0xad, %g1 | |
866 | 86034 trap : cpuid=0 is_async=0 async==0 tl=0 ttype=0x140 pstate=0x012 syscall=0x00ad | |
867 | 86035 instr : cpuid=0 p [0xffffffff7dfa34e0] ta %icc, %g0 + 0x40 T [0x0000000001002800] tr | |
868 | \end{verbatim} | |
869 | ||
870 | \section{Trace format design} | |
871 | ||
872 | \subsection{How do I encode state (such as warmed cache state) in RST?} | |
873 | ||
874 | In short, do not do this. RST is designed for capturing a dynamic | |
875 | sequence of events (instructions, TLB activity, etc) from an computer | |
876 | system. | |
877 | ||
878 | If you need to heterogenous information in a single trace, create a | |
879 | \rqhttp{unatrace}{http://smeeng.eng/\rqtilde{}quong/unawrap.html}, which | |
880 | is a general purpose trace \textsl{wrapper} format. Aztecs snaps, which | |
881 | consist of [cache + TLB + branch predictor warming + RST instruction | |
882 | traces] use the unawrap format. | |
883 | ||
884 | \subsection{Design tradeoffs in RST} | |
885 | ||
886 | Any trace format must be a balance of the following design tradeoffs, | |
887 | because not all properties can be achieved simultaneously. We evaluate | |
888 | RST against various criterion. | |
889 | ||
890 | \begin{tabularx}{\linewidth}{|l|l|l|X|} \hline | |
891 | Goal & RST grade & Conflicts with & Description \\ \hline | |
892 | Simple & A & Size & Trace format should be easy to use. RST uses a | |
893 | fixed size record so it is easy to skip N records. RST has a common | |
894 | rtype byte so decoding a record is very easy. \\ \hline | |
895 | Size & D & Simplicity & Information density should be high, as traces | |
896 | are often very large. RST requires about 30 bytes per instruction | |
897 | (PA+VA, PC+EA, TLB, traps events). We believe an separate compression | |
898 | phase can be used to reduce the RST size (use of a beta quality | |
899 | compressor and gzip reduced the size of RST by approx 5-10X). \\ \hline | |
900 | Flexible & A & Size & A trace should be able to hold different types | |
901 | of data. A trace format which uses a fixed-record type severely | |
902 | restricts flexibility, because every record must have a field for every | |
903 | type. We avoid this in RST by having a different record types in RST. | |
904 | \\ \hline | |
905 | \end{tabularx} | |
906 | ||
907 | Other RST design notes. (1) The RST trace instruction record was | |
908 | designed to hold an instruction word (32-bit), instruction record | |
909 | (64-bit PC) and memory effective address (64-bit EA) and other overhead | |
910 | such as the \textss{rtype} byte. This lead to the 24-byte record size. | |
911 | ||
912 | \section{Patching for Aztecs} | |
913 | ||
914 | \begin{verbatim} | |
915 | 1418 instr : u [0x000000010048e9b8] add %g4, 1, %g4 tr | |
916 | 1419 patch : isbegin=1 rewindrecs=0 id=1 length=2 descr=atrPCdAZ | |
917 | 1420 instr : u [0x000000010048e9bc] jmpl %g2 + 0, %g1 T [0x0000000078404780] | |
918 | 1421 instr : u [0x000000010048e9c0] nop | |
919 | 1422 patch : isbegin=0 rewindrecs=0 id=1 length=2 descr=atrPCdAZ | |
920 | 1423 context : asi=0x0000 last_context=0x0000 trap_lvl=0x00 trap_type=0x00 pstate=0x0000 primA=0x0000 secA=0x0000 primD=0x0000 secD= | |
921 | 0x0000 | |
922 | 1424 pavadiff: context=0 pc_pa_va=0x00000003673c0000 ((ea_pa_va=0xffffffffffffffff)) ea_valid=0 | |
923 | 1425 instr : u [0x0000000078404780] save %sp, -0xb0, %sp | |
924 | \end{verbatim} | |
925 | ||
926 | \section{FAQ} | |
927 | ||
928 | \subsection{There is a TRAP record and a tr bit in the instruction record. What is the difference?} | |
929 | ||
930 | The trap record contains many values including the trap type, trap | |
931 | level, PC, NPC, pstate register and the system call number (\%g1 | |
932 | register) on a syscall trap. | |
933 | ||
934 | The \textss{tr} bit in the instruction simply indicates if a trap | |
935 | occurred during this instruction. The tr bit is necessary to clearly | |
936 | distinguish when a trap occurs. | |
937 | ||
938 | \section{The rstf workspace} | |
939 | ||
940 | \subsection{Purpose} | |
941 | ||
942 | \begin{rqenumerate}{0em} | |
943 | \item The main purpose of this WS is to define the RST file format in | |
944 | \textff{rstf.h}. | |
945 | Some secondary and/or deprecated definitions are in | |
946 | \textff{rstf\_*.h} | |
947 | ||
948 | \item A secondary purpose is to define common RST utilities/code, including | |
949 | starter code and RST-to-RST filters. | |
950 | \end{rqenumerate} | |
951 | ||
952 | \subsection{Guidance on updating this workspace} | |
953 | ||
954 | The file \textff{rstf.h} defines the RST file format. The file format | |
955 | consist of the rtype definitions and the fields within each record. | |
956 | \textbf{Many} other programs use \textff{rstf.h}. So.... | |
957 | ||
958 | \begin{rqitemize}{0em} | |
959 | \item Try to avoid changing this file if possible. | |
960 | In the last 12 months (10/2001-10/2002), I have bumped the minor | |
961 | version once. | |
962 | \item Avoid breaking backward compatibility \textbf{AT ALL COSTS}. | |
963 | There is considerable data in \texttt{rstf} 2.04-2.06 format. | |
964 | \item The safest changes involve adding new rtypes or adding more constants | |
965 | to existing enumerations. E.g. filling out the register constants | |
966 | in the \textss{REGVAL\_T:regtype[]} | |
967 | \item There is a Java port of \texttt{rstf}, in the (to be released | |
968 | 12/2002) \textss{jrst} workspace. A Perl script in \textss{jrst} | |
969 | "parses" \textff{rstf.h} and makes undocumented assumptions about | |
970 | the way \textff{rstf.h} looks. Please try to conform to the | |
971 | existing style in the typedefs and enums. | |
972 | ||
973 | \item I (RQ) have tried to be stingy in using \textss{rtype} values. | |
974 | I have unofficially reserved bits 7 and 6 of the rtype as a hedge for | |
975 | (two rounds of) sweeping changes to RST in the distant future if it | |
976 | comes to that. Thus, i strongly recommend only using rtypes from | |
977 | 2-63. | |
978 | \end{rqitemize} | |
979 | ||
980 | If you must change \textff{rstf.h}, bump the version number in | |
981 | \textff{rstf.h} | |
982 | ||
983 | \subsection{Version numbers} | |
984 | ||
985 | Many programs or code snippets have version numbers. The big rule about | |
986 | version numbers is that given an RST trace and full knowledge about the | |
987 | history of the programs involved in producing the trace, you must (or | |
988 | should) be able to determine what idiosyncrasies exist in that trace. | |
989 | Note, you do \textem{not} know what version of the program were involved | |
990 | producing the trace. | |
991 | ||
992 | As an example, you are given the trace \textss{try8-t24.rz.gz} from | |
993 | 6/2001, which was produced by \textss{blaze V3} and \textss{rstracer}. | |
994 | You are given the phone numbers of all the developers involved in | |
995 | tracing at Sun, so you can obtain the history of all programs involved. | |
996 | What are the issues, if any, of this trace from a data format and | |
997 | correctness standpoint? First you have to determine which components | |
998 | (or programs) were involved in this trace. Running \textss{trv.sh -n | |
999 | 40} on this trace we see | |
1000 | ||
1001 | \begin{flushleft} | |
1002 | 0 header : majorVer=1 minorVer=10 RST Header v1.10\\ | |
1003 | 4 strdesc : " rstracer=V1.8"\\ | |
1004 | 8 strdesc : " compiled against Blz 3.64 - Excal 5.8 LL RW MP ||Disk API=[Trace,Timing]"\\ | |
1005 | 24 strdesc : "blz::version=3.65 - Excal 5.8 LL RW MP ||Disk API=[Trace,Timing]"\\ | |
1006 | \end{flushleft} | |
1007 | ||
1008 | Thus this is a RSTF v1.10 trace and \textss{blaze V3.65} and the | |
1009 | \textss{rstracer V1.8} were involved. You call up their developers and | |
1010 | get the details of these programs from the dawn of time until now and | |
1011 | have an understanding of the trace issues. | |
1012 | ||
1013 | Thus, here are the strong recommendations regarding version numbers and | |
1014 | traces. | |
1015 | ||
1016 | \begin{rqenumerate}{0em} | |
1017 | \item An RST trace must contain the version numbers of all programs | |
1018 | involved in producing the trace. In the case of | |
1019 | \textss{try8-t24.rz.gz}, this trace has the version numbers of | |
1020 | \textss{rstf}, \textss{rstracer} and \textss{blaze}. | |
1021 | ||
1022 | \item The version number of each component (or program) must indicate | |
1023 | that state of that component. I.e. if something is changed, the | |
1024 | version number of that component must be changed. | |
1025 | ||
1026 | \item There must be a record of known bugs for each component for each | |
1027 | version number. | |
1028 | \end{rqenumerate} | |
1029 | ||
1030 | Here are some examples of version numbers. | |
1031 | ||
1032 | \begin{tabularx}{\linewidth}{|l|X|} | |
1033 | Code & Description/philosopy of version numbers \\ | |
1034 | rstf & Version of the RST Format records. Should not change often. \\ | |
1035 | & The first record in an RSTF trace must define the version number | |
1036 | If a new version of RSTF breaks backward compatibility (e.g. the | |
1037 | format for PAVADIFF changes), increment | |
1038 | the major version. And this should happen once every never. \\ | |
1039 | rstFilter & updated when a filter is added or updated. Update freely. \\ | |
1040 | rstracer & (in rstracer WS) | |
1041 | Reflects which version of the rst tracer. The version indicates | |
1042 | what bugs/idiosyncrasies exist. Note that the RST trace produced | |
1043 | by \textss{rstracer} contains both the rstracer version number and | |
1044 | the RSTF version num. | |
1045 | \end{tabularx} | |
1046 | ||
1047 | \subsection{Basic programs and scripts in the rstf workspace} | |
1048 | ||
1049 | The master workspace for RST is \textff{/import/archperf/ws/rstf}. | |
1050 | It should be open to all to do a bringover, aka world bringover-able. | |
1051 | If you need to do a putback to this workspace, talk to someone in Arch | |
1052 | Tools, say \textss{lren@eng}. | |
1053 | ||
1054 | \subsubsection{trv.sh} | |
1055 | ||
1056 | Look at RST files (compressed or not) in ASCII. (Replaces rstunzip and | |
1057 | trconv). Runs a PAGER ( \textss{more} or \textss{less} ) if output is a | |
1058 | terminal. \textem{Use this program}. | |
1059 | ||
1060 | \subsubsection{rstFilter.C} | |
1061 | ||
1062 | Implemements many (30+) RST-to-RST filters (read stdin/file , write | |
1063 | stdout). Typically you need to use several filters in a row. All error | |
1064 | messages go to stderr. This code offers generic double-buffering on | |
1065 | both input and the output, making is "easier" (hah) to do | |
1066 | transformations that must look at several records. | |
1067 | ||
1068 | \subsubsection{runRSTFilt.sh} | |
1069 | ||
1070 | Convenient shell script driver for running \textss{rstFilter}. Use | |
1071 | this. | |
1072 | ||
1073 | \begin{rqcode}{ } | |
1074 | // by hand | |
1075 | rstFilter -a filter1 input-file | rstFilter -a filter2 | rstFilter -a | |
1076 | filter3 > output | |
1077 | ||
1078 | // using runRSTFilt.sh | |
1079 | runRSTFilt -a 'filter1 filter2 filter3' > output | |
1080 | // Same as above but generate ASCII dumps of all intermediate files, too | |
1081 | runRSTFilt -u -a 'filter1 filter2 filter3' > output | |
1082 | ||
1083 | // E.g to clean up the raw outout from atrace2rst [Atrace->RST] files, | |
1084 | runRSTFilt.sh -a 'ihash addBrTarg' [-u] raw.rst > clean.rst | |
1085 | \end{rqcode} | |
1086 | ||
1087 | \subsubsection{atr2rst.sh, atrace2rst.C and dumpatr} | |
1088 | ||
1089 | The script \textss{atr2rst.sh} = runs \texttt{atrace2rst} and does some | |
1090 | post processing to clean up the RST. The 64-bit executable | |
1091 | \textss{atrace2rst} converts an atrace to raw RST. The post processing | |
1092 | adds ihash values and branch targets among other things. | |
1093 | \textss{Dumpatr} is a hard link to atrace2rst; it is the same as running | |
1094 | \textss{atrace2rst -a}. | |
1095 | ||
1096 | \subsubsection{snapForAztecs.sh} | |
1097 | ||
1098 | Generate snaps suitable for aztecs. Snaps the RST file and then runs a | |
1099 | horrific combination of RST filters on the result and then compresses | |
1100 | the results. Even the author does not want to look at this script. | |
1101 | ||
1102 | \section{History} | |
1103 | ||
1104 | The RST format and this document was started and then maintained by R | |
1105 | Quong through 11/2002. | |
1106 | ||
1107 | \end{document} |