| 1 | \documentclass[10pt]{article} |
| 2 | |
| 3 | \usepackage{rqdefs} \usepackage{rqfullpage} \usepackage{utopia} |
| 4 | \usepackage{rqcode} |
| 5 | |
| 6 | %% Example ltoh commands (start with %-ltoh-) |
| 7 | %-ltoh- title := The RS trace format, aka RST |
| 8 | %-ltoh- :{}:\gold:<font color=gold>:</>: |
| 9 | %-ltoh- :comm:\salsa:<strong>salsa</strong>:: |
| 10 | |
| 11 | \rqfullpageD |
| 12 | |
| 13 | \begin{document} |
| 14 | |
| 15 | \rqtitle{The RST Trace format} |
| 16 | \rqsubtitle{AAD Tools, last updated \today} |
| 17 | |
| 18 | \toc |
| 19 | |
| 20 | A HTML version of this document is available at |
| 21 | \rqlink{http://ppgweb.eng/archperf/rstFormat.html}. |
| 22 | |
| 23 | The master workspace for RST is \textff{/import/archperf/ws/rstf}. This |
| 24 | workspace contains source to generate PS and PDF versions of this |
| 25 | document. A (possibly out of date) PostScript version of this document |
| 26 | is available at \texttt{/home/quong/proj/rstf/rstFormat.ps}. |
| 27 | |
| 28 | \section{Other Useful links} |
| 29 | |
| 30 | \begin{tabular}{|l|l|} \hline |
| 31 | What & URL \\ \hline |
| 32 | List of known traces & |
| 33 | \rqlink{http://traces.eng/} \\ \hline |
| 34 | ArchTools Trace FAQ & |
| 35 | \rqlink{http://ppgweb.eng/archperf/traceFAQ.html} \\ \hline |
| 36 | The RST trace format & |
| 37 | \rqlink{http://ppgweb.eng/archperf/rstFormat.html} \\ \hline |
| 38 | The atrace tool/format & |
| 39 | \rqlink{http://muskoka.eng/\rqtilde{}bmc/atrace/} \\ \hline |
| 40 | Converting atrace to RST & |
| 41 | \rqlink{http://ppgweb.eng/\rqtilde{}quong/atrace2rst.html} \\ \hline |
| 42 | Converting bustraces to RST & |
| 43 | \rqlink{http://ppgweb.eng/\rqtilde{}quong/bustrace.html} \\ \hline |
| 44 | Instruction Trace validation & |
| 45 | \rqlink{http://ppgweb.eng/archperf/trace-validation2002.html} \\ \hline |
| 46 | Blaze web page & |
| 47 | \rqlink{http://ppgweb.eng/archperf/blaze.html} \\ \hline |
| 48 | Blaze user guide & |
| 49 | \rqlink{http://ppgweb.eng/\rqtilde{}quong/blaze-userguide.html} \\ \hline |
| 50 | Blaze TPCC trace & |
| 51 | \rqlink{http://ppgweb.eng/\rqtilde{}quong/blaze-tpcc-try500.html} \\ \hline |
| 52 | \end{tabular} |
| 53 | |
| 54 | The trace FAQ answers many questions on how to analyze/process RST |
| 55 | traces. Unfortunately, there are many issues that could reasonably |
| 56 | belong in the trace FAQ page or this page. We (RQ) have tried to limit |
| 57 | this web page to RST format definition issues, but the need for examples |
| 58 | causes this web page to overlap with the trace FAQ. |
| 59 | |
| 60 | \section{What is RST?} |
| 61 | |
| 62 | RST is short for RS Trace format, which is a format for computer |
| 63 | architecture traces. RS stands for "Really Simple" or "Russell's Simple", |
| 64 | depending on who you talk to (or "to whom you talk" if you hate dangling |
| 65 | particples). An RST trace consists of fixed-length records (24 bytes) |
| 66 | in which the first byte of each record, known as the \textsl{rtype}, |
| 67 | specifies the type. These two properties ensure that an RST trace is |
| 68 | easy to decode both now and in the future. In particular if your old |
| 69 | analyzer sees a new \textsl{rtype} it does not understand, your code can |
| 70 | simply skip that record, ensuring forward compatibility. |
| 71 | |
| 72 | There are many different kinds of RST records types, including |
| 73 | \begin{rqitemize}{0em} |
| 74 | \item instructions |
| 75 | \item events (traps, interrupts) |
| 76 | \item MMU state (changes to the TLB, PA-VA diffs) |
| 77 | \item string data |
| 78 | \item internal processor state (register dumps) |
| 79 | \item high-level evnts (process/context switchs, thread switch) |
| 80 | \item markers (timestamp, current CPU) |
| 81 | \item state (cache/memory state) |
| 82 | \item define your own |
| 83 | \end{rqitemize} |
| 84 | |
| 85 | Here is an example of 4 records in an RST trace: pavadiff, instr, instr, |
| 86 | and trap. Again, each record is the same size and the \textsl{rtype} |
| 87 | indicates the record type. |
| 88 | |
| 89 | \begin{rqcode}{\small} |
| 90 | +================+ |
| 91 | |rtype=PAVADIFFT | |
| 92 | | i/d contexts | |
| 93 | | PA-VA for PC | |
| 94 | | PA-VA for EA | |
| 95 | +================+ |
| 96 | |rtype=INSTR\_T | |
| 97 | | flags+instr | |
| 98 | | PC (VA) | |
| 99 | | EA (VA) | |
| 100 | +================+ |
| 101 | |rtype=INSTR\_T | |
| 102 | | flags+instr | |
| 103 | | PC (VA) | |
| 104 | | EA (VA) | |
| 105 | +================+ |
| 106 | |rtype=TRAP\_T | |
| 107 | | trap type | |
| 108 | | trap level | |
| 109 | | | |
| 110 | +================+ |
| 111 | \end{rqcode} |
| 112 | |
| 113 | |
| 114 | \section{Why should I use RST?} |
| 115 | |
| 116 | Because RST it is simple, flexible, extensible and supported. It has |
| 117 | provisions for MP traces, traps, events (snooping, DMA, etc), VA/PA, and |
| 118 | time stamps, descriptive strings, and trace patching. Adding new types |
| 119 | of data in an RST trace, is as simple as defining new rtypes. In short, |
| 120 | RST was designed to all kinds of trace data for the next 5 years. |
| 121 | |
| 122 | There are numerous tools based on RST. There is an RST compressor, |
| 123 | rstzip; used with gzip, we typically find compression rates of 18-40X |
| 124 | and have seen compression of 200X. |
| 125 | |
| 126 | The \textss{rst-snapper} reads RST. The \textss{rstgen} is a shade |
| 127 | analyzer which generates RST. The RST tracer module ( |
| 128 | \textss{rstracer.so} ) for \rqhttp{blaze}{} geneerates rich RST traces. |
| 129 | Additionally, snaps for the new (4/2000) version of Aztecs, the cycle |
| 130 | accurate simulator for Cheetah/Jubatus, use RST instructions. |
| 131 | |
| 132 | \section{What tools exist for RST?} |
| 133 | |
| 134 | The following tools exist for handling RST traces. All binaries exist |
| 135 | in \textss{/import/archperf/bin}. |
| 136 | |
| 137 | \begin{tabular}{|l|l|} \hline |
| 138 | Tool & Description \\ \hline |
| 139 | \textss{trv.sh} & view (un/compressed) RST trace \\ \hline |
| 140 | \textss{trconv} & view RST trace in ASCII format \\ \hline |
| 141 | \textss{rstFilter} & process an RST trace producing a new RST trace \\ \hline |
| 142 | \textss{atr2rst.sh} & script to \rqhttp{convert atrace to |
| 143 | RST}{atrace2rst.html} \\ \hline |
| 144 | \textss{rstzip2} & De/Compressor tailored for MP RST instr traces \\ \hline |
| 145 | \textss{rstzip} & De/Compressor tailored for RST instr traces (deprecated) \\ \hline |
| 146 | \textss{rstsnap} & snap an RST trace for Aztecs \\ \hline |
| 147 | \textss{rstgen} & generate an RST trace from shade (used for Spec2K) |
| 148 | \\ \hline |
| 149 | \textss{rstracer.so} & Blaze module to dump/generate RST traces \\ \hline |
| 150 | \end{tabular} |
| 151 | |
| 152 | \subsection{RST Compression} |
| 153 | |
| 154 | A significant, but often overlooked, benefit of RST is an excellent |
| 155 | compression algorithm. Typically, RST traces compress to about 1 bytes |
| 156 | per instruction. The compressor was implemented by Kelvin Fong. |
| 157 | |
| 158 | Unfortunately, there are two incompatible RST compression formats, V1 |
| 159 | and V2, requiring different de/compressors, \textss{rstzip} and |
| 160 | \textss{rstzip2}, respectively. Version 2 RST compression supports MP |
| 161 | traces and is the preferred compression method for all RST instruction |
| 162 | traces after Aug 2001. Most RST traces before Jun 2001 were compressed |
| 163 | with V1 compression (rstzip). |
| 164 | |
| 165 | \begin{tabular}{|l|l|l|} |
| 166 | Format & Suffix & Description \\ |
| 167 | V1 & \textss{rz.gz} \textss{rsz} \textss{rsz.gz} & Original 1P format \\ |
| 168 | V2 & \textss{rz2.gz} & Support for MP and value tracing \\ |
| 169 | \end{tabular} |
| 170 | |
| 171 | \subsection{Viewing an RST trace} |
| 172 | |
| 173 | Use \textss{trv.sh} or \textss{trconv}. This code was recently updated |
| 174 | on 6/28/2000, so some flags may have changed. |
| 175 | |
| 176 | \begin{rqenumerate}{0em} |
| 177 | \item An example run is \textss{trv.sh -n 1000 -s 200 trace45.rz2.gz | less}. |
| 178 | |
| 179 | \item An example run is \textss{trconv -n 1000 -s 20 file.rst}. |
| 180 | |
| 181 | \item There is on-line help for \textss{trconv}. |
| 182 | |
| 183 | \item In the \texttt{INSTR}, \texttt{PAVADIFF} and \texttt{PHYSADDR} |
| 184 | records, there is a \textss{ea\_valid} field which indicates if the |
| 185 | corresponding \textss{ea\_xxxx} field contains valid data. If |
| 186 | \textss{ea\_valid} is false (0) then the \textss{ea\_xxxx} field must be |
| 187 | ignored. |
| 188 | |
| 189 | Most of my programs (atr2rst, the blaze rstracer) which generate RST |
| 190 | traces put a bogus, easily recognizable value in \textss{ea\_xxxx} if |
| 191 | \textss{ea\_valid} is false. |
| 192 | \end{rqenumerate} |
| 193 | |
| 194 | \section{Getting started analyzing a trace} |
| 195 | |
| 196 | In a typical trace analyzer (e.g. cache simulator) two records types |
| 197 | suffice for most of what you want to do. The \textss{INSTR\_T} record |
| 198 | gives you the instruction word, PC and optional EA. The |
| 199 | \textss{PAVADIFF\_T} record lets you translate the VAs into PAs. For |
| 200 | both the I and D references, the last seen \textss{PAVADIFF\_T} contains |
| 201 | the current (PA-VA) difference values and the effective context used. |
| 202 | |
| 203 | Let's consider an example. Consider the following output from |
| 204 | \textss{trconv}: |
| 205 | |
| 206 | \begin{verbatim} |
| 207 | 8896 pavadiff: cpuid=0icontext=0 dcontext=0 pc_pa_va=0xffffffffff400000 ea_pa_va=0xffffffffff400000 ea_valid=1 |
| 208 | 8897 instr : cpuid=0 p [0x0000000001085be0] lduw [%g2 + 0xf0], %g2 [0x00000000014000f0] |
| 209 | 8898 instr : cpuid=0 p [0x0000000001085be4] srl %g2, 0xb, %g2 |
| 210 | 8899 instr : cpuid=0 p [0x0000000001085be8] subcc %g2, 0, %g0 |
| 211 | 8900 instr : cpuid=0 p [0x0000000001085bec] bpe,a,pt %icc, 0x1085c00 T [0x0000000001085c00] |
| 212 | 8901 pavadiff: cpuid=0icontext=0 dcontext=0 pc_pa_va=0xffffffffff400000 ea_pa_va=0xfffffd5fe5c08000 ea_valid=1 |
| 213 | 8902 instr : cpuid=0 p [0x0000000001085bf0] lduh [%g7 + 0x188], %g2 [0x000002a100225ec8] |
| 214 | 8903 instr : cpuid=0 p [0x0000000001085c00] sra %g2, 0, %i1 |
| 215 | 8904 instr : cpuid=0 p [0x0000000001085c04] call 0x1032420 T [0x0000000001032420] |
| 216 | 8905 instr : cpuid=0 p [0x0000000001085c08] restore %g0, %g0, %g0 |
| 217 | \end{verbatim} |
| 218 | |
| 219 | At record 8897, we have an LDUW instruction with PC=0x0000000001085be0. |
| 220 | Thus the PC PA is 0x0000000001085be00 + 0xffffffffff400000. The |
| 221 | PAVADIFF also indicates the I-context is 0, meaning we are in priviledge |
| 222 | mode hence executing kernel code. (We could also determine this as the |
| 223 | PC VA is in the kernel text region.) The EA of the load is in record |
| 224 | 8897. To derive PA of the LDUW, we use the |
| 225 | \textss{PAVADIFF\_T::ea\_pa\_va\_diff} field. |
| 226 | |
| 227 | At record 8898, we have an SRL instruction with PC=0x0000000001085be4. |
| 228 | We continue to use the values from the last \textss{PAVADIFF\_T} to get |
| 229 | the PC PA. |
| 230 | |
| 231 | At record 8901, we see another \textss{PAVADIFF\_T} record, because the |
| 232 | LDUH load in the next record, 8902, accesses data on a different page. |
| 233 | This \textss{PAVADIFF\_T} record contains the necessary |
| 234 | \textss{ea\_pa\_va\_diff} value for the following LDUH instruction. In |
| 235 | this case, the PC (PA-VA) value has not changed. |
| 236 | |
| 237 | The dependence on only two record types and the use of the last PAVADIFF |
| 238 | record data makes processing simple. Here is the main loop of |
| 239 | \textff{readRST.C} which prints out the VA and PA of each instruction. |
| 240 | |
| 241 | There is C and C++ starter source code for RST readers in |
| 242 | \textff{/import/archperf/ws/rstf}. |
| 243 | |
| 244 | \begin{verbatim} |
| 245 | ... |
| 246 | for (long long ix = 0; ix < nrecords; ix++) { |
| 247 | long long recidx = ix + skip; |
| 248 | rstf_unionT * up = &rec[ix]; |
| 249 | uint8_t rtype = up->context.rtype; |
| 250 | if (rtype == CONTEXT_T) { |
| 251 | icontext = (up->context.traplevel > 0) ? 0 : up->context.primD; |
| 252 | dcontext = up->context.primD; |
| 253 | } else if (rtype == PAVADIFF_T) { |
| 254 | rstf_pavadiffT * pv = & up->pavadiff; |
| 255 | icontext = pv->icontext; |
| 256 | dcontext = pv->dcontext; |
| 257 | pava_pc = pv->pc_pa_va; |
| 258 | if (pv->ea_valid) { |
| 259 | pava_ea = pv->ea_pa_va; |
| 260 | } |
| 261 | } else if (rtype == INSTR_T) { |
| 262 | rstf_instrT * ip = & up->instr; |
| 263 | short ih = ip->ihash; |
| 264 | if ( ih == IH_LDX ) { // shade V5 ihash values |
| 265 | // instr is a LDX |
| 266 | } |
| 267 | iw = ip->instr; |
| 268 | pc = ip->pc_va; |
| 269 | pc_pa = pc + pava_pc; |
| 270 | fprintf(out, "%4lld PC V/P %llx / %llx (IW=0x%8x)", |
| 271 | recidx, pc, pc_pa, iw); |
| 272 | if (ip->ea_valid) { |
| 273 | ea = ip->ea_va; |
| 274 | ea_pa = ea + pava_ea; |
| 275 | fprintf(out, "(EA V/P %llx / %llx)", |
| 276 | ea, ea_pa, iw); |
| 277 | } |
| 278 | fprintf(out, "\n"); |
| 279 | } |
| 280 | } |
| 281 | \end{verbatim} |
| 282 | |
| 283 | |
| 284 | \section{Versions of RST} |
| 285 | |
| 286 | \begin{verbatim} |
| 287 | 2.0 6/??/2001 Mostly the same as V1.10. Has official MP support |
| 288 | 1.10 5/09/2001 MP support via cpuid field; better utility fn API\n\ |
| 289 | 1.9 4/20/2001 PREG_T: add cpuid, rename asiReg\n\ |
| 290 | 1.8 3/27/2001 unixcommand(), rstf_snprintf(), stdized rstf_headerT,magic\n\ |
| 291 | 1.7 3/26/2001 Add RECNUM_T for rst-snapper\n\ |
| 292 | 1.6 3/15/2001 Add support for MP (cpu-id to pavadiff, more TLB info)\n\ |
| 293 | 1.5 9/18/2000 Fixed Shade V6 record types (thanks Kelvin)\n\ |
| 294 | 1.4 9/9/2000 Added icontext and dcontext to PAVADIFF_T rec\n\ |
| 295 | 1.4 9/?/2000 Added major, minor numbers to HEADER_T rec\n\ |
| 296 | 1.3 8/25/2000 Added PATCH_T type.\n\ |
| 297 | 1.2 8/22/2000 Added STATUS_T type.\n\ |
| 298 | \end{verbatim} |
| 299 | |
| 300 | \subsection{Where do I get RST code?} |
| 301 | |
| 302 | The \textff{rstf.h} header file is at |
| 303 | \textff{/import/archperf/include/rstf/rstf.h}. |
| 304 | Various other RST binaries exist in \textff{/import/archperf/bin/} |
| 305 | |
| 306 | Source code to various RST utilities are in the Code Manager WS |
| 307 | \textff{/import/archperf/ws/rstf/}. |
| 308 | |
| 309 | \labsec{Canonical} |
| 310 | \section{Detailed specification of the RST trace type} |
| 311 | |
| 312 | We present a precise specification for common record types, as a |
| 313 | reference for various analyzers. Compounding matters, different trace |
| 314 | sources, produce slightly differing values even for common cases. In |
| 315 | particular, the \textss{ea\_va} field for branches has several |
| 316 | interpretations. Historical note: In my mind, when defining the |
| 317 | \textss{INSTR\_T} record, there was no chance for ambiguity. I was |
| 318 | wrong. |
| 319 | |
| 320 | \subsection{Analyzing a canonical RST trace} |
| 321 | |
| 322 | The following table shows where/how to get information from an RST |
| 323 | trace. The notation \textss{ttt::fff} means to look at field |
| 324 | \textss{fff} in the \textem{last} record of type \textss{ttt}. The |
| 325 | notation \textss{ppp?xxx:yyy} means use value \textss{xxx} if predicate |
| 326 | \textss{ppp} is true, else use \textss{yyy}; if \textss{yyy} is missing |
| 327 | then there is no valid data value. |
| 328 | |
| 329 | \begin{tabular}{|l|l|} \hline |
| 330 | Value of interest & Where / how \\ \hline |
| 331 | IW (instr) & \textss{I-TLB miss} ? 0x0 : \textss{INSTR\_T::instr} \\ \hline |
| 332 | PC VA & \textss{INSTR\_T::pc\_va} \\ \hline |
| 333 | PC PA & \textss{INSTR\_T::pc\_va} + \textss{PAVADIFF\_T::pc\_pa\_va} |
| 334 | (known wrong if I-TLB miss) \\ \hline |
| 335 | ld/st VA & \textss{INSTR\_T::ea\_valid ? INSTR\_T::ea\_va} \\ \hline |
| 336 | ld/st PA & \textss{INSTR\_T::ea\_valid ? (INSTR\_T::ea\_va + |
| 337 | PAVADIFF\_T::ea\_pa\_va)} (PA invalid on D-TLB miss or non-translating ASI) \\ \hline |
| 338 | PC I context & \textss{PAVADIFF\_T::icontext} \\ \hline |
| 339 | ld/st D context & \textss{PAVADIFF\_T::dcontext} \\ \hline |
| 340 | ld/st ASI goes to mem ? & examine ASI; translating+bypass ASI's go to memory \\ \hline |
| 341 | instr is CTI ? & decode IW or look at \textss{INSTR\_T::ihash} \\ \hline |
| 342 | CTI is taken ? & \textss{INSTR\_T::bt} \\ \hline |
| 343 | taken CTI target & \textss{INSTR\_T::bt ? INSTR\_T::ea\_va} \\ \hline |
| 344 | instr is annulled ? & \textss{INSTR\_T::an} \\ \hline |
| 345 | \hline |
| 346 | ld/st ASI & \textss{Immed-ASI ? IW : PREG\_T::asireg} (Decode IW to |
| 347 | determine Immed-ASI)\\ \hline |
| 348 | trap level & \textss{saw TRAP\_T ? (TRAP\_T:tl + 1) : |
| 349 | PREG\_T::trap\_lvl} \\ \hline |
| 350 | enter trap & Get a \textss{TRAP\_T} record and/or |
| 351 | \textss{INSTR\_T::tr} \\ \hline |
| 352 | trap type & \textss{TRAP\_T::ttype} \\ \hline |
| 353 | exit trap & Get DONE / RETRY IW and/or get \textss{PREG\_T} \\ \hline |
| 354 | system call & \textss{TRAP\_T::syscall} \\ \hline |
| 355 | TLB demap & \textss{TLB\_T::demap} is 1. |
| 356 | \end{tabular} |
| 357 | |
| 358 | The \textss{PREG\_T} (priviledged register) record is the new name for |
| 359 | badly-named \textss{CONTEXT\_T} record. The \textss{PREG\_T} record |
| 360 | encodes various hardware values. And making things all the more |
| 361 | galling, the \textss{CONTEXT\_T/PREG\_T} \textem{should not be used to |
| 362 | detect context switches}. It is deprecated as of RST format V2.05 and |
| 363 | has been renamed \textss{PREG\_T}. |
| 364 | |
| 365 | \begin{tabular}{|l|l|} \hline |
| 366 | Register & Where / how \\ \hline |
| 367 | PSTATE & \textss{CONTEXT\_T::pstate} \\ \hline |
| 368 | ASI reg & \textss{CONTEXT\_T::asireg} \\ \hline |
| 369 | I-MMU primary context & \textss{CONTEXT\_T::primA} (non-existent for |
| 370 | SPARC) \\ \hline |
| 371 | I-MMU secondary context & \textss{CONTEXT\_T::secA} (non-existent for |
| 372 | SPARC) \\ \hline |
| 373 | D-MMU primary context & \textss{CONTEXT\_T::primD} \\ \hline |
| 374 | D-MMU secondary context & \textss{CONTEXT\_T::secD} \\ \hline |
| 375 | \end{tabular} |
| 376 | |
| 377 | \subsection{Table of common cases for INSTR and PAVADIFF records} |
| 378 | |
| 379 | To help clarify the above information, the following table lists the |
| 380 | value from \textss{INSTR\_T} and \textss{PAVADIFF\_T} fields for the |
| 381 | various common cases. We use the nomenclature \texttt{undef}=undefined, |
| 382 | \textss{valid}=expected value, N/A=not used or not applicable, and |
| 383 | \texttt{impdep}=implementation dependent. In particular, pavadiff |
| 384 | records are emitted on demand and a N/A entry will not trigger a |
| 385 | pavadiff record. |
| 386 | |
| 387 | \begin{tabular}{|l|l|l|l|l|l|} |
| 388 | & \mc{3}{|c|}{\textss{INSTR\_T}} & \mc{2}{|c|}{\textss{PAVADIFF\_T}} \\ |
| 389 | Case & \texttt{IW} & \texttt{pc\_va} & \texttt{ea\_va} & \texttt{pavaPC} & \texttt{pavaEA} \\ |
| 390 | ITLB miss & norec & norec & norec & norec & norec \\ |
| 391 | non-mem, non-CTI & valid & PC VA & undef & valid & valid \\ |
| 392 | memop good & valid & PC VA & EA VA & valid & valid \\ |
| 393 | memop DTLB miss & valid & PC VA & EA VA & valid & N/A \\ |
| 394 | CTI taken & valid & PC VA & target PC & valid & N/A \\ |
| 395 | CTI non-taken & valid & PC VA & \textem{impdep} & valid & N/A \\ |
| 396 | annul ITLB miss & undef & PC VA & norec & norec & norec \\ |
| 397 | annul instr & valid & PC VA & N/A & valid & N/A \\ |
| 398 | \end{tabular} |
| 399 | |
| 400 | \subsection{Ordering of simultaneous records} |
| 401 | |
| 402 | When several RST records apply to a given event, we recommend the |
| 403 | following ordering. Records in the same group can be ordered |
| 404 | arbitrarily; in practice, only Our guiding strategy is to make it easy |
| 405 | for an RST trace consumer to process the information. Generally, we try |
| 406 | to put \textss{INSTR\_T} record last, unless there is a |
| 407 | \textss{REGVAL\_T} record containing values produced by the instr. |
| 408 | |
| 409 | \begin{tabular}{|l|l|} |
| 410 | Group 1 & \textss{CPU\_T} \\ |
| 411 | Group 2 & \textss{PREG\_T}, \textss{REGVAL\_T} with postInstr=0 \\ |
| 412 | Group 3 & \textss{TRAP\_T}, \textss{TRAPEXIT\_T} \\ |
| 413 | Group 4 & \textss{PAVADIFF\_T} \\ |
| 414 | Group 5 & \textss{INSTR\_T} \\ |
| 415 | Group last & \textss{REGVAL\_T} with postInstr=1 \\ |
| 416 | \end{tabular} |
| 417 | |
| 418 | If \textss{REGVAL\_T::postInstr} is set, then the values are those |
| 419 | present after the instruction has executed. |
| 420 | |
| 421 | \swallow{ |
| 422 | arch specific |
| 423 | algorithm for I context changes in MM |
| 424 | TLB field |
| 425 | ihash fn |
| 426 | } |
| 427 | |
| 428 | \subsection{Common errors} |
| 429 | |
| 430 | To detect a context switch, examine \textss{PAVADIFF\_T::icontext}, |
| 431 | \textsl{do not use a \textss{CONTEXT\_T} record}. |
| 432 | \textss{PAVADIFF\_T::icontext} and \textss{PAVADIFF\_T::dcontext} give |
| 433 | the effective I and D contexts being used, which is correct to the best |
| 434 | of my knowledge. |
| 435 | |
| 436 | On a ASI LD/ST using an immediate ASI, do not use the |
| 437 | \textss{CONTEXT\_T::asireg} field, as this field contains the contents |
| 438 | of the ASI register. |
| 439 | |
| 440 | The \textss{TLB\_T::valid} bit is meaning less. To determine if a TLB |
| 441 | line is valid, instead, look at the valid bit in the TLB TTE data. |
| 442 | |
| 443 | \subsection{Clarification and the blaze RST tracer} |
| 444 | |
| 445 | We cover several ambiguous corner cases in this section. We also |
| 446 | describe what the blaze V1.x and V2.x RST tracer does in these cases. |
| 447 | |
| 448 | \begin{rqitemize}{0em} |
| 449 | \item {}[Annulled instr - I-TLB hit] |
| 450 | The VA PC, PA PC and IW are all valid. The blaze RST tracer emits an |
| 451 | instruction record with the annulled bit set. |
| 452 | |
| 453 | \item {}[Annulled instr - with ITLB miss] If there is an I-TLB miss on |
| 454 | an annulled instruction, \textsl{only the VA PC is valid}. Because we |
| 455 | cannot determine the PC PA, we cannot even fetch the IW. |
| 456 | |
| 457 | (The blaze RST tracer) On an annulled instr that misses the I-TLB, we |
| 458 | emit an IW=0 (illegal trap) and blithely let the previous PAVADIFF |
| 459 | remain. While this PC PAVADIFF is technically wrong, we deemed it it |
| 460 | less intrusive than generating a PAVADIFF just to flag that we have an |
| 461 | unknow PC PA. (I had tried emitting a special PAVADIFF, but have since |
| 462 | retracted this approach.) Additionally, trace analyzers that do care |
| 463 | about the PC PA of an annulled instruction should be smart enough to |
| 464 | suppress the I-TLB miss. |
| 465 | |
| 466 | \item {}[Memory ops with a D-TLB miss] |
| 467 | |
| 468 | A memop that has a D-TLB miss will appear twice in the trace. The |
| 469 | sequence will be $(i)$ memop-try-1 + DTLB miss, $(ii)$ D-TLB miss handler |
| 470 | with possible complications like TSB miss and/or page fault, $(iii)$ |
| 471 | memop try-2 which succeeds. On a D-TLB miss, the |
| 472 | PA EA will be unknown on the first try. |
| 473 | |
| 474 | (The blaze RST tracer) On a D-TLB miss, we first emit an RST trap |
| 475 | record indicating the D-TLB miss and then it emits instr record for the |
| 476 | memop. The RST instruction record for the memop has its trap bit set. |
| 477 | That's it. In particular, we do not emit a PAVADIFF record, as we rely |
| 478 | on the trace consumer to detect the DTLB miss and squash the memop. |
| 479 | |
| 480 | \item {}[EA for untaken branches] |
| 481 | If a CTI instr is not taken (an untaken branch), we know we shall fall |
| 482 | through to next PC. In this case, \textss{INSTR\_T::ea\_valid} is 0, |
| 483 | and \textss{INSTR\_T::ea\_va} \textsl{is unspecified}. For blaze and |
| 484 | atrace-based RST traces, \textss{INSTR\_T::ea\_va = PC +8}; for |
| 485 | shade-based RST traces, \textss{INSTR\_T::ea\_va = taken-target-PC}. |
| 486 | |
| 487 | \item {} [PSTATE.AM = 1] If the AM bit is one, all virtual addresses |
| 488 | are limited to 32 bits. The \textss{INSTR\_T::ea\_va} field must |
| 489 | contain a 32-bit value; namely the upper 32 bits of the 64-bit |
| 490 | \textss{ea\_va} field must be zero. |
| 491 | |
| 492 | \item {}[PA EA only applies to mem ops] |
| 493 | Although \textss{INSTR\_T::ea\_va} holds both $(i)$ memory addresses and |
| 494 | $(ii)$ CTI target, you must use \textss{PAVADIFF\_T::ea\_pa\_va} value |
| 495 | only for memory operations. The RST spec forbids using PAVADIFF to get |
| 496 | the PA of a CTI/branch target, because $(a)$ you can get the PA PC from |
| 497 | the actual target instruction itself and $(b)$ at the time of the |
| 498 | CTI/branch, the PA PC may not be known as we may incur an I-MMU miss. |
| 499 | |
| 500 | \item {}[TLB demap operation] On a TLB demap operation, we record the |
| 501 | VA and context of the TLB entry that is demapped. We do not record |
| 502 | the TTE\_data of the entry being demapped. (There was a bug where the |
| 503 | \textss{TLB\_T::demap} was not being set.) |
| 504 | |
| 505 | \item {}[LD/ST ASI to non-memory (e.g. to an MMU register)] For all |
| 506 | loads and stores, even load/store ASI instructions, |
| 507 | \textss{INSTR\_T::ea\_valid = 1} and \textss{INSTR\_T::ea\_va} holds |
| 508 | the virtual address. Some of the ld/st ASI have a meaningful virtual |
| 509 | addresses, e.g. \textss{ASI\_UDB\_INTR\_W} or |
| 510 | \textss{ASI\_DTLB\_DATA\_TAG\_REG}, so the \textss{INSTR\_T::ea\_va} |
| 511 | must contain the effective address. |
| 512 | |
| 513 | For non-translating ASI's, the downstream trace analyzer must not |
| 514 | generate a PA, which puts the burden of knowing whether to generate a PA |
| 515 | on the trace analyzer. The ASI's obey the following breakdown, where |
| 516 | [aa,bb] is the range inclusive of aa and bb, namely $aa \le x \le bb$. |
| 517 | |
| 518 | \begin{tabular}{|l|l|} \hline |
| 519 | Range & How to get PA from VA \\ \hline |
| 520 | {}[0x04,0x11] [0x18,0x19] [0x24,0x2c] & Translate via MMU \\ |
| 521 | {}[0x70,0x73] [0x78,0x79] [0x80,0xff] & Translate via MMU \\ \hline |
| 522 | {}[0x14,0x15] [0x1c,0x1d] & Bypass. PA=VA \\ \hline |
| 523 | {}[0x45,0x6f] [0x76,0x77] [0x7e,0x7f] & Non-translating (no PA) \\ \hline |
| 524 | \end{tabular} |
| 525 | |
| 526 | The following tables lists some common translating ASI's. |
| 527 | |
| 528 | \begin{tabular}{|l|l|} |
| 529 | Value & ASI name \\ |
| 530 | 0X04 & NUCLEUS \\ |
| 531 | 0X0C & NUCLEUS\_LITTLE \\ |
| 532 | 0X10 & AS\_IF\_USER\_PRIMARY \\ |
| 533 | 0X11 & AS\_IF\_USER\_SECONDARY \\ |
| 534 | 0X80 & PRIMARY (the default ASI for all loads) \\ |
| 535 | 0X81 & SECONDARY \\ |
| 536 | 0X82 & PRIMARY\_NO\_FAULT \\ |
| 537 | 0X83 & SECONDARY\_NO\_FAULT \\ |
| 538 | 0X88 & PRIMARY\_LITTLE \\ |
| 539 | 0X89 & SECONDARY\_LITTLE \\ |
| 540 | 0XE0 & BLK\_COMMIT\_PRIMARY \\ |
| 541 | 0XE1 & BLK\_COMMIT\_SECONDARY \\ |
| 542 | 0XF0 & BLK\_PRIMARY \\ |
| 543 | 0XF1 & BLK\_SECONDARY \\ |
| 544 | \end{tabular} |
| 545 | |
| 546 | \end{rqitemize} |
| 547 | |
| 548 | \labsec{PAVA} |
| 549 | \subsection{VA to PA translation} |
| 550 | |
| 551 | The RST format is designed to capture both VA and PAs. There are |
| 552 | several overlapping ways to specify the necessary information. As of |
| 553 | 11/2001, the defacto standard is the \texttt{PAVADIFF} method and as |
| 554 | such, you may safely assume \textss{PAVADIFF\_T} records always exist. |
| 555 | (As of 3/2001, blaze, atrace and shade based RST traces all use |
| 556 | PAVADIFF\_T records). |
| 557 | |
| 558 | \textsl{PAVADIFF method:} A standard method is to use \texttt{PAVADIFF} |
| 559 | records, which captures the (PA-VA) values for the I-TLB and D-TLB. In |
| 560 | a \texttt{PAVADIFF} record, the \textss{pc\_pa\_va} field contains the |
| 561 | difference of (PA-VA) for the PC of the next \texttt{INSTR} record. If |
| 562 | the INSTR is a \texttt{load} or \texttt{store} (but not a |
| 563 | \texttt{branch/call/jump} ), then the \textss{ea\_pa\_va} field of the |
| 564 | \texttt{PAVADIFF} record holds the (PA-VA) for the EA, which is how the |
| 565 | D-TLB would translate that EA. Here is how the trace might look. |
| 566 | |
| 567 | As of RST V1.9 (4/2001), there is a separate \texttt{PAVADIFF} record |
| 568 | for each CPU. The CPU ID is contained in \texttt{PAVADIFF\_T::cpuid}. |
| 569 | |
| 570 | {\footnotesize |
| 571 | \begin{verbatim} |
| 572 | 3 pavadiff: context=571 cpu=0 pc_pa_va=0x00000002c0800000 ((ea_pa_va=0xffffffffffffffff)) ea_valid=0 |
| 573 | 4 instr : u [0x000000010050f144] srl %i1, 0, %i0 |
| 574 | 5 instr : u [0x000000010050f148] or %g3, %g2, %g2 |
| 575 | 6 instr : u [0x000000010050f14c] sll %i0, 2, %g3 |
| 576 | 7 pavadiff: context=571 pc_pa_va=0x00000002c0800000 ea_pa_va=0x00000002c0800000 ea_valid=1 |
| 577 | 8 instr : u [0x000000010050f150] ldsw [%g3 + %g2], %g3 [0x000000010050e390] |
| 578 | 9 instr : u [0x000000010050f154] jmpl %g3 + %g2, %g0 T |
| 579 | 10 instr : u [0x000000010050f158] nop |
| 580 | \end{verbatim} |
| 581 | } |
| 582 | |
| 583 | Note for a control-transfter instruction (e.g \texttt{br/call/jmpl} etc) |
| 584 | which jumps to a target PC, \texttt{targPC}, you must "wait" until you |
| 585 | see the target INSTR record to determine the PA for \texttt{targPC}. If |
| 586 | \texttt{targPC} is on a different page with a different (PA-VA) value |
| 587 | than the current PC, there will be a PAVADIFF record before the target |
| 588 | instruction, if necessary. |
| 589 | |
| 590 | \begin{rqcode}{ } |
| 591 | PAVADIFF pc\_pa\_va=diffaa |
| 592 | ... |
| 593 | PCaa br targPC // targPC is on a different page |
| 594 | PCaa+4 delay slot instr |
| 595 | |
| 596 | PAVADIFF pc\_pa\_va=diffbb // new value for PA-VA for targPC. |
| 597 | targPC target instruction // PA of targPC = (targPC + diffbb) |
| 598 | \end{rqcode} |
| 599 | |
| 600 | There are two common ways of using PAVADIFF records. With |
| 601 | \textsl{on-change}, I generate PAVADIFF records only the (PA-VA) values |
| 602 | change for either the PC or the EA. If the \textss{ea\_valid} field is |
| 603 | false (0), then the previous \textss{ea\_pa\_va} value is still assumed |
| 604 | to be correct. One caveat, it is possible for the (PA-VA) values to be the same for |
| 605 | different (TLB) pages, in which case, you may not see PAVADIFF record |
| 606 | even when we cross pages. |
| 607 | |
| 608 | Another possible use of PAVADIFF records is on a \textsl{every-instr} |
| 609 | basis, in which PAVADIFF record precedes every instr, which nearly |
| 610 | doubles the trace size. Nobody in their right mind does this as of |
| 611 | 1/2001. |
| 612 | |
| 613 | \textsl{TLB method:} The conceptually preferred (but practically |
| 614 | difficult) method is to have \texttt{TLB} records which describe all the |
| 615 | necessary mappings before they are used. Whenever a TLB line is |
| 616 | changed, a corresponding TLB records appears in the trace. Also, in an |
| 617 | RST trace from blaze, the entire TLB is dumped at the beginning of the |
| 618 | trace. As of RST V1.9, a TLB record contians the TLB unit (e.g. Cheetah |
| 619 | has two I-TLBS units) and the CPU to which it applies. |
| 620 | |
| 621 | Despite its compactness, TLB records cannot be used universally for |
| 622 | VA-to-PA translation. The TLB method for tramslation is the most |
| 623 | compact, because the TLB records should be relatively infrequent in a |
| 624 | trace. However, to get PA's, your analyzer program must simulate a TLB, |
| 625 | which has proven to be difficult, slow and hence extremely unpopular. |
| 626 | And in many cases, (e.g. when the trace is from atrace or shade), TLB |
| 627 | information is unavailable, so TLB records will be missing. |
| 628 | |
| 629 | Here is sample output from a RST trace from the blaze \textss{rstracer} |
| 630 | module. Records 9-4014 (roughly 2048 I-TLB + 2048 D-TLB) are the |
| 631 | initial TLB values. At record 4649, we replace I-TLB entry 1090. Here |
| 632 | \texttt{type=0} means I-TLB. The \texttt{demap=0} field means that this |
| 633 | entry is being added to the TLB, which also replaces any previous entry. |
| 634 | |
| 635 | {\footnotesize |
| 636 | \begin{verbatim} |
| 637 | 1 string : string=date=00-06-26 |
| 638 | 2 string : string=host=bigc |
| 639 | 3 string : string=ramsize=1024M |
| 640 | |
| 641 | 4 string : string=tlbsize=2048 |
| 642 | 5 string : string=nwins=8 |
| 643 | 6 string : string=cpufreq=600000000 |
| 644 | 7 string : string=mpsteps=200 |
| 645 | 8 cpu : cpu=0 timestamp=0x45a318a1df |
| 646 | 9 tlb : demap=0 type=0 valid=0 index=0 state=0x0000 context=0x0000 tag=0x00000000ffd00000 data=0xa000000000e00064 pa=0xe00000 |
| 647 | 10 tlb : demap=0 type=0 valid=0 index=1 state=0x0000 context=0x0000 tag=0x00000000ffd10000 data=0xa000000000e10064 pa=0xe10000 |
| 648 | ... |
| 649 | ... |
| 650 | 4103 tlb : demap=0 type=1 valid=1 index=2046 state=0x0000 context=0x110f tag=0x00000003a68b310f data=0xe00010000c000032 pa=0x |
| 651 | c000000 |
| 652 | 4104 tlb : demap=0 type=1 valid=0 index=2047 state=0x0000 context=0x1115 tag=0x00000003a9fc7115 data=0xe000000008c00032 pa=0x |
| 653 | 8c00000 |
| 654 | 4105 cpu : cpu=1077781320 timestamp=0x45a318a1df |
| 655 | 4106 context : asi=0x0082 last_context=0x0120 trap_lvl=0x00 trap_type=0x00 pstate=0x0012 primA=0x0000 secA=0x0000 primD=0x0120 se |
| 656 | cD=0x0120 |
| 657 | 4107 instr : u [0x000000010087ec18] add %i1, %o2, %o0 |
| 658 | 4108 instr : u [0x000000010087ec1c] ldub [%o2 + %o4], %g3 [0x0000000101b7a7ba] |
| 659 | 4109 instr : u [0x000000010087ec20] subcc %g3, %g2, %g0 |
| 660 | 4110 instr : u [0x000000010087ec24] bple,a,pn %icc, 0x10087ec30 T [0x000000010087ec30] |
| 661 | ... |
| 662 | ... |
| 663 | 4648 instr : p [0x0000000010000cb4] nop an |
| 664 | 4649 tlb : demap=0 type=0 valid=0 index=1090 state=0x0000 context=0x0120 tag=0x0000000100b2a120 data=0x8000000003756020 pa=0x |
| 665 | 3756000 |
| 666 | 4650 instr : p [0x0000000010000cb8] stxa %g5, [%g0 + %g0]0x54 |
| 667 | 4651 context : asi=0x0082 last_context=0x0000 trap_lvl=0x00 trap_type=0x00 pstate=0x0012 primA=0x0000 secA=0x0000 primD=0x0120 se |
| 668 | cD=0x0120 |
| 669 | 4652 instr : p [0x0000000010000cbc] retry T [0x0000000100b2bed0] |
| 670 | 4653 instr : p [0x0000000100b2bed0] or %g0, %o0, %g2 |
| 671 | \end{verbatim} |
| 672 | } |
| 673 | |
| 674 | \textsl{PHYSADDR method:} The last brute force method is to put a |
| 675 | PHYSADDR record before every instr record. The PHYSADDR record contains |
| 676 | the PA for the following PC and the EA, if appropriate. |
| 677 | |
| 678 | \subsection{VA to PA translation historical notes} |
| 679 | |
| 680 | For many months, I (RQ) was convinced the TLB method was the correct way |
| 681 | to handle VA-PA translation. How difficult could simulating a TLB be? |
| 682 | The PAVADIFF was meant to be a stop-gap, until correct TLB simulators |
| 683 | were written. I was wrong. Very wrong. |
| 684 | |
| 685 | In retrospect, the use of PAVADIFF records has greatly simplified RST |
| 686 | trace processing. Even now (03/2002), two years after the initial |
| 687 | discussion, finding a correct TLB simulator (e.g. one that agrees with |
| 688 | the PAVADIFF recors) remains elusive. Special "thanks" to the MM team, |
| 689 | especially Sudi K, for steadfastly being unable to use TLB records, |
| 690 | forcing PAVADIFF records to become the standard. |
| 691 | |
| 692 | \subsection{Underlying philosophy} |
| 693 | |
| 694 | You should be able to glean most of what you want to know from just |
| 695 | \textss{INSTR\_T}, \textss{PAVADIFF\_T} and \textss{TRAP\_T} records. |
| 696 | |
| 697 | ((to be finished)) |
| 698 | |
| 699 | The \textss{TLB\_T} records let you do your own VA to PA translation |
| 700 | were extremely unpopular and have been superceded in practice by |
| 701 | \textss{PAVADIFF\_T} records. |
| 702 | |
| 703 | \textss{CONTEXT\_T} records were originally meant to be much more useful. |
| 704 | |
| 705 | \section{The blaze RST trace} |
| 706 | |
| 707 | While each individual RST record type is fairly unambigious, how the |
| 708 | records are put together is implementation dependent. |
| 709 | |
| 710 | In a \textss{TRAP\_T} record, the HW state is that \textbf{before} the |
| 711 | trap is taken. Thus, a trap from user code will show \textss{TL=0}. |
| 712 | |
| 713 | If executing an instr causes a trap, say a D-TLB miss, you will see the |
| 714 | instr (with the \textss{tr} bit set) and then trap. If fetching an |
| 715 | instr causes a trap (e.g. IMMU miss), you will not see the instruction |
| 716 | until after the trap returns. |
| 717 | |
| 718 | If there is a write to the \textss{PSTATE} or the \textss{TL} registers, |
| 719 | the new values are shown in a \textss{CONTEXT\_T} record. |
| 720 | |
| 721 | \subsection{Information in the RST header} |
| 722 | |
| 723 | As of 4/2001 (V1.4), the blaze \textss{rstracer} module spits out |
| 724 | copious information about the configuration. Before that, a more modest |
| 725 | modicum of information was spit out. As of Version 1.4, we get the |
| 726 | following series of records at the start of a trace. |
| 727 | |
| 728 | \begin{verbatim} |
| 729 | $ trv.sh -n 24 /import/arch-trace03/blaze/tpcc-try8/try8-t6.rsz |
| 730 | RST trace format (stdin) |
| 731 | ================ |
| 732 | User/ Branch |
| 733 | Rec # Type Priv PC Disassembly Taken EA |
| 734 | 0 header : majorVer=1 minorVer=8 RST Header v1.8 |
| 735 | 1 strdesc : "Blaze [ rstracer.so ]" |
| 736 | 2 strdesc : "rstracer=V1.4" |
| 737 | 6 strdesc : "rstracer [compiled against Blz3.48 - Excal 5.8 RW MP ||Disk API=[Trace,Timing]]" |
| 738 | 8 strdesc : "date=2001-04-05_01:51:56" |
| 739 | 9 strdesc : "host=bigc" |
| 740 | 10 strdesc : "<blazeinfo>" |
| 741 | 13 strdesc : "blz::version=3.49 - Excal 5.8 RW MP ||Disk API=[Trace,Timing]" |
| 742 | 14 strdesc : "blz::ncpus=1" |
| 743 | 15 strdesc : "blz::ram=1024M " |
| 744 | 16 strdesc : "blz::tlbsize=2048" |
| 745 | 17 strdesc : "blz::mmutype=spitfire" |
| 746 | 18 strdesc : "blz::cpufreq=200000000" |
| 747 | 19 strdesc : "blz::sysfreq=10000000" |
| 748 | 20 strdesc : "blz::diskdelay=800000" |
| 749 | 21 strdesc : "blz::nwins=8" |
| 750 | 22 strdesc : "blz::mpsteps=2" |
| 751 | 23 strdesc : "</blazeinfo>" |
| 752 | \end{verbatim} |
| 753 | |
| 754 | \section{Where is the source for RST?} |
| 755 | |
| 756 | The current source is in \textss{/import/archperf/pkgs/rstf/latest/}. |
| 757 | |
| 758 | \begin{tabular}{|l|l|} \hline |
| 759 | file & Description \\ \hline |
| 760 | \rqhttp{\textss{rstf.h}}{file:/import/archperf/pkgs/rstf/latest/rstf.h} & the RST |
| 761 | format \\ \hline |
| 762 | \rqhttp{\textss{rstf.c}}{file:/import/archperf/pkgs/rstf/latest/rstf.c} & a few utility routines and some test code \\ \hline |
| 763 | \end{tabular} |
| 764 | |
| 765 | \section{I want to process an RST trace, where do I start?} |
| 766 | |
| 767 | A simple, sample C++ skeleton to read an RST trace file at |
| 768 | \textss{/import/archperf/pkgs/rstf/latest/readRST.C}. |
| 769 | |
| 770 | A simple, sample ANSI C skeleton to read an RST trace file at |
| 771 | \textss{/import/archperf/pkgs/rstf/latest/readRST-ansiC.c}. I "thank" |
| 772 | Anders who found the C++ skeleton impenetrable, and so spent several |
| 773 | hours doing numerous moronic things getting this code to work. |
| 774 | |
| 775 | Finally, the file \textss{/import/archperf/ws/rstf/rstFilter.h} |
| 776 | contains a more realistic (i.e. complicated) example of RST processing |
| 777 | in which we read an RST trace, adding/modifying/deleting records, and |
| 778 | generate a new RST trace. This code double buffers both input and |
| 779 | output to guarantee that we can always access/modify the previous K |
| 780 | records at both the input and output. (In contrast, if you use a single |
| 781 | buffer and you just happen to fill (flush) the input (output) buffer, |
| 782 | you cannot access or modify the previous record). |
| 783 | |
| 784 | \subsection{The actual RST code} |
| 785 | |
| 786 | Here are the corresponding record definitions directly from |
| 787 | \textss{rstf.h}. The code on this web page maybe a bit out of date, so |
| 788 | check the source \rqlink{/import/archperf/pkgs/rstf/latest/rstf.h}. |
| 789 | |
| 790 | \begin{verbatim} |
| 791 | typedef struct { |
| 792 | uint8_t rtype; /* value = INSTR_T */ |
| 793 | unsigned notused : 1; /* not used */ |
| 794 | unsigned ea_valid : 1; /* ea_va field is valid */ |
| 795 | unsigned tr : 1; /* trap occured 1=yes */ |
| 796 | unsigned notused2 : 1; /* not used */ |
| 797 | unsigned pr : 1; /* priviledged or user 1=priv */ |
| 798 | unsigned bt : 1; /* branch/trap taken, cond-move/st done, like Shade6 */ |
| 799 | unsigned an : 1; /* 1=annulled (instr was not executed) */ |
| 800 | unsigned reservedCompress : 1; /* used by rstzip compression */ |
| 801 | uint16_t ihash; /* ihash value (optional) */ |
| 802 | uint32_t instr; /* instruction word (opcode, src, dest) */ |
| 803 | uint64_t pc_va; /* VA */ |
| 804 | uint64_t ea_va; /* Eff addr VA */ |
| 805 | } rstf_instrT; |
| 806 | |
| 807 | typedef struct { |
| 808 | uint8_t rtype; /* value = PAVADIFF_T */ |
| 809 | unsigned ea_valid : 1; /* does ea_pa contain a valid address */ |
| 810 | unsigned cpuid : 7; |
| 811 | uint16_t notused16; /* (deprecated) context used for these diffs */ |
| 812 | uint16_t icontext; /* I-context used for these diffs */ |
| 813 | uint16_t dcontext; /* only valid if ea_valid is true, */ |
| 814 | uint64_t pc_pa_va; /* (PA-VA) of PC */ |
| 815 | uint64_t ea_pa_va; /* (PA-VA) of EA for ld/st (not branches), if ea_valid is true */ |
| 816 | } rstf_pavadiffT; |
| 817 | |
| 818 | typedef struct { |
| 819 | uint8_t rtype; /* value = TRAP_T */ |
| 820 | unsigned is_async : 1 ; /* asynchronous trap ? */ |
| 821 | unsigned unused : 3 ; /* unused */ |
| 822 | unsigned tl : 4 ; /* trap level in the trap handler */ |
| 823 | uint16_t ttype; /* trap type for V9, only 9 bits matter */ |
| 824 | |
| 825 | uint16_t pstate; /* Pstate register in the trap, only 9 bits */ |
| 826 | uint16_t syscall; /* If a system call, the syscall # */ |
| 827 | |
| 828 | uint64_t pc; |
| 829 | uint64_t npc; |
| 830 | } rstf_trapT; |
| 831 | \end{verbatim} |
| 832 | |
| 833 | \section{System calls} |
| 834 | |
| 835 | Depending on the tracing harness, system call information maybe present |
| 836 | in the trace. E.g. in \textss{RST/blaze}, system call information is |
| 837 | present. |
| 838 | |
| 839 | A system call consists of a (software) trap instruction to trap TRNUM, |
| 840 | with the \texttt{\%g1} register containing the system call number. |
| 841 | There is one trap number for 32-bit and a second trap for 64-bit system |
| 842 | calls. |
| 843 | |
| 844 | \begin{tabular}{|l|l|} |
| 845 | TRNUM & system call \\ |
| 846 | \texttt{0x108} & 32 bit system call \\ |
| 847 | \texttt{0x140} & 64 bit system call \\ |
| 848 | \end{tabular} |
| 849 | |
| 850 | See the C header file \textff{/usr/include/sys/syscall.h} for the system |
| 851 | call numbers. Thus \textss{2=fork}, \textss{5=open}, and |
| 852 | \textss{173=pread}. The header file \textff{/usr/include/sys/trap.h} |
| 853 | has the 32-bit trap number. (I forgot where I found the 64-bit system |
| 854 | call trap.) |
| 855 | |
| 856 | Thus, a system call will appear as a \textss{TRAP\_T} record with the |
| 857 | \textss{ttype} field set to either \textss{0x108} or \textss{0x140} and |
| 858 | the \textss{syscall} field holding system call index. For example, in |
| 859 | this blaze TPCC trace snippet (\textss{t5sds}), the trap record at 86034 |
| 860 | indicates a system call (pread) is being made at instruction record |
| 861 | 86035. |
| 862 | |
| 863 | \begin{verbatim} |
| 864 | 86032 instr : cpuid=0 u [0xffffffff7dfa34d8] stx %o0, [%sp + 0x87f] [0xffffffff7fff0910] |
| 865 | 86033 instr : cpuid=0 u [0xffffffff7dfa34dc] or %g0, 0xad, %g1 |
| 866 | 86034 trap : cpuid=0 is_async=0 async==0 tl=0 ttype=0x140 pstate=0x012 syscall=0x00ad |
| 867 | 86035 instr : cpuid=0 p [0xffffffff7dfa34e0] ta %icc, %g0 + 0x40 T [0x0000000001002800] tr |
| 868 | \end{verbatim} |
| 869 | |
| 870 | \section{Trace format design} |
| 871 | |
| 872 | \subsection{How do I encode state (such as warmed cache state) in RST?} |
| 873 | |
| 874 | In short, do not do this. RST is designed for capturing a dynamic |
| 875 | sequence of events (instructions, TLB activity, etc) from an computer |
| 876 | system. |
| 877 | |
| 878 | If you need to heterogenous information in a single trace, create a |
| 879 | \rqhttp{unatrace}{http://smeeng.eng/\rqtilde{}quong/unawrap.html}, which |
| 880 | is a general purpose trace \textsl{wrapper} format. Aztecs snaps, which |
| 881 | consist of [cache + TLB + branch predictor warming + RST instruction |
| 882 | traces] use the unawrap format. |
| 883 | |
| 884 | \subsection{Design tradeoffs in RST} |
| 885 | |
| 886 | Any trace format must be a balance of the following design tradeoffs, |
| 887 | because not all properties can be achieved simultaneously. We evaluate |
| 888 | RST against various criterion. |
| 889 | |
| 890 | \begin{tabularx}{\linewidth}{|l|l|l|X|} \hline |
| 891 | Goal & RST grade & Conflicts with & Description \\ \hline |
| 892 | Simple & A & Size & Trace format should be easy to use. RST uses a |
| 893 | fixed size record so it is easy to skip N records. RST has a common |
| 894 | rtype byte so decoding a record is very easy. \\ \hline |
| 895 | Size & D & Simplicity & Information density should be high, as traces |
| 896 | are often very large. RST requires about 30 bytes per instruction |
| 897 | (PA+VA, PC+EA, TLB, traps events). We believe an separate compression |
| 898 | phase can be used to reduce the RST size (use of a beta quality |
| 899 | compressor and gzip reduced the size of RST by approx 5-10X). \\ \hline |
| 900 | Flexible & A & Size & A trace should be able to hold different types |
| 901 | of data. A trace format which uses a fixed-record type severely |
| 902 | restricts flexibility, because every record must have a field for every |
| 903 | type. We avoid this in RST by having a different record types in RST. |
| 904 | \\ \hline |
| 905 | \end{tabularx} |
| 906 | |
| 907 | Other RST design notes. (1) The RST trace instruction record was |
| 908 | designed to hold an instruction word (32-bit), instruction record |
| 909 | (64-bit PC) and memory effective address (64-bit EA) and other overhead |
| 910 | such as the \textss{rtype} byte. This lead to the 24-byte record size. |
| 911 | |
| 912 | \section{Patching for Aztecs} |
| 913 | |
| 914 | \begin{verbatim} |
| 915 | 1418 instr : u [0x000000010048e9b8] add %g4, 1, %g4 tr |
| 916 | 1419 patch : isbegin=1 rewindrecs=0 id=1 length=2 descr=atrPCdAZ |
| 917 | 1420 instr : u [0x000000010048e9bc] jmpl %g2 + 0, %g1 T [0x0000000078404780] |
| 918 | 1421 instr : u [0x000000010048e9c0] nop |
| 919 | 1422 patch : isbegin=0 rewindrecs=0 id=1 length=2 descr=atrPCdAZ |
| 920 | 1423 context : asi=0x0000 last_context=0x0000 trap_lvl=0x00 trap_type=0x00 pstate=0x0000 primA=0x0000 secA=0x0000 primD=0x0000 secD= |
| 921 | 0x0000 |
| 922 | 1424 pavadiff: context=0 pc_pa_va=0x00000003673c0000 ((ea_pa_va=0xffffffffffffffff)) ea_valid=0 |
| 923 | 1425 instr : u [0x0000000078404780] save %sp, -0xb0, %sp |
| 924 | \end{verbatim} |
| 925 | |
| 926 | \section{FAQ} |
| 927 | |
| 928 | \subsection{There is a TRAP record and a tr bit in the instruction record. What is the difference?} |
| 929 | |
| 930 | The trap record contains many values including the trap type, trap |
| 931 | level, PC, NPC, pstate register and the system call number (\%g1 |
| 932 | register) on a syscall trap. |
| 933 | |
| 934 | The \textss{tr} bit in the instruction simply indicates if a trap |
| 935 | occurred during this instruction. The tr bit is necessary to clearly |
| 936 | distinguish when a trap occurs. |
| 937 | |
| 938 | \section{The rstf workspace} |
| 939 | |
| 940 | \subsection{Purpose} |
| 941 | |
| 942 | \begin{rqenumerate}{0em} |
| 943 | \item The main purpose of this WS is to define the RST file format in |
| 944 | \textff{rstf.h}. |
| 945 | Some secondary and/or deprecated definitions are in |
| 946 | \textff{rstf\_*.h} |
| 947 | |
| 948 | \item A secondary purpose is to define common RST utilities/code, including |
| 949 | starter code and RST-to-RST filters. |
| 950 | \end{rqenumerate} |
| 951 | |
| 952 | \subsection{Guidance on updating this workspace} |
| 953 | |
| 954 | The file \textff{rstf.h} defines the RST file format. The file format |
| 955 | consist of the rtype definitions and the fields within each record. |
| 956 | \textbf{Many} other programs use \textff{rstf.h}. So.... |
| 957 | |
| 958 | \begin{rqitemize}{0em} |
| 959 | \item Try to avoid changing this file if possible. |
| 960 | In the last 12 months (10/2001-10/2002), I have bumped the minor |
| 961 | version once. |
| 962 | \item Avoid breaking backward compatibility \textbf{AT ALL COSTS}. |
| 963 | There is considerable data in \texttt{rstf} 2.04-2.06 format. |
| 964 | \item The safest changes involve adding new rtypes or adding more constants |
| 965 | to existing enumerations. E.g. filling out the register constants |
| 966 | in the \textss{REGVAL\_T:regtype[]} |
| 967 | \item There is a Java port of \texttt{rstf}, in the (to be released |
| 968 | 12/2002) \textss{jrst} workspace. A Perl script in \textss{jrst} |
| 969 | "parses" \textff{rstf.h} and makes undocumented assumptions about |
| 970 | the way \textff{rstf.h} looks. Please try to conform to the |
| 971 | existing style in the typedefs and enums. |
| 972 | |
| 973 | \item I (RQ) have tried to be stingy in using \textss{rtype} values. |
| 974 | I have unofficially reserved bits 7 and 6 of the rtype as a hedge for |
| 975 | (two rounds of) sweeping changes to RST in the distant future if it |
| 976 | comes to that. Thus, i strongly recommend only using rtypes from |
| 977 | 2-63. |
| 978 | \end{rqitemize} |
| 979 | |
| 980 | If you must change \textff{rstf.h}, bump the version number in |
| 981 | \textff{rstf.h} |
| 982 | |
| 983 | \subsection{Version numbers} |
| 984 | |
| 985 | Many programs or code snippets have version numbers. The big rule about |
| 986 | version numbers is that given an RST trace and full knowledge about the |
| 987 | history of the programs involved in producing the trace, you must (or |
| 988 | should) be able to determine what idiosyncrasies exist in that trace. |
| 989 | Note, you do \textem{not} know what version of the program were involved |
| 990 | producing the trace. |
| 991 | |
| 992 | As an example, you are given the trace \textss{try8-t24.rz.gz} from |
| 993 | 6/2001, which was produced by \textss{blaze V3} and \textss{rstracer}. |
| 994 | You are given the phone numbers of all the developers involved in |
| 995 | tracing at Sun, so you can obtain the history of all programs involved. |
| 996 | What are the issues, if any, of this trace from a data format and |
| 997 | correctness standpoint? First you have to determine which components |
| 998 | (or programs) were involved in this trace. Running \textss{trv.sh -n |
| 999 | 40} on this trace we see |
| 1000 | |
| 1001 | \begin{flushleft} |
| 1002 | 0 header : majorVer=1 minorVer=10 RST Header v1.10\\ |
| 1003 | 4 strdesc : " rstracer=V1.8"\\ |
| 1004 | 8 strdesc : " compiled against Blz 3.64 - Excal 5.8 LL RW MP ||Disk API=[Trace,Timing]"\\ |
| 1005 | 24 strdesc : "blz::version=3.65 - Excal 5.8 LL RW MP ||Disk API=[Trace,Timing]"\\ |
| 1006 | \end{flushleft} |
| 1007 | |
| 1008 | Thus this is a RSTF v1.10 trace and \textss{blaze V3.65} and the |
| 1009 | \textss{rstracer V1.8} were involved. You call up their developers and |
| 1010 | get the details of these programs from the dawn of time until now and |
| 1011 | have an understanding of the trace issues. |
| 1012 | |
| 1013 | Thus, here are the strong recommendations regarding version numbers and |
| 1014 | traces. |
| 1015 | |
| 1016 | \begin{rqenumerate}{0em} |
| 1017 | \item An RST trace must contain the version numbers of all programs |
| 1018 | involved in producing the trace. In the case of |
| 1019 | \textss{try8-t24.rz.gz}, this trace has the version numbers of |
| 1020 | \textss{rstf}, \textss{rstracer} and \textss{blaze}. |
| 1021 | |
| 1022 | \item The version number of each component (or program) must indicate |
| 1023 | that state of that component. I.e. if something is changed, the |
| 1024 | version number of that component must be changed. |
| 1025 | |
| 1026 | \item There must be a record of known bugs for each component for each |
| 1027 | version number. |
| 1028 | \end{rqenumerate} |
| 1029 | |
| 1030 | Here are some examples of version numbers. |
| 1031 | |
| 1032 | \begin{tabularx}{\linewidth}{|l|X|} |
| 1033 | Code & Description/philosopy of version numbers \\ |
| 1034 | rstf & Version of the RST Format records. Should not change often. \\ |
| 1035 | & The first record in an RSTF trace must define the version number |
| 1036 | If a new version of RSTF breaks backward compatibility (e.g. the |
| 1037 | format for PAVADIFF changes), increment |
| 1038 | the major version. And this should happen once every never. \\ |
| 1039 | rstFilter & updated when a filter is added or updated. Update freely. \\ |
| 1040 | rstracer & (in rstracer WS) |
| 1041 | Reflects which version of the rst tracer. The version indicates |
| 1042 | what bugs/idiosyncrasies exist. Note that the RST trace produced |
| 1043 | by \textss{rstracer} contains both the rstracer version number and |
| 1044 | the RSTF version num. |
| 1045 | \end{tabularx} |
| 1046 | |
| 1047 | \subsection{Basic programs and scripts in the rstf workspace} |
| 1048 | |
| 1049 | The master workspace for RST is \textff{/import/archperf/ws/rstf}. |
| 1050 | It should be open to all to do a bringover, aka world bringover-able. |
| 1051 | If you need to do a putback to this workspace, talk to someone in Arch |
| 1052 | Tools, say \textss{lren@eng}. |
| 1053 | |
| 1054 | \subsubsection{trv.sh} |
| 1055 | |
| 1056 | Look at RST files (compressed or not) in ASCII. (Replaces rstunzip and |
| 1057 | trconv). Runs a PAGER ( \textss{more} or \textss{less} ) if output is a |
| 1058 | terminal. \textem{Use this program}. |
| 1059 | |
| 1060 | \subsubsection{rstFilter.C} |
| 1061 | |
| 1062 | Implemements many (30+) RST-to-RST filters (read stdin/file , write |
| 1063 | stdout). Typically you need to use several filters in a row. All error |
| 1064 | messages go to stderr. This code offers generic double-buffering on |
| 1065 | both input and the output, making is "easier" (hah) to do |
| 1066 | transformations that must look at several records. |
| 1067 | |
| 1068 | \subsubsection{runRSTFilt.sh} |
| 1069 | |
| 1070 | Convenient shell script driver for running \textss{rstFilter}. Use |
| 1071 | this. |
| 1072 | |
| 1073 | \begin{rqcode}{ } |
| 1074 | // by hand |
| 1075 | rstFilter -a filter1 input-file | rstFilter -a filter2 | rstFilter -a |
| 1076 | filter3 > output |
| 1077 | |
| 1078 | // using runRSTFilt.sh |
| 1079 | runRSTFilt -a 'filter1 filter2 filter3' > output |
| 1080 | // Same as above but generate ASCII dumps of all intermediate files, too |
| 1081 | runRSTFilt -u -a 'filter1 filter2 filter3' > output |
| 1082 | |
| 1083 | // E.g to clean up the raw outout from atrace2rst [Atrace->RST] files, |
| 1084 | runRSTFilt.sh -a 'ihash addBrTarg' [-u] raw.rst > clean.rst |
| 1085 | \end{rqcode} |
| 1086 | |
| 1087 | \subsubsection{atr2rst.sh, atrace2rst.C and dumpatr} |
| 1088 | |
| 1089 | The script \textss{atr2rst.sh} = runs \texttt{atrace2rst} and does some |
| 1090 | post processing to clean up the RST. The 64-bit executable |
| 1091 | \textss{atrace2rst} converts an atrace to raw RST. The post processing |
| 1092 | adds ihash values and branch targets among other things. |
| 1093 | \textss{Dumpatr} is a hard link to atrace2rst; it is the same as running |
| 1094 | \textss{atrace2rst -a}. |
| 1095 | |
| 1096 | \subsubsection{snapForAztecs.sh} |
| 1097 | |
| 1098 | Generate snaps suitable for aztecs. Snaps the RST file and then runs a |
| 1099 | horrific combination of RST filters on the result and then compresses |
| 1100 | the results. Even the author does not want to look at this script. |
| 1101 | |
| 1102 | \section{History} |
| 1103 | |
| 1104 | The RST format and this document was started and then maintained by R |
| 1105 | Quong through 11/2002. |
| 1106 | |
| 1107 | \end{document} |