Commit | Line | Data |
---|---|---|
920dae64 AT |
1 | /* |
2 | * Copyright 2006 Sun Microsystems, Inc. All rights reserved. | |
3 | * Use is subject to license terms. | |
4 | */ | |
5 | #pragma ident "@(#)README.error_trap_generation 1.7 06/11/08 SMI" | |
6 | ||
7 | ||
8 | Error Trap Generation Framework | |
9 | =============================== | |
10 | ||
11 | User Guide | |
12 | ||
13 | ||
14 | What the Error Trap Generation framework supports: | |
15 | -------------------------------------------------- | |
16 | ||
17 | The user will be able to specify a list of errors and/or arbitrary | |
18 | traps to inject into the system along with specific conditions under | |
19 | which the errors/traps are to be injected. | |
20 | ||
21 | Optionally, the user may also specify a list of ASI override values | |
22 | which Legion will use to serve up load requests (or noop store | |
23 | requests). | |
24 | ||
25 | Legion provides simulation of all the Error Status Registers and | |
26 | Error Enabling Registers (error reporting and error recording) | |
27 | associated with supported errors. Legion will also generate the | |
28 | correct trap type depending on the error. | |
29 | ||
30 | This framework does not, however, provide a simulation for actually | |
31 | injecting real errors into TLBs, caches, etc. So you will have to | |
32 | supply ASI override values for any diagnostic ASIs such as reading | |
33 | raw cache line contents or TLB entries which may be needed by the | |
34 | hypervisor error handling code being simulated. | |
35 | ||
36 | Also, by using the arbitrary trap generation functionality combined | |
37 | with ASI overrides, you can pretty much inject and simulate any | |
38 | error -- even ones that have not been supported yet using the named | |
39 | error method. This allows the Hypervisor developers to make forward | |
40 | progress and not have to wait for various ASI registers to be fully | |
41 | supported in Legion. | |
42 | ||
43 | ||
44 | Enabling the Error Trap Generation framework: | |
45 | --------------------------------------------- | |
46 | ||
47 | Modify the GNUMakefile by setting ERROR_TRAP_GEN to 1 or 2 (a value | |
48 | of 2 generates more output and is mainly intended for use in Legion | |
49 | development) and recompile Legion. | |
50 | ||
51 | Then modify your Legion .conf file by adding error_asi {} and | |
52 | error_event {} definitions (described in detail later) to the | |
53 | "processor" definition like this: | |
54 | ||
55 | ... | |
56 | processor "rock" { | |
57 | ... | |
58 | error_asi { ... } | |
59 | error_asi { ... } | |
60 | ... | |
61 | error_event { ... } | |
62 | error_event { ... } | |
63 | ... | |
64 | } | |
65 | ... | |
66 | ||
67 | Alternatively, you could just add all your error_asi and error_event | |
68 | definitions to an error.conf file and #include that file from within | |
69 | the "processor" definition like this: | |
70 | ||
71 | ... | |
72 | processor "rock" { | |
73 | ... | |
74 | #include "error.conf" | |
75 | ... | |
76 | } | |
77 | ... | |
78 | ||
79 | User input/options: | |
80 | ------------------- | |
81 | ||
82 | In order to inject error traps under this framework, the user | |
83 | provides error_event {} definitions and error_asi {} definitions | |
84 | in the Legion .conf file. | |
85 | ||
86 | 1) error_asi {} | |
87 | ||
88 | These are used to specify ASI overrides. You can specify ASI | |
89 | overrides for ASI/VA pairs which are already supported in Legion | |
90 | or you can specify ASI overrides for accesses that would otherwise | |
91 | result in a DAX because they are not yet supported by Legion. | |
92 | ||
93 | In the case of ASIs which Legion already supports, the masks | |
94 | specified here are applied on top of the value that Legion would | |
95 | normally return. In the case of ASIs which Legion does not yet | |
96 | support, we record the value on a store, or retrieve the value | |
97 | then apply the masks specified here on a load. | |
98 | ||
99 | NOTE: In the case of an ASI which Legion does not yet support, if | |
100 | you intend to write to multiple VAs and would like the values to | |
101 | be retained, then each ASI/VA pair must have it's own error_asi | |
102 | entry. | |
103 | ||
104 | The format for an ASI override is as follows: | |
105 | ||
106 | error_asi { | |
107 | ||
108 | // ASI number (required field) | |
109 | ASI 0x32; | |
110 | ||
111 | // VA for this override. If no VA is specified, this ASI | |
112 | // override will match on any VA for the given ASI. | |
113 | VA 0xec000018; | |
114 | ||
115 | // NAND mask value. This value is negated and then logically | |
116 | // "and"ed with the value being returned in the ldxa | |
117 | // operation. Defaults to 0 if not specified. | |
118 | NAND_MASK 0xff00; | |
119 | ||
120 | // OR mask value. This value is logically "or"ed with the | |
121 | // value being returned in the ldxa operation after we have | |
122 | // already applied the NAND_MASK. Defaults to 0 if not | |
123 | // specified. | |
124 | OR_MASK 0x4000000000; | |
125 | ||
126 | // Access count. This value defines the number of accesses | |
127 | // that an ASI overrides will be valid. Defaults to permanent | |
128 | // override (for the duration of current Legion run) if not | |
129 | // specified. | |
130 | ACCESS_CNT 0x5; | |
131 | ||
132 | // CPU mask. This value defines the 64bit mask of the cpus | |
133 | // for which this asi override is valid. If not specified | |
134 | // the ASI override will be valid for all cpus. | |
135 | CPU_MASK 0xf; | |
136 | } | |
137 | ||
138 | ||
139 | 2) error_event {} | |
140 | ||
141 | These are used to specify the type of error trap you want Legion | |
142 | to generate and under what circumstance it will be injected. The | |
143 | format is: | |
144 | ||
145 | error_event { | |
146 | ||
147 | // ASCII name from PRM. e.g. DCDP or UE | |
148 | error "DCDP"; | |
149 | ||
150 | // Arbitrary TT to inject (in lieu of a specific error). | |
151 | trap 0x63; | |
152 | ||
153 | // Arbitrary service processor (SP) interrupt level | |
154 | // to inject (in lieu of a specific error). | |
155 | // | |
156 | // NOTE: SP interrupts are currently only supported in | |
157 | // Rock (using the SCX library). SCX must be initialized | |
158 | // for both arbitrary SP interrupts and any built-in | |
159 | // errors which generate SP interrupts. See README.scx | |
160 | // for information on how to setup the SCX library. | |
161 | sp_intr 0x1; | |
162 | ||
163 | // | |
164 | // NOTE: Each error_event definition should include one of | |
165 | // ('error' or 'trap' or 'sp_intr') | |
166 | // | |
167 | ||
168 | // Instruction cycle count to wait for before injecting | |
169 | // the error trap. Defaults to 0x1 if unspecified. | |
170 | instn_cnt 0x10000; | |
171 | ||
172 | // %pc value to trigger the error trap. If the user also | |
173 | // specifies an instn_cnt, the %pc trigger will not get | |
174 | // loaded until the instn_cnt has first been reached. | |
175 | pc 0xf023972; | |
176 | ||
177 | // Strand ID of the CPU this error/trap is targetted at. | |
178 | // Defaults to 0x0 if unspecified. | |
179 | target_cpuid 0x4; | |
180 | ||
181 | // Address to which an access will trigger the error. | |
182 | // Address value of 0x0 means "any address". User can | |
183 | // optionally specify that the error will be tiggered by | |
184 | // either a "load" or a "store" -- default to both if | |
185 | // unspecified. | |
186 | address { 0x1f12080000; "store"; } | |
187 | ||
188 | // | |
189 | // NOTE: Each error_event definition should include at least | |
190 | // one of 'instn_cnt' or 'pc' value or address. A combination | |
191 | // off instn_cnt with pc or instn_cnt with address is also | |
192 | // possible. | |
193 | // | |
194 | ||
195 | // Privilege level the CPU should be in before the error | |
196 | // is injected. Defaults to "any priv level" if unspecified. | |
197 | priv "PRIV"; | |
198 | ||
199 | // Trap level the CPU should be in before the error is | |
200 | // injected. Defaults to "any trap level" is unspecified. | |
201 | tl 0x1; | |
202 | ||
203 | // Error persistence can be set with a trigger_cnt | |
204 | // directive. 1 means single trigger (default, if not | |
205 | // specified); > 1 means trigger N times. | |
206 | trigger_cnt 0x2; | |
207 | ||
208 | // ASI overrides associated with this error event. The | |
209 | // definition for these is the same as the error_asi {}'s | |
210 | // defined outside the error_event except for the fact | |
211 | // that they are enabled only after the error is triggered | |
212 | // AND the default access count is one (if access_cnt is | |
213 | // not specified, the override is valid for only one | |
214 | // matching access). | |
215 | error_asi { } | |
216 | } | |
217 | ||
218 | ||
219 | Example error.conf file: | |
220 | ------------------------ | |
221 | ||
222 | error_asi { ASI 0x58; VA 0xa0000150; } | |
223 | error_asi { ASI 0x37; VA 0x80000250; } | |
224 | error_asi { ASI 0x37; VA 0x800002D0; } | |
225 | error_asi { ASI 0x37; VA 0x80000350; } | |
226 | error_asi { ASI 0x37; VA 0x800003d0; } | |
227 | error_asi { ASI 0x4c; } | |
228 | error_asi { ASI 0x32; VA 0xec000050; } | |
229 | ||
230 | error_event { | |
231 | error "DCTM"; | |
232 | instn_cnt 0x1; | |
233 | address { 0x0; "load"; } | |
234 | priv "PRIV"; | |
235 | } | |
236 | ||
237 | error_event { | |
238 | error "WRFC"; | |
239 | address { 0x1f12080000; "store"; } | |
240 | priv "USER"; | |
241 | } | |
242 | ||
243 | error_event { | |
244 | trap 0x63; | |
245 | instn_cnt 0x2c38e3b; | |
246 | pc 0xf0239720; | |
247 | address { 0x0; "load"; } | |
248 | priv "PRIV"; | |
249 | error_asi { ASI 0x32; VA 0xec000018; OR_MASK 0x4000000000; } | |
250 | } | |
251 | ||
252 | "Dynamic" error.conf file load: | |
253 | --------------------------------- | |
254 | ||
255 | The user has the option of dynamically loading error_asi {} and | |
256 | error_event {} definitions while the simulation is executing. | |
257 | ||
258 | 1) Setting up dynamic error_event {} and error_asi {} definitions | |
259 | ||
260 | Dynamic error_asi and error_event definitions must be included in a | |
261 | separate error.conf file. In addition, the error.conf filename must | |
262 | be specified, using an error_reload_file_name definition, within the | |
263 | "processor definition" section of the standard conf file. If no | |
264 | error_reload_file_name is provided when you start legion, the default | |
265 | file called "reload.error.conf" will be opended and parsed. | |
266 | ||
267 | example: | |
268 | ||
269 | ... | |
270 | processor "rock" { | |
271 | ... | |
272 | error_reload_file_name "error.conf"; | |
273 | ... | |
274 | } | |
275 | ... | |
276 | ||
277 | 2) Loading dynamic error_event {} and error_asi {} definitions | |
278 | ||
279 | To dynamically load the defined error file, type '~er' in the | |
280 | simulated console window. | |
281 | ||
282 | Note: "Dynamic" error_event {} definitions will be appended to the | |
283 | existing list. Any error_asi {} definitions will be used to update | |
284 | any existing error_asi defintions that match on ASI/VA. This can | |
285 | be used to reset the access_cnt of an error_asi who's access_cnt | |
286 | is now 0. | |
287 | ||
288 | Other options available during the Legion run: | |
289 | ---------------------------------------------- | |
290 | ||
291 | 1) The user can dump the error_event {} and error_asi {} lists by | |
292 | typing '~ed' in the simulated console window. | |
293 | ||
294 | 2) The user can dump the list of all supported (built-in) error names | |
295 | by typing '~es' in the simulated console window. | |
296 | ||
297 | Notes, Limitations, and Known Bugs: | |
298 | ----------------------------------- | |
299 | ||
300 | - Legion processes the error events in the order in which they are | |
301 | listed in the .conf file and each CPU only monitors the trigger | |
302 | conditions for one error event at a time. This means you must list | |
303 | the error events in the order you wish to have them injected. | |
304 |