Commit | Line | Data |
---|---|---|
7eeb782e AT |
1 | The standard purpose of regression testing is to avoid getting the same |
2 | bug twice. When a bug is found, the programmer fixes the bug and adds a | |
3 | test to the test suite. The test should fail before the fix and pass | |
4 | after the fix. When a new version is about to be released, all the tests | |
5 | in the regression test suite are run and if an old bug reappears, this | |
6 | will be seen quickly since the appropriate test will fail. | |
7 | ||
8 | The regression testing in GNU Go is slightly different. A typical test | |
9 | case involves specifying a position and asking the engine what move it | |
10 | would make. This is compared to one or more correct moves to decide | |
11 | whether the test case passes or fails. It is also stored whether a test | |
12 | case is expected to pass or fail, and deviations in this status signify | |
13 | whether a change has solved some problem and/or broken something | |
14 | else. Thus the regression tests both include positions highlighting some | |
15 | mistake being done by the engine, which are waiting to be fixed, and | |
16 | positions where the engine does the right thing, where we want to detect | |
17 | if a change breaks something. | |
18 | ||
19 | @menu | |
20 | * Regression Testing:: Regression Testing in GNU Go | |
21 | * Test Suites:: Test Suites | |
22 | * Running the Regressions:: Running the Regression Tests | |
23 | * Running regress.pike:: Running regress.pike | |
24 | * Viewing with Emacs:: Viewing tests with Emacs | |
25 | * HTML Views:: HTML Views | |
26 | @end menu | |
27 | ||
28 | @node Regression Testing | |
29 | @section Regression testing in GNU Go | |
30 | ||
31 | Regression testing is performed by the files in the @file{regression/} | |
32 | directory. The tests are specified as GTP commands in files with the | |
33 | suffix @file{.tst}, with corresponding correct results and expected | |
34 | pass/fail status encoded in GTP comments following the test. To run a | |
35 | test suite the shell scripts @file{test.sh}, @file{eval.sh}, and | |
36 | @file{regress.sh} can be used. There are also Makefile targets to do | |
37 | this. If you @command{make all_batches} most of the tests are run. The | |
38 | Pike script @file{regress.pike} can also be used to run all tests or a | |
39 | subset of the tests. | |
40 | ||
41 | Game records used by the regression tests are stored in the | |
42 | directory @file{regression/games/} and its subdirectories. | |
43 | ||
44 | @node Test Suites | |
45 | @section Test suites | |
46 | ||
47 | The regression tests are grouped into suites and stored in files as GTP | |
48 | commands. A part of a test suite can look as follows: | |
49 | @example | |
50 | @group | |
51 | # Connecting with ko at B14 looks best. Cutting at D17 might be | |
52 | # considered. B17 (game move) is inferior. | |
53 | loadsgf games/strategy25.sgf 61 | |
54 | 90 gg_genmove black | |
55 | #? [B14|D17] | |
56 | ||
57 | # The game move at P13 is a suicidal blunder. | |
58 | loadsgf games/strategy25.sgf 249 | |
59 | 95 gg_genmove black | |
60 | #? [!P13] | |
61 | ||
62 | loadsgf games/strategy26.sgf 257 | |
63 | 100 gg_genmove black | |
64 | #? [M16]* | |
65 | @end group | |
66 | @end example | |
67 | ||
68 | Lines starting with a hash sign, or in general anything following a hash | |
69 | sign, are interpreted as comments by the GTP mode and thus ignored by | |
70 | the engine. GTP commands are executed in the order they appear, but only | |
71 | those on numbered lines are used for testing. The comment lines starting | |
72 | with @code{#?} are magical to the regression testing scripts and | |
73 | indicate correct results and expected pass/fail status. The string | |
74 | within brackets is matched as a regular expression against the response | |
75 | from the previous numbered GTP command. A particular useful feature of | |
76 | regular expressions is that by using @samp{|} it is possible to specify | |
77 | alternatives. Thus @code{B14|D17} above means that if either @code{B14} | |
78 | or @code{D17} is the move generated in test case 90, it passes. There is | |
79 | one important special case to be aware of. If the correct result string | |
80 | starts with an exclamation mark, this is excluded from the regular | |
81 | expression but afterwards the result of the matching is negated. Thus | |
82 | @code{!P13} in test case 95 means that any move except @code{P13} is | |
83 | accepted as a correct result. | |
84 | ||
85 | In test case 100, the brackets on the @code{#?} line is followed by an | |
86 | asterisk. This means that the test is expected to fail. If there is no | |
87 | asterisk, the test is expected to pass. The brackets may also be | |
88 | followed by a @samp{&}, meaning that the result is ignored. This is | |
89 | primarily used to report statistics, e.g. how many tactical reading | |
90 | nodes were spent while running the test suite. | |
91 | ||
92 | @node Running the Regressions | |
93 | @section Running the Regression Tests | |
94 | ||
95 | @code{./test.sh blunder.tst} runs the tests in @file{blunder.tst} and | |
96 | prints the results of the commands on numbered lines, which may look | |
97 | like: | |
98 | ||
99 | @example | |
100 | 1 E5 | |
101 | 2 F9 | |
102 | 3 O18 | |
103 | 4 B7 | |
104 | 5 A4 | |
105 | 6 E4 | |
106 | 7 E3 | |
107 | 8 A3 | |
108 | 9 D9 | |
109 | 10 J9 | |
110 | 11 B3 | |
111 | 12 C6 | |
112 | 13 C6 | |
113 | @end example | |
114 | ||
115 | This is usually not very informative, however. More interesting is | |
116 | @code{./eval.sh blunder.tst} which also compares the results above | |
117 | against the correct ones in the test file and prints a report for each | |
118 | test on the form: | |
119 | ||
120 | @example | |
121 | 1 failed: Correct '!E5', got 'E5' | |
122 | 2 failed: Correct 'C9|H9', got 'F9' | |
123 | 3 PASSED | |
124 | 4 failed: Correct 'B5|C5|C4|D4|E4|E3|F3', got 'B7' | |
125 | 5 PASSED | |
126 | 6 failed: Correct 'D4', got 'E4' | |
127 | 7 PASSED | |
128 | 8 failed: Correct 'B4', got 'A3' | |
129 | 9 failed: Correct 'G8|G9|H8', got 'D9' | |
130 | 10 failed: Correct 'G9|F9|C7', got 'J9' | |
131 | 11 failed: Correct 'D4|E4|E5|F4|C6', got 'B3' | |
132 | 12 failed: Correct 'D4', got 'C6' | |
133 | 13 failed: Correct 'D4|E4|E5|F4', got 'C6' | |
134 | @end example | |
135 | ||
136 | The result of a test can be one of four different cases: | |
137 | ||
138 | @itemize @bullet | |
139 | @item @code{passed}: An expected pass | |
140 | ||
141 | This is the ideal result. | |
142 | ||
143 | @item @code{PASSED}: An unexpected pass | |
144 | ||
145 | This is a result that we are hoping for when we fix a bug. An old test | |
146 | case that used to fail is now passing. | |
147 | ||
148 | @item @code{failed}: An expected failure | |
149 | ||
150 | The test failed but this was also what we expected, unless we were | |
151 | trying to fix the particular mistake highlighted by the test case. | |
152 | These tests show weaknesses of the GNU Go engine and are good places to | |
153 | search if you want to detect an area which needs improvement. | |
154 | ||
155 | @item @code{FAILED}: An unexpected failure | |
156 | ||
157 | This should nominally only happen if something is broken by a | |
158 | change. However, sometimes GNU Go passes a test, but for the wrong | |
159 | reason or for a combination of wrong reasons. When one of these reasons | |
160 | is fixed, the other one may shine through so that the test suddenly | |
161 | fails. When a test case unexpectedly fails, it is necessary to make a | |
162 | closer examination in order to determine whether a change has broken | |
163 | something. | |
164 | ||
165 | @end itemize | |
166 | ||
167 | If you want a less verbose report, @code{./regress.sh . blunder.tst} | |
168 | does the same thing as the previous command, but only reports unexpected | |
169 | results. The example above is compressed to | |
170 | ||
171 | @example | |
172 | 3 unexpected PASS! | |
173 | 5 unexpected PASS! | |
174 | 7 unexpected PASS! | |
175 | @end example | |
176 | ||
177 | For convenience the tests are also available as makefile targets. For | |
178 | example, @code{make blunder} runs the tests in the blunder test suite by | |
179 | executing @code{eval.sh blunder.tst}. @code{make all_batches} runs all | |
180 | test suites in a sequence using the @code{regress.sh} script. | |
181 | ||
182 | @node Running regress.pike | |
183 | @section Running regress.pike | |
184 | ||
185 | A more powerful way to run regressions is with the script | |
186 | @file{regress.pike}. This requires that you have Pike | |
187 | (@url{http://pike.ida.liu.se}) installed. | |
188 | ||
189 | Executing @code{./regress.pike} without arguments will run all | |
190 | testsuites that @code{make all_batches} would run. The difference is | |
191 | that unexpected results are reported immediately when they have been | |
192 | found (instead of after the whole file has been run) and that statistics | |
193 | of time consumption and node usage is presented for each test file and | |
194 | in total. | |
195 | ||
196 | To run a single test suite do e.g. @code{./regress.pike nicklas3.tst} or | |
197 | @code{./regress.pike nicklas3}. The result may look like: | |
198 | @example | |
199 | nicklas3 2.96 614772 3322 469 | |
200 | Total nodes: 614772 3322 469 | |
201 | Total time: 2.96 (3.22) | |
202 | Total uncertainty: 0.00 | |
203 | @end example | |
204 | The numbers here mean that the test suite took 2.96 seconds of processor | |
205 | time and 3.22 seconds of real time. The consumption of reading nodes was | |
206 | 614772 for tactical reading, 3322 for owl reading, and 469 for | |
207 | connection reading. The last line relates to the variability of the | |
208 | generated moves in the test suite, and 0 means that none was decided by | |
209 | the randomness contribution to the move valuation. Multiple testsuites | |
210 | can be run by e.g. @code{./regress.pike owl ld_owl owl1}. | |
211 | ||
212 | It is also possible to run a single testcase, e.g. @code{./regress.pike | |
213 | strategy:6}, a number of testcases, e.g. @code{./regress.pike | |
214 | strategy:6,23,45}, a range of testcases, e.g. @code{./regress.pike | |
215 | strategy:13-15} or more complex combinations e.g. @code{./regress.pike | |
216 | strategy:6,13-15,23,45 nicklas3:602,1403}. | |
217 | ||
218 | There are also command line options to choose what engine to run, what | |
219 | options to send to the engine, to turn on verbose output, and to use a | |
220 | file to specify which testcases to run. Run @code{./regress.pike --help} | |
221 | for a complete and up to date list of options. | |
222 | ||
223 | @node Viewing with Emacs | |
224 | @section Viewing tests with Emacs | |
225 | ||
226 | To get a quick regression view, you may use the graphical | |
227 | display mode available with Emacs (@pxref{Emacs}). You will | |
228 | want the cursor in the regression buffer when you enter | |
229 | @command{M-x gnugo}, so that GNU Go opens in the correct | |
230 | directory. A good way to be in the right directory is to | |
231 | open the window of the test you want to investigate. Then | |
232 | you can cut and past GTP commands directly from the test to | |
233 | the minibuffer, using the @command{:} command from | |
234 | Emacs. Although Emacs mode does not have a coordinate grid, | |
235 | you may get an ascii board with the coordinate grid using | |
236 | @command{: showboard} command. | |
237 | ||
238 | @node HTML Views | |
239 | @section HTML Regression Views | |
240 | ||
241 | Extremely useful HTML Views of the regression tests may be | |
242 | produced using two perl scripts @file{regression/regress.pl} | |
243 | and @file{regression/regress.plx}. | |
244 | ||
245 | @enumerate | |
246 | @item The driver program (regress.pl) which: | |
247 | @itemize @bullet | |
248 | @item Runs the regression tests, invoking GNU Go. | |
249 | @item Captures the trace output, board position, and pass/fail status, | |
250 | sgf output, and dragon status information. | |
251 | @end itemize | |
252 | @item The interface to view the captured output (regress.plx) which: | |
253 | @itemize @bullet | |
254 | @item Never invokes GNU Go. | |
255 | @item Displays the captured output in helpful formats (i.e. HTML). | |
256 | @end itemize | |
257 | @end enumerate | |
258 | ||
259 | @subsection Setting up the HTML regression Views | |
260 | ||
261 | There are many ways configuring Apache to permit CGI scripts, all of them are | |
262 | featured in Apache documentation, which can be found at | |
263 | @url{http://httpd.apache.org/docs/2.0/howto/cgi.html} | |
264 | ||
265 | Below you will find one example. | |
266 | ||
267 | This documentation assumes an Apache 2.0 included in Fedora Core distribution, | |
268 | but it should be fairly close to the config for other distributions. | |
269 | ||
270 | First, you will need to configure Apache to run CGI scripts in the directory | |
271 | you wish to serve the html views from. In @file{/etc/httpd/conf/httpd.conf} | |
272 | there should be a line: | |
273 | ||
274 | @code{DocumentRoot "/var/www/html"} | |
275 | ||
276 | Search for a line @code{<Directory "/path/to/directory">}, where | |
277 | @code{/path/to/directory} is the same as provided in @code{DocumentRoot}, | |
278 | then add @code{ExecCGI} to list of @code{Options}. | |
279 | The whole section should look like: | |
280 | ||
281 | @example | |
282 | <Directory "/var/www/html"> | |
283 | ... | |
284 | Options ... ExecCGI | |
285 | ... | |
286 | </Directory> | |
287 | @end example | |
288 | ||
289 | This allows CGI scripts to be executed in the directory used by regress.plx. | |
290 | Next, you need to tell Apache that @file{.plx} is a CGI script ending. Your | |
291 | @file{httpd.conf} file should contain a line: | |
292 | ||
293 | @code{AddHandler cgi-script ...} | |
294 | ||
295 | If there isn't already, add it; add @file{.plx} to the list of extensions, | |
296 | so line should look like: | |
297 | ||
298 | @code{AddHandler cgi-script ... .plx} | |
299 | ||
300 | You will also need to make sure you have the necessary modules loaded to run | |
301 | CGI scripts; mod_cgi and mod_mime should be sufficient. Your @file{httpd.conf} | |
302 | should have the relevant @code{LoadModule cgi_module modules/mod_cgi.so} and | |
303 | @code{LoadModule mime_module modules/mod_mime.so} lines; uncomment them if | |
304 | necessary. | |
305 | ||
306 | Next, you need to put a copy of @file{regress.plx} in the @code{DocumentRoot} | |
307 | directory @code{/var/www/html} or it subdirectories where you plan to serve the | |
308 | html views from. | |
309 | ||
310 | You will also need to install the Perl module GD | |
311 | (@url{http://search.cpan.org/dist/GD/}), available from CPAN. | |
312 | ||
313 | Finally, run @file{regression/regress.pl} to create the xml data used to | |
314 | generate the html views (to do all regression tests run | |
315 | @file{regression/regress.pl -a 1}); then, copy the @file{html/} directory to | |
316 | the same directory as @file{regress.plx} resides in. | |
317 | ||
318 | At this point, you should have a working copy of the html regression views. | |
319 | ||
320 | Additional notes for Debian users: The Perl GD module can be installed | |
321 | by @code{apt-get install libgd-perl}. It may suffice to add this to | |
322 | the apache2 configuration: | |
323 | ||
324 | @example | |
325 | <Directory "/var/www/regression"> | |
326 | Options +ExecCGI | |
327 | AddHandler cgi-script .plx | |
328 | RedirectMatch ^/regression$ /regression/regress.plx | |
329 | </Directory> | |
330 | @end example | |
331 | ||
332 | and then make a link from @file{/var/www/regression} to the GNU Go | |
333 | regression directory. The @code{RedirectMatch} statement is only | |
334 | needed to set up a shorter entry URL. |