| 1 | The standard purpose of regression testing is to avoid getting the same |
| 2 | bug twice. When a bug is found, the programmer fixes the bug and adds a |
| 3 | test to the test suite. The test should fail before the fix and pass |
| 4 | after the fix. When a new version is about to be released, all the tests |
| 5 | in the regression test suite are run and if an old bug reappears, this |
| 6 | will be seen quickly since the appropriate test will fail. |
| 7 | |
| 8 | The regression testing in GNU Go is slightly different. A typical test |
| 9 | case involves specifying a position and asking the engine what move it |
| 10 | would make. This is compared to one or more correct moves to decide |
| 11 | whether the test case passes or fails. It is also stored whether a test |
| 12 | case is expected to pass or fail, and deviations in this status signify |
| 13 | whether a change has solved some problem and/or broken something |
| 14 | else. Thus the regression tests both include positions highlighting some |
| 15 | mistake being done by the engine, which are waiting to be fixed, and |
| 16 | positions where the engine does the right thing, where we want to detect |
| 17 | if a change breaks something. |
| 18 | |
| 19 | @menu |
| 20 | * Regression Testing:: Regression Testing in GNU Go |
| 21 | * Test Suites:: Test Suites |
| 22 | * Running the Regressions:: Running the Regression Tests |
| 23 | * Running regress.pike:: Running regress.pike |
| 24 | * Viewing with Emacs:: Viewing tests with Emacs |
| 25 | * HTML Views:: HTML Views |
| 26 | @end menu |
| 27 | |
| 28 | @node Regression Testing |
| 29 | @section Regression testing in GNU Go |
| 30 | |
| 31 | Regression testing is performed by the files in the @file{regression/} |
| 32 | directory. The tests are specified as GTP commands in files with the |
| 33 | suffix @file{.tst}, with corresponding correct results and expected |
| 34 | pass/fail status encoded in GTP comments following the test. To run a |
| 35 | test suite the shell scripts @file{test.sh}, @file{eval.sh}, and |
| 36 | @file{regress.sh} can be used. There are also Makefile targets to do |
| 37 | this. If you @command{make all_batches} most of the tests are run. The |
| 38 | Pike script @file{regress.pike} can also be used to run all tests or a |
| 39 | subset of the tests. |
| 40 | |
| 41 | Game records used by the regression tests are stored in the |
| 42 | directory @file{regression/games/} and its subdirectories. |
| 43 | |
| 44 | @node Test Suites |
| 45 | @section Test suites |
| 46 | |
| 47 | The regression tests are grouped into suites and stored in files as GTP |
| 48 | commands. A part of a test suite can look as follows: |
| 49 | @example |
| 50 | @group |
| 51 | # Connecting with ko at B14 looks best. Cutting at D17 might be |
| 52 | # considered. B17 (game move) is inferior. |
| 53 | loadsgf games/strategy25.sgf 61 |
| 54 | 90 gg_genmove black |
| 55 | #? [B14|D17] |
| 56 | |
| 57 | # The game move at P13 is a suicidal blunder. |
| 58 | loadsgf games/strategy25.sgf 249 |
| 59 | 95 gg_genmove black |
| 60 | #? [!P13] |
| 61 | |
| 62 | loadsgf games/strategy26.sgf 257 |
| 63 | 100 gg_genmove black |
| 64 | #? [M16]* |
| 65 | @end group |
| 66 | @end example |
| 67 | |
| 68 | Lines starting with a hash sign, or in general anything following a hash |
| 69 | sign, are interpreted as comments by the GTP mode and thus ignored by |
| 70 | the engine. GTP commands are executed in the order they appear, but only |
| 71 | those on numbered lines are used for testing. The comment lines starting |
| 72 | with @code{#?} are magical to the regression testing scripts and |
| 73 | indicate correct results and expected pass/fail status. The string |
| 74 | within brackets is matched as a regular expression against the response |
| 75 | from the previous numbered GTP command. A particular useful feature of |
| 76 | regular expressions is that by using @samp{|} it is possible to specify |
| 77 | alternatives. Thus @code{B14|D17} above means that if either @code{B14} |
| 78 | or @code{D17} is the move generated in test case 90, it passes. There is |
| 79 | one important special case to be aware of. If the correct result string |
| 80 | starts with an exclamation mark, this is excluded from the regular |
| 81 | expression but afterwards the result of the matching is negated. Thus |
| 82 | @code{!P13} in test case 95 means that any move except @code{P13} is |
| 83 | accepted as a correct result. |
| 84 | |
| 85 | In test case 100, the brackets on the @code{#?} line is followed by an |
| 86 | asterisk. This means that the test is expected to fail. If there is no |
| 87 | asterisk, the test is expected to pass. The brackets may also be |
| 88 | followed by a @samp{&}, meaning that the result is ignored. This is |
| 89 | primarily used to report statistics, e.g. how many tactical reading |
| 90 | nodes were spent while running the test suite. |
| 91 | |
| 92 | @node Running the Regressions |
| 93 | @section Running the Regression Tests |
| 94 | |
| 95 | @code{./test.sh blunder.tst} runs the tests in @file{blunder.tst} and |
| 96 | prints the results of the commands on numbered lines, which may look |
| 97 | like: |
| 98 | |
| 99 | @example |
| 100 | 1 E5 |
| 101 | 2 F9 |
| 102 | 3 O18 |
| 103 | 4 B7 |
| 104 | 5 A4 |
| 105 | 6 E4 |
| 106 | 7 E3 |
| 107 | 8 A3 |
| 108 | 9 D9 |
| 109 | 10 J9 |
| 110 | 11 B3 |
| 111 | 12 C6 |
| 112 | 13 C6 |
| 113 | @end example |
| 114 | |
| 115 | This is usually not very informative, however. More interesting is |
| 116 | @code{./eval.sh blunder.tst} which also compares the results above |
| 117 | against the correct ones in the test file and prints a report for each |
| 118 | test on the form: |
| 119 | |
| 120 | @example |
| 121 | 1 failed: Correct '!E5', got 'E5' |
| 122 | 2 failed: Correct 'C9|H9', got 'F9' |
| 123 | 3 PASSED |
| 124 | 4 failed: Correct 'B5|C5|C4|D4|E4|E3|F3', got 'B7' |
| 125 | 5 PASSED |
| 126 | 6 failed: Correct 'D4', got 'E4' |
| 127 | 7 PASSED |
| 128 | 8 failed: Correct 'B4', got 'A3' |
| 129 | 9 failed: Correct 'G8|G9|H8', got 'D9' |
| 130 | 10 failed: Correct 'G9|F9|C7', got 'J9' |
| 131 | 11 failed: Correct 'D4|E4|E5|F4|C6', got 'B3' |
| 132 | 12 failed: Correct 'D4', got 'C6' |
| 133 | 13 failed: Correct 'D4|E4|E5|F4', got 'C6' |
| 134 | @end example |
| 135 | |
| 136 | The result of a test can be one of four different cases: |
| 137 | |
| 138 | @itemize @bullet |
| 139 | @item @code{passed}: An expected pass |
| 140 | |
| 141 | This is the ideal result. |
| 142 | |
| 143 | @item @code{PASSED}: An unexpected pass |
| 144 | |
| 145 | This is a result that we are hoping for when we fix a bug. An old test |
| 146 | case that used to fail is now passing. |
| 147 | |
| 148 | @item @code{failed}: An expected failure |
| 149 | |
| 150 | The test failed but this was also what we expected, unless we were |
| 151 | trying to fix the particular mistake highlighted by the test case. |
| 152 | These tests show weaknesses of the GNU Go engine and are good places to |
| 153 | search if you want to detect an area which needs improvement. |
| 154 | |
| 155 | @item @code{FAILED}: An unexpected failure |
| 156 | |
| 157 | This should nominally only happen if something is broken by a |
| 158 | change. However, sometimes GNU Go passes a test, but for the wrong |
| 159 | reason or for a combination of wrong reasons. When one of these reasons |
| 160 | is fixed, the other one may shine through so that the test suddenly |
| 161 | fails. When a test case unexpectedly fails, it is necessary to make a |
| 162 | closer examination in order to determine whether a change has broken |
| 163 | something. |
| 164 | |
| 165 | @end itemize |
| 166 | |
| 167 | If you want a less verbose report, @code{./regress.sh . blunder.tst} |
| 168 | does the same thing as the previous command, but only reports unexpected |
| 169 | results. The example above is compressed to |
| 170 | |
| 171 | @example |
| 172 | 3 unexpected PASS! |
| 173 | 5 unexpected PASS! |
| 174 | 7 unexpected PASS! |
| 175 | @end example |
| 176 | |
| 177 | For convenience the tests are also available as makefile targets. For |
| 178 | example, @code{make blunder} runs the tests in the blunder test suite by |
| 179 | executing @code{eval.sh blunder.tst}. @code{make all_batches} runs all |
| 180 | test suites in a sequence using the @code{regress.sh} script. |
| 181 | |
| 182 | @node Running regress.pike |
| 183 | @section Running regress.pike |
| 184 | |
| 185 | A more powerful way to run regressions is with the script |
| 186 | @file{regress.pike}. This requires that you have Pike |
| 187 | (@url{http://pike.ida.liu.se}) installed. |
| 188 | |
| 189 | Executing @code{./regress.pike} without arguments will run all |
| 190 | testsuites that @code{make all_batches} would run. The difference is |
| 191 | that unexpected results are reported immediately when they have been |
| 192 | found (instead of after the whole file has been run) and that statistics |
| 193 | of time consumption and node usage is presented for each test file and |
| 194 | in total. |
| 195 | |
| 196 | To run a single test suite do e.g. @code{./regress.pike nicklas3.tst} or |
| 197 | @code{./regress.pike nicklas3}. The result may look like: |
| 198 | @example |
| 199 | nicklas3 2.96 614772 3322 469 |
| 200 | Total nodes: 614772 3322 469 |
| 201 | Total time: 2.96 (3.22) |
| 202 | Total uncertainty: 0.00 |
| 203 | @end example |
| 204 | The numbers here mean that the test suite took 2.96 seconds of processor |
| 205 | time and 3.22 seconds of real time. The consumption of reading nodes was |
| 206 | 614772 for tactical reading, 3322 for owl reading, and 469 for |
| 207 | connection reading. The last line relates to the variability of the |
| 208 | generated moves in the test suite, and 0 means that none was decided by |
| 209 | the randomness contribution to the move valuation. Multiple testsuites |
| 210 | can be run by e.g. @code{./regress.pike owl ld_owl owl1}. |
| 211 | |
| 212 | It is also possible to run a single testcase, e.g. @code{./regress.pike |
| 213 | strategy:6}, a number of testcases, e.g. @code{./regress.pike |
| 214 | strategy:6,23,45}, a range of testcases, e.g. @code{./regress.pike |
| 215 | strategy:13-15} or more complex combinations e.g. @code{./regress.pike |
| 216 | strategy:6,13-15,23,45 nicklas3:602,1403}. |
| 217 | |
| 218 | There are also command line options to choose what engine to run, what |
| 219 | options to send to the engine, to turn on verbose output, and to use a |
| 220 | file to specify which testcases to run. Run @code{./regress.pike --help} |
| 221 | for a complete and up to date list of options. |
| 222 | |
| 223 | @node Viewing with Emacs |
| 224 | @section Viewing tests with Emacs |
| 225 | |
| 226 | To get a quick regression view, you may use the graphical |
| 227 | display mode available with Emacs (@pxref{Emacs}). You will |
| 228 | want the cursor in the regression buffer when you enter |
| 229 | @command{M-x gnugo}, so that GNU Go opens in the correct |
| 230 | directory. A good way to be in the right directory is to |
| 231 | open the window of the test you want to investigate. Then |
| 232 | you can cut and past GTP commands directly from the test to |
| 233 | the minibuffer, using the @command{:} command from |
| 234 | Emacs. Although Emacs mode does not have a coordinate grid, |
| 235 | you may get an ascii board with the coordinate grid using |
| 236 | @command{: showboard} command. |
| 237 | |
| 238 | @node HTML Views |
| 239 | @section HTML Regression Views |
| 240 | |
| 241 | Extremely useful HTML Views of the regression tests may be |
| 242 | produced using two perl scripts @file{regression/regress.pl} |
| 243 | and @file{regression/regress.plx}. |
| 244 | |
| 245 | @enumerate |
| 246 | @item The driver program (regress.pl) which: |
| 247 | @itemize @bullet |
| 248 | @item Runs the regression tests, invoking GNU Go. |
| 249 | @item Captures the trace output, board position, and pass/fail status, |
| 250 | sgf output, and dragon status information. |
| 251 | @end itemize |
| 252 | @item The interface to view the captured output (regress.plx) which: |
| 253 | @itemize @bullet |
| 254 | @item Never invokes GNU Go. |
| 255 | @item Displays the captured output in helpful formats (i.e. HTML). |
| 256 | @end itemize |
| 257 | @end enumerate |
| 258 | |
| 259 | @subsection Setting up the HTML regression Views |
| 260 | |
| 261 | There are many ways configuring Apache to permit CGI scripts, all of them are |
| 262 | featured in Apache documentation, which can be found at |
| 263 | @url{http://httpd.apache.org/docs/2.0/howto/cgi.html} |
| 264 | |
| 265 | Below you will find one example. |
| 266 | |
| 267 | This documentation assumes an Apache 2.0 included in Fedora Core distribution, |
| 268 | but it should be fairly close to the config for other distributions. |
| 269 | |
| 270 | First, you will need to configure Apache to run CGI scripts in the directory |
| 271 | you wish to serve the html views from. In @file{/etc/httpd/conf/httpd.conf} |
| 272 | there should be a line: |
| 273 | |
| 274 | @code{DocumentRoot "/var/www/html"} |
| 275 | |
| 276 | Search for a line @code{<Directory "/path/to/directory">}, where |
| 277 | @code{/path/to/directory} is the same as provided in @code{DocumentRoot}, |
| 278 | then add @code{ExecCGI} to list of @code{Options}. |
| 279 | The whole section should look like: |
| 280 | |
| 281 | @example |
| 282 | <Directory "/var/www/html"> |
| 283 | ... |
| 284 | Options ... ExecCGI |
| 285 | ... |
| 286 | </Directory> |
| 287 | @end example |
| 288 | |
| 289 | This allows CGI scripts to be executed in the directory used by regress.plx. |
| 290 | Next, you need to tell Apache that @file{.plx} is a CGI script ending. Your |
| 291 | @file{httpd.conf} file should contain a line: |
| 292 | |
| 293 | @code{AddHandler cgi-script ...} |
| 294 | |
| 295 | If there isn't already, add it; add @file{.plx} to the list of extensions, |
| 296 | so line should look like: |
| 297 | |
| 298 | @code{AddHandler cgi-script ... .plx} |
| 299 | |
| 300 | You will also need to make sure you have the necessary modules loaded to run |
| 301 | CGI scripts; mod_cgi and mod_mime should be sufficient. Your @file{httpd.conf} |
| 302 | should have the relevant @code{LoadModule cgi_module modules/mod_cgi.so} and |
| 303 | @code{LoadModule mime_module modules/mod_mime.so} lines; uncomment them if |
| 304 | necessary. |
| 305 | |
| 306 | Next, you need to put a copy of @file{regress.plx} in the @code{DocumentRoot} |
| 307 | directory @code{/var/www/html} or it subdirectories where you plan to serve the |
| 308 | html views from. |
| 309 | |
| 310 | You will also need to install the Perl module GD |
| 311 | (@url{http://search.cpan.org/dist/GD/}), available from CPAN. |
| 312 | |
| 313 | Finally, run @file{regression/regress.pl} to create the xml data used to |
| 314 | generate the html views (to do all regression tests run |
| 315 | @file{regression/regress.pl -a 1}); then, copy the @file{html/} directory to |
| 316 | the same directory as @file{regress.plx} resides in. |
| 317 | |
| 318 | At this point, you should have a working copy of the html regression views. |
| 319 | |
| 320 | Additional notes for Debian users: The Perl GD module can be installed |
| 321 | by @code{apt-get install libgd-perl}. It may suffice to add this to |
| 322 | the apache2 configuration: |
| 323 | |
| 324 | @example |
| 325 | <Directory "/var/www/regression"> |
| 326 | Options +ExecCGI |
| 327 | AddHandler cgi-script .plx |
| 328 | RedirectMatch ^/regression$ /regression/regress.plx |
| 329 | </Directory> |
| 330 | @end example |
| 331 | |
| 332 | and then make a link from @file{/var/www/regression} to the GNU Go |
| 333 | regression directory. The @code{RedirectMatch} statement is only |
| 334 | needed to set up a shorter entry URL. |