Updated README: Equal sign not required with `--mode` flag.
[sgk-go] / doc / regression.texi
CommitLineData
7eeb782e
AT
1The standard purpose of regression testing is to avoid getting the same
2bug twice. When a bug is found, the programmer fixes the bug and adds a
3test to the test suite. The test should fail before the fix and pass
4after the fix. When a new version is about to be released, all the tests
5in the regression test suite are run and if an old bug reappears, this
6will be seen quickly since the appropriate test will fail.
7
8The regression testing in GNU Go is slightly different. A typical test
9case involves specifying a position and asking the engine what move it
10would make. This is compared to one or more correct moves to decide
11whether the test case passes or fails. It is also stored whether a test
12case is expected to pass or fail, and deviations in this status signify
13whether a change has solved some problem and/or broken something
14else. Thus the regression tests both include positions highlighting some
15mistake being done by the engine, which are waiting to be fixed, and
16positions where the engine does the right thing, where we want to detect
17if a change breaks something.
18
19@menu
20* Regression Testing:: Regression Testing in GNU Go
21* Test Suites:: Test Suites
22* Running the Regressions:: Running the Regression Tests
23* Running regress.pike:: Running regress.pike
24* Viewing with Emacs:: Viewing tests with Emacs
25* HTML Views:: HTML Views
26@end menu
27
28@node Regression Testing
29@section Regression testing in GNU Go
30
31Regression testing is performed by the files in the @file{regression/}
32directory. The tests are specified as GTP commands in files with the
33suffix @file{.tst}, with corresponding correct results and expected
34pass/fail status encoded in GTP comments following the test. To run a
35test suite the shell scripts @file{test.sh}, @file{eval.sh}, and
36@file{regress.sh} can be used. There are also Makefile targets to do
37this. If you @command{make all_batches} most of the tests are run. The
38Pike script @file{regress.pike} can also be used to run all tests or a
39subset of the tests.
40
41Game records used by the regression tests are stored in the
42directory @file{regression/games/} and its subdirectories.
43
44@node Test Suites
45@section Test suites
46
47The regression tests are grouped into suites and stored in files as GTP
48commands. A part of a test suite can look as follows:
49@example
50@group
51# Connecting with ko at B14 looks best. Cutting at D17 might be
52# considered. B17 (game move) is inferior.
53loadsgf games/strategy25.sgf 61
5490 gg_genmove black
55#? [B14|D17]
56
57# The game move at P13 is a suicidal blunder.
58loadsgf games/strategy25.sgf 249
5995 gg_genmove black
60#? [!P13]
61
62loadsgf games/strategy26.sgf 257
63100 gg_genmove black
64#? [M16]*
65@end group
66@end example
67
68Lines starting with a hash sign, or in general anything following a hash
69sign, are interpreted as comments by the GTP mode and thus ignored by
70the engine. GTP commands are executed in the order they appear, but only
71those on numbered lines are used for testing. The comment lines starting
72with @code{#?} are magical to the regression testing scripts and
73indicate correct results and expected pass/fail status. The string
74within brackets is matched as a regular expression against the response
75from the previous numbered GTP command. A particular useful feature of
76regular expressions is that by using @samp{|} it is possible to specify
77alternatives. Thus @code{B14|D17} above means that if either @code{B14}
78or @code{D17} is the move generated in test case 90, it passes. There is
79one important special case to be aware of. If the correct result string
80starts with an exclamation mark, this is excluded from the regular
81expression but afterwards the result of the matching is negated. Thus
82@code{!P13} in test case 95 means that any move except @code{P13} is
83accepted as a correct result.
84
85In test case 100, the brackets on the @code{#?} line is followed by an
86asterisk. This means that the test is expected to fail. If there is no
87asterisk, the test is expected to pass. The brackets may also be
88followed by a @samp{&}, meaning that the result is ignored. This is
89primarily used to report statistics, e.g. how many tactical reading
90nodes were spent while running the test suite.
91
92@node Running the Regressions
93@section Running the Regression Tests
94
95@code{./test.sh blunder.tst} runs the tests in @file{blunder.tst} and
96prints the results of the commands on numbered lines, which may look
97like:
98
99@example
1001 E5
1012 F9
1023 O18
1034 B7
1045 A4
1056 E4
1067 E3
1078 A3
1089 D9
10910 J9
11011 B3
11112 C6
11213 C6
113@end example
114
115This is usually not very informative, however. More interesting is
116@code{./eval.sh blunder.tst} which also compares the results above
117against the correct ones in the test file and prints a report for each
118test on the form:
119
120@example
1211 failed: Correct '!E5', got 'E5'
1222 failed: Correct 'C9|H9', got 'F9'
1233 PASSED
1244 failed: Correct 'B5|C5|C4|D4|E4|E3|F3', got 'B7'
1255 PASSED
1266 failed: Correct 'D4', got 'E4'
1277 PASSED
1288 failed: Correct 'B4', got 'A3'
1299 failed: Correct 'G8|G9|H8', got 'D9'
13010 failed: Correct 'G9|F9|C7', got 'J9'
13111 failed: Correct 'D4|E4|E5|F4|C6', got 'B3'
13212 failed: Correct 'D4', got 'C6'
13313 failed: Correct 'D4|E4|E5|F4', got 'C6'
134@end example
135
136The result of a test can be one of four different cases:
137
138@itemize @bullet
139@item @code{passed}: An expected pass
140
141This is the ideal result.
142
143@item @code{PASSED}: An unexpected pass
144
145This is a result that we are hoping for when we fix a bug. An old test
146case that used to fail is now passing.
147
148@item @code{failed}: An expected failure
149
150The test failed but this was also what we expected, unless we were
151trying to fix the particular mistake highlighted by the test case.
152These tests show weaknesses of the GNU Go engine and are good places to
153search if you want to detect an area which needs improvement.
154
155@item @code{FAILED}: An unexpected failure
156
157This should nominally only happen if something is broken by a
158change. However, sometimes GNU Go passes a test, but for the wrong
159reason or for a combination of wrong reasons. When one of these reasons
160is fixed, the other one may shine through so that the test suddenly
161fails. When a test case unexpectedly fails, it is necessary to make a
162closer examination in order to determine whether a change has broken
163something.
164
165@end itemize
166
167If you want a less verbose report, @code{./regress.sh . blunder.tst}
168does the same thing as the previous command, but only reports unexpected
169results. The example above is compressed to
170
171@example
1723 unexpected PASS!
1735 unexpected PASS!
1747 unexpected PASS!
175@end example
176
177For convenience the tests are also available as makefile targets. For
178example, @code{make blunder} runs the tests in the blunder test suite by
179executing @code{eval.sh blunder.tst}. @code{make all_batches} runs all
180test suites in a sequence using the @code{regress.sh} script.
181
182@node Running regress.pike
183@section Running regress.pike
184
185A more powerful way to run regressions is with the script
186@file{regress.pike}. This requires that you have Pike
187(@url{http://pike.ida.liu.se}) installed.
188
189Executing @code{./regress.pike} without arguments will run all
190testsuites that @code{make all_batches} would run. The difference is
191that unexpected results are reported immediately when they have been
192found (instead of after the whole file has been run) and that statistics
193of time consumption and node usage is presented for each test file and
194in total.
195
196To run a single test suite do e.g. @code{./regress.pike nicklas3.tst} or
197@code{./regress.pike nicklas3}. The result may look like:
198@example
199nicklas3 2.96 614772 3322 469
200Total nodes: 614772 3322 469
201Total time: 2.96 (3.22)
202Total uncertainty: 0.00
203@end example
204The numbers here mean that the test suite took 2.96 seconds of processor
205time and 3.22 seconds of real time. The consumption of reading nodes was
206614772 for tactical reading, 3322 for owl reading, and 469 for
207connection reading. The last line relates to the variability of the
208generated moves in the test suite, and 0 means that none was decided by
209the randomness contribution to the move valuation. Multiple testsuites
210can be run by e.g. @code{./regress.pike owl ld_owl owl1}.
211
212It is also possible to run a single testcase, e.g. @code{./regress.pike
213strategy:6}, a number of testcases, e.g. @code{./regress.pike
214strategy:6,23,45}, a range of testcases, e.g. @code{./regress.pike
215strategy:13-15} or more complex combinations e.g. @code{./regress.pike
216strategy:6,13-15,23,45 nicklas3:602,1403}.
217
218There are also command line options to choose what engine to run, what
219options to send to the engine, to turn on verbose output, and to use a
220file to specify which testcases to run. Run @code{./regress.pike --help}
221for a complete and up to date list of options.
222
223@node Viewing with Emacs
224@section Viewing tests with Emacs
225
226To get a quick regression view, you may use the graphical
227display mode available with Emacs (@pxref{Emacs}). You will
228want the cursor in the regression buffer when you enter
229@command{M-x gnugo}, so that GNU Go opens in the correct
230directory. A good way to be in the right directory is to
231open the window of the test you want to investigate. Then
232you can cut and past GTP commands directly from the test to
233the minibuffer, using the @command{:} command from
234Emacs. Although Emacs mode does not have a coordinate grid,
235you may get an ascii board with the coordinate grid using
236@command{: showboard} command.
237
238@node HTML Views
239@section HTML Regression Views
240
241Extremely useful HTML Views of the regression tests may be
242produced using two perl scripts @file{regression/regress.pl}
243and @file{regression/regress.plx}.
244
245@enumerate
246@item The driver program (regress.pl) which:
247@itemize @bullet
248@item Runs the regression tests, invoking GNU Go.
249@item Captures the trace output, board position, and pass/fail status,
250sgf output, and dragon status information.
251@end itemize
252@item The interface to view the captured output (regress.plx) which:
253@itemize @bullet
254@item Never invokes GNU Go.
255@item Displays the captured output in helpful formats (i.e. HTML).
256@end itemize
257@end enumerate
258
259@subsection Setting up the HTML regression Views
260
261There are many ways configuring Apache to permit CGI scripts, all of them are
262featured in Apache documentation, which can be found at
263@url{http://httpd.apache.org/docs/2.0/howto/cgi.html}
264
265Below you will find one example.
266
267This documentation assumes an Apache 2.0 included in Fedora Core distribution,
268but it should be fairly close to the config for other distributions.
269
270First, you will need to configure Apache to run CGI scripts in the directory
271you wish to serve the html views from. In @file{/etc/httpd/conf/httpd.conf}
272there should be a line:
273
274@code{DocumentRoot "/var/www/html"}
275
276Search for a line @code{<Directory "/path/to/directory">}, where
277@code{/path/to/directory} is the same as provided in @code{DocumentRoot},
278then add @code{ExecCGI} to list of @code{Options}.
279The whole section should look like:
280
281@example
282<Directory "/var/www/html">
283...
284 Options ... ExecCGI
285...
286</Directory>
287@end example
288
289This allows CGI scripts to be executed in the directory used by regress.plx.
290Next, you need to tell Apache that @file{.plx} is a CGI script ending. Your
291@file{httpd.conf} file should contain a line:
292
293@code{AddHandler cgi-script ...}
294
295If there isn't already, add it; add @file{.plx} to the list of extensions,
296so line should look like:
297
298@code{AddHandler cgi-script ... .plx}
299
300You will also need to make sure you have the necessary modules loaded to run
301CGI scripts; mod_cgi and mod_mime should be sufficient. Your @file{httpd.conf}
302should have the relevant @code{LoadModule cgi_module modules/mod_cgi.so} and
303@code{LoadModule mime_module modules/mod_mime.so} lines; uncomment them if
304necessary.
305
306Next, you need to put a copy of @file{regress.plx} in the @code{DocumentRoot}
307directory @code{/var/www/html} or it subdirectories where you plan to serve the
308html views from.
309
310You will also need to install the Perl module GD
311(@url{http://search.cpan.org/dist/GD/}), available from CPAN.
312
313Finally, run @file{regression/regress.pl} to create the xml data used to
314generate the html views (to do all regression tests run
315@file{regression/regress.pl -a 1}); then, copy the @file{html/} directory to
316the same directory as @file{regress.plx} resides in.
317
318At this point, you should have a working copy of the html regression views.
319
320Additional notes for Debian users: The Perl GD module can be installed
321by @code{apt-get install libgd-perl}. It may suffice to add this to
322the apache2 configuration:
323
324@example
325<Directory "/var/www/regression">
326 Options +ExecCGI
327 AddHandler cgi-script .plx
328 RedirectMatch ^/regression$ /regression/regress.plx
329</Directory>
330@end example
331
332and then make a link from @file{/var/www/regression} to the GNU Go
333regression directory. The @code{RedirectMatch} statement is only
334needed to set up a shorter entry URL.