[sgk-go] / doc / regression.texi

The standard purpose of regression testing is to avoid getting the same
bug twice. When a bug is found, the programmer fixes the bug and adds a
test to the test suite. The test should fail before the fix and pass
after the fix. When a new version is about to be released, all the tests
in the regression test suite are run and if an old bug reappears, this
will be seen quickly since the appropriate test will fail.

The regression testing in GNU Go is slightly different. A typical test
case involves specifying a position and asking the engine what move it
would make. This is compared to one or more correct moves to decide
whether the test case passes or fails. It is also stored whether a test
case is expected to pass or fail, and deviations in this status signify
whether a change has solved some problem and/or broken something
else. Thus the regression tests both include positions highlighting some
mistake being done by the engine, which are waiting to be fixed, and
positions where the engine does the right thing, where we want to detect
if a change breaks something.

@menu
* Regression Testing::          Regression Testing in GNU Go
* Test Suites::                 Test Suites
* Running the Regressions::     Running the Regression Tests
* Running regress.pike::        Running regress.pike
* Viewing with Emacs::          Viewing tests with Emacs
* HTML Views::                  HTML Views
@end menu

@node Regression Testing
@section Regression testing in GNU Go

Regression testing is performed by the files in the @file{regression/}
directory. The tests are specified as GTP commands in files with the
suffix @file{.tst}, with corresponding correct results and expected
pass/fail status encoded in GTP comments following the test. To run a
test suite the shell scripts @file{test.sh}, @file{eval.sh}, and
@file{regress.sh} can be used. There are also Makefile targets to do
this. If you @command{make all_batches} most of the tests are run. The
Pike script @file{regress.pike} can also be used to run all tests or a
subset of the tests.

Game records used by the regression tests are stored in the
directory @file{regression/games/} and its subdirectories.

@node Test Suites
@section Test suites

The regression tests are grouped into suites and stored in files as GTP
commands. A part of a test suite can look as follows:
@example
@group
# Connecting with ko at B14 looks best. Cutting at D17 might be
# considered. B17 (game move) is inferior.
loadsgf games/strategy25.sgf 61
90 gg_genmove black
#? [B14|D17]

# The game move at P13 is a suicidal blunder.
loadsgf games/strategy25.sgf 249
95 gg_genmove black
#? [!P13]

loadsgf games/strategy26.sgf 257
100 gg_genmove black
#? [M16]*
@end group
@end example

Lines starting with a hash sign, or in general anything following a hash
sign, are interpreted as comments by the GTP mode and thus ignored by
the engine. GTP commands are executed in the order they appear, but only
those on numbered lines are used for testing. The comment lines starting
with @code{#?} are magical to the regression testing scripts and
indicate correct results and expected pass/fail status. The string
within brackets is matched as a regular expression against the response
from the previous numbered GTP command. A particular useful feature of
regular expressions is that by using @samp{|} it is possible to specify
alternatives. Thus @code{B14|D17} above means that if either @code{B14}
or @code{D17} is the move generated in test case 90, it passes. There is
one important special case to be aware of. If the correct result string
starts with an exclamation mark, this is excluded from the regular
expression but afterwards the result of the matching is negated. Thus
@code{!P13} in test case 95 means that any move except @code{P13} is
accepted as a correct result.

In test case 100, the brackets on the @code{#?} line is followed by an
asterisk. This means that the test is expected to fail. If there is no
asterisk, the test is expected to pass. The brackets may also be
followed by a @samp{&}, meaning that the result is ignored. This is
primarily used to report statistics, e.g. how many tactical reading
nodes were spent while running the test suite.

@node Running the Regressions
@section Running the Regression Tests

@code{./test.sh blunder.tst} runs the tests in @file{blunder.tst} and
prints the results of the commands on numbered lines, which may look
like:

@example
1 E5
2 F9
3 O18
4 B7
5 A4
6 E4
7 E3
8 A3
9 D9
10 J9
11 B3
12 C6
13 C6
@end example

This is usually not very informative, however. More interesting is
@code{./eval.sh blunder.tst} which also compares the results above
against the correct ones in the test file and prints a report for each
test on the form:

@example
1 failed: Correct '!E5', got 'E5'
2 failed: Correct 'C9|H9', got 'F9'
3 PASSED
4 failed: Correct 'B5|C5|C4|D4|E4|E3|F3', got 'B7'
5 PASSED
6 failed: Correct 'D4', got 'E4'
7 PASSED
8 failed: Correct 'B4', got 'A3'
9 failed: Correct 'G8|G9|H8', got 'D9'
10 failed: Correct 'G9|F9|C7', got 'J9'
11 failed: Correct 'D4|E4|E5|F4|C6', got 'B3'
12 failed: Correct 'D4', got 'C6'
13 failed: Correct 'D4|E4|E5|F4', got 'C6'
@end example

The result of a test can be one of four different cases:

@itemize @bullet
@item @code{passed}: An expected pass

This is the ideal result.

@item @code{PASSED}: An unexpected pass

This is a result that we are hoping for when we fix a bug. An old test
case that used to fail is now passing. 

@item @code{failed}: An expected failure

The test failed but this was also what we expected, unless we were
trying to fix the particular mistake highlighted by the test case.
These tests show weaknesses of the GNU Go engine and are good places to
search if you want to detect an area which needs improvement.

@item @code{FAILED}: An unexpected failure

This should nominally only happen if something is broken by a
change. However, sometimes GNU Go passes a test, but for the wrong
reason or for a combination of wrong reasons.  When one of these reasons
is fixed, the other one may shine through so that the test suddenly
fails. When a test case unexpectedly fails, it is necessary to make a
closer examination in order to determine whether a change has broken
something.

@end itemize

If you want a less verbose report, @code{./regress.sh . blunder.tst}
does the same thing as the previous command, but only reports unexpected
results. The example above is compressed to

@example
3 unexpected PASS!
5 unexpected PASS!
7 unexpected PASS!
@end example

For convenience the tests are also available as makefile targets. For
example, @code{make blunder} runs the tests in the blunder test suite by
executing @code{eval.sh blunder.tst}. @code{make all_batches} runs all
test suites in a sequence using the @code{regress.sh} script.

@node Running regress.pike
@section Running regress.pike

A more powerful way to run regressions is with the script
@file{regress.pike}. This requires that you have Pike
(@url{http://pike.ida.liu.se}) installed.

Executing @code{./regress.pike} without arguments will run all
testsuites that @code{make all_batches} would run. The difference is
that unexpected results are reported immediately when they have been
found (instead of after the whole file has been run) and that statistics
of time consumption and node usage is presented for each test file and
in total.

To run a single test suite do e.g. @code{./regress.pike nicklas3.tst} or
@code{./regress.pike nicklas3}. The result may look like:
@example
nicklas3                                 2.96    614772    3322      469
Total nodes: 614772 3322 469
Total time: 2.96 (3.22)
Total uncertainty: 0.00
@end example
The numbers here mean that the test suite took 2.96 seconds of processor
time and 3.22 seconds of real time. The consumption of reading nodes was
614772 for tactical reading, 3322 for owl reading, and 469 for
connection reading. The last line relates to the variability of the
generated moves in the test suite, and 0 means that none was decided by
the randomness contribution to the move valuation. Multiple testsuites
can be run by e.g. @code{./regress.pike owl ld_owl owl1}.

It is also possible to run a single testcase, e.g. @code{./regress.pike
strategy:6}, a number of testcases, e.g. @code{./regress.pike
strategy:6,23,45}, a range of testcases, e.g. @code{./regress.pike
strategy:13-15} or more complex combinations e.g. @code{./regress.pike
strategy:6,13-15,23,45 nicklas3:602,1403}.

There are also command line options to choose what engine to run, what
options to send to the engine, to turn on verbose output, and to use a
file to specify which testcases to run. Run @code{./regress.pike --help}
for a complete and up to date list of options.

@node Viewing with Emacs
@section Viewing tests with Emacs

To get a quick regression view, you may use the graphical
display mode available with Emacs (@pxref{Emacs}). You will
want the cursor in the regression buffer when you enter
@command{M-x gnugo}, so that GNU Go opens in the correct
directory. A good way to be in the right directory is to
open the window of the test you want to investigate. Then
you can cut and past GTP commands directly from the test to
the minibuffer, using the @command{:} command from
Emacs. Although Emacs mode does not have a coordinate grid,
you may get an ascii board with the coordinate grid using
@command{: showboard} command.

@node HTML Views
@section HTML Regression Views

Extremely useful HTML Views of the regression tests may be
produced using two perl scripts @file{regression/regress.pl}
and @file{regression/regress.plx}.

@enumerate
@item The driver program (regress.pl) which:
@itemize @bullet
@item Runs the regression tests, invoking GNU Go.
@item Captures the trace output, board position, and pass/fail status,
sgf output, and dragon status information.
@end itemize
@item The interface to view the captured output (regress.plx) which:
@itemize @bullet
@item Never invokes GNU Go.
@item Displays the captured output in helpful formats (i.e. HTML).
@end itemize
@end enumerate

@subsection Setting up the HTML regression Views

There are many ways configuring Apache to permit CGI scripts, all of them are
featured in Apache documentation, which can be found at
@url{http://httpd.apache.org/docs/2.0/howto/cgi.html}

Below you will find one example. 

This documentation assumes an Apache 2.0 included in Fedora Core distribution,
but it should be fairly close to the config for other distributions.

First, you will need to configure Apache to run CGI scripts in the directory
you wish to serve the html views from. In @file{/etc/httpd/conf/httpd.conf}
there should be a line:

@code{DocumentRoot "/var/www/html"}

Search for a line @code{<Directory "/path/to/directory">}, where 
@code{/path/to/directory} is the same as provided in @code{DocumentRoot},
then add @code{ExecCGI} to list of @code{Options}.
The whole section should look like:

@example
<Directory "/var/www/html">
...
    Options ... ExecCGI
...
</Directory>
@end example

This allows CGI scripts to be executed in the directory used by regress.plx.
Next, you need to tell Apache that @file{.plx} is a CGI script ending. Your
@file{httpd.conf} file should contain a line:

@code{AddHandler cgi-script ...}

If there isn't already, add it; add @file{.plx} to the list of extensions,
so line should look like:

@code{AddHandler cgi-script ... .plx}

You will also need to make sure you have the necessary modules loaded to run
CGI scripts; mod_cgi and mod_mime should be sufficient. Your @file{httpd.conf}
should have the relevant @code{LoadModule cgi_module modules/mod_cgi.so} and
@code{LoadModule mime_module modules/mod_mime.so} lines; uncomment them if
necessary.

Next, you need to put a copy of @file{regress.plx} in the @code{DocumentRoot}
directory @code{/var/www/html} or it subdirectories where you plan to serve the
html views from.

You will also need to install the Perl module GD
(@url{http://search.cpan.org/dist/GD/}), available from CPAN.

Finally, run @file{regression/regress.pl} to create the xml data used to
generate the html views (to do all regression tests run
@file{regression/regress.pl -a 1}); then, copy the @file{html/} directory to
the same directory as @file{regress.plx} resides in.

At this point, you should have a working copy of the html regression views.

Additional notes for Debian users: The Perl GD module can be installed
by @code{apt-get install libgd-perl}. It may suffice to add this to
the apache2 configuration:

@example
<Directory "/var/www/regression">
	Options +ExecCGI
	AddHandler cgi-script .plx
	RedirectMatch ^/regression$ /regression/regress.plx
</Directory>
@end example

and then make a link from @file{/var/www/regression} to the GNU Go
regression directory. The @code{RedirectMatch} statement is only
needed to set up a shorter entry URL.
Commit	Line	Data
7eeb782e AT	1	The standard purpose of regression testing is to avoid getting the same
	2	bug twice. When a bug is found, the programmer fixes the bug and adds a
	3	test to the test suite. The test should fail before the fix and pass
	4	after the fix. When a new version is about to be released, all the tests
	5	in the regression test suite are run and if an old bug reappears, this
	6	will be seen quickly since the appropriate test will fail.
	7
	8	The regression testing in GNU Go is slightly different. A typical test
	9	case involves specifying a position and asking the engine what move it
	10	would make. This is compared to one or more correct moves to decide
	11	whether the test case passes or fails. It is also stored whether a test
	12	case is expected to pass or fail, and deviations in this status signify
	13	whether a change has solved some problem and/or broken something
	14	else. Thus the regression tests both include positions highlighting some
	15	mistake being done by the engine, which are waiting to be fixed, and
	16	positions where the engine does the right thing, where we want to detect
	17	if a change breaks something.
	18
	19	@menu
	20	* Regression Testing:: Regression Testing in GNU Go
	21	* Test Suites:: Test Suites
	22	* Running the Regressions:: Running the Regression Tests
	23	* Running regress.pike:: Running regress.pike
	24	* Viewing with Emacs:: Viewing tests with Emacs
	25	* HTML Views:: HTML Views
	26	@end menu
	27
	28	@node Regression Testing
	29	@section Regression testing in GNU Go
	30
	31	Regression testing is performed by the files in the @file{regression/}
	32	directory. The tests are specified as GTP commands in files with the
	33	suffix @file{.tst}, with corresponding correct results and expected
	34	pass/fail status encoded in GTP comments following the test. To run a
	35	test suite the shell scripts @file{test.sh}, @file{eval.sh}, and
	36	@file{regress.sh} can be used. There are also Makefile targets to do
	37	this. If you @command{make all_batches} most of the tests are run. The
	38	Pike script @file{regress.pike} can also be used to run all tests or a
	39	subset of the tests.
	40
	41	Game records used by the regression tests are stored in the
	42	directory @file{regression/games/} and its subdirectories.
	43
	44	@node Test Suites
	45	@section Test suites
	46
	47	The regression tests are grouped into suites and stored in files as GTP
	48	commands. A part of a test suite can look as follows:
	49	@example
	50	@group
	51	# Connecting with ko at B14 looks best. Cutting at D17 might be
	52	# considered. B17 (game move) is inferior.
	53	loadsgf games/strategy25.sgf 61
	54	90 gg_genmove black
	55	#? [B14\|D17]
	56
	57	# The game move at P13 is a suicidal blunder.
	58	loadsgf games/strategy25.sgf 249
	59	95 gg_genmove black
	60	#? [!P13]
	61
	62	loadsgf games/strategy26.sgf 257
	63	100 gg_genmove black
	64	#? [M16]*
65	@end group
66	@end example
67
68	Lines starting with a hash sign, or in general anything following a hash
69	sign, are interpreted as comments by the GTP mode and thus ignored by
70	the engine. GTP commands are executed in the order they appear, but only
71	those on numbered lines are used for testing. The comment lines starting
72	with @code{#?} are magical to the regression testing scripts and
73	indicate correct results and expected pass/fail status. The string
74	within brackets is matched as a regular expression against the response
75	from the previous numbered GTP command. A particular useful feature of
76	regular expressions is that by using @samp{\|} it is possible to specify
77	alternatives. Thus @code{B14\|D17} above means that if either @code{B14}
78	or @code{D17} is the move generated in test case 90, it passes. There is
79	one important special case to be aware of. If the correct result string
80	starts with an exclamation mark, this is excluded from the regular
81	expression but afterwards the result of the matching is negated. Thus
82	@code{!P13} in test case 95 means that any move except @code{P13} is
83	accepted as a correct result.
84
85	In test case 100, the brackets on the @code{#?} line is followed by an
86	asterisk. This means that the test is expected to fail. If there is no
87	asterisk, the test is expected to pass. The brackets may also be
88	followed by a @samp{&}, meaning that the result is ignored. This is
89	primarily used to report statistics, e.g. how many tactical reading
90	nodes were spent while running the test suite.
91
92	@node Running the Regressions
93	@section Running the Regression Tests
94
95	@code{./test.sh blunder.tst} runs the tests in @file{blunder.tst} and
96	prints the results of the commands on numbered lines, which may look
97	like:
98
99	@example
100	1 E5
101	2 F9
102	3 O18
103	4 B7
104	5 A4
105	6 E4
106	7 E3
107	8 A3
108	9 D9
109	10 J9
110	11 B3
111	12 C6
112	13 C6
113	@end example
114
115	This is usually not very informative, however. More interesting is
116	@code{./eval.sh blunder.tst} which also compares the results above
117	against the correct ones in the test file and prints a report for each
118	test on the form:
119
120	@example
121	1 failed: Correct '!E5', got 'E5'
122	2 failed: Correct 'C9\|H9', got 'F9'
123	3 PASSED
124	4 failed: Correct 'B5\|C5\|C4\|D4\|E4\|E3\|F3', got 'B7'
125	5 PASSED
126	6 failed: Correct 'D4', got 'E4'
127	7 PASSED
128	8 failed: Correct 'B4', got 'A3'
129	9 failed: Correct 'G8\|G9\|H8', got 'D9'
130	10 failed: Correct 'G9\|F9\|C7', got 'J9'
131	11 failed: Correct 'D4\|E4\|E5\|F4\|C6', got 'B3'
132	12 failed: Correct 'D4', got 'C6'
133	13 failed: Correct 'D4\|E4\|E5\|F4', got 'C6'
134	@end example
135
136	The result of a test can be one of four different cases:
137
138	@itemize @bullet
139	@item @code{passed}: An expected pass
140
141	This is the ideal result.
142
143	@item @code{PASSED}: An unexpected pass
144
145	This is a result that we are hoping for when we fix a bug. An old test
146	case that used to fail is now passing.
147
148	@item @code{failed}: An expected failure
149
150	The test failed but this was also what we expected, unless we were
151	trying to fix the particular mistake highlighted by the test case.
152	These tests show weaknesses of the GNU Go engine and are good places to
153	search if you want to detect an area which needs improvement.
154
155	@item @code{FAILED}: An unexpected failure
156
157	This should nominally only happen if something is broken by a
158	change. However, sometimes GNU Go passes a test, but for the wrong
159	reason or for a combination of wrong reasons. When one of these reasons
160	is fixed, the other one may shine through so that the test suddenly
161	fails. When a test case unexpectedly fails, it is necessary to make a
162	closer examination in order to determine whether a change has broken
163	something.
164
165	@end itemize
166
167	If you want a less verbose report, @code{./regress.sh . blunder.tst}
168	does the same thing as the previous command, but only reports unexpected
169	results. The example above is compressed to
170
171	@example
172	3 unexpected PASS!
173	5 unexpected PASS!
174	7 unexpected PASS!
175	@end example
176
177	For convenience the tests are also available as makefile targets. For
178	example, @code{make blunder} runs the tests in the blunder test suite by
179	executing @code{eval.sh blunder.tst}. @code{make all_batches} runs all
180	test suites in a sequence using the @code{regress.sh} script.
181
182	@node Running regress.pike
183	@section Running regress.pike
184
185	A more powerful way to run regressions is with the script
186	@file{regress.pike}. This requires that you have Pike
187	(@url{http://pike.ida.liu.se}) installed.
188
189	Executing @code{./regress.pike} without arguments will run all
190	testsuites that @code{make all_batches} would run. The difference is
191	that unexpected results are reported immediately when they have been
192	found (instead of after the whole file has been run) and that statistics
193	of time consumption and node usage is presented for each test file and
194	in total.
195
196	To run a single test suite do e.g. @code{./regress.pike nicklas3.tst} or
197	@code{./regress.pike nicklas3}. The result may look like:
198	@example
199	nicklas3 2.96 614772 3322 469
200	Total nodes: 614772 3322 469
201	Total time: 2.96 (3.22)
202	Total uncertainty: 0.00
203	@end example
204	The numbers here mean that the test suite took 2.96 seconds of processor
205	time and 3.22 seconds of real time. The consumption of reading nodes was
206	614772 for tactical reading, 3322 for owl reading, and 469 for
207	connection reading. The last line relates to the variability of the
208	generated moves in the test suite, and 0 means that none was decided by
209	the randomness contribution to the move valuation. Multiple testsuites
210	can be run by e.g. @code{./regress.pike owl ld_owl owl1}.
211
212	It is also possible to run a single testcase, e.g. @code{./regress.pike
213	strategy:6}, a number of testcases, e.g. @code{./regress.pike
214	strategy:6,23,45}, a range of testcases, e.g. @code{./regress.pike
215	strategy:13-15} or more complex combinations e.g. @code{./regress.pike
216	strategy:6,13-15,23,45 nicklas3:602,1403}.
217
218	There are also command line options to choose what engine to run, what
219	options to send to the engine, to turn on verbose output, and to use a
220	file to specify which testcases to run. Run @code{./regress.pike --help}
221	for a complete and up to date list of options.
222
223	@node Viewing with Emacs
224	@section Viewing tests with Emacs
225
226	To get a quick regression view, you may use the graphical
227	display mode available with Emacs (@pxref{Emacs}). You will
228	want the cursor in the regression buffer when you enter
229	@command{M-x gnugo}, so that GNU Go opens in the correct
230	directory. A good way to be in the right directory is to
231	open the window of the test you want to investigate. Then
232	you can cut and past GTP commands directly from the test to
233	the minibuffer, using the @command{:} command from
234	Emacs. Although Emacs mode does not have a coordinate grid,
235	you may get an ascii board with the coordinate grid using
236	@command{: showboard} command.
237
238	@node HTML Views
239	@section HTML Regression Views
240
241	Extremely useful HTML Views of the regression tests may be
242	produced using two perl scripts @file{regression/regress.pl}
243	and @file{regression/regress.plx}.
244
245	@enumerate
246	@item The driver program (regress.pl) which:
247	@itemize @bullet
248	@item Runs the regression tests, invoking GNU Go.
249	@item Captures the trace output, board position, and pass/fail status,
250	sgf output, and dragon status information.
251	@end itemize
252	@item The interface to view the captured output (regress.plx) which:
253	@itemize @bullet
254	@item Never invokes GNU Go.
255	@item Displays the captured output in helpful formats (i.e. HTML).
256	@end itemize
257	@end enumerate
258
259	@subsection Setting up the HTML regression Views
260
261	There are many ways configuring Apache to permit CGI scripts, all of them are
262	featured in Apache documentation, which can be found at
263	@url{http://httpd.apache.org/docs/2.0/howto/cgi.html}
264
265	Below you will find one example.
266
267	This documentation assumes an Apache 2.0 included in Fedora Core distribution,
268	but it should be fairly close to the config for other distributions.
269
270	First, you will need to configure Apache to run CGI scripts in the directory
271	you wish to serve the html views from. In @file{/etc/httpd/conf/httpd.conf}
272	there should be a line:
273
274	@code{DocumentRoot "/var/www/html"}
275
276	Search for a line @code{<Directory "/path/to/directory">}, where
277	@code{/path/to/directory} is the same as provided in @code{DocumentRoot},
278	then add @code{ExecCGI} to list of @code{Options}.
279	The whole section should look like:
280
281	@example
282	<Directory "/var/www/html">
283	...
284	Options ... ExecCGI
285	...
286	</Directory>
287	@end example
288
289	This allows CGI scripts to be executed in the directory used by regress.plx.
290	Next, you need to tell Apache that @file{.plx} is a CGI script ending. Your
291	@file{httpd.conf} file should contain a line:
292
293	@code{AddHandler cgi-script ...}
294
295	If there isn't already, add it; add @file{.plx} to the list of extensions,
296	so line should look like:
297
298	@code{AddHandler cgi-script ... .plx}
299
300	You will also need to make sure you have the necessary modules loaded to run
301	CGI scripts; mod_cgi and mod_mime should be sufficient. Your @file{httpd.conf}
302	should have the relevant @code{LoadModule cgi_module modules/mod_cgi.so} and
303	@code{LoadModule mime_module modules/mod_mime.so} lines; uncomment them if
304	necessary.
305
306	Next, you need to put a copy of @file{regress.plx} in the @code{DocumentRoot}
307	directory @code{/var/www/html} or it subdirectories where you plan to serve the
308	html views from.
309
310	You will also need to install the Perl module GD
311	(@url{http://search.cpan.org/dist/GD/}), available from CPAN.
312
313	Finally, run @file{regression/regress.pl} to create the xml data used to
314	generate the html views (to do all regression tests run
315	@file{regression/regress.pl -a 1}); then, copy the @file{html/} directory to
316	the same directory as @file{regress.plx} resides in.
317
318	At this point, you should have a working copy of the html regression views.
319
320	Additional notes for Debian users: The Perl GD module can be installed
321	by @code{apt-get install libgd-perl}. It may suffice to add this to
322	the apache2 configuration:
323
324	@example
325	<Directory "/var/www/regression">
326	Options +ExecCGI
327	AddHandler cgi-script .plx
328	RedirectMatch ^/regression$ /regression/regress.plx
329	</Directory>
330	@end example
331
332	and then make a link from @file{/var/www/regression} to the GNU Go
333	regression directory. The @code{RedirectMatch} statement is only
334	needed to set up a shorter entry URL.