[sgk-go] / doc / montecarlo.texi

@cindex Monte Carlo Go
@cindex UCT algorithm
In Monte Carlo Go the engine plays random games to the
end, generating moves from a pattern database within
the context of the algorithm UCT (upper confidence
bounds applied to trees).  This algorithm allowed the
program MoGo (@uref{http://www.lri.fr/~gelly/MoGo.htm}, 
to become the first computer program to defeat a
professional while taking a 9 stone handicap
(@uref{http://senseis.xmp.net/?MoGo}).

GNU Go 3.8 can play 9x9 Go with the option
@option{--monte-carlo} using the UCT algorithm.
For command line options, see @xref{Invoking GNU Go}.

During reading, the engine makes incremental updates
of local 3x3 neighborhood, suicide status, self-atari
status, and number of stones captured, for each move.

GNU Go's simulations (Monte Carlo games) are pattern generated.
The random playout move generation is distributed
strictly proportional to move values computed by table
lookup from a local context consisting of 3x3
neighborhood, opponent suicide status, own and
opponent self-atari status, number of stones captured
by own and opponent move, and closeness to the
previous move. Let's call this local context simply "a
pattern" and the table "pattern values" or simply
"patterns".

There are three built-in databases that you can select
using the option @option{--mc-patterns <name>}, where
@option{<name>} is one of

@itemize
@item @command{mc_montegnu_classic}
@item @command{mc_mogo_classic}
@item @command{mc_uniform}
@end itemize

The first of these is an approximation of the previous random move
generation algorithm. The @command{mogo_classic} pattern values is an
approximation of the simulation policy used by early versions of MoGo,
as published in the report @uref{http://hal.inria.fr/inria-00117266,
odification of UCT with Patterns in Monte-Carlo Go}
RR-6062, by Sylvain Gelly, Yizao Wang, Rémi Munos, and
Olivier Teytaud. The uniform pattern values is the so
called "light" playout which chooses uniformly between
all legal moves except single point proper eyes.

If you're not satisfied with these you can also tune your own
pattern values with a pattern database file and load it at runtime
with @option{--mc-load-patterns <name>} adding your own
pattern database.

Let's start with the uniform pattern values. Those are defined by the
file @file{patterns/mc_uniform.db}, which looks like this:

@example

oOo
O*O
oO?

:0

oOo
O*O
---

:0

|Oo
|*O
+--

:0
@end example

Patterns are always exactly 3x3 in size with the move at the center
point. The symbols are the usual for GNU Go pattern databases:

@example
* move
O own stone (i.e. the same color as the color to move)
o own stone or empty
X opponent stone
x opponent stone or empty
? own stone, opponent stone, or empty
| vertical edge
- horizontal edge
+ corner
@end example

There's also a new symbol:

@example
% own stone, opponent stone, empty, or edge
@end example

After the pattern comes a line starting with a colon. In all these
patterns it says that the pattern has a move value of 0, i.e. must not
be played. Unmatched patterns have a default value of 1. When all move
values are zero for both players, the playout will stop. Including the
three patterns above is important because otherwise the playouts would
be likely to go on indefinitely, or as it actually happens be
terminated at a hard-coded limit of 600 moves. Also place these
patterns at the top of the database because when multiple patterns
match, the first one is used, regardless of the values.

When using only these patterns you will probably notice that it plays
rather heavy, trying hard to be solidly connected. This is because
uniform playouts are badly biased with a high probability of non-solid
connections being cut apart. To counter this you could try a pattern
like

@example
?X?
O*O
x.?

:20,near
@end example

to increase the probability that the one-point jump is reinforced when
threatened. Here we added the property "near", which means that the
pattern only applies if the previous move was played "near" this move.
Primarily "near" means within the surrounding 3x3 neighborhood but it
also includes certain cases of liberties of low-liberty strings
adjacent to the previous move, e.g. the move to extend out of an atari
created by the previous move. You have to read the source to find out
the exact rules for nearness.

We could also be even more specific and say

@example
?X?
O*O
x.?

:20,near,osafe,xsafe
@end example

to exclude the cases where this move is a self atari (osafe) or would
be a self-atari for the opponent (xsafe).

It may also be interesting to see the effect of capturing stones. A
catch-all pattern for captures would be

@example
?X%
?*%
%%%

:10,ocap1,osafe
:20,ocap2
:30,ocap3
@end example

where we have used multiple colon lines to specify different move
values depending on the number of captured stones; value 10 for a
single captured stone, value 20 for two captured stones, and value 30
for three or more captured stones. Here we also excluded self-atari
moves in the case of 1 captured stone in order to avoid getting stuck
in triple-ko in the playouts (there's no superko detection in the
playouts).

The full set of pattern properties is as follows:

@ftable @code
@item near
The move is "near" the previous move.

@item far
The move is not "near" the previous move.

@item osafe
The move is not a self-atari.

@item ounsafe
The move is a self-atari.

@item xsafe
The move would not be a self-atari for the opponent.

@item xunsafe
The move would be a self-atari for the opponent.

@item xsuicide
The move would be suicide for the opponent

@item xnosuicide
The move would not be suicide for the opponent.

@item ocap0
The move captures zero stones.

@item ocap1
The move captures one stone.

@item ocap2
The move captures two stones.

@item ocap3
The move captures three or more stones.

@item ocap1+
The move captures one or more stones.

@item ocap1-
The move captures at most one stone.

@item ocap2+
The move captures two or more stones.

@item ocap2-
The move captures at most two stones.

@item xcap0
An opponent move would capture zero stones.

@item xcap1
An opponent move would capture one stone.

@item xcap2
An opponent move would capture two stones.

@item xcap3
An opponent move would capture three or more stones.

@item xcap1+
An opponent move would capture one or more stones.

@item xcap1-
An opponent move would capture at most one stone.

@item xcap2+
An opponent move would capture two or more stones.

@item xcap2-
An opponent move would capture at most two stones.
@end ftable

These can be combined arbitrarily but all must be satisfied for the
pattern to take effect. If contradictory properties are combined, the
pattern will never match.

@section Final Remarks

@itemize
@item  Move values are unsigned 32-bit integers. To avoid overflow in
computations it is highly recommended to keep the values below
10000000 or so.
@item There is no speed penalty for having lots of patterns in the
database. The average time per move is approximately constant
(slightly dependent on how often stones are captured or become low
on liberties) and the time per game mostly depends on the average
game length.
@item For more complex pattern databases, see
@file{patterns/mc_montegnu_classic.db} and @file{patterns/mc_mogo_classic.db}.
@end itemize

Nobody really knows how to tune the random playouts to get as strong
engine as possible. Please play with this and report any interesting
findings, especially if you're able to make it substantially stronger
than the @file{montegnu_classic} patterns.
Commit	Line	Data
7eeb782e AT	1	@cindex Monte Carlo Go
	2	@cindex UCT algorithm
	3	In Monte Carlo Go the engine plays random games to the
	4	end, generating moves from a pattern database within
	5	the context of the algorithm UCT (upper confidence
	6	bounds applied to trees). This algorithm allowed the
	7	program MoGo (@uref{http://www.lri.fr/~gelly/MoGo.htm},
	8	to become the first computer program to defeat a
	9	professional while taking a 9 stone handicap
	10	(@uref{http://senseis.xmp.net/?MoGo}).
	11
	12	GNU Go 3.8 can play 9x9 Go with the option
	13	@option{--monte-carlo} using the UCT algorithm.
	14	For command line options, see @xref{Invoking GNU Go}.
	15
	16	During reading, the engine makes incremental updates
	17	of local 3x3 neighborhood, suicide status, self-atari
	18	status, and number of stones captured, for each move.
	19
	20	GNU Go's simulations (Monte Carlo games) are pattern generated.
	21	The random playout move generation is distributed
	22	strictly proportional to move values computed by table
	23	lookup from a local context consisting of 3x3
	24	neighborhood, opponent suicide status, own and
	25	opponent self-atari status, number of stones captured
	26	by own and opponent move, and closeness to the
	27	previous move. Let's call this local context simply "a
	28	pattern" and the table "pattern values" or simply
	29	"patterns".
	30
	31	There are three built-in databases that you can select
	32	using the option @option{--mc-patterns <name>}, where
	33	@option{<name>} is one of
	34
	35	@itemize
	36	@item @command{mc_montegnu_classic}
	37	@item @command{mc_mogo_classic}
	38	@item @command{mc_uniform}
	39	@end itemize
	40
	41	The first of these is an approximation of the previous random move
	42	generation algorithm. The @command{mogo_classic} pattern values is an
	43	approximation of the simulation policy used by early versions of MoGo,
	44	as published in the report @uref{http://hal.inria.fr/inria-00117266,
	45	odification of UCT with Patterns in Monte-Carlo Go}
	46	RR-6062, by Sylvain Gelly, Yizao Wang, Rémi Munos, and
	47	Olivier Teytaud. The uniform pattern values is the so
	48	called "light" playout which chooses uniformly between
	49	all legal moves except single point proper eyes.
	50
	51	If you're not satisfied with these you can also tune your own
	52	pattern values with a pattern database file and load it at runtime
	53	with @option{--mc-load-patterns <name>} adding your own
	54	pattern database.
	55
	56	Let's start with the uniform pattern values. Those are defined by the
	57	file @file{patterns/mc_uniform.db}, which looks like this:
	58
	59	@example
	60
	61	oOo
	62	O*O
	63	oO?
	64
65	:0
66
67	oOo
68	O*O
69	---
70
71	:0
72
73	\|Oo
74	\|*O
75	+--
76
77	:0
78	@end example
79
80	Patterns are always exactly 3x3 in size with the move at the center
81	point. The symbols are the usual for GNU Go pattern databases:
82
83	@example
84	* move
85	O own stone (i.e. the same color as the color to move)
86	o own stone or empty
87	X opponent stone
88	x opponent stone or empty
89	? own stone, opponent stone, or empty
90	\| vertical edge
91	- horizontal edge
92	+ corner
93	@end example
94
95	There's also a new symbol:
96
97	@example
98	% own stone, opponent stone, empty, or edge
99	@end example
100
101	After the pattern comes a line starting with a colon. In all these
102	patterns it says that the pattern has a move value of 0, i.e. must not
103	be played. Unmatched patterns have a default value of 1. When all move
104	values are zero for both players, the playout will stop. Including the
105	three patterns above is important because otherwise the playouts would
106	be likely to go on indefinitely, or as it actually happens be
107	terminated at a hard-coded limit of 600 moves. Also place these
108	patterns at the top of the database because when multiple patterns
109	match, the first one is used, regardless of the values.
110
111	When using only these patterns you will probably notice that it plays
112	rather heavy, trying hard to be solidly connected. This is because
113	uniform playouts are badly biased with a high probability of non-solid
114	connections being cut apart. To counter this you could try a pattern
115	like
116
117	@example
118	?X?
119	O*O
120	x.?
121
122	:20,near
123	@end example
124
125	to increase the probability that the one-point jump is reinforced when
126	threatened. Here we added the property "near", which means that the
127	pattern only applies if the previous move was played "near" this move.
128	Primarily "near" means within the surrounding 3x3 neighborhood but it
129	also includes certain cases of liberties of low-liberty strings
130	adjacent to the previous move, e.g. the move to extend out of an atari
131	created by the previous move. You have to read the source to find out
132	the exact rules for nearness.
133
134	We could also be even more specific and say
135
136	@example
137	?X?
138	O*O
139	x.?
140
141	:20,near,osafe,xsafe
142	@end example
143
144	to exclude the cases where this move is a self atari (osafe) or would
145	be a self-atari for the opponent (xsafe).
146
147	It may also be interesting to see the effect of capturing stones. A
148	catch-all pattern for captures would be
149
150	@example
151	?X%
152	?*%
153	%%%
154
155	:10,ocap1,osafe
156	:20,ocap2
157	:30,ocap3
158	@end example
159
160	where we have used multiple colon lines to specify different move
161	values depending on the number of captured stones; value 10 for a
162	single captured stone, value 20 for two captured stones, and value 30
163	for three or more captured stones. Here we also excluded self-atari
164	moves in the case of 1 captured stone in order to avoid getting stuck
165	in triple-ko in the playouts (there's no superko detection in the
166	playouts).
167
168	The full set of pattern properties is as follows:
169
170	@ftable @code
171	@item near
172	The move is "near" the previous move.
173
174	@item far
175	The move is not "near" the previous move.
176
177	@item osafe
178	The move is not a self-atari.
179
180	@item ounsafe
181	The move is a self-atari.
182
183	@item xsafe
184	The move would not be a self-atari for the opponent.
185
186	@item xunsafe
187	The move would be a self-atari for the opponent.
188
189	@item xsuicide
190	The move would be suicide for the opponent
191
192	@item xnosuicide
193	The move would not be suicide for the opponent.
194
195	@item ocap0
196	The move captures zero stones.
197
198	@item ocap1
199	The move captures one stone.
200
201	@item ocap2
202	The move captures two stones.
203
204	@item ocap3
205	The move captures three or more stones.
206
207	@item ocap1+
208	The move captures one or more stones.
209
210	@item ocap1-
211	The move captures at most one stone.
212
213	@item ocap2+
214	The move captures two or more stones.
215
216	@item ocap2-
217	The move captures at most two stones.
218
219	@item xcap0
220	An opponent move would capture zero stones.
221
222	@item xcap1
223	An opponent move would capture one stone.
224
225	@item xcap2
226	An opponent move would capture two stones.
227
228	@item xcap3
229	An opponent move would capture three or more stones.
230
231	@item xcap1+
232	An opponent move would capture one or more stones.
233
234	@item xcap1-
235	An opponent move would capture at most one stone.
236
237	@item xcap2+
238	An opponent move would capture two or more stones.
239
240	@item xcap2-
241	An opponent move would capture at most two stones.
242	@end ftable
243
244	These can be combined arbitrarily but all must be satisfied for the
245	pattern to take effect. If contradictory properties are combined, the
246	pattern will never match.
247
2b8d5b93	248	@section Final Remarks
7eeb782e AT	249
	250	@itemize
	251	@item Move values are unsigned 32-bit integers. To avoid overflow in
	252	computations it is highly recommended to keep the values below
	253	10000000 or so.
	254	@item There is no speed penalty for having lots of patterns in the
	255	database. The average time per move is approximately constant
	256	(slightly dependent on how often stones are captured or become low
	257	on liberties) and the time per game mostly depends on the average
	258	game length.
	259	@item For more complex pattern databases, see
	260	@file{patterns/mc_montegnu_classic.db} and @file{patterns/mc_mogo_classic.db}.
	261	@end itemize
	262
	263	Nobody really knows how to tune the random playouts to get as strong
	264	engine as possible. Please play with this and report any interesting
	265	findings, especially if you're able to make it substantially stronger
	266	than the @file{montegnu_classic} patterns.