Minor changes to documentation to enable building with newer texinfo.
[sgk-go] / doc / montecarlo.texi
CommitLineData
7eeb782e
AT
1@cindex Monte Carlo Go
2@cindex UCT algorithm
3In Monte Carlo Go the engine plays random games to the
4end, generating moves from a pattern database within
5the context of the algorithm UCT (upper confidence
6bounds applied to trees). This algorithm allowed the
7program MoGo (@uref{http://www.lri.fr/~gelly/MoGo.htm},
8to become the first computer program to defeat a
9professional while taking a 9 stone handicap
10(@uref{http://senseis.xmp.net/?MoGo}).
11
12GNU Go 3.8 can play 9x9 Go with the option
13@option{--monte-carlo} using the UCT algorithm.
14For command line options, see @xref{Invoking GNU Go}.
15
16During reading, the engine makes incremental updates
17of local 3x3 neighborhood, suicide status, self-atari
18status, and number of stones captured, for each move.
19
20GNU Go's simulations (Monte Carlo games) are pattern generated.
21The random playout move generation is distributed
22strictly proportional to move values computed by table
23lookup from a local context consisting of 3x3
24neighborhood, opponent suicide status, own and
25opponent self-atari status, number of stones captured
26by own and opponent move, and closeness to the
27previous move. Let's call this local context simply "a
28pattern" and the table "pattern values" or simply
29"patterns".
30
31There are three built-in databases that you can select
32using the option @option{--mc-patterns <name>}, where
33@option{<name>} is one of
34
35@itemize
36@item @command{mc_montegnu_classic}
37@item @command{mc_mogo_classic}
38@item @command{mc_uniform}
39@end itemize
40
41The first of these is an approximation of the previous random move
42generation algorithm. The @command{mogo_classic} pattern values is an
43approximation of the simulation policy used by early versions of MoGo,
44as published in the report @uref{http://hal.inria.fr/inria-00117266,
45odification of UCT with Patterns in Monte-Carlo Go}
46RR-6062, by Sylvain Gelly, Yizao Wang, RĂ©mi Munos, and
47Olivier Teytaud. The uniform pattern values is the so
48called "light" playout which chooses uniformly between
49all legal moves except single point proper eyes.
50
51If you're not satisfied with these you can also tune your own
52pattern values with a pattern database file and load it at runtime
53with @option{--mc-load-patterns <name>} adding your own
54pattern database.
55
56Let's start with the uniform pattern values. Those are defined by the
57file @file{patterns/mc_uniform.db}, which looks like this:
58
59@example
60
61oOo
62O*O
63oO?
64
65:0
66
67oOo
68O*O
69---
70
71:0
72
73|Oo
74|*O
75+--
76
77:0
78@end example
79
80Patterns are always exactly 3x3 in size with the move at the center
81point. The symbols are the usual for GNU Go pattern databases:
82
83@example
84* move
85O own stone (i.e. the same color as the color to move)
86o own stone or empty
87X opponent stone
88x opponent stone or empty
89? own stone, opponent stone, or empty
90| vertical edge
91- horizontal edge
92+ corner
93@end example
94
95There's also a new symbol:
96
97@example
98% own stone, opponent stone, empty, or edge
99@end example
100
101After the pattern comes a line starting with a colon. In all these
102patterns it says that the pattern has a move value of 0, i.e. must not
103be played. Unmatched patterns have a default value of 1. When all move
104values are zero for both players, the playout will stop. Including the
105three patterns above is important because otherwise the playouts would
106be likely to go on indefinitely, or as it actually happens be
107terminated at a hard-coded limit of 600 moves. Also place these
108patterns at the top of the database because when multiple patterns
109match, the first one is used, regardless of the values.
110
111When using only these patterns you will probably notice that it plays
112rather heavy, trying hard to be solidly connected. This is because
113uniform playouts are badly biased with a high probability of non-solid
114connections being cut apart. To counter this you could try a pattern
115like
116
117@example
118?X?
119O*O
120x.?
121
122:20,near
123@end example
124
125to increase the probability that the one-point jump is reinforced when
126threatened. Here we added the property "near", which means that the
127pattern only applies if the previous move was played "near" this move.
128Primarily "near" means within the surrounding 3x3 neighborhood but it
129also includes certain cases of liberties of low-liberty strings
130adjacent to the previous move, e.g. the move to extend out of an atari
131created by the previous move. You have to read the source to find out
132the exact rules for nearness.
133
134We could also be even more specific and say
135
136@example
137?X?
138O*O
139x.?
140
141:20,near,osafe,xsafe
142@end example
143
144to exclude the cases where this move is a self atari (osafe) or would
145be a self-atari for the opponent (xsafe).
146
147It may also be interesting to see the effect of capturing stones. A
148catch-all pattern for captures would be
149
150@example
151?X%
152?*%
153%%%
154
155:10,ocap1,osafe
156:20,ocap2
157:30,ocap3
158@end example
159
160where we have used multiple colon lines to specify different move
161values depending on the number of captured stones; value 10 for a
162single captured stone, value 20 for two captured stones, and value 30
163for three or more captured stones. Here we also excluded self-atari
164moves in the case of 1 captured stone in order to avoid getting stuck
165in triple-ko in the playouts (there's no superko detection in the
166playouts).
167
168The full set of pattern properties is as follows:
169
170@ftable @code
171@item near
172The move is "near" the previous move.
173
174@item far
175The move is not "near" the previous move.
176
177@item osafe
178The move is not a self-atari.
179
180@item ounsafe
181The move is a self-atari.
182
183@item xsafe
184The move would not be a self-atari for the opponent.
185
186@item xunsafe
187The move would be a self-atari for the opponent.
188
189@item xsuicide
190The move would be suicide for the opponent
191
192@item xnosuicide
193The move would not be suicide for the opponent.
194
195@item ocap0
196The move captures zero stones.
197
198@item ocap1
199The move captures one stone.
200
201@item ocap2
202The move captures two stones.
203
204@item ocap3
205The move captures three or more stones.
206
207@item ocap1+
208The move captures one or more stones.
209
210@item ocap1-
211The move captures at most one stone.
212
213@item ocap2+
214The move captures two or more stones.
215
216@item ocap2-
217The move captures at most two stones.
218
219@item xcap0
220An opponent move would capture zero stones.
221
222@item xcap1
223An opponent move would capture one stone.
224
225@item xcap2
226An opponent move would capture two stones.
227
228@item xcap3
229An opponent move would capture three or more stones.
230
231@item xcap1+
232An opponent move would capture one or more stones.
233
234@item xcap1-
235An opponent move would capture at most one stone.
236
237@item xcap2+
238An opponent move would capture two or more stones.
239
240@item xcap2-
241An opponent move would capture at most two stones.
242@end ftable
243
244These can be combined arbitrarily but all must be satisfied for the
245pattern to take effect. If contradictory properties are combined, the
246pattern will never match.
247
2b8d5b93 248@section Final Remarks
7eeb782e
AT
249
250@itemize
251@item Move values are unsigned 32-bit integers. To avoid overflow in
252computations it is highly recommended to keep the values below
25310000000 or so.
254@item There is no speed penalty for having lots of patterns in the
255database. The average time per move is approximately constant
256(slightly dependent on how often stones are captured or become low
257on liberties) and the time per game mostly depends on the average
258game length.
259@item For more complex pattern databases, see
260@file{patterns/mc_montegnu_classic.db} and @file{patterns/mc_mogo_classic.db}.
261@end itemize
262
263Nobody really knows how to tune the random playouts to get as strong
264engine as possible. Please play with this and report any interesting
265findings, especially if you're able to make it substantially stronger
266than the @file{montegnu_classic} patterns.