Commit | Line | Data |
---|---|---|
7eeb782e AT |
1 | @cindex Monte Carlo Go |
2 | @cindex UCT algorithm | |
3 | In Monte Carlo Go the engine plays random games to the | |
4 | end, generating moves from a pattern database within | |
5 | the context of the algorithm UCT (upper confidence | |
6 | bounds applied to trees). This algorithm allowed the | |
7 | program MoGo (@uref{http://www.lri.fr/~gelly/MoGo.htm}, | |
8 | to become the first computer program to defeat a | |
9 | professional while taking a 9 stone handicap | |
10 | (@uref{http://senseis.xmp.net/?MoGo}). | |
11 | ||
12 | GNU Go 3.8 can play 9x9 Go with the option | |
13 | @option{--monte-carlo} using the UCT algorithm. | |
14 | For command line options, see @xref{Invoking GNU Go}. | |
15 | ||
16 | During reading, the engine makes incremental updates | |
17 | of local 3x3 neighborhood, suicide status, self-atari | |
18 | status, and number of stones captured, for each move. | |
19 | ||
20 | GNU Go's simulations (Monte Carlo games) are pattern generated. | |
21 | The random playout move generation is distributed | |
22 | strictly proportional to move values computed by table | |
23 | lookup from a local context consisting of 3x3 | |
24 | neighborhood, opponent suicide status, own and | |
25 | opponent self-atari status, number of stones captured | |
26 | by own and opponent move, and closeness to the | |
27 | previous move. Let's call this local context simply "a | |
28 | pattern" and the table "pattern values" or simply | |
29 | "patterns". | |
30 | ||
31 | There are three built-in databases that you can select | |
32 | using the option @option{--mc-patterns <name>}, where | |
33 | @option{<name>} is one of | |
34 | ||
35 | @itemize | |
36 | @item @command{mc_montegnu_classic} | |
37 | @item @command{mc_mogo_classic} | |
38 | @item @command{mc_uniform} | |
39 | @end itemize | |
40 | ||
41 | The first of these is an approximation of the previous random move | |
42 | generation algorithm. The @command{mogo_classic} pattern values is an | |
43 | approximation of the simulation policy used by early versions of MoGo, | |
44 | as published in the report @uref{http://hal.inria.fr/inria-00117266, | |
45 | odification of UCT with Patterns in Monte-Carlo Go} | |
46 | RR-6062, by Sylvain Gelly, Yizao Wang, RĂ©mi Munos, and | |
47 | Olivier Teytaud. The uniform pattern values is the so | |
48 | called "light" playout which chooses uniformly between | |
49 | all legal moves except single point proper eyes. | |
50 | ||
51 | If you're not satisfied with these you can also tune your own | |
52 | pattern values with a pattern database file and load it at runtime | |
53 | with @option{--mc-load-patterns <name>} adding your own | |
54 | pattern database. | |
55 | ||
56 | Let's start with the uniform pattern values. Those are defined by the | |
57 | file @file{patterns/mc_uniform.db}, which looks like this: | |
58 | ||
59 | @example | |
60 | ||
61 | oOo | |
62 | O*O | |
63 | oO? | |
64 | ||
65 | :0 | |
66 | ||
67 | oOo | |
68 | O*O | |
69 | --- | |
70 | ||
71 | :0 | |
72 | ||
73 | |Oo | |
74 | |*O | |
75 | +-- | |
76 | ||
77 | :0 | |
78 | @end example | |
79 | ||
80 | Patterns are always exactly 3x3 in size with the move at the center | |
81 | point. The symbols are the usual for GNU Go pattern databases: | |
82 | ||
83 | @example | |
84 | * move | |
85 | O own stone (i.e. the same color as the color to move) | |
86 | o own stone or empty | |
87 | X opponent stone | |
88 | x opponent stone or empty | |
89 | ? own stone, opponent stone, or empty | |
90 | | vertical edge | |
91 | - horizontal edge | |
92 | + corner | |
93 | @end example | |
94 | ||
95 | There's also a new symbol: | |
96 | ||
97 | @example | |
98 | % own stone, opponent stone, empty, or edge | |
99 | @end example | |
100 | ||
101 | After the pattern comes a line starting with a colon. In all these | |
102 | patterns it says that the pattern has a move value of 0, i.e. must not | |
103 | be played. Unmatched patterns have a default value of 1. When all move | |
104 | values are zero for both players, the playout will stop. Including the | |
105 | three patterns above is important because otherwise the playouts would | |
106 | be likely to go on indefinitely, or as it actually happens be | |
107 | terminated at a hard-coded limit of 600 moves. Also place these | |
108 | patterns at the top of the database because when multiple patterns | |
109 | match, the first one is used, regardless of the values. | |
110 | ||
111 | When using only these patterns you will probably notice that it plays | |
112 | rather heavy, trying hard to be solidly connected. This is because | |
113 | uniform playouts are badly biased with a high probability of non-solid | |
114 | connections being cut apart. To counter this you could try a pattern | |
115 | like | |
116 | ||
117 | @example | |
118 | ?X? | |
119 | O*O | |
120 | x.? | |
121 | ||
122 | :20,near | |
123 | @end example | |
124 | ||
125 | to increase the probability that the one-point jump is reinforced when | |
126 | threatened. Here we added the property "near", which means that the | |
127 | pattern only applies if the previous move was played "near" this move. | |
128 | Primarily "near" means within the surrounding 3x3 neighborhood but it | |
129 | also includes certain cases of liberties of low-liberty strings | |
130 | adjacent to the previous move, e.g. the move to extend out of an atari | |
131 | created by the previous move. You have to read the source to find out | |
132 | the exact rules for nearness. | |
133 | ||
134 | We could also be even more specific and say | |
135 | ||
136 | @example | |
137 | ?X? | |
138 | O*O | |
139 | x.? | |
140 | ||
141 | :20,near,osafe,xsafe | |
142 | @end example | |
143 | ||
144 | to exclude the cases where this move is a self atari (osafe) or would | |
145 | be a self-atari for the opponent (xsafe). | |
146 | ||
147 | It may also be interesting to see the effect of capturing stones. A | |
148 | catch-all pattern for captures would be | |
149 | ||
150 | @example | |
151 | ?X% | |
152 | ?*% | |
153 | %%% | |
154 | ||
155 | :10,ocap1,osafe | |
156 | :20,ocap2 | |
157 | :30,ocap3 | |
158 | @end example | |
159 | ||
160 | where we have used multiple colon lines to specify different move | |
161 | values depending on the number of captured stones; value 10 for a | |
162 | single captured stone, value 20 for two captured stones, and value 30 | |
163 | for three or more captured stones. Here we also excluded self-atari | |
164 | moves in the case of 1 captured stone in order to avoid getting stuck | |
165 | in triple-ko in the playouts (there's no superko detection in the | |
166 | playouts). | |
167 | ||
168 | The full set of pattern properties is as follows: | |
169 | ||
170 | @ftable @code | |
171 | @item near | |
172 | The move is "near" the previous move. | |
173 | ||
174 | @item far | |
175 | The move is not "near" the previous move. | |
176 | ||
177 | @item osafe | |
178 | The move is not a self-atari. | |
179 | ||
180 | @item ounsafe | |
181 | The move is a self-atari. | |
182 | ||
183 | @item xsafe | |
184 | The move would not be a self-atari for the opponent. | |
185 | ||
186 | @item xunsafe | |
187 | The move would be a self-atari for the opponent. | |
188 | ||
189 | @item xsuicide | |
190 | The move would be suicide for the opponent | |
191 | ||
192 | @item xnosuicide | |
193 | The move would not be suicide for the opponent. | |
194 | ||
195 | @item ocap0 | |
196 | The move captures zero stones. | |
197 | ||
198 | @item ocap1 | |
199 | The move captures one stone. | |
200 | ||
201 | @item ocap2 | |
202 | The move captures two stones. | |
203 | ||
204 | @item ocap3 | |
205 | The move captures three or more stones. | |
206 | ||
207 | @item ocap1+ | |
208 | The move captures one or more stones. | |
209 | ||
210 | @item ocap1- | |
211 | The move captures at most one stone. | |
212 | ||
213 | @item ocap2+ | |
214 | The move captures two or more stones. | |
215 | ||
216 | @item ocap2- | |
217 | The move captures at most two stones. | |
218 | ||
219 | @item xcap0 | |
220 | An opponent move would capture zero stones. | |
221 | ||
222 | @item xcap1 | |
223 | An opponent move would capture one stone. | |
224 | ||
225 | @item xcap2 | |
226 | An opponent move would capture two stones. | |
227 | ||
228 | @item xcap3 | |
229 | An opponent move would capture three or more stones. | |
230 | ||
231 | @item xcap1+ | |
232 | An opponent move would capture one or more stones. | |
233 | ||
234 | @item xcap1- | |
235 | An opponent move would capture at most one stone. | |
236 | ||
237 | @item xcap2+ | |
238 | An opponent move would capture two or more stones. | |
239 | ||
240 | @item xcap2- | |
241 | An opponent move would capture at most two stones. | |
242 | @end ftable | |
243 | ||
244 | These can be combined arbitrarily but all must be satisfied for the | |
245 | pattern to take effect. If contradictory properties are combined, the | |
246 | pattern will never match. | |
247 | ||
2b8d5b93 | 248 | @section Final Remarks |
7eeb782e AT |
249 | |
250 | @itemize | |
251 | @item Move values are unsigned 32-bit integers. To avoid overflow in | |
252 | computations it is highly recommended to keep the values below | |
253 | 10000000 or so. | |
254 | @item There is no speed penalty for having lots of patterns in the | |
255 | database. The average time per move is approximately constant | |
256 | (slightly dependent on how often stones are captured or become low | |
257 | on liberties) and the time per game mostly depends on the average | |
258 | game length. | |
259 | @item For more complex pattern databases, see | |
260 | @file{patterns/mc_montegnu_classic.db} and @file{patterns/mc_mogo_classic.db}. | |
261 | @end itemize | |
262 | ||
263 | Nobody really knows how to tune the random playouts to get as strong | |
264 | engine as possible. Please play with this and report any interesting | |
265 | findings, especially if you're able to make it substantially stronger | |
266 | than the @file{montegnu_classic} patterns. |