| 1 | @cindex Monte Carlo Go |
| 2 | @cindex UCT algorithm |
| 3 | In Monte Carlo Go the engine plays random games to the |
| 4 | end, generating moves from a pattern database within |
| 5 | the context of the algorithm UCT (upper confidence |
| 6 | bounds applied to trees). This algorithm allowed the |
| 7 | program MoGo (@uref{http://www.lri.fr/~gelly/MoGo.htm}, |
| 8 | to become the first computer program to defeat a |
| 9 | professional while taking a 9 stone handicap |
| 10 | (@uref{http://senseis.xmp.net/?MoGo}). |
| 11 | |
| 12 | GNU Go 3.8 can play 9x9 Go with the option |
| 13 | @option{--monte-carlo} using the UCT algorithm. |
| 14 | For command line options, see @xref{Invoking GNU Go}. |
| 15 | |
| 16 | During reading, the engine makes incremental updates |
| 17 | of local 3x3 neighborhood, suicide status, self-atari |
| 18 | status, and number of stones captured, for each move. |
| 19 | |
| 20 | GNU Go's simulations (Monte Carlo games) are pattern generated. |
| 21 | The random playout move generation is distributed |
| 22 | strictly proportional to move values computed by table |
| 23 | lookup from a local context consisting of 3x3 |
| 24 | neighborhood, opponent suicide status, own and |
| 25 | opponent self-atari status, number of stones captured |
| 26 | by own and opponent move, and closeness to the |
| 27 | previous move. Let's call this local context simply "a |
| 28 | pattern" and the table "pattern values" or simply |
| 29 | "patterns". |
| 30 | |
| 31 | There are three built-in databases that you can select |
| 32 | using the option @option{--mc-patterns <name>}, where |
| 33 | @option{<name>} is one of |
| 34 | |
| 35 | @itemize |
| 36 | @item @command{mc_montegnu_classic} |
| 37 | @item @command{mc_mogo_classic} |
| 38 | @item @command{mc_uniform} |
| 39 | @end itemize |
| 40 | |
| 41 | The first of these is an approximation of the previous random move |
| 42 | generation algorithm. The @command{mogo_classic} pattern values is an |
| 43 | approximation of the simulation policy used by early versions of MoGo, |
| 44 | as published in the report @uref{http://hal.inria.fr/inria-00117266, |
| 45 | odification of UCT with Patterns in Monte-Carlo Go} |
| 46 | RR-6062, by Sylvain Gelly, Yizao Wang, RĂ©mi Munos, and |
| 47 | Olivier Teytaud. The uniform pattern values is the so |
| 48 | called "light" playout which chooses uniformly between |
| 49 | all legal moves except single point proper eyes. |
| 50 | |
| 51 | If you're not satisfied with these you can also tune your own |
| 52 | pattern values with a pattern database file and load it at runtime |
| 53 | with @option{--mc-load-patterns <name>} adding your own |
| 54 | pattern database. |
| 55 | |
| 56 | Let's start with the uniform pattern values. Those are defined by the |
| 57 | file @file{patterns/mc_uniform.db}, which looks like this: |
| 58 | |
| 59 | @example |
| 60 | |
| 61 | oOo |
| 62 | O*O |
| 63 | oO? |
| 64 | |
| 65 | :0 |
| 66 | |
| 67 | oOo |
| 68 | O*O |
| 69 | --- |
| 70 | |
| 71 | :0 |
| 72 | |
| 73 | |Oo |
| 74 | |*O |
| 75 | +-- |
| 76 | |
| 77 | :0 |
| 78 | @end example |
| 79 | |
| 80 | Patterns are always exactly 3x3 in size with the move at the center |
| 81 | point. The symbols are the usual for GNU Go pattern databases: |
| 82 | |
| 83 | @example |
| 84 | * move |
| 85 | O own stone (i.e. the same color as the color to move) |
| 86 | o own stone or empty |
| 87 | X opponent stone |
| 88 | x opponent stone or empty |
| 89 | ? own stone, opponent stone, or empty |
| 90 | | vertical edge |
| 91 | - horizontal edge |
| 92 | + corner |
| 93 | @end example |
| 94 | |
| 95 | There's also a new symbol: |
| 96 | |
| 97 | @example |
| 98 | % own stone, opponent stone, empty, or edge |
| 99 | @end example |
| 100 | |
| 101 | After the pattern comes a line starting with a colon. In all these |
| 102 | patterns it says that the pattern has a move value of 0, i.e. must not |
| 103 | be played. Unmatched patterns have a default value of 1. When all move |
| 104 | values are zero for both players, the playout will stop. Including the |
| 105 | three patterns above is important because otherwise the playouts would |
| 106 | be likely to go on indefinitely, or as it actually happens be |
| 107 | terminated at a hard-coded limit of 600 moves. Also place these |
| 108 | patterns at the top of the database because when multiple patterns |
| 109 | match, the first one is used, regardless of the values. |
| 110 | |
| 111 | When using only these patterns you will probably notice that it plays |
| 112 | rather heavy, trying hard to be solidly connected. This is because |
| 113 | uniform playouts are badly biased with a high probability of non-solid |
| 114 | connections being cut apart. To counter this you could try a pattern |
| 115 | like |
| 116 | |
| 117 | @example |
| 118 | ?X? |
| 119 | O*O |
| 120 | x.? |
| 121 | |
| 122 | :20,near |
| 123 | @end example |
| 124 | |
| 125 | to increase the probability that the one-point jump is reinforced when |
| 126 | threatened. Here we added the property "near", which means that the |
| 127 | pattern only applies if the previous move was played "near" this move. |
| 128 | Primarily "near" means within the surrounding 3x3 neighborhood but it |
| 129 | also includes certain cases of liberties of low-liberty strings |
| 130 | adjacent to the previous move, e.g. the move to extend out of an atari |
| 131 | created by the previous move. You have to read the source to find out |
| 132 | the exact rules for nearness. |
| 133 | |
| 134 | We could also be even more specific and say |
| 135 | |
| 136 | @example |
| 137 | ?X? |
| 138 | O*O |
| 139 | x.? |
| 140 | |
| 141 | :20,near,osafe,xsafe |
| 142 | @end example |
| 143 | |
| 144 | to exclude the cases where this move is a self atari (osafe) or would |
| 145 | be a self-atari for the opponent (xsafe). |
| 146 | |
| 147 | It may also be interesting to see the effect of capturing stones. A |
| 148 | catch-all pattern for captures would be |
| 149 | |
| 150 | @example |
| 151 | ?X% |
| 152 | ?*% |
| 153 | %%% |
| 154 | |
| 155 | :10,ocap1,osafe |
| 156 | :20,ocap2 |
| 157 | :30,ocap3 |
| 158 | @end example |
| 159 | |
| 160 | where we have used multiple colon lines to specify different move |
| 161 | values depending on the number of captured stones; value 10 for a |
| 162 | single captured stone, value 20 for two captured stones, and value 30 |
| 163 | for three or more captured stones. Here we also excluded self-atari |
| 164 | moves in the case of 1 captured stone in order to avoid getting stuck |
| 165 | in triple-ko in the playouts (there's no superko detection in the |
| 166 | playouts). |
| 167 | |
| 168 | The full set of pattern properties is as follows: |
| 169 | |
| 170 | @ftable @code |
| 171 | @item near |
| 172 | The move is "near" the previous move. |
| 173 | |
| 174 | @item far |
| 175 | The move is not "near" the previous move. |
| 176 | |
| 177 | @item osafe |
| 178 | The move is not a self-atari. |
| 179 | |
| 180 | @item ounsafe |
| 181 | The move is a self-atari. |
| 182 | |
| 183 | @item xsafe |
| 184 | The move would not be a self-atari for the opponent. |
| 185 | |
| 186 | @item xunsafe |
| 187 | The move would be a self-atari for the opponent. |
| 188 | |
| 189 | @item xsuicide |
| 190 | The move would be suicide for the opponent |
| 191 | |
| 192 | @item xnosuicide |
| 193 | The move would not be suicide for the opponent. |
| 194 | |
| 195 | @item ocap0 |
| 196 | The move captures zero stones. |
| 197 | |
| 198 | @item ocap1 |
| 199 | The move captures one stone. |
| 200 | |
| 201 | @item ocap2 |
| 202 | The move captures two stones. |
| 203 | |
| 204 | @item ocap3 |
| 205 | The move captures three or more stones. |
| 206 | |
| 207 | @item ocap1+ |
| 208 | The move captures one or more stones. |
| 209 | |
| 210 | @item ocap1- |
| 211 | The move captures at most one stone. |
| 212 | |
| 213 | @item ocap2+ |
| 214 | The move captures two or more stones. |
| 215 | |
| 216 | @item ocap2- |
| 217 | The move captures at most two stones. |
| 218 | |
| 219 | @item xcap0 |
| 220 | An opponent move would capture zero stones. |
| 221 | |
| 222 | @item xcap1 |
| 223 | An opponent move would capture one stone. |
| 224 | |
| 225 | @item xcap2 |
| 226 | An opponent move would capture two stones. |
| 227 | |
| 228 | @item xcap3 |
| 229 | An opponent move would capture three or more stones. |
| 230 | |
| 231 | @item xcap1+ |
| 232 | An opponent move would capture one or more stones. |
| 233 | |
| 234 | @item xcap1- |
| 235 | An opponent move would capture at most one stone. |
| 236 | |
| 237 | @item xcap2+ |
| 238 | An opponent move would capture two or more stones. |
| 239 | |
| 240 | @item xcap2- |
| 241 | An opponent move would capture at most two stones. |
| 242 | @end ftable |
| 243 | |
| 244 | These can be combined arbitrarily but all must be satisfied for the |
| 245 | pattern to take effect. If contradictory properties are combined, the |
| 246 | pattern will never match. |
| 247 | |
| 248 | @section Final Remarks |
| 249 | |
| 250 | @itemize |
| 251 | @item Move values are unsigned 32-bit integers. To avoid overflow in |
| 252 | computations it is highly recommended to keep the values below |
| 253 | 10000000 or so. |
| 254 | @item There is no speed penalty for having lots of patterns in the |
| 255 | database. The average time per move is approximately constant |
| 256 | (slightly dependent on how often stones are captured or become low |
| 257 | on liberties) and the time per game mostly depends on the average |
| 258 | game length. |
| 259 | @item For more complex pattern databases, see |
| 260 | @file{patterns/mc_montegnu_classic.db} and @file{patterns/mc_mogo_classic.db}. |
| 261 | @end itemize |
| 262 | |
| 263 | Nobody really knows how to tune the random playouts to get as strong |
| 264 | engine as possible. Please play with this and report any interesting |
| 265 | findings, especially if you're able to make it substantially stronger |
| 266 | than the @file{montegnu_classic} patterns. |