git.subgeniuskitty.com - sgk-go/.git/blame_incremental

... / ...

Commit	Line	Data
	1	@cindex Monte Carlo Go
	2	@cindex UCT algorithm
	3	In Monte Carlo Go the engine plays random games to the
	4	end, generating moves from a pattern database within
	5	the context of the algorithm UCT (upper confidence
	6	bounds applied to trees). This algorithm allowed the
	7	program MoGo (@uref{http://www.lri.fr/~gelly/MoGo.htm},
	8	to become the first computer program to defeat a
	9	professional while taking a 9 stone handicap
	10	(@uref{http://senseis.xmp.net/?MoGo}).
	11
	12	GNU Go 3.8 can play 9x9 Go with the option
	13	@option{--monte-carlo} using the UCT algorithm.
	14	For command line options, see @xref{Invoking GNU Go}.
	15
	16	During reading, the engine makes incremental updates
	17	of local 3x3 neighborhood, suicide status, self-atari
	18	status, and number of stones captured, for each move.
	19
	20	GNU Go's simulations (Monte Carlo games) are pattern generated.
	21	The random playout move generation is distributed
	22	strictly proportional to move values computed by table
	23	lookup from a local context consisting of 3x3
	24	neighborhood, opponent suicide status, own and
	25	opponent self-atari status, number of stones captured
	26	by own and opponent move, and closeness to the
	27	previous move. Let's call this local context simply "a
	28	pattern" and the table "pattern values" or simply
	29	"patterns".
	30
	31	There are three built-in databases that you can select
	32	using the option @option{--mc-patterns <name>}, where
	33	@option{<name>} is one of
	34
	35	@itemize
	36	@item @command{mc_montegnu_classic}
	37	@item @command{mc_mogo_classic}
	38	@item @command{mc_uniform}
	39	@end itemize
	40
	41	The first of these is an approximation of the previous random move
	42	generation algorithm. The @command{mogo_classic} pattern values is an
	43	approximation of the simulation policy used by early versions of MoGo,
	44	as published in the report @uref{http://hal.inria.fr/inria-00117266,
	45	odification of UCT with Patterns in Monte-Carlo Go}
	46	RR-6062, by Sylvain Gelly, Yizao Wang, Rémi Munos, and
	47	Olivier Teytaud. The uniform pattern values is the so
	48	called "light" playout which chooses uniformly between
	49	all legal moves except single point proper eyes.
	50
	51	If you're not satisfied with these you can also tune your own
	52	pattern values with a pattern database file and load it at runtime
	53	with @option{--mc-load-patterns <name>} adding your own
	54	pattern database.
	55
	56	Let's start with the uniform pattern values. Those are defined by the
	57	file @file{patterns/mc_uniform.db}, which looks like this:
	58
	59	@example
	60
	61	oOo
	62	O*O
	63	oO?
	64
	65	:0
	66
	67	oOo
	68	O*O
	69	---
	70
	71	:0
	72
	73	\|Oo
	74	\|*O
	75	+--
	76
	77	:0
	78	@end example
	79
	80	Patterns are always exactly 3x3 in size with the move at the center
	81	point. The symbols are the usual for GNU Go pattern databases:
	82
	83	@example
	84	* move
	85	O own stone (i.e. the same color as the color to move)
	86	o own stone or empty
	87	X opponent stone
	88	x opponent stone or empty
	89	? own stone, opponent stone, or empty
	90	\| vertical edge
	91	- horizontal edge
	92	+ corner
	93	@end example
	94
	95	There's also a new symbol:
	96
	97	@example
	98	% own stone, opponent stone, empty, or edge
	99	@end example
	100
	101	After the pattern comes a line starting with a colon. In all these
	102	patterns it says that the pattern has a move value of 0, i.e. must not
	103	be played. Unmatched patterns have a default value of 1. When all move
	104	values are zero for both players, the playout will stop. Including the
	105	three patterns above is important because otherwise the playouts would
	106	be likely to go on indefinitely, or as it actually happens be
	107	terminated at a hard-coded limit of 600 moves. Also place these
	108	patterns at the top of the database because when multiple patterns
	109	match, the first one is used, regardless of the values.
	110
	111	When using only these patterns you will probably notice that it plays
	112	rather heavy, trying hard to be solidly connected. This is because
	113	uniform playouts are badly biased with a high probability of non-solid
	114	connections being cut apart. To counter this you could try a pattern
	115	like
	116
	117	@example
	118	?X?
	119	O*O
	120	x.?
	121
	122	:20,near
	123	@end example
	124
	125	to increase the probability that the one-point jump is reinforced when
	126	threatened. Here we added the property "near", which means that the
	127	pattern only applies if the previous move was played "near" this move.
	128	Primarily "near" means within the surrounding 3x3 neighborhood but it
	129	also includes certain cases of liberties of low-liberty strings
	130	adjacent to the previous move, e.g. the move to extend out of an atari
	131	created by the previous move. You have to read the source to find out
	132	the exact rules for nearness.
	133
	134	We could also be even more specific and say
	135
	136	@example
	137	?X?
	138	O*O
	139	x.?
	140
	141	:20,near,osafe,xsafe
	142	@end example
	143
	144	to exclude the cases where this move is a self atari (osafe) or would
	145	be a self-atari for the opponent (xsafe).
	146
	147	It may also be interesting to see the effect of capturing stones. A
	148	catch-all pattern for captures would be
	149
	150	@example
	151	?X%
	152	?*%
	153	%%%
	154
	155	:10,ocap1,osafe
	156	:20,ocap2
	157	:30,ocap3
	158	@end example
	159
	160	where we have used multiple colon lines to specify different move
	161	values depending on the number of captured stones; value 10 for a
	162	single captured stone, value 20 for two captured stones, and value 30
	163	for three or more captured stones. Here we also excluded self-atari
	164	moves in the case of 1 captured stone in order to avoid getting stuck
	165	in triple-ko in the playouts (there's no superko detection in the
	166	playouts).
	167
	168	The full set of pattern properties is as follows:
	169
	170	@ftable @code
	171	@item near
	172	The move is "near" the previous move.
	173
	174	@item far
	175	The move is not "near" the previous move.
	176
	177	@item osafe
	178	The move is not a self-atari.
	179
	180	@item ounsafe
	181	The move is a self-atari.
	182
	183	@item xsafe
	184	The move would not be a self-atari for the opponent.
	185
	186	@item xunsafe
	187	The move would be a self-atari for the opponent.
	188
	189	@item xsuicide
	190	The move would be suicide for the opponent
	191
	192	@item xnosuicide
	193	The move would not be suicide for the opponent.
	194
	195	@item ocap0
	196	The move captures zero stones.
	197
	198	@item ocap1
	199	The move captures one stone.
	200
	201	@item ocap2
	202	The move captures two stones.
	203
	204	@item ocap3
	205	The move captures three or more stones.
	206
	207	@item ocap1+
	208	The move captures one or more stones.
	209
	210	@item ocap1-
	211	The move captures at most one stone.
	212
	213	@item ocap2+
	214	The move captures two or more stones.
	215
	216	@item ocap2-
	217	The move captures at most two stones.
	218
	219	@item xcap0
	220	An opponent move would capture zero stones.
	221
	222	@item xcap1
	223	An opponent move would capture one stone.
	224
	225	@item xcap2
	226	An opponent move would capture two stones.
	227
	228	@item xcap3
	229	An opponent move would capture three or more stones.
	230
	231	@item xcap1+
	232	An opponent move would capture one or more stones.
	233
	234	@item xcap1-
	235	An opponent move would capture at most one stone.
	236
	237	@item xcap2+
	238	An opponent move would capture two or more stones.
	239
	240	@item xcap2-
	241	An opponent move would capture at most two stones.
	242	@end ftable
	243
	244	These can be combined arbitrarily but all must be satisfied for the
	245	pattern to take effect. If contradictory properties are combined, the
	246	pattern will never match.
	247
	248	@section Final Remarks
	249
	250	@itemize
	251	@item Move values are unsigned 32-bit integers. To avoid overflow in
	252	computations it is highly recommended to keep the values below
	253	10000000 or so.
	254	@item There is no speed penalty for having lots of patterns in the
	255	database. The average time per move is approximately constant
	256	(slightly dependent on how often stones are captured or become low
	257	on liberties) and the time per game mostly depends on the average
	258	game length.
	259	@item For more complex pattern databases, see
	260	@file{patterns/mc_montegnu_classic.db} and @file{patterns/mc_mogo_classic.db}.
	261	@end itemize
	262
	263	Nobody really knows how to tune the random playouts to get as strong
	264	engine as possible. Please play with this and report any interesting
	265	findings, especially if you're able to make it substantially stronger
	266	than the @file{montegnu_classic} patterns.