[sgk-go] / doc / dfa.texi

In this chapter, we describe the principles of the GNU Go DFA
pattern matcher.  The aim of this system is to permit a fast
pattern matching when it becomes time critical like in owl
module (@ref{The Owl Code}). Since GNU Go 3.2, this is enabled
by default. You can still get back the traditional pattern matcher
by running @command{configure --disable-dfa} and then recompiling
GNU Go. 

Otherwise, a finite state machine called a Deterministic Finite
State Automaton (@ref{What is a DFA}) will be built off line from the
pattern database. This is used at runtime to speedup pattern matching
(@ref{Pattern matching with DFA} and @ref{Incremental Algorithm}). The
runtime speedup is at the cost of an increase in memory use and
compile time.


@menu
* Introduction to the DFA::  Scanning the board along a path
* What is a DFA:: A recall of language theory.
* Pattern matching with DFA:: How to retrieve go patterns with a DFA?
* Building the DFA:: Playing with explosives.
* Incremental Algorithm:: The joy of determinism.
* DFA Optimizations:: Some possible optimizations.
@end menu


@cindex pattern matching
@cindex fast pattern matching
@cindex pattern database
@cindex finite state automaton
@cindex automaton
@cindex dfa
@cindex dfa.h
@cindex dfa.c
@cindex matchpat.c
@cindex product


@node Introduction to the DFA
@section Introduction to the DFA

The general idea is as follows:

For each intersection of the board, its neighbourhood is scanned following
a predefined path.  The actual path used does not matter very much; GNU Go
uses a spiral as shown below.

@ifinfo
@example
  +---B--------------+
  | C 4 A . . . . . .|
  D 5 1 3 9 . . . . .|
  E 6 2 8 . . X . . .|
  | F 7 . . . . . . .|
  | . +-> . . . . . .|
  | . . . . . . . . .|
  | . O . . . X . . .|
  | . . . . . . . . .|
  | . . . . . . . . .|
  +------------------+
@end example
@end ifinfo
@ifnotinfo
@image{path}
@end ifnotinfo

In each step of the path, the pattern matcher jumps into a state
determined by what it has found on the board so far. If we have
successfully matched one or several patterns in this step, this state
immediately tells us so (in its @dfn{attribute}).
But the state also implicitly encodes which further patterns can still
get matched: The information stored in the state contains in which state
to jump next, depending on whether we find a black, white or empty
intersection (or an intersection out of board) in the next step of the
path. The state will also immediately tell us if we cannot find any
further pattern (by telling us to jump into the @dfn{error} state).

These sloppy explanations may become clearer with the definitions in
the next section (@ref{What is a DFA}).

Reading the board following a predefined path reduces
the two dimentional pattern matching to a linear text searching problem.
For example, this pattern

@example
?X?
.O?
?OO
@end example

@noindent
scanned following the path

@example
 B
C4A
5139
628
 7
@end example

@noindent
gives the string @b{"OO?X.?*O*?*?"}
where @b{"?"} means @b{'don't care'} and @b{"*"} means @b{'don't care, can
even be out of board'}.

So we can forget that we are dealing with two dimensional patterns and
consider linear patterns.


@node What is a DFA
@section What is a DFA

The acronym DFA means Deterministic Finite state Automaton
(See @uref{http://www.eti.pg.gda.pl/~jandac/thesis/node12.html}
or @cite{Hopcroft & Ullman "Introduction to Language Theory"}
for more details).
DFA are common tools in compilers design 
(Read @cite{Aho, Ravi Sethi, Ullman "COMPILERS: Principles, Techniques and Tools"}
for a complete introduction), a lot of
powerfull text searching algorithm like @cite{Knuth-Morris-Pratt}
or @cite{Boyer-Moore} algorithms are based on DFA's
(See @uref{http://www-igm.univ-mlv.fr/~lecroq/string/} for a bibliography
of pattern matching algorithms).

Basically, 
a DFA is a set of @dfn{states} connected by labeled @dfn{transitions}.
The labels are the values read on the board,
in GNU Go these values are EMPTY, WHITE, BLACK or OUT_BOARD, denoted
respectively by '.','O','X' and '#'.


The best way to represent a DFA is to draw its transition graph:
the pattern @b{"????..X"} is recognized by the following DFA:

@ifinfo
@example
@group
   .,X,O     .,X,O    .,X,O    .,X,O     .      .      X
[1]------>[2]----->[3]----->[4]----->[5]--->[6]--->[7]--->[8 OK!]
Start
@end group
@end example
@end ifinfo

@ifnotinfo
@image{dfa}
@end ifnotinfo


This means that
starting from state [1], if you read '.','X' or 'O' on the board, 
go to state [2] and so on until you reach state [5].
From state [5], if you read '.', go to state [6]
otherwise go to error state [0]. 
And so on until you reach state [8].
As soon as you reach state [8], 
you recognize Pattern @b{"????..X"}

Adding a pattern like @b{"XXo"} ('o' is a wildcard for not 'X')
will transform directly the automaton 
by synchronization product (@ref{Building the DFA}).
Consider the following DFA:

@ifinfo
@example
@group
Start .,O   .,X,O    .,O,X   .,X,O      .      .       X
[1]---->[2]----->[3]----->[4]------>[5]--->[6]---->[7]--->[8 OK!]
 |                ^        ^         ^
 |            .,O |        |         |
 |            ----         |         |
 |           |          X  |         |
 |           |          ---    .,X,O |
 |           |         |             |
 |     X     |   X     | O,.         |
  --------->[9]------>[A]--->[B OK!]-
@end group
@end example
@end ifinfo

@ifnotinfo
@image{dfa2}
@end ifnotinfo

By adding a special @dfn{error} state and completing each state
by a transition to error state when there is none, we transform
easily a DFA in a @dfn{Complete Deterministic Finite state
Automaton} (CDFA).  The synchronization product
(@ref{Building the DFA}) is only possible on CDFA's.


@ifinfo
@example
@group
Start .,O   .,X,O    .,O,X   .,X,O      .      .       X
[1]---->[2]----->[3]----->[4]------>[5]--->[6]---->[7]--->[8 OK!]
 |                ^        ^         ^      |       |      |
 |            .,O |        |         |      |       |      |
 |            ----         |         |      |       |      |
 |           |          X  |         |      |X,O    | .,O  |X,.,O
 |           |          ---    .,X,O |      |       |      |
 |           |         |             |      |       |      |
 |     X     |   X     | O,.         |     \ /     \ /    \ /
  --------->[9]------>[A]--->[B OK!]-      [0  Error state !]
@end group
@end example
@end ifinfo
@ifnotinfo
@image{cdfa}
@end ifnotinfo

The graph of a CDFA is coded by an array of states:
The 0 state is the "error" state and
the start state is 1.

@example
@group
----------------------------------------------------
 state  |   .    |   O    |   X    |   #    |  att
----------------------------------------------------
      1 |      2 |      2 |      9 |      0 |
      2 |      3 |      3 |      3 |      0 |
      3 |      4 |      4 |      4 |      0 |
      5 |      6 |      0 |      0 |      0 |
      6 |      7 |      0 |      0 |      0 |
      7 |      0 |      0 |      8 |      0 |
      8 |      0 |      0 |      0 |      0 | Found pattern "????..X"
      9 |      3 |      3 |      A |      0 |
      A |      B |      B |      4 |      0 |
      B |      5 |      5 |      5 |      0 | Found pattern "XXo"
----------------------------------------------------
@end group
@end example

To each state we associate an often empty
list of attributes which is the  
list of pattern indexes recognized when this state is reached.
In '@file{dfa.h}' this is basically represented by two stuctures:

@example
@group
@code{
/* dfa state */
typedef struct state
@{
  int next[4]; /* transitions for EMPTY, BLACK, WHITE and OUT_BOARD */
  attrib_t *att;
@}
state_t;

/* dfa */
typedef struct dfa
@{
  attrib_t *indexes; /* Array of pattern indexes */
  int maxIndexes;

  state_t *states; /* Array of states */
  int maxStates;
@}
dfa_t;}
@end group
@end example

@node Pattern matching with DFA
@section Pattern matching with DFA

Recognizing with a DFA is very simple
and thus very fast 
(See '@code{scan_for_pattern()}' in the '@file{engine/matchpat.c}' file).


Starting from the start state, we only need to read the board
following the spiral path, jump from states to states following
the transitions labelled by the values read on the board and
collect the patterns indexes on the way.  If we reach the error
state (zero), it means that no more patterns will be matched.
The worst case complexity of this algorithm is o(m) where m is
the size of the biggest pattern.

Here is an example of scan:

First we build a minimal DFA recognizing these patterns:
@b{"X..X"}, @b{"X???"}, @b{"X.OX"} and @b{"X?oX"}.
Note that wildcards like '?','o', or 'x' give multiple out-transitions.

@example
@group
----------------------------------------------------
 state  |   .    |   O    |   X    |   #    |  att
----------------------------------------------------
      1 |      0 |      0 |      2 |      0 |
      2 |      3 |     10 |     10 |      0 |
      3 |      4 |      7 |      9 |      0 |
      4 |      5 |      5 |      6 |      0 |
      5 |      0 |      0 |      0 |      0 |    2
      6 |      0 |      0 |      0 |      0 |    4    2    1
      7 |      5 |      5 |      8 |      0 |
      8 |      0 |      0 |      0 |      0 |    4    2    3
      9 |      5 |      5 |      5 |      0 |
     10 |     11 |     11 |      9 |      0 |
     11 |      5 |      5 |     12 |      0 |
     12 |      0 |      0 |      0 |      0 |    4    2
----------------------------------------------------
@end group
@end example

We perform the scan of the string 
@b{"X..XXO...."} starting from state 1:

Current state:     1, substring to scan :@b{ X..XXO....}

We read an 'X' value, so from state 1 we must go to
state 2.

Current state:     2, substring to scan : @b{..XXO....}

We read a '.' value, so from state 2 we must go to
state 3 and so on ...

@example
Current state:     3, substring to scan : .XXO....
Current state:     4, substring to scan : XXO....
Current state:     6, substring to scan : XO....
Found pattern 4
Found pattern 2
Found pattern 1                 
@end example

After reaching state 6 where we match patterns
1,2 and 4, there is no out-transitions so we stop the matching.
To keep the same match order as in the standard algorithm, 
the patterns indexes are collected in an array and sorted by indexes.

@node Building the DFA
@section Building the DFA

The most flavouring point is the building of the minimal DFA 
recognizing a given set of patterns.
To perform the insertion of a new
pattern into an already existing DFA one must completly rebuild
the DFA: the principle is to build the minimal CDFA recognizing
the new pattern to replace the original CDFA with its
@dfn{synchronised product} by the new one.

We first give a formal definition:
Let @b{L} be the left CDFA and @b{R} be the right one.
Let @b{B} be the @dfn{synchronised product} of @b{L} by @b{R}.
Its states are the couples @b{(l,r)} where @b{l} is a state of @b{L} and 
@b{r} is a state of @b{R}.
The state @b{(0,0)} is the error state of @b{B} and the state
@b{(1,1)} is its initial state.
To each couple @b{(l,r)} we associate the union of patterns
recognized in both @b{l} and @b{r}.
The transitions set of @b{B} is the set of transitions
@b{(l1,r1)---a--->(l2,r2)} for
each symbol @b{'a'} such that both @b{l1---a--->l2} in @b{L} 
and @b{r1---a--->r2} in @b{R}.

The maximal number of states of @b{B} is the product of the
number of states of @b{L} and @b{R} but almost all this states
are non reachable from the initial state @b{(1,1)}.

The algorithm used in function '@code{sync_product()}' builds
the minimal product DFA only by keeping the reachable states.
It recursively scans the product CDFA by following simultaneously
the transitions of @b{L} and @b{R}. A hast table
(@code{gtest}) is used to check if a state @b{(l,r)} has
already been reached, the reachable states are remapped on
a new DFA. The CDFA thus obtained is minimal and recognizes the
union of the two patterns sets.

@ifnotinfo
For example these two CDFA's:


@image{sync-prod1}

Give by synchronization product the following one:


@image{sync-prod2}
@end ifnotinfo

It is possible to construct a special pattern database that
generates an "explosive" automaton: the size of the DFA is in
the worst case exponential in the number of patterns it
recognizes.  But it doesn't occur in pratical situations: 
the DFA size tends to be @dfn{stable}.  By @dfn{stable} we mean that if we
add a pattern which greatly increases the size of the DFA it
also increases the chance that the next added pattern does not
increase its size at all.  Nevertheless there are many ways to
reduce the size of the DFA. Good compression methods are
explained in @cite{Aho, Ravi Sethi, Ullman "COMPILERS:
Principles, Techniques and Tools" chapter Optimization of
DFA-based pattern matchers}.

@node Incremental Algorithm
@section Incremental Algorithm

The incremental version of the DFA pattern matcher is not yet implemented
in GNU Go but we explain here how it will work.
By definition of a deterministic automaton, scanning the same
string will reach the same states every time.

Each reached state during pattern matching is stored in a stack
@code{top_stack[i][j]} and @code{state_stack[i][j][stack_idx]}
We use one stack by intersection @code{(i,j)}.  A precomputed reverse
path list allows to know for each couple of board intersections
@code{(x,y)} its position @code{reverse(x,y)} in the spiral scan path
starting from @code{(0,0)}.

When a new stone is put on the board at @code{(lx,ly)}, the only work
of the pattern matcher is:

@example
@group
@code{
 for(each stone on the board at (i,j))
    if(reverse(lx-i,ly-j) < top_stack[i][j])
      @{
         begin the dfa scan from the state
         state_stack[i][j][reverse(lx-i,ly-j)];
      @}
}
@end group
@end example

In most situations reverse(lx-i,ly-j) will be inferior to
top_stack[i][j]. This should speedup a lot pattern matching.

@node DFA Optimizations
@section Some DFA Optimizations

The DFA is constructed to minimize jumps in memory making some
assumptions about the frequencies of the values: the EMPTY
value is supposed to appear often on the board, so the the '.'
transition are almost always successors in memory.  The
OUT_BOARD are supposed to be rare, so '#' transitions will
almost always imply a big jump.
Commit	Line	Data
7eeb782e AT	1	In this chapter, we describe the principles of the GNU Go DFA
	2	pattern matcher. The aim of this system is to permit a fast
	3	pattern matching when it becomes time critical like in owl
	4	module (@ref{The Owl Code}). Since GNU Go 3.2, this is enabled
	5	by default. You can still get back the traditional pattern matcher
	6	by running @command{configure --disable-dfa} and then recompiling
	7	GNU Go.
	8
	9	Otherwise, a finite state machine called a Deterministic Finite
	10	State Automaton (@ref{What is a DFA}) will be built off line from the
	11	pattern database. This is used at runtime to speedup pattern matching
	12	(@ref{Pattern matching with DFA} and @ref{Incremental Algorithm}). The
	13	runtime speedup is at the cost of an increase in memory use and
	14	compile time.
	15
	16
	17	@menu
	18	* Introduction to the DFA:: Scanning the board along a path
	19	* What is a DFA:: A recall of language theory.
	20	* Pattern matching with DFA:: How to retrieve go patterns with a DFA?
	21	* Building the DFA:: Playing with explosives.
	22	* Incremental Algorithm:: The joy of determinism.
	23	* DFA Optimizations:: Some possible optimizations.
	24	@end menu
	25
	26
	27	@cindex pattern matching
	28	@cindex fast pattern matching
	29	@cindex pattern database
	30	@cindex finite state automaton
	31	@cindex automaton
	32	@cindex dfa
	33	@cindex dfa.h
	34	@cindex dfa.c
	35	@cindex matchpat.c
	36	@cindex product
	37
	38
	39	@node Introduction to the DFA
	40	@section Introduction to the DFA
	41
	42	The general idea is as follows:
	43
	44	For each intersection of the board, its neighbourhood is scanned following
	45	a predefined path. The actual path used does not matter very much; GNU Go
	46	uses a spiral as shown below.
	47
	48	@ifinfo
	49	@example
	50	+---B--------------+
	51	\| C 4 A . . . . . .\|
	52	D 5 1 3 9 . . . . .\|
	53	E 6 2 8 . . X . . .\|
	54	\| F 7 . . . . . . .\|
	55	\| . +-> . . . . . .\|
	56	\| . . . . . . . . .\|
	57	\| . O . . . X . . .\|
	58	\| . . . . . . . . .\|
	59	\| . . . . . . . . .\|
	60	+------------------+
	61	@end example
	62	@end ifinfo
	63	@ifnotinfo
	64	@image{path}
65	@end ifnotinfo
66
67	In each step of the path, the pattern matcher jumps into a state
68	determined by what it has found on the board so far. If we have
69	successfully matched one or several patterns in this step, this state
70	immediately tells us so (in its @dfn{attribute}).
71	But the state also implicitly encodes which further patterns can still
72	get matched: The information stored in the state contains in which state
73	to jump next, depending on whether we find a black, white or empty
74	intersection (or an intersection out of board) in the next step of the
75	path. The state will also immediately tell us if we cannot find any
76	further pattern (by telling us to jump into the @dfn{error} state).
77
78	These sloppy explanations may become clearer with the definitions in
79	the next section (@ref{What is a DFA}).
80
81	Reading the board following a predefined path reduces
82	the two dimentional pattern matching to a linear text searching problem.
83	For example, this pattern
84
85	@example
86	?X?
87	.O?
88	?OO
89	@end example
90
91	@noindent
92	scanned following the path
93
94	@example
95	B
96	C4A
97	5139
98	628
99	7
100	@end example
101
102	@noindent
103	gives the string @b{"OO?X.?O?*?"}
104	where @b{"?"} means @b{'don't care'} and @b{"*"} means @b{'don't care, can
105	even be out of board'}.
106
107	So we can forget that we are dealing with two dimensional patterns and
108	consider linear patterns.
109
110
111	@node What is a DFA
112	@section What is a DFA
113
114	The acronym DFA means Deterministic Finite state Automaton
115	(See @uref{http://www.eti.pg.gda.pl/~jandac/thesis/node12.html}
116	or @cite{Hopcroft & Ullman "Introduction to Language Theory"}
117	for more details).
118	DFA are common tools in compilers design
119	(Read @cite{Aho, Ravi Sethi, Ullman "COMPILERS: Principles, Techniques and Tools"}
120	for a complete introduction), a lot of
121	powerfull text searching algorithm like @cite{Knuth-Morris-Pratt}
122	or @cite{Boyer-Moore} algorithms are based on DFA's
123	(See @uref{http://www-igm.univ-mlv.fr/~lecroq/string/} for a bibliography
124	of pattern matching algorithms).
125
126	Basically,
127	a DFA is a set of @dfn{states} connected by labeled @dfn{transitions}.
128	The labels are the values read on the board,
129	in GNU Go these values are EMPTY, WHITE, BLACK or OUT_BOARD, denoted
130	respectively by '.','O','X' and '#'.
131
132
133	The best way to represent a DFA is to draw its transition graph:
134	the pattern @b{"????..X"} is recognized by the following DFA:
135
136	@ifinfo
137	@example
138	@group
139	.,X,O .,X,O .,X,O .,X,O . . X
140	[1]------>[2]----->[3]----->[4]----->[5]--->[6]--->[7]--->[8 OK!]
141	Start
142	@end group
143	@end example
144	@end ifinfo
145
146	@ifnotinfo
147	@image{dfa}
148	@end ifnotinfo
149
150
151	This means that
152	starting from state [1], if you read '.','X' or 'O' on the board,
153	go to state [2] and so on until you reach state [5].
154	From state [5], if you read '.', go to state [6]
155	otherwise go to error state [0].
156	And so on until you reach state [8].
157	As soon as you reach state [8],
158	you recognize Pattern @b{"????..X"}
159
160	Adding a pattern like @b{"XXo"} ('o' is a wildcard for not 'X')
161	will transform directly the automaton
162	by synchronization product (@ref{Building the DFA}).
163	Consider the following DFA:
164
165	@ifinfo
166	@example
167	@group
168	Start .,O .,X,O .,O,X .,X,O . . X
169	[1]---->[2]----->[3]----->[4]------>[5]--->[6]---->[7]--->[8 OK!]
170	\| ^ ^ ^
171	\| .,O \| \| \|
172	\| ---- \| \|
173	\| \| X \| \|
174	\| \| --- .,X,O \|
175	\| \| \| \|
176	\| X \| X \| O,. \|
177	--------->[9]------>[A]--->[B OK!]-
178	@end group
179	@end example
180	@end ifinfo
181
182	@ifnotinfo
183	@image{dfa2}
184	@end ifnotinfo
185
186	By adding a special @dfn{error} state and completing each state
187	by a transition to error state when there is none, we transform
188	easily a DFA in a @dfn{Complete Deterministic Finite state
189	Automaton} (CDFA). The synchronization product
190	(@ref{Building the DFA}) is only possible on CDFA's.
191
192
193	@ifinfo
194	@example
195	@group
196	Start .,O .,X,O .,O,X .,X,O . . X
197	[1]---->[2]----->[3]----->[4]------>[5]--->[6]---->[7]--->[8 OK!]
198	\| ^ ^ ^ \| \| \|
199	\| .,O \| \| \| \| \| \|
200	\| ---- \| \| \| \| \|
201	\| \| X \| \| \|X,O \| .,O \|X,.,O
202	\| \| --- .,X,O \| \| \| \|
203	\| \| \| \| \| \| \|
204	\| X \| X \| O,. \| \ / \ / \ /
205	--------->[9]------>[A]--->[B OK!]- [0 Error state !]
206	@end group
207	@end example
208	@end ifinfo
209	@ifnotinfo
210	@image{cdfa}
211	@end ifnotinfo
212
213	The graph of a CDFA is coded by an array of states:
214	The 0 state is the "error" state and
215	the start state is 1.
216
217	@example
218	@group
219	----------------------------------------------------
220	state \| . \| O \| X \| # \| att
221	----------------------------------------------------
222	1 \| 2 \| 2 \| 9 \| 0 \|
223	2 \| 3 \| 3 \| 3 \| 0 \|
224	3 \| 4 \| 4 \| 4 \| 0 \|
225	5 \| 6 \| 0 \| 0 \| 0 \|
226	6 \| 7 \| 0 \| 0 \| 0 \|
227	7 \| 0 \| 0 \| 8 \| 0 \|
228	8 \| 0 \| 0 \| 0 \| 0 \| Found pattern "????..X"
229	9 \| 3 \| 3 \| A \| 0 \|
230	A \| B \| B \| 4 \| 0 \|
231	B \| 5 \| 5 \| 5 \| 0 \| Found pattern "XXo"
232	----------------------------------------------------
233	@end group
234	@end example
235
236	To each state we associate an often empty
237	list of attributes which is the
238	list of pattern indexes recognized when this state is reached.
239	In '@file{dfa.h}' this is basically represented by two stuctures:
240
241	@example
242	@group
243	@code{
244	/* dfa state */
245	typedef struct state
246	@{
247	int next[4]; /* transitions for EMPTY, BLACK, WHITE and OUT_BOARD */
248	attrib_t *att;
249	@}
250	state_t;
251
252	/* dfa */
253	typedef struct dfa
254	@{
255	attrib_t indexes; / Array of pattern indexes */
256	int maxIndexes;
257
258	state_t states; / Array of states */
259	int maxStates;
260	@}
261	dfa_t;}
262	@end group
263	@end example
264
265	@node Pattern matching with DFA
266	@section Pattern matching with DFA
267
268	Recognizing with a DFA is very simple
269	and thus very fast
270	(See '@code{scan_for_pattern()}' in the '@file{engine/matchpat.c}' file).
271
272
273	Starting from the start state, we only need to read the board
274	following the spiral path, jump from states to states following
275	the transitions labelled by the values read on the board and
276	collect the patterns indexes on the way. If we reach the error
277	state (zero), it means that no more patterns will be matched.
278	The worst case complexity of this algorithm is o(m) where m is
279	the size of the biggest pattern.
280
281	Here is an example of scan:
282
283	First we build a minimal DFA recognizing these patterns:
284	@b{"X..X"}, @b{"X???"}, @b{"X.OX"} and @b{"X?oX"}.
285	Note that wildcards like '?','o', or 'x' give multiple out-transitions.
286
287	@example
288	@group
289	----------------------------------------------------
290	state \| . \| O \| X \| # \| att
291	----------------------------------------------------
292	1 \| 0 \| 0 \| 2 \| 0 \|
293	2 \| 3 \| 10 \| 10 \| 0 \|
294	3 \| 4 \| 7 \| 9 \| 0 \|
295	4 \| 5 \| 5 \| 6 \| 0 \|
296	5 \| 0 \| 0 \| 0 \| 0 \| 2
297	6 \| 0 \| 0 \| 0 \| 0 \| 4 2 1
298	7 \| 5 \| 5 \| 8 \| 0 \|
299	8 \| 0 \| 0 \| 0 \| 0 \| 4 2 3
300	9 \| 5 \| 5 \| 5 \| 0 \|
301	10 \| 11 \| 11 \| 9 \| 0 \|
302	11 \| 5 \| 5 \| 12 \| 0 \|
303	12 \| 0 \| 0 \| 0 \| 0 \| 4 2
304	----------------------------------------------------
305	@end group
306	@end example
307
308	We perform the scan of the string
309	@b{"X..XXO...."} starting from state 1:
310
311	Current state: 1, substring to scan :@b{ X..XXO....}
312
313	We read an 'X' value, so from state 1 we must go to
314	state 2.
315
316	Current state: 2, substring to scan : @b{..XXO....}
317
318	We read a '.' value, so from state 2 we must go to
319	state 3 and so on ...
320
321	@example
322	Current state: 3, substring to scan : .XXO....
323	Current state: 4, substring to scan : XXO....
324	Current state: 6, substring to scan : XO....
325	Found pattern 4
326	Found pattern 2
327	Found pattern 1
328	@end example
329
330	After reaching state 6 where we match patterns
331	1,2 and 4, there is no out-transitions so we stop the matching.
332	To keep the same match order as in the standard algorithm,
333	the patterns indexes are collected in an array and sorted by indexes.
334
335	@node Building the DFA
336	@section Building the DFA
337
338	The most flavouring point is the building of the minimal DFA
339	recognizing a given set of patterns.
340	To perform the insertion of a new
341	pattern into an already existing DFA one must completly rebuild
342	the DFA: the principle is to build the minimal CDFA recognizing
343	the new pattern to replace the original CDFA with its
344	@dfn{synchronised product} by the new one.
345
346	We first give a formal definition:
347	Let @b{L} be the left CDFA and @b{R} be the right one.
348	Let @b{B} be the @dfn{synchronised product} of @b{L} by @b{R}.
349	Its states are the couples @b{(l,r)} where @b{l} is a state of @b{L} and
350	@b{r} is a state of @b{R}.
351	The state @b{(0,0)} is the error state of @b{B} and the state
352	@b{(1,1)} is its initial state.
353	To each couple @b{(l,r)} we associate the union of patterns
354	recognized in both @b{l} and @b{r}.
355	The transitions set of @b{B} is the set of transitions
356	@b{(l1,r1)---a--->(l2,r2)} for
357	each symbol @b{'a'} such that both @b{l1---a--->l2} in @b{L}
358	and @b{r1---a--->r2} in @b{R}.
359
360	The maximal number of states of @b{B} is the product of the
361	number of states of @b{L} and @b{R} but almost all this states
362	are non reachable from the initial state @b{(1,1)}.
363
364	The algorithm used in function '@code{sync_product()}' builds
365	the minimal product DFA only by keeping the reachable states.
366	It recursively scans the product CDFA by following simultaneously
367	the transitions of @b{L} and @b{R}. A hast table
368	(@code{gtest}) is used to check if a state @b{(l,r)} has
369	already been reached, the reachable states are remapped on
370	a new DFA. The CDFA thus obtained is minimal and recognizes the
371	union of the two patterns sets.
372
373	@ifnotinfo
374	For example these two CDFA's:
375
376
377	@image{sync-prod1}
378
379	Give by synchronization product the following one:
380
381
382	@image{sync-prod2}
383	@end ifnotinfo
384
385	It is possible to construct a special pattern database that
386	generates an "explosive" automaton: the size of the DFA is in
387	the worst case exponential in the number of patterns it
388	recognizes. But it doesn't occur in pratical situations:
389	the DFA size tends to be @dfn{stable}. By @dfn{stable} we mean that if we
390	add a pattern which greatly increases the size of the DFA it
391	also increases the chance that the next added pattern does not
392	increase its size at all. Nevertheless there are many ways to
393	reduce the size of the DFA. Good compression methods are
394	explained in @cite{Aho, Ravi Sethi, Ullman "COMPILERS:
395	Principles, Techniques and Tools" chapter Optimization of
396	DFA-based pattern matchers}.
397
398	@node Incremental Algorithm
399	@section Incremental Algorithm
400
401	The incremental version of the DFA pattern matcher is not yet implemented
402	in GNU Go but we explain here how it will work.
403	By definition of a deterministic automaton, scanning the same
404	string will reach the same states every time.
405
406	Each reached state during pattern matching is stored in a stack
407	@code{top_stack[i][j]} and @code{state_stack[i][j][stack_idx]}
408	We use one stack by intersection @code{(i,j)}. A precomputed reverse
409	path list allows to know for each couple of board intersections
410	@code{(x,y)} its position @code{reverse(x,y)} in the spiral scan path
411	starting from @code{(0,0)}.
412
413	When a new stone is put on the board at @code{(lx,ly)}, the only work
414	of the pattern matcher is:
415
416	@example
417	@group
418	@code{
419	for(each stone on the board at (i,j))
420	if(reverse(lx-i,ly-j) < top_stack[i][j])
421	@{
422	begin the dfa scan from the state
423	state_stack[i][j][reverse(lx-i,ly-j)];
424	@}
425	}
426	@end group
427	@end example
428
429	In most situations reverse(lx-i,ly-j) will be inferior to
430	top_stack[i][j]. This should speedup a lot pattern matching.
431
432	@node DFA Optimizations
433	@section Some DFA Optimizations
434
435	The DFA is constructed to minimize jumps in memory making some
436	assumptions about the frequencies of the values: the EMPTY
437	value is supposed to appear often on the board, so the the '.'
438	transition are almost always successors in memory. The
439	OUT_BOARD are supposed to be rare, so '#' transitions will
440	almost always imply a big jump.
441