View
216
Download
0
Tags:
Embed Size (px)
Citation preview
AI techniques for the game of Go
Erik van der Werf
Universiteit Maastricht / ReSound Algorithm R&D
+9 +9 +3 +9 +9 +9 +9 -9 -3 +9 -1 +9
(+9) (-9)(+3)
(+9)
+9 +9 +3 +9 +9 +9 +9 -9 -3 +9 -1 +9
(+9) (-9)(+3)
(+9)
Contents
Introduction Searching techniques
The Capture Game Solving Go on Small Boards
Learning techniques Move Prediction Learning to Score Predicting Life & Death Estimating Potential Territory
Summary of results Conclusions
The game of Go
Deceivingly simple rules Black and White move in turns A move places a stone on the board Surrounded stones are captured Direct repetition is forbidden (Ko-rule) The game is over when both players pass The player controlling most intersections
wins
Some basic terminology
Block - connected stones of one colour (no diagonal connections)
Liberty
- adjacent empty intersection
Eye - surrounded region providing a safe liberty
Group - stones of one colour controlling a local region
Alive - group that cannot be captured
Dead - group that can eventually be captured
Computer Go
Even the best Go programs have no chance against strong amateurs
Human players superior in area’s such as pattern recognition spatial reasoning Learning
Computer programs
20 15 10 5 4 3 2 1
1 2 3 4 5 6 7 8 9
Kyu (student)
Dan
Level: weak strong
Master Professional
Handicap stones
Computer programs
20 15 10 5 4 3 2 1
1 2 3 4 5 6 7 8 9
Kyu (student)
Dan
Level: weak strong
Master Professional
Handicap stones
Playing strength
29 stones handicap
Problem statement
How can Artificial Intelligence techniques be used to improve the strength of Go
programs?
We focused on
Searching techniques & Learning techniques
Searching techniques
Very successful for other board games Evaluate positions by ‘thinking ahead’ Research
Recognizing positions ‘that are irrelevant’ Fast heuristic evaluations Provably correct knowledge Move ordering (the best moves first) Re-use of partial results from the search
process
The Capture Game
Simplified version of Go First to capture a stone wins the game Passing not allowed
Detecting final positions trivial (unlike normal Go)
Search method Iterative Deepening Principal Variation Search Enhanced transposition table Move ordering using shared tables for both
colours for killer and history heuristic
Heuristic evaluation for the capture game
Based on four principles:
1. Maximize liberties2. Maximize territory
3. Connect stones4. Make eyes
Low order liberties (max. distance 3)
Euler number (objects – holes)
Fast computation using a bit-board representation
Solutions for the Capture Game
All boards up to 5x5 were solved Winner decided by board-size parity
Will initiative take over at 6 x 6?
Board Winner
Depth Time (s)
Nodes (log10)
2 2 W 4 0 1.8
3 3 B 7 0 3.2
4 4 W 14 1 5.7
5 5 B 19 395 8.4
6 6 ? >23 >106 >12
Solution for 55 (Black wins)
Solution for 44 (White wins)
Solutions for the Capture Game on 6x6
Starting position Stable Crosscut
Winner Black Black
Depth 26 (+5) 15 (+4)
Nodes (log10) 11 8.0
Time (s) 8.3 105 (10 days) 185
Initiative takes over at 6 6
Solving Go on Small Boards
Iterative Deepening Principal Variation Search Enhanced transposition table Exploit board symmetry Internal unconditional bounds Effective move ordering
Evaluation function Heuristic component
Similar to the capture game Provably correct component
Benson’s algorithm for recognizing unconditional life extended with detection of unconditional territory
Recognizing Unconditional Territory
1. Find regions surrounded by unconditionally alive stones of one colour
2. Find interior of the regions (eyespace)3. Remove false eyes4. Contract eyespace around defender stones5. Count maximum sure liberties (MSL)
MSL<2 Unconditionally territory.Otherwise Play it out.
Solutions for Small Boards
Board Result Depth Time (s) Nodes (log10)
2 2 draw 5 n.a. 2.1
3 3 B+9 11 n.a. 3.5
4 4 B+2 21 3.3 (s) 5.8
5 5 B+25 23 2.7 (h) 9.2
Value of opening moves on 5x5
(3,2)(2,2) (3,3)
Learning techniques
Successful in several related domains Heuristic knowledge can be ‘learned’ from
analysis of human games Research
Representation & Generalization Learn maximally from limited number of
examples Pros and cons of different architectures Clever use of available domain knowledge
Move prediction
Many moves in Go conform to local patterns which can be played almost reflexively
Train a MLP network to rank moves Use move-pairs {expert , random} extracted from
human game records Training attempts to rank expert moves first
otherwise
vvvvError
erer
0
)( 2
expert move
ev
random move
rv
otherwise
vvvvError
erer
0
)( 2
expert move
ev
random move
rv
random move
rv
ROI surface (points)
Err
or (
%)
Move Prediction - Representation
Selection of raw features:• Edge• Liberties• Captures• Last move
• Stones• Ko• Liberties after• Nearest stones
Remove symmetry by canonical ordering & colour reversal
High-dimensional representation suffers from curse of dimensionality
=> Apply linear feature extraction to reduce dimensionality
Move Prediction - Feature Extraction
Principal Component Analysis (PCA) Linear Discriminant Analysis (LDA)
Move-Pair Analysis (MPA) Linear projection maximizing the expected quadratic
distance between pairs Weakness: ignores global features
Modified Eigenspace Separation Transform (MEST)
Linear projection on eigenvectors with largest absolute eigenvalues of the correlation difference matrix
Good results using combination of MEST & MPA
Standard techniques, sub-optimal for ranking
Human & Computer Performance Compared
Game 1
Game 2
Game 3
Average
3 dan 96.7 91.5 89.5 92.4
2 dan 95.8 95.0 97.0 95.9
2 kyu 95.0 91.5 92.5 92.9
MP* 90.0 89.4 89.5 89.6
2 kyu 87.5 90.8 n.a. 89.3
5 kyu 87.5 84.4 85.0 85.5
8 kyu 87.5 85.1 86.5 86.3
13 kyu 83.3 75.2 82.7 80.2
14 kyu 76.7 83.0 80.5 80.2
15 kyu 80.0 73.8 82.0 78.4
Black must choose between two red intersections
Performance on professional 19×19 games
Ranking
Perf.
First 25 %
Top-3 45 %
Top-20 80 %
moves
Cum
ulat
ive
perf
orm
ance
(%
)
Learning to Score
Using archives of (online) Go servers, such as NNGS, for ML is non-trivial because of :
1. Missing information: Only a single numeric result is given. The status of individual board-points is not available.
2. Unfinished games: Humans resign early or do not even finish the game at all
3. Bad moves
To overcome 1&2, we need reliable final scores Large dataset created: 18k labeled final 9x9
positions Several tricks were used to identify dubious scores A few thousand positions scored/verified manually
The scoring method
1. Classify life & death for all blocks2. Remove dead blocks3. Mark empty intersections using flood-
fills or distance to nearest remaining colour
4. (Optional) recursively update representation to take adjacent block status into account; return to 1
Blocks to Classify
For final positions there are 3 types of blocks:
1. Alive (O): at border of own territory2. Dead (X): inside the opponents territory3. Irrelevant (?): removal does not change area
score We only train on blocks of type 1 and 2 !
? ?
?
?
? ?
?
?
? ?
?
?
Representation of the blocks
Direct features of the block• Size• Perimeter• Adjacent opponent stones• 1st, 2nd, 3rd - order liberties• Protected liberties• Auto-atari liberties• Adjacent opponent blocks• Local majority (MD < 3)• Centre of mass• Bounding box size
Adjacent fully accessible CERs • Number of regions• Size• Perimeter• Split points
Adjacent partially accessible CERs• Number of partially accessible regions• Accessible size• Accessible perimeter• Inaccessible size • Inaccessible perimeter • Inaccessible split points
Disputed territory• Direct liberties of the block in disputed territory• Liberties of all friendly blocks in disputed
territory• Liberties of all enemy blocks in disputed
territory
Directly adjacent eyespace• Size• Perimeter
Optimistic chain• Number of blocks• Size• Perimeter• Split points• Adjacent CERs• Adjacent CERs with eyespace• Adjacent CERs, fully accessible from at least 1 block• Size of adjacent eyespace• Perimeter of adjacent eyespace• External opponent liberties
Opponent blocks (3x) (1) Weakest directly adjacent opponent block (weakest = block with the fewest
liberties) (2) 2nd weakest directly adjacent opponent block (3) Weakest opponent block adjacent or sharing liberties with the block’s optimistic
chain• Perimeter• Liberties• Shared liberties• Split points• Perimeter of adjacent eyespace
Recursive features• Predicted value of strongest adjacent friendly block• Predicted value of weakest adjacent opponent block• Predicted value of second weakest adjacent opponent block• Average predicted value of weakest opponent block’s optimistic chain• Adjacent eyespace size of the weakest opponent block’s optimistic chain• Adjacent eyespace perimeter of the weakest opponent block’s optimistic chain
Scoring Performance
Blocks (direct/recursive classification)
Training Size
(blocks)
Direct error
(%)
2-step error
(%)
3-step error
(%)
4-step error
(%)
1,000 1.93 1.60 1.52 1.48
10,000 1.09 0.76 0.74 0.72
100,000 0.68 0.43 0.38 0.37
Full board (4-step recursive classification) Incorrect score: 1.1% = better than the average rated NNGS
player (~7 kyu) Incorrect winner: 0.5% = comparable to the average NNGS player
Average absolute score difference: 0.15 points
Life & Death during the game
Predict whether blocks of stones can be captured
Perfect predictions not possible in non-final positions!
Approximate the a posteriori probability that a blockwill be alive at the end of the game
4 Block types First 3 types identified from final position (as before) 4th type: blocks captured during the game -> dead Irrelevant blocks not used during training!
Representation extended with 5 additional featuresPlayer to move, Ko , Distance to ko, Nr. of black/white stones on the
board
Black blocks50% alive
Performance over the game
MLP, 25 hidden units, 175,000 training examplesAverage prediction error: 11.7%
Estimating Potential Territory
Why estimate territory?1. For predicting the score (potential territory)
Main purpose: to build an evaluation function May also be used to adjust strategy (e.g., play safe when
ahead)
2. To detect safe regions (secure territory)Main purpose: forward pruning (risky unless provably correct)
Our main focus is on (1) potential territory
We investigate: Direct methods, known or derived from literature ML methods, trained on game records Enhancements with (heuristic) knowledge of L&D
Direct methods
1. Explicit control2. Direct control3. Distance-based control4. Influence based control (~ numerical
dilations)5. Bouzy’s method (numerical dilations +
erosions)6. Combinations 5+3 or 5+4
Enhancements use knowledge of Life & Death to remove dead stones (or reverse their colour)
ML methods
Simple representation Intersections in ROI:
Colour {+1 black, -1 white, 0 empty}
Enhanced representation Intersections in ROI:
Colour x Prob.(Alive) Edge Colour of nearest stone Colour of nearest living stone
Prob.(Alive) obtained from pre-trained MLP
predicted colour +1 sure black 0 neutral - 1 sure white
features
Summary: Searching Techniques
The capture game Simplified Go rules (who captures the first stone wins) boards up to 6x6 solved
Go on small boards Normal Go rules First program in the world to have solved 5x5 Go
Perfect solutions up to ~30 intersections Heuristic knowledge required for larger
boards
Summary: Learning Techniques 1
Move prediction Very good results (strong kyu level) Strong play is possible with limited
selection of moves
Scoring final positions Excellent classification Reliable training data
Summary: Learning Techniques 2
Predicting life and death Good results Most important ingredient for accurate
evaluation of positions during the game
Estimating potential territory Comparison of non-learning and learning
methods Best results with learning methods
Conclusions
Knowledge is the most important ingredient to improve Go programs
Searching techniques Provably correct knowledge sufficient for solving
small problems up to ~30 intersections Heuristic knowledge essential for larger problems
Learning techniques Heuristic knowledge learned quite well from
games Learned heuristic knowledge at least at the level
of reasonably strong kyu players