28
Mastering the game of Go with deep neural networks and tree search Speaker: San-Feng Chang er, D., Huang, A., Maddison, C.J., Guez, A., Sifre, L., van den Dri chrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., D Grewe, D., Nham, J., Kalchbrenner, N., Sutskever, I., Lillicrap, T., M., Kavukcuoglu, K., Graepel, T., and Hassabis, D. Nature, 529(7587):484–489, 2016. 06/24/2022 1

Mastering the game of go with deep neural networks and tree search

Embed Size (px)

Citation preview

Page 1: Mastering the game of go with deep neural networks and tree search

05/03/2023 1

Mastering the game of Go with deep neural networks and tree search

Speaker: San-Feng Chang

Silver, D., Huang, A., Maddison, C.J., Guez, A., Sifre, L., van den Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., Dieleman,

S., Grewe, D., Nham, J., Kalchbrenner, N., Sutskever, I., Lillicrap, T., Leach, M., Kavukcuoglu, K., Graepel, T., and Hassabis, D.

Nature, 529(7587):484–489, 2016.

Page 2: Mastering the game of go with deep neural networks and tree search

05/03/2023 2

Outline

• AI in Game Playing• Previous Work of Go Research• Architecture of AlphaGo• AlphaGo’s methods• The playing strength of AlphaGo• Conclusion

Page 3: Mastering the game of go with deep neural networks and tree search

05/03/2023 3

AI in Game Playing(1/3)

• Game-playing is a specific problem to measure the performance of an AI.

• One classification for outcomes of an AI test is:Optimal It is not possible to perform better

Strong super-human Performs better than all humans

Super-human Performs better than most humans

Sub-human Performs worse than most humans

Page 4: Mastering the game of go with deep neural networks and tree search

05/03/2023 4

AI in Game Playing(2/3)

Game Players Branching Factor Depth Length Complexity

ChessDeep Blue vs

Kasparov (1997)

35 80 35^80 ≈ 10^123

Go AlphaGo vs Lee Sedol (2016) 250 150 250^150≈

10^360

Evolution of Gaming Tree Search:

Brute Force Minmax &Alpha-Beta MCTS AlphaGo’s

Method

Page 5: Mastering the game of go with deep neural networks and tree search

05/03/2023 5

AI in Game Playing(3/3)

• Minmax & Alpha-Beta Pruning

The complexity is still too high.

https://upload.wikimedia.org/wikipedia/commons/thumb/9/91/AB_pruning.svg/1280px-AB_pruning.svg.png?1458451165542

Page 6: Mastering the game of go with deep neural networks and tree search

05/03/2023 6

Previous Work of Go Research (1/4)

• Monte Carlo rollouts search to maximum depth without branching at all, by sampling long sequences of actions for both players from a policy p.

• Monte Carlo tree search (MCTS) uses Monte Carlo rollouts to estimate the value of each state in a search tree.

Page 7: Mastering the game of go with deep neural networks and tree search

05/03/2023 7

Previous Work of Go Research (2/4)

• Monte Carlo Tree Search:

2/3

1/1 1/2

1/1 0/1

2/3

1/1 1/2

1/1 0/1

Selection(Randomly)

Expansion

0/0

Player 1

Player 2

Player 1

Page 8: Mastering the game of go with deep neural networks and tree search

05/03/2023 8

• Monte Carlo Tree Search:

Previous Work of Go Research (3/4)

2/3

1/1 1/2

1/1 0/1

Simulation

0/0

......

3/4

1/1 2/3

2/2 0/1

Back-Propagation

1/1

Player 1

Player 2

Player 1

Player 2

Page 9: Mastering the game of go with deep neural networks and tree search

05/03/2023 9

Previous Work of Go Research (4/4)

• The strongest current Go programs are based on MCTS, enhanced by policies that are trained to predict human expert moves.

• However, prior work has been limited to shallow policies or value functions based on a linear combination of input features.

Page 10: Mastering the game of go with deep neural networks and tree search

05/03/2023 10

Architecture of AlphaGo

Neural Network Training Pipeline

s: board positiona: legal moves

p(a|s): probability distributionv(s): scalar value

Two Brains

Human expert dataset: KGS server ~ 160,000 games

29.4 million positions

Page 11: Mastering the game of go with deep neural networks and tree search

05/03/2023 11

Convolution Neural Network(1/2)

A regular 3-layer Neural Network A convolutional neural network

Input volume of size: W1 x H1 x D1

Requires four hyperparameters: 1. Number of filters K (depth) 2. Spatial extent F (kernel size) 3. The stride S 4. The amount of zero padding P

Output volume size: W2 x H2 x D2

W2 = (W1 – F + 2P)/S + 1 H2 = (H1 – F + 2P)/S + 1 D2 = k• Parameter sharing: total weights: (F * F * D1) * K

http://cs231n.github.io/convolutional-networks/

Page 12: Mastering the game of go with deep neural networks and tree search

05/03/2023 12

Convolution Neural Network(2/2)

http://cs231n.github.io/convolutional-networks/

Number of filter K: 2Spatial extent F: 3 x 3Stride S: 2Zero padding P: 1

Page 13: Mastering the game of go with deep neural networks and tree search

05/03/2023 13

AlphaGo’s methods – Trained by Human Expert (1/6)

• Rollout Policy : – Using 2μs to select an action but only 24.2% accuracy

to predict expert moves correctly – Using a linear softmax of small pattern features with

weights π

p

n1

n2

n3

n1,in

n2,in

n3,in

ininin

in

nnn

n

out eeeen

,3,2,1

,1

,1

https://qph.fs.quoracdn.net/main-qimg-9e2d012ef7cb8b29d2bed14d2975c986

Page 14: Mastering the game of go with deep neural networks and tree search

05/03/2023 14

AlphaGo’s methods – Trained by Human Expert (2/6)

• SL policy :– Using 3ms to select an action and 57.0% accuracy

to predict expert moves correctly – Using 13 layers convolutional neural network with

weights σ

p

......

InputSize: 19*1948 planes

First layerConv + ReLU

Kernel size: 5 x 5

2nd~12th layers Conv + ReLU

Kernel size: 3 x 3

13th layers Kernel size: 1 x 1, 1 filter, softmax

Page 15: Mastering the game of go with deep neural networks and tree search

05/03/2023 15

AlphaGo’s methods – Reinforcement Learning pρ (3/6)

SL policypσ

Initialize Weightsρ = ρ- = σ

RL policypρ

pρ- pρ

Opponent pool

Play ...... End

r

rewardPolicy Gradient

Method

Add pρ to opponent pool

Page 16: Mastering the game of go with deep neural networks and tree search

05/03/2023 16

AlphaGo’s methods – Value Network vθ (4/6)

• Supervised Learning: – Used to estimate the positions’ winning rate at

current state– Using 15 layers CNN

......

InputSize: 19*1948 planes+1 unit(current color)

1st~13th layers The same as

RL Policy networks

15th layers Full-connected

1 tanh unit

14th layerFully-connected256 ReLU unit

Page 17: Mastering the game of go with deep neural networks and tree search

05/03/2023 17

AlphaGo’s methods – Value Network vθ (5/6)

• Randomly sample an integer U in 1 ~ 450– t = 1 ~ U-1 – Played by SL policy network pσ

– t = U – Random action– t = U+1 ~ End – Played by RL policy network pρ

• Reward • Only a single training example (sU+1, zU+1) is

added to the data set from each game.

Tt srz

Page 18: Mastering the game of go with deep neural networks and tree search

05/03/2023 18

AlphaGo’s methods – Searching (6/6)

• Q: Action Value Winning scores• u(P): Upper Confidence bound Exploration vs. Exploitation • P: Prior probability using pσ (SL performed better than RL)

More

Page 19: Mastering the game of go with deep neural networks and tree search

05/03/2023 19

The playing strength of AlphaGo

Page 20: Mastering the game of go with deep neural networks and tree search

05/03/2023 20

Conclusion

• Reaching a milestone is the beginning of the next milestone.

• Stay hungry, stay foolish!

Page 21: Mastering the game of go with deep neural networks and tree search

05/03/2023 21

References(1/2)

• Nature: – Mastering the game of Go with deep neural

networks and tree search• Mark Chang:– http://

www.slideshare.net/ckmarkohchang/alphago-in-depth

• CNN:– http://cs231n.github.io/convolutional-networks/

Page 23: Mastering the game of go with deep neural networks and tree search

05/03/2023 23

EndThank You

Page 24: Mastering the game of go with deep neural networks and tree search

05/03/2023 24

Formula(1/2)

• Policy Network: classification

• Policy Network: reinforcement learning

• Value Network: regression

m

k

kk sap

m 1

log

itit

n

i

i

t

it

it svzsap

n

1 1

log

km

kkk svsvz

m 1

Page 25: Mastering the game of go with deep neural networks and tree search

05/03/2023 25

Formula(2/2)• Searching:

asuasQa tta

t ,,maxarg

asNasPasu,1,,

n

i

iaslasN1

,,,

n

iLisViasl

asNasQ

1

,,,1,

l(s,a,i) indicates whether an edge (s,a) ith simulation

siL is the leaf node from ith simulation

LLL zsvsV 1

Back

asNbsN

asPcasu b rpuct ,1

,,,

Page 26: Mastering the game of go with deep neural networks and tree search

05/03/2023 26

How AlphaGo selected its move

Page 27: Mastering the game of go with deep neural networks and tree search

05/03/2023 27

The playing strength of AlphaGo(Bonus 1)

Page 28: Mastering the game of go with deep neural networks and tree search

05/03/2023 28

The playing strength of AlphaGo(Bonus 2)