38
Mastering the Game of Go Without Human Knowledge Lead: Liam Hinzman Facilitators: Tahseen Shabab and Susan Cheng

Mastering the Game of Go Without Human ... - aisc.ai.science · MCTS: Disadvantages Many simulation are required No generalization between similar states Performance is dependent

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Mastering the Game of Go Without Human ... - aisc.ai.science · MCTS: Disadvantages Many simulation are required No generalization between similar states Performance is dependent

Mastering the Game of Go Without Human Knowledge

Lead: Liam HinzmanFacilitators: Tahseen Shabab and Susan Cheng

Page 2: Mastering the Game of Go Without Human ... - aisc.ai.science · MCTS: Disadvantages Many simulation are required No generalization between similar states Performance is dependent

AlphaGo Zero

Page 3: Mastering the Game of Go Without Human ... - aisc.ai.science · MCTS: Disadvantages Many simulation are required No generalization between similar states Performance is dependent

Overview● Brief History of AI in Games● What is Go and Why Should You Care?● How AlphaGo Zero Works● Results● Discussion

Page 4: Mastering the Game of Go Without Human ... - aisc.ai.science · MCTS: Disadvantages Many simulation are required No generalization between similar states Performance is dependent

Minimax

Page 5: Mastering the Game of Go Without Human ... - aisc.ai.science · MCTS: Disadvantages Many simulation are required No generalization between similar states Performance is dependent

HeuristicsReduces search depth

Page 6: Mastering the Game of Go Without Human ... - aisc.ai.science · MCTS: Disadvantages Many simulation are required No generalization between similar states Performance is dependent

Deep Blue

● 126 million positions per second● Hand-designed Heuristics

Page 7: Mastering the Game of Go Without Human ... - aisc.ai.science · MCTS: Disadvantages Many simulation are required No generalization between similar states Performance is dependent

The Game of Go

Page 8: Mastering the Game of Go Without Human ... - aisc.ai.science · MCTS: Disadvantages Many simulation are required No generalization between similar states Performance is dependent
Page 9: Mastering the Game of Go Without Human ... - aisc.ai.science · MCTS: Disadvantages Many simulation are required No generalization between similar states Performance is dependent
Page 10: Mastering the Game of Go Without Human ... - aisc.ai.science · MCTS: Disadvantages Many simulation are required No generalization between similar states Performance is dependent

Go is Incredibly Complex

Go is Hard for Computers

Page 11: Mastering the Game of Go Without Human ... - aisc.ai.science · MCTS: Disadvantages Many simulation are required No generalization between similar states Performance is dependent

How AlphaGo Zero Works

Policy IterationMonte-Carlo Tree Search Residual Network

Page 12: Mastering the Game of Go Without Human ... - aisc.ai.science · MCTS: Disadvantages Many simulation are required No generalization between similar states Performance is dependent

Monte-Carlo Tree Search (MCTS)

Page 13: Mastering the Game of Go Without Human ... - aisc.ai.science · MCTS: Disadvantages Many simulation are required No generalization between similar states Performance is dependent

Monte-Carlo Tree Search (MCTS)

Page 14: Mastering the Game of Go Without Human ... - aisc.ai.science · MCTS: Disadvantages Many simulation are required No generalization between similar states Performance is dependent

MCTS: Advantages● Aheuristic● Online-search● Works well on large trees

Page 15: Mastering the Game of Go Without Human ... - aisc.ai.science · MCTS: Disadvantages Many simulation are required No generalization between similar states Performance is dependent

MCTS: Disadvantages● Many simulation are required● No generalization between similar states● Performance is dependent on “rollout” policy

Page 16: Mastering the Game of Go Without Human ... - aisc.ai.science · MCTS: Disadvantages Many simulation are required No generalization between similar states Performance is dependent

MCTS in AlphaGo Zero

Page 17: Mastering the Game of Go Without Human ... - aisc.ai.science · MCTS: Disadvantages Many simulation are required No generalization between similar states Performance is dependent

Upper Confidence Bound for Trees (UCT)

Page 18: Mastering the Game of Go Without Human ... - aisc.ai.science · MCTS: Disadvantages Many simulation are required No generalization between similar states Performance is dependent

Upper Confidence Bound for Trees (UCT)

Page 19: Mastering the Game of Go Without Human ... - aisc.ai.science · MCTS: Disadvantages Many simulation are required No generalization between similar states Performance is dependent

Upper Confidence Bound for Trees (UCT)

s State

a Action

Q(s, a) Expected Reward

P(s, a) Policy

N(s, a) # of state visits

cpuct Hyperparameter

Page 20: Mastering the Game of Go Without Human ... - aisc.ai.science · MCTS: Disadvantages Many simulation are required No generalization between similar states Performance is dependent

AlphaGo Zero’s Network Architecture

Page 21: Mastering the Game of Go Without Human ... - aisc.ai.science · MCTS: Disadvantages Many simulation are required No generalization between similar states Performance is dependent

Residual Layer

Page 22: Mastering the Game of Go Without Human ... - aisc.ai.science · MCTS: Disadvantages Many simulation are required No generalization between similar states Performance is dependent

Dual Heads

Page 23: Mastering the Game of Go Without Human ... - aisc.ai.science · MCTS: Disadvantages Many simulation are required No generalization between similar states Performance is dependent

Training

Evaluator

Self-play Worker

Training Worker

𝜋’ > 𝜋

Page 24: Mastering the Game of Go Without Human ... - aisc.ai.science · MCTS: Disadvantages Many simulation are required No generalization between similar states Performance is dependent

How AlphaGo Zero Chooses a Move

𝜋 ~ N1/𝜏1600 Simulations

Page 25: Mastering the Game of Go Without Human ... - aisc.ai.science · MCTS: Disadvantages Many simulation are required No generalization between similar states Performance is dependent

Self-Play Workers

Page 26: Mastering the Game of Go Without Human ... - aisc.ai.science · MCTS: Disadvantages Many simulation are required No generalization between similar states Performance is dependent

Training Worker

Page 27: Mastering the Game of Go Without Human ... - aisc.ai.science · MCTS: Disadvantages Many simulation are required No generalization between similar states Performance is dependent

Evaluator

400 Games 55% Win Rate

Page 28: Mastering the Game of Go Without Human ... - aisc.ai.science · MCTS: Disadvantages Many simulation are required No generalization between similar states Performance is dependent

MCTS in AlphaGo Zero

Page 29: Mastering the Game of Go Without Human ... - aisc.ai.science · MCTS: Disadvantages Many simulation are required No generalization between similar states Performance is dependent

5 Minute Break

Page 30: Mastering the Game of Go Without Human ... - aisc.ai.science · MCTS: Disadvantages Many simulation are required No generalization between similar states Performance is dependent

AlphaGo Zero vs AlphaGoEntirely self-play

Input is game board

Single network

No rollouts

Supervised learning + self-play

Input is hand-crafted features

Two networks

Rollouts were used

Page 31: Mastering the Game of Go Without Human ... - aisc.ai.science · MCTS: Disadvantages Many simulation are required No generalization between similar states Performance is dependent

Results

Page 32: Mastering the Game of Go Without Human ... - aisc.ai.science · MCTS: Disadvantages Many simulation are required No generalization between similar states Performance is dependent

Learning Stages

Page 33: Mastering the Game of Go Without Human ... - aisc.ai.science · MCTS: Disadvantages Many simulation are required No generalization between similar states Performance is dependent

Ladders

Page 34: Mastering the Game of Go Without Human ... - aisc.ai.science · MCTS: Disadvantages Many simulation are required No generalization between similar states Performance is dependent

AlphaZero

Page 35: Mastering the Game of Go Without Human ... - aisc.ai.science · MCTS: Disadvantages Many simulation are required No generalization between similar states Performance is dependent

AlphaGo Zero’s Gift

Page 36: Mastering the Game of Go Without Human ... - aisc.ai.science · MCTS: Disadvantages Many simulation are required No generalization between similar states Performance is dependent

Discussion

Page 37: Mastering the Game of Go Without Human ... - aisc.ai.science · MCTS: Disadvantages Many simulation are required No generalization between similar states Performance is dependent

DiscussionHow can the AlphaGo Zero algorithm be extended to different games?

How can the sample efficiency of AlphaGo Zero be improved?

A very stable training environment is need for the algorithm. Can this be alleviated to let AlphaZero applied to real-world problems?

Page 38: Mastering the Game of Go Without Human ... - aisc.ai.science · MCTS: Disadvantages Many simulation are required No generalization between similar states Performance is dependent

ResourcesMastering the Game of Go without Human Knowledge

David Silver 2017 NIPS Talk

ELF OpenGo: An Analysis and Open Reimplementation of AlphaZero

David Silver’s PhD Thesis: Reinforcement Learning and Simulation-Based Search in Computer Go

A Brief History of Game AI Up To AlphaGo - Andrey Kurenkov

AlphaGo Zero Demystified - Dylan Djian