Upload
others
View
16
Download
0
Embed Size (px)
Citation preview
DeepStackExpert-Level Artificial Intelligence in Heads-Up No-Limit Poker
Lasse Becker-Czarnetzki
University of HeidelbergArtificial Intelligence for Games
SS 2019
July 11. 2019
1 Perfect vs Imperfect information GamesIntroductionNo-Limit Heads-up Texax HoldemPerfect Information strategies
2 DeepStackRe-solving (CFR)Depth limited searchCounterfactual Value NetworksSparse lookahead trees
3 EvaluationPerformanve against humansExploitability (LBR)Nice features
4 Conclusion
Perfect vs Imperfect information Games DeepStack Evaluation ConclusionIntroduction No-Limit Heads-up Texax Holdem Perfect Information strategies
Perfect information games
July 11. 2019 DeepStack Lasse Becker-Czarnetzki 1 / 38
Perfect vs Imperfect information Games DeepStack Evaluation ConclusionIntroduction No-Limit Heads-up Texax Holdem Perfect Information strategies
Von Neuman on games
July 11. 2019 DeepStack Lasse Becker-Czarnetzki 2 / 38
Perfect vs Imperfect information Games DeepStack Evaluation ConclusionIntroduction No-Limit Heads-up Texax Holdem Perfect Information strategies
No-Limit Heads-up Texax Holdem
2 player zero-sum game4 Betting rounds on ”who has the better cards”2 Hold cards (private) (3, 4, 5) public cards.
–> 10160decisionpoints
July 11. 2019 DeepStack Lasse Becker-Czarnetzki 3 / 38
Perfect vs Imperfect information Games DeepStack Evaluation ConclusionIntroduction No-Limit Heads-up Texax Holdem Perfect Information strategies
Poker Terms
BigblindFoldCheckCallBet (raise)Flop (Pre-Flop)TurnRiverrange
July 11. 2019 DeepStack Lasse Becker-Czarnetzki 4 / 38
Perfect vs Imperfect information Games DeepStack Evaluation ConclusionIntroduction No-Limit Heads-up Texax Holdem Perfect Information strategies
Poker Game Tree
July 11. 2019 DeepStack Lasse Becker-Czarnetzki 5 / 38
Perfect vs Imperfect information Games DeepStack Evaluation ConclusionIntroduction No-Limit Heads-up Texax Holdem Perfect Information strategies
Perfect information game
July 11. 2019 DeepStack Lasse Becker-Czarnetzki 6 / 38
Perfect vs Imperfect information Games DeepStack Evaluation ConclusionIntroduction No-Limit Heads-up Texax Holdem Perfect Information strategies
Perfect information game
July 11. 2019 DeepStack Lasse Becker-Czarnetzki 7 / 38
Perfect vs Imperfect information Games DeepStack Evaluation ConclusionIntroduction No-Limit Heads-up Texax Holdem Perfect Information strategies
Perfect information game
July 11. 2019 DeepStack Lasse Becker-Czarnetzki 8 / 38
Perfect vs Imperfect information Games DeepStack Evaluation ConclusionIntroduction No-Limit Heads-up Texax Holdem Perfect Information strategies
Perfect information game
July 11. 2019 DeepStack Lasse Becker-Czarnetzki 9 / 38
Perfect vs Imperfect information Games DeepStack Evaluation ConclusionIntroduction No-Limit Heads-up Texax Holdem Perfect Information strategies
Problems for imperfect information games
July 11. 2019 DeepStack Lasse Becker-Czarnetzki 10 / 38
Perfect vs Imperfect information Games DeepStack Evaluation ConclusionIntroduction No-Limit Heads-up Texax Holdem Perfect Information strategies
Questions
How can we forget supergames without using necessaryinformation?How do we solve a subgame when their are no definite statesto start from?How do we evaluate a state, when we can’t use a single valueto summarize a position?
July 11. 2019 DeepStack Lasse Becker-Czarnetzki 11 / 38
Perfect vs Imperfect information Games DeepStack Evaluation ConclusionRe-solving (CFR) Depth limited search Counterfactual Value Networks Sparse lookahead trees
Re-solving
July 11. 2019 DeepStack Lasse Becker-Czarnetzki 12 / 38
Perfect vs Imperfect information Games DeepStack Evaluation ConclusionRe-solving (CFR) Depth limited search Counterfactual Value Networks Sparse lookahead trees
Re-solving
July 11. 2019 DeepStack Lasse Becker-Czarnetzki 13 / 38
Perfect vs Imperfect information Games DeepStack Evaluation ConclusionRe-solving (CFR) Depth limited search Counterfactual Value Networks Sparse lookahead trees
Re-solving
July 11. 2019 DeepStack Lasse Becker-Czarnetzki 14 / 38
Perfect vs Imperfect information Games DeepStack Evaluation ConclusionRe-solving (CFR) Depth limited search Counterfactual Value Networks Sparse lookahead trees
Counterfactual Regret Minimization
Counterfactual: ”If i had known”...Regret: ”how much better would i have done if i didsomething else instead?Minimization: ”what strategy minimizes my overall regret?Average strategy over i iterations = approximation to NashEquilibrium
July 11. 2019 DeepStack Lasse Becker-Czarnetzki 15 / 38
Perfect vs Imperfect information Games DeepStack Evaluation ConclusionRe-solving (CFR) Depth limited search Counterfactual Value Networks Sparse lookahead trees
Counterfactual Regret Minimization
July 11. 2019 DeepStack Lasse Becker-Czarnetzki 16 / 38
Perfect vs Imperfect information Games DeepStack Evaluation ConclusionRe-solving (CFR) Depth limited search Counterfactual Value Networks Sparse lookahead trees
Counterfactual Regret Minimization
July 11. 2019 DeepStack Lasse Becker-Czarnetzki 17 / 38
Perfect vs Imperfect information Games DeepStack Evaluation ConclusionRe-solving (CFR) Depth limited search Counterfactual Value Networks Sparse lookahead trees
Counterfactual Regret Minimization
July 11. 2019 DeepStack Lasse Becker-Czarnetzki 18 / 38
Perfect vs Imperfect information Games DeepStack Evaluation ConclusionRe-solving (CFR) Depth limited search Counterfactual Value Networks Sparse lookahead trees
Continual Re-solving
At every action we re-solve the subgameWe need our range and opponents counterfactual value”What-if” (expected value) opponent reaches public statewith hand x.3 scenarios for updating range and CFVs.
own action: CFVs = CFVs(action) – Update range via BayesruleChance action: CFVs = CFVs(chance action) – Eliminateimpossible card combos.Opponents action: Do Nothing
July 11. 2019 DeepStack Lasse Becker-Czarnetzki 19 / 38
Perfect vs Imperfect information Games DeepStack Evaluation ConclusionRe-solving (CFR) Depth limited search Counterfactual Value Networks Sparse lookahead trees
Depth limited search
July 11. 2019 DeepStack Lasse Becker-Czarnetzki 20 / 38
Perfect vs Imperfect information Games DeepStack Evaluation ConclusionRe-solving (CFR) Depth limited search Counterfactual Value Networks Sparse lookahead trees
Depth limited search
July 11. 2019 DeepStack Lasse Becker-Czarnetzki 21 / 38
Perfect vs Imperfect information Games DeepStack Evaluation ConclusionRe-solving (CFR) Depth limited search Counterfactual Value Networks Sparse lookahead trees
Solutions
Search from a set of possible states, re-solving multiple times.Remember players range and opponents counterfactual valuesGet evaluation through Deep Counterfactual value networks
July 11. 2019 DeepStack Lasse Becker-Czarnetzki 22 / 38
Perfect vs Imperfect information Games DeepStack Evaluation ConclusionRe-solving (CFR) Depth limited search Counterfactual Value Networks Sparse lookahead trees
DeepStack elements summary
July 11. 2019 DeepStack Lasse Becker-Czarnetzki 23 / 38
Perfect vs Imperfect information Games DeepStack Evaluation ConclusionRe-solving (CFR) Depth limited search Counterfactual Value Networks Sparse lookahead trees
Deep Counterfactual Value Networks
July 11. 2019 DeepStack Lasse Becker-Czarnetzki 24 / 38
Perfect vs Imperfect information Games DeepStack Evaluation ConclusionRe-solving (CFR) Depth limited search Counterfactual Value Networks Sparse lookahead trees
Deep Counterfactual Value Networks
2 Networks: Flop Network, Turn NetworkAuxiliary network (Pre-Flop)Simple FFNN (7 layers, 500 Nodes, ReLU)outer network to fit values for zero-sum gameinput: Pot sizes, public cards, players rangesoutput: Counterfactual Values (Players, Hands)
July 11. 2019 DeepStack Lasse Becker-Czarnetzki 25 / 38
Perfect vs Imperfect information Games DeepStack Evaluation ConclusionRe-solving (CFR) Depth limited search Counterfactual Value Networks Sparse lookahead trees
Training
Randomly generated Poker situations.Turn network: 10M, Flop network:1MTurn network used for depth-limited lookahead in FlopNetwork training.
July 11. 2019 DeepStack Lasse Becker-Czarnetzki 26 / 38
Perfect vs Imperfect information Games DeepStack Evaluation ConclusionRe-solving (CFR) Depth limited search Counterfactual Value Networks Sparse lookahead trees
Sparse lookahead trees
July 11. 2019 DeepStack Lasse Becker-Czarnetzki 27 / 38
Perfect vs Imperfect information Games DeepStack Evaluation ConclusionRe-solving (CFR) Depth limited search Counterfactual Value Networks Sparse lookahead trees
Abstraction?
Traditionally abstraction was used to simplify the gameAction abstraction – Card abstraction–> Translation ErrorsDeepstack only uses action abstraction in lookaheadCard clustering is used for NN input.
July 11. 2019 DeepStack Lasse Becker-Czarnetzki 28 / 38
Perfect vs Imperfect information Games DeepStack Evaluation ConclusionPerformanve against humans Exploitability (LBR) Nice features
Evaluation
Exploitability – Play against humansProblems with Variance(Luck) –> 100.000 Hands forstatistical significance–> AIVAT 3k Hands = 90k normal hands
July 11. 2019 DeepStack Lasse Becker-Czarnetzki 29 / 38
Perfect vs Imperfect information Games DeepStack Evaluation ConclusionPerformanve against humans Exploitability (LBR) Nice features
Pro players experimental results
July 11. 2019 DeepStack Lasse Becker-Czarnetzki 30 / 38
Perfect vs Imperfect information Games DeepStack Evaluation ConclusionPerformanve against humans Exploitability (LBR) Nice features
Pro players experimental results
July 11. 2019 DeepStack Lasse Becker-Czarnetzki 31 / 38
Perfect vs Imperfect information Games DeepStack Evaluation ConclusionPerformanve against humans Exploitability (LBR) Nice features
Exploitability
July 11. 2019 DeepStack Lasse Becker-Czarnetzki 32 / 38
Perfect vs Imperfect information Games DeepStack Evaluation ConclusionPerformanve against humans Exploitability (LBR) Nice features
Nice to know
July 11. 2019 DeepStack Lasse Becker-Czarnetzki 33 / 38
Perfect vs Imperfect information Games DeepStack Evaluation ConclusionPerformanve against humans Exploitability (LBR) Nice features
Nice to know
July 11. 2019 DeepStack Lasse Becker-Czarnetzki 34 / 38
Perfect vs Imperfect information Games DeepStack Evaluation Conclusion
Conclusion
DeepStack beats Pro Poker player in No-Limit Heads-UpHoldem for the first timeConnects Perfect information AI heuristical searrch strategywith imperfect information AIPlays with Nash Equilibrium approximated strategy–> Doesn’t exploit weaker players.No MultiplayerCan’t explain moves but strategy tips can be taken away fromDeepStacks play.
July 11. 2019 DeepStack Lasse Becker-Czarnetzki 35 / 38
Perfect vs Imperfect information Games DeepStack Evaluation Conclusion
References
[1] Matej Moravcík and Martin Schmid and Neil Burch andViliam Lisý and Dustin Morrill and Nolan Bard and TrevorDavis and Kevin Waugh and Michael Johanson and MichaelH. BowlingDeepStack: Expert-Level Artificial Intelligence in No-LimitPokerScience, 2017
[2] N. Burch, M. Johanson, M. BowlingProceedings of the Twenty-Eighth Conference onArtificialIntelligence(2014)pp. 602–608
July 11. 2019 DeepStack Lasse Becker-Czarnetzki 36 / 38
Perfect vs Imperfect information Games DeepStack Evaluation Conclusion
References
[3] M. Bowling, N. Burch, M. Johanson, O. TammelinScience 347145 (2015)
[4] Todd W. Neller and Marc LanctoAn introduction to counterfactual regret minimization.InProceedings of Model AI AssignmentsThe Fourth Symposium on Educational Advances inArtificialIntelligence (EAAI-2013), 2013
[5] www.deepstack.ai
[6] www.depthfirstlearning.com/2018/DeepStack
[7] https://www.youtube.com/watch?v=qndXrHcV1sM
July 11. 2019 DeepStack Lasse Becker-Czarnetzki 37 / 38
Perfect vs Imperfect information Games DeepStack Evaluation Conclusion
Thank You for ListeningAny Questions?
July 11. 2019 DeepStack Lasse Becker-Czarnetzki 38 / 38