Upload
vincent-parsons
View
215
Download
0
Tags:
Embed Size (px)
Citation preview
SARTRE: System OverviewA Case-Based Agent for Two-Player Texas
Hold'em
Jonathan Rubin & Ian Watson
University of Auckland Game AI Grouphttp://www.cs.auckland.ac.nz/research/gameai/
Overview
• Introduction
• Texas Hold'em
• Approaches to Computer Poker
• Sartre: System Overview
• Results
• Future Work
Texas Hold'em
• Two-player Limit Hold'em
– Much different to full-table game
• Chance events
• Hidden Information
Near-Equilibrium Strategy
• Nash Equilibrium– Assumes the opponent makes no mistakes
– Attempts to minimise its loses against this perfect opponent
• Near-Equilibrium– As game tree is too large
– Plays not to lose
Exploitative Strategy
• Exploitative Strategy– Opponent Modelling
– Attempts to punish weaknesses in the opponents strategy
– Plays off the equilibrium
– Plays to win
Sartre: System Overview
• Similarity Assessment Reasoning for Texas hold'em via Recall of Experience
• Our entry for the 2009 Computer Poker Competition
• Case-base was constructed from past CPC games
Sartre: System Overview
• Hand picked by authors
• Case Features
– Previous betting for the hand
– Hand Category
– Board Category
1. Previous betting for the hand
• Currently represented as a string
– f = fold– c = check/call– r = bet/raise
• Examples
– r– rrc-r– rc-crrc-rc-cr
2. Hand Category
• Two components
– Hand Category– Hand Potential
• Examples
– Missed– One-Pair, Two-Pair, Three-of-a-kind
– Flush-draw, Straight-draw
3. Board Category
• Captures information about potential
– Flush Draws or,
– Straight Draws
• Information that is likely to be noticed by an good player
Similarity
• Currently either all or nothing
– If a collection of cards maps to the same category they are assigned a similarity of 1.0, otherwise 0.
Case Overview
• Case Features– 1. Previous betting for the hand– 2. Hand Category– 3. Board Category
• Solution– f, c, r
• Outcome– +/- value– + Profit– - Loss
Case Overview
• Solution + Outcome– Recorded from equilibrium approaching
bots from previous AAAI Computer Poker Competition
• Separate case-bases for preflop, flop, turn & river
• Approx. 250,000 cases in each case-base.
Decision Making
• Retrieved cases can have different decisions
• Three different versions
– 1. Probability Triple
– 2. Majority rules
– 3. Outcome-based
Decision Making• Probability Triple
– Proportion of times that the solution indicated to fold, call or raise
– (f, c, r)• Majority Rules
– Decision made the most is reused• Outcome-Based
– Dependant on adjusted average outcome values for each decision
– If a call or raise decision was never made, it's outcome is unknown and is given a value of +infinity
Duplicate Matches
• Experimental results derived using duplicate matches
– Play N poker hands– Reset each players memory– Reverse the position of each player and
deal the same N hands
• Forward + Reverse Directions
• Reduces variance
Self-Play Experiments
• Small bets per hand (sb/h)– Assuming a $10/$20 game
• Sartre-Probability Vs. Sartre-Outcome– Sartre-Probability wins 0.168 sb/h– On average $1.68 profit per hand
• Sartre-Probability Vs. Sartre-Majority– Sartre-Majority wins 0.039 sb/h– On average $0.39 per hand
Self-Play Experiments
• Chose Sartre – Majority Rules.
• Results not transitive
• Makes Sartre more predictable and hence more exploitable by strong opposition
2009 Computer Poker Competition Results
• Duplicate match structure– 3000 hands in forward & reverse direction
• Multiple matches against each opponent until statistical significance obtained
• Sartre placed 7th out of 13 entrants in limit competition
2009 Computer Poker Competition Results
1 MANZANA -0.038
2 GGValuta -0.043
3 HyperboreanLimit-Eqm -0.051
4 HyperboreanLimit-BR -0.023
5 Rockhopper -0.033
6 Slumbot -0.012
7 Sartre
8 GS5 -0.007
9 AoBot 0.131
10 LIDIA 0.145
11 dcurbhu 0.217
12 GS5Dynamic 0.119
13 tommybot 0.765
Total 0.097
2009 Computer Poker Competition Results
• Overall profit of +0.097 sb/h
• Assuming a $10/$20 game
– $0.97 per hand profit
Future Work
• Investigate loosening of all-or-nothing similarity
• CBR and adaptive poker agents– Opponent modelling – Learning
• Better solution adaptation– Combination of decision + outcome