27
Finding equilibria in Finding equilibria in large sequential games of large sequential games of imperfect information imperfect information Andrew Gilpin Andrew Gilpin and and Tuomas Tuomas Sandholm Sandholm Carnegie Mellon University Carnegie Mellon University Computer Science Department Computer Science Department

Finding equilibria in large sequential games of imperfect information Andrew Gilpin and Tuomas Sandholm Carnegie Mellon University Computer Science Department

Embed Size (px)

Citation preview

Page 1: Finding equilibria in large sequential games of imperfect information Andrew Gilpin and Tuomas Sandholm Carnegie Mellon University Computer Science Department

Finding equilibria in large Finding equilibria in large sequential games of sequential games of

imperfect informationimperfect information

Andrew GilpinAndrew Gilpin and and Tuomas SandholmTuomas SandholmCarnegie Mellon UniversityCarnegie Mellon University

Computer Science DepartmentComputer Science Department

Page 2: Finding equilibria in large sequential games of imperfect information Andrew Gilpin and Tuomas Sandholm Carnegie Mellon University Computer Science Department

Motivation: Poker

• Poker is a wildly popular card game– This year’s World Series of Poker prize pool

surpassed $103 million, including $56 million for the World Championship event

– ESPN is broadcasting parts of the tournament

• Poker presents several challenges for AI– Imperfect information– Risk assessment and management– Deception (bluffing, slow-playing)– Counter-deception (calling a bluff)

Page 3: Finding equilibria in large sequential games of imperfect information Andrew Gilpin and Tuomas Sandholm Carnegie Mellon University Computer Science Department

Rhode Island Hold’em poker:

The Deal

Page 4: Finding equilibria in large sequential games of imperfect information Andrew Gilpin and Tuomas Sandholm Carnegie Mellon University Computer Science Department

Rhode Island Hold’em poker:

Round 1

Page 5: Finding equilibria in large sequential games of imperfect information Andrew Gilpin and Tuomas Sandholm Carnegie Mellon University Computer Science Department

Rhode Island Hold’em poker:

Round 2

Page 6: Finding equilibria in large sequential games of imperfect information Andrew Gilpin and Tuomas Sandholm Carnegie Mellon University Computer Science Department

Rhode Island Hold’em poker:

Round 3

Page 7: Finding equilibria in large sequential games of imperfect information Andrew Gilpin and Tuomas Sandholm Carnegie Mellon University Computer Science Department

Rhode Island Hold’em poker:

Showdown

Page 8: Finding equilibria in large sequential games of imperfect information Andrew Gilpin and Tuomas Sandholm Carnegie Mellon University Computer Science Department

Sneak preview of results:Solving Rhode Island Hold’em poker

• Rhode Island Hold’em poker invented as a testbed for AI research [Shi & Littman 2001]

• Game tree has more than 3.1 billion nodes

• Previously, the best techniques did not scale to games this large

• Using our algorithm we have computed optimal strategies for this game

• This is the largest poker game solved to date by over four orders of magnitude

Page 9: Finding equilibria in large sequential games of imperfect information Andrew Gilpin and Tuomas Sandholm Carnegie Mellon University Computer Science Department

Outline of this talk

• Game-theoretic foundations: Equilibrium

• Model: Ordered games

• Abstraction mechanism: Information filters

• Strategic equivalence: Game isomorphisms

• Algorithm: GameShrink

• Solving Rhode Island Hold’em

Page 10: Finding equilibria in large sequential games of imperfect information Andrew Gilpin and Tuomas Sandholm Carnegie Mellon University Computer Science Department

Game Theory

• In multi-agent systems, an agent’s outcome depends on the actions of the other agents

• Consequently, an agent’s optimal action depends on the actions of the other agents

• Game theory provides guidance as to how an agent should act

• A game-theoretic equilibrium specifies a strategy for each agent such that no agent wishes to deviate– Such an equilibrium always exists [Nash 1950]

Page 11: Finding equilibria in large sequential games of imperfect information Andrew Gilpin and Tuomas Sandholm Carnegie Mellon University Computer Science Department

A simple example

0, 0 -1, 1 1, -1

1, -1 0, 0 -1, 1

-1, 1 1, -1 0, 0

Rock

Rock

Paper

Scissors

ScissorsPaper

1/3

1/3

1/3

1/3 1/3 1/3

Page 12: Finding equilibria in large sequential games of imperfect information Andrew Gilpin and Tuomas Sandholm Carnegie Mellon University Computer Science Department

Complexity of computing equilibria

• Finding a Nash equilibrium is “A most fundamental computational problem whose complexity is wide open [and] together with factoring … the most important concrete open question on the boundary of P today” [Papadimitriou 2001]– Even for games with only two players

• There are algorithms (requiring exponential-time in the worst-case) for computing Nash equilibria

• Good news: Two-person zero-sum matrix games can be solved in poly-time using linear programming

Page 13: Finding equilibria in large sequential games of imperfect information Andrew Gilpin and Tuomas Sandholm Carnegie Mellon University Computer Science Department

What about sequential games?

• Sequential games involve turn-taking, moves of chance, and imperfect information

• Every sequential game can be converted into a simultaneous-move game– Basic idea: Make one strategy in the simultaneous-

move game for every possible action in every possible situation in the sequential game

– This approach leads to an exponential blowup in the number of strategies

Page 14: Finding equilibria in large sequential games of imperfect information Andrew Gilpin and Tuomas Sandholm Carnegie Mellon University Computer Science Department

Sequence form representation

• The sequence form is an alternative representation that is more compact [Koller, Megiddo, von Stengel, Romanovskii]

• Using the sequence form, two-player zero-sum games with perfect recall can be solved in time polynomial in the size of the game tree– But, Texas Hold’em has 1018 nodes

Page 15: Finding equilibria in large sequential games of imperfect information Andrew Gilpin and Tuomas Sandholm Carnegie Mellon University Computer Science Department

Our approach

• Instead of developing an equilibrium-finding algorithm per se, we instead introduce an automated abstraction technique that results in a smaller, equivalent game

• We prove that a Nash equilibrium in the smaller game corresponds to a Nash equilibrium in the original game

• Our technique applies to n-player sequential games with observed actions and ordered signals

Page 16: Finding equilibria in large sequential games of imperfect information Andrew Gilpin and Tuomas Sandholm Carnegie Mellon University Computer Science Department

Illustration of our approach

Nash equilibriumNash equilibrium

Original gameAbstracted game

Abstraction

Compute Nash

Page 17: Finding equilibria in large sequential games of imperfect information Andrew Gilpin and Tuomas Sandholm Carnegie Mellon University Computer Science Department

Game with ordered signals(a.k.a. ordered game)

1. Players I = {1,…,n}

2. Stage games G = G1,…,Gr

3. Player label L

4. Game-ending nodes ω

5. Signal alphabet Θ

6. Signal quantities κ = κ1,…,κr and γ = γ1,…,γr

7. Signal probability distribution p

8. Partial ordering ≥ of subsets of Θ

9. Utility function u (increasing in private signals)

I = {1,2}

Θ = {2♠,…,A♦}κ = (0,1,1)γ = (1,0,0)

UniformHand rank

Page 18: Finding equilibria in large sequential games of imperfect information Andrew Gilpin and Tuomas Sandholm Carnegie Mellon University Computer Science Department

Information filters• Observation: We can make games smaller by filtering

the information a player receives• Instead of observing a specific signal exactly, a player

instead observes a filtered set of signals– E.g. receiving the signal {A♠,A♣,A♥,A♦} instead of A♠

• Combining an ordered game and a valid information filter yields a filtered ordered game

• Prop. A filtered ordered game is a finite sequential game with perfect recall– Corollary If the filtered ordered game is two-person zero-

sum, we can solve it in poly-time using linear programming

Page 19: Finding equilibria in large sequential games of imperfect information Andrew Gilpin and Tuomas Sandholm Carnegie Mellon University Computer Science Department

Filtered signal trees

• Every filtered ordered game has a corresponding filtered signal tree– Each edge corresponds to the revelation of some signal– Each path corresponds to the revelation of a set of

signals

• Our algorithms operate directly on the filtered signal tree– We never load the full game representation into

memory

Page 20: Finding equilibria in large sequential games of imperfect information Andrew Gilpin and Tuomas Sandholm Carnegie Mellon University Computer Science Department

Ordered game isomorphic relation

• The ordered game isomorphic relation captures the notion of strategic symmetry between nodes

• We define the relationship recursively:– Two leaves are ordered game isomorphic if the payoffs to all

players are the same at each leaf, for all action histories

– Two internal nodes are ordered game isomorphic if they are siblings and there is a bijection between their children such that only ordered game isomorphic nodes are matched

• We can compute this relationship efficiently using dynamic programming and perfect matching computations in a bipartite graph

Page 21: Finding equilibria in large sequential games of imperfect information Andrew Gilpin and Tuomas Sandholm Carnegie Mellon University Computer Science Department

Ordered game isomorphic abstraction transformation

• This operation transforms an existing information filter into a new filter that merges two ordered game isomorphic nodes

• The new filter yields a smaller, abstracted game

• Thm If a strategy profile is a Nash equilibrium in the smaller, abstracted game, then it is a Nash equilibrium in the original game

Page 22: Finding equilibria in large sequential games of imperfect information Andrew Gilpin and Tuomas Sandholm Carnegie Mellon University Computer Science Department

GameShrink: Efficiently computing ordered game isomorphic abstraction transformations

• Recall: we have a dynamic program for determining if two nodes of the filtered signal tree are ordered game isomorphic

• Algorithm: Starting from the top of the filtered signal tree, perform the transformation where applicable

• Approximation algorithm: instead of requiring perfect matching, instead require a matching with a penalty below some threshold

Page 23: Finding equilibria in large sequential games of imperfect information Andrew Gilpin and Tuomas Sandholm Carnegie Mellon University Computer Science Department

GameShrink: Efficiently computing ordered game isomorphic abstraction transformations

• The Union-Find data structure provides an efficient representation of the information filter– Linear memory and almost linear time

• Can eliminate certain perfect matching computations by using easy-to-check necessary conditions– Compact histogram databases for storing win/loss frequencies to

speed up the checks

Page 24: Finding equilibria in large sequential games of imperfect information Andrew Gilpin and Tuomas Sandholm Carnegie Mellon University Computer Science Department

Solving Rhode Island Hold’em poker

• GameShrink computes all ordered game isomorphic abstraction transformations in under one second

• Without abstraction, the linear program has 91,224,226 rows and columns

• After applying GameShrink, the linear program has only 1,237,238 rows and columns

• By solving the resulting linear program, we are able to compute optimal min-max strategies for this game– CPLEX Barrier method takes 7 days, 17 hours and 25 GB

RAM to solve

• This is the largest poker game solved to date by over four orders of magnitude

Page 25: Finding equilibria in large sequential games of imperfect information Andrew Gilpin and Tuomas Sandholm Carnegie Mellon University Computer Science Department

Comparison to previous research

• Rule-based– Limited success in even small poker games

• Simulation/Learning– Do not take multi-agent aspect into account

• Game-theoretic– Manual abstraction

• “Approximating Game-Theoretic Optimal Strategies for Full-scale Poker”, Billings, Burch, Davidson, Holte, Schaeffer, Schauenberg, Szafron, IJCAI-03. Distinguished Paper Award.

– Automated abstraction

Page 26: Finding equilibria in large sequential games of imperfect information Andrew Gilpin and Tuomas Sandholm Carnegie Mellon University Computer Science Department

Directions for future work

• Computing strategies for larger games– Requires approximation of solutions

• Tournament poker

• More than two players

• Other types of abstraction

Page 27: Finding equilibria in large sequential games of imperfect information Andrew Gilpin and Tuomas Sandholm Carnegie Mellon University Computer Science Department

Summary• Introduced an automatic method for performing

abstractions in a broad class of games• Introduced information filters as a technique for

working with games with imperfect information• Developed an equilibrium-preserving abstraction

transformation, along with an efficient algorithm• Described a simple extension that yields an

approximation algorithm for tackling even larger games• Solved the largest poker game to date

– Playable on-line at http://www.cs.cmu.edu/~gilpin/gsi.html

Thank you very much for your interest