2

Click here to load reader

Optimal Rhode Island Hold’em Poker - cs.cmu.edusandholm/RIHoldEm.ISD.aaai05proceedings.pdf · Optimal Rhode Island Hold’em Poker∗ Andrew Gilpin and Tuomas Sandholm Computer

Embed Size (px)

Citation preview

Page 1: Optimal Rhode Island Hold’em Poker - cs.cmu.edusandholm/RIHoldEm.ISD.aaai05proceedings.pdf · Optimal Rhode Island Hold’em Poker∗ Andrew Gilpin and Tuomas Sandholm Computer

Optimal Rhode Island Hold’em Poker∗

Andrew Gilpin and Tuomas SandholmComputer Science Department

Carnegie Mellon UniversityPittsburgh, PA 15213

{gilpin,sandholm }@cs.cmu.edu

AbstractRhode Island Hold’em is a poker card game that has beenproposed as a testbed for AI research. This game, with atree size larger than 3.1 billion nodes, features many char-acteristics present in full-scale poker (e.g., Texas Hold’em).Our research advances in equilibrium computation have en-abled us to solve for the optimal (equilibrium) strategies forthis game. Some features of the equilibrium include pokertechniques such as bluffing, slow-playing, check-raising, andsemi-bluffing. In this demonstration, participants will com-pete with our optimal opponent and will experience thesestrategies firsthand.

IntroductionIn environments with multiple self-interested agents, anagent’s outcome is affected by actions of the other agents.Consequently, the optimal action of one agent generally de-pends on the actions of others. Game theory provides a nor-mative framework for analyzing such strategic situations. Inparticular, game theory formally defines games and strate-gies, and provides the notion of anequilibrium, which isa collection of strategies (one for each player) in which noplayer has incentive to deviate to a different strategy (i.e.,all players are playing best responses to each others’ strate-gies). Thus, one component desired for a strong game-playing agent is an effective method for computing (or ap-proximating) equilibria of games. The most commonly usedsolution concept for two-player games isNash equilibrium,which is guaranteed to exist in any finite game where playersare able to play randomized strategies (Nash 1950).

Games can be classified as either games ofcomplete in-formationor incomplete information. Chess and Go are ex-amples of the former, and, until recently, most game playingwork in AI has been on games of this type. To compute anoptimal strategy in a complete information game, an agenttraverses the game tree and evaluates individual nodes. Ifthe agent is able to traverse the entire game tree, she simplycomputes an optimal strategy from the bottom-up, using theprinciple ofbackward induction. This is the main approachbehind minimax and alpha-beta search. These algorithms

∗This material is based upon work supported by the Na-tional Science Foundation under ITR grants IIS-0121678 and IIS-0427858, and a Sloan Fellowship.Copyright c© 2005, American Association for Artificial Intelli-gence (www.aaai.org). All rights reserved.

have limits, of course, particularly when the game tree ishuge, but extremely effective game-playing agents can bedeveloped, even when the size of the game tree prohibitscomplete search.

Current algorithms for solving complete informationgames do not apply to games of incomplete information.The distinguishing difference is that the latter are not fullyobservable: when it is an agent’s turn to move, she does nothave access to all of the information about the world. In suchgames, the decision of what to do at a node cannot generallybe optimally made without considering decisions at all othernodes (including ones on other paths of play).

The sequence formis a compact representation (Ro-manovskii 1962; Koller, Megiddo, & von Stengel 1994;von Stengel 1996) of a sequential game. For two-personzero-sum games, there is a natural linear programming for-mulation based on the sequence form that is polynomial inthe size of the game tree. Thus, reasonable-sized two-persongames can be solved using this method (von Stengel 1996;Koller, Megiddo, & von Stengel 1996; Koller & Pfeffer1997). However this approach still yields enormous (un-solvable) optimization problems for many real-world games,most notably poker. In this research we introduce auto-mated abstraction techniques for finding smaller, strategi-cally equivalent games for which the equilibrium computa-tion is faster. We have chosen poker as the first applicationof our equilibrium finding techniques.

PokerPoker is an enormously popular card game played aroundthe world. The 2005 World Series of Poker is expected tohave nearly $50 million dollars in prize money in severaltournaments. Increasingly, poker players compete in on-line poker rooms, and television stations regularly broadcastpoker tournaments.

Due to the uncertainty stemming from opponents’ cards,opponents’ future actions, and chance moves, poker hasbeen identified as an important research area in AI (Billingset al. 2002). Poker has been a popular subject in the gametheory literature since the field’s founding, but manual equi-librium analysis has been limited to extremely small games.Even with the use of computers, the largest poker games thathave been solved have only about 140,000 nodes (Koller &Pfeffer 1997). Large-scale approximations have been devel-

Page 2: Optimal Rhode Island Hold’em Poker - cs.cmu.edusandholm/RIHoldEm.ISD.aaai05proceedings.pdf · Optimal Rhode Island Hold’em Poker∗ Andrew Gilpin and Tuomas Sandholm Computer

oped (Billingset al. 2003), but those methods do not pro-vide any guarantees about the performance of the computedstrategies. Furthermore, the approximations were designedmanually by a human expert. Our approach does not requireany domain knowledge.

Rhode Island Hold’emRhode Island Hold’em is a specific instance of poker thatwas invented as a testbed for AI research (Shi & Littman2001). It was designed so that it was similar in style to TexasHold’em, yet not so large that devising reasonably intelli-gent strategies would be impossible. Rhode Island Hold’emhas a game tree exceeding 3.1 billion nodes, and until nowfinding an exact solution seemed unlikely.

Rhode Island Hold’em (in this version) is played by 2players. Each player pays ananteof 5 chips which is addedto thepot. Both players initially receive a single card, facedown; these are known as thehole cards. After receiving thehole cards, the players take part in one betting round. Eachplayer may check or bet if no bets have been placed. If abet has been placed, then the player mayfold (thus forfeit-ing the game),call (adding chips to the pot equal to the lastplayer’s bet), orraise(calling the current bet and making anadditional bet). In Rhode Island Hold’em, the players arelimited to 3 raises each per betting round. In this bettinground, the bets are 10 chips. After the betting round, a com-munity card is dealt face up. This is called theflop. Anotherbetting round take places at this point, with bets equal to 20chips. Another community card is dealt face up. This iscalled theturn card. A final betting round takes place at thispoint, with bets equal to 20 chips. If neither player folds,then theshowdowntakes place. Both players turn over theircards. The player who has the best 3-card poker hand takesthe pot. (Hands in 3-card poker games are ranked slightlydifferently than 5-card poker hands. The main differencesare that the order of flushes and straights are reversed, anda three of a kind is better than straights or flushes.) In theevent of a draw, the pot is split evenly.

Technical contributionThis demonstration embodies the first use of our algorithm,GameShrink, for the automatic detection ofextensive gameisomorphismsand the application ofrestricted game iso-morphic abstraction transformations(Gilpin & Sandholm2005). Essentially, our algorithm takes as input an imperfectinformation game tree and outputs a strategically equivalentgame that is much smaller. We can prove that a Nash equi-librium in the smaller, abstracted game is strategically equiv-alent to a Nash equilibrium in the original game in the sensethat, given a Nash equilibrium in the abstracted game, it issimple to compute a Nash equilibrium in the original game.Thus, by shrinking the game tree, we can carry out the equi-librium computations on a smaller input, and then constructan optimal equilibria in the original game in a straightfor-ward manner.

For even larger games, we have developed approxima-tion technique for automatically computing abstractions thatmay not necessarily result in a Nash equilibrium in thesmaller game. This approach does allow us to find approx-

imate equilibria for larger games. For solving Rhode IslandHold’em, we did not need to resort to such approximations.

Applying the sequence form representation to Rhode Is-land Hold’em yields an LP with 91,224,226 rows, andthe same number of columns. This is much too largefor current linear programming algorithms to handle. Weused GameShrinkto reduce this, and it yielded an LPwith 1,237,238 rows and columns—with 50,428,638 non-zero coefficients in the LP. We then applied iterated elim-ination of dominated strategies, which further reducedthis to 1,190,443 rows and 1,181,084 columns. (Ap-plying iterated elimination of dominated strategies with-out GameShrinkyielded 89,471,986 rows and 89,121,538columns, which still would have been prohibitively large tosolve.) GameShrinkrequired less than one second to per-form the shrinking (i.e., to compute all of the restricted gameisomorphic abstraction transformations). Using a 1.65GHzIBM eServer p5 570 with 64 gigabytes of RAM (we onlyneeded 25 gigabytes), we solved the LP (and thus solvedfor a Nash equilibrium of the game) in 7 days and 13 hoursusing the barrier method of ILOG CPLEX.

While others have worked on computer programs forplaying Rhode Island Hold’em (Shi & Littman 2001), nooptimal strategy has been found. This is the largest pokergame solved to date by over four orders of magnitude.

ReferencesBillings, D.; Davidson, A.; Schaeffer, J.; and Szafron, D.2002. The challenge of poker.Artificial Intelligence134(1-2):201–240.Billings, D.; Burch, N.; Davidson, A.; Holte, R.; Schaeffer,J.; Schauenberg, T.; and Szafron, D. 2003. Approximatinggame-theoretic optimal strategies for full-scale poker. InProceedings of the Eighteenth International Joint Confer-ence on Artificial Intelligence (IJCAI).Gilpin, A., and Sandholm, T. 2005. Finding equilibriain large extensive form games of imperfect information.Mimeo.Koller, D., and Pfeffer, A. 1997. Representations and so-lutions for game-theoretic problems.Artificial Intelligence94(1):167–215.Koller, D.; Megiddo, N.; and von Stengel, B. 1994. Fastalgorithms for finding randomized strategies in game trees.In Proceedings of the 26th ACM Symposium on Theory ofComputing (STOC), 750–759.Koller, D.; Megiddo, N.; and von Stengel, B. 1996. Ef-ficient computation of equilibria for extensive two-persongames.Games and Economic Behavior14(2):247–259.Nash, J. 1950. Equilibrium points in n-person games.Proc.of the National Academy of Sciences36:48–49.Romanovskii, I. 1962. Reduction of a game with completememory to a matrix game.Soviet Mathematics3:678–681.Shi, J., and Littman, M. 2001. Abstraction methods forgame theoretic poker. InComputers and Games, 333–345.Springer-Verlag.von Stengel, B. 1996. Efficient computation of behaviorstrategies.Games and Economic Behavior14(2):220–246.