View
217
Download
0
Category
Preview:
Citation preview
Games of ChanceGames of Chance
Introduction toIntroduction toArtificial IntelligenceArtificial Intelligence
COS302COS302
Michael L. LittmanMichael L. Littman
Fall 2001Fall 2001
AdministrationAdministration
Rush hour (10/22).Rush hour (10/22).
Today not part of midterm (10/24), Today not part of midterm (10/24), just final.just final.
Uncertainty in SearchUncertainty in Search
We’ve assumed everything is known: We’ve assumed everything is known: starting state, neighbors, goals, starting state, neighbors, goals, etc.etc.
Often need to make decisions even Often need to make decisions even though some things are uncertain.though some things are uncertain.
Complicates things…Complicates things…
Types of UncertaintyTypes of Uncertainty
Opponent: What will other player do?Opponent: What will other player do?• MinimaxMinimax
Outcome: Which neighbor get?Outcome: Which neighbor get?• Model via probability distributionModel via probability distribution
State: Where are we now?State: Where are we now?• Hidden informationHidden information
Transition: What are the rules?Transition: What are the rules?• Need to use learning to find outNeed to use learning to find out
Nim-RandNim-Rand
Pile of sticks.Pile of sticks.• Lose if take last stick.Lose if take last stick.• On your turn, take 1 or 2.On your turn, take 1 or 2.• Flip a coin. If H, take 1 more.Flip a coin. If H, take 1 more.
Which type of uncertainty?Which type of uncertainty?
Value of a GameValue of a Game
Without randomness: maximize your Without randomness: maximize your winnings in the worst case.winnings in the worst case.
With randomness: maximize your With randomness: maximize your expectedexpected winnings in the worst winnings in the worst case.case.
Want to do well on average.Want to do well on average.
What games are like this?What games are like this?
Nim-Rand TreeNim-Rand Tree
(|||)-X(|||)-X
cc cc(||)-Y(||)-Y
(|)-Y(|)-Y (|)-Y(|)-Y ()-Y()-Ycc
()-X()-X ()-X()-X ()-X()-X(|)-X(|)-X
+1 +1 -1-1
1 2
+1 +1
1 2
+1
()-X()-X+1
+1
()-Y()-Y
Nim-Rand ValuesNim-Rand Values
(|||)-X(|||)-X
cc cc(||)-Y(||)-Y
(|)-Y(|)-Y (|)-Y(|)-Y ()-Y()-Ycc
()-X()-X ()-X()-X ()-X()-X(|)-X(|)-X
+1 +1 -1-1
1 2
+1 +1
1 2
+1
()-X()-X+1
+1
()-Y()-Y-1-1+1+1
+1+1 +1+1 +1+1
-1-1
-1-1
+1+1 +1+1+0+0
+0+0+0.5+0.5 +0+0
+0.5+0.5
Search ModelSearch Model
States, terminal states (G), values for States, terminal states (G), values for terminal states (V).terminal states (V).
X states (maximizer), Y states X states (maximizer), Y states (minimizer), Z states (chance)(minimizer), Z states (chance)
For all s in Z, for all s’ in N(s)For all s in Z, for all s’ in N(s)
P(s’|s) is the probability of reaching P(s’|s) is the probability of reaching s’ from s.s’ from s.
Game Value (no loops)Game Value (no loops)
Gameval(s) = {Gameval(s) = {If (G(s)) return V(s)If (G(s)) return V(s)Else if s in XElse if s in X
return maxreturn maxs’ in N(s) s’ in N(s) Gameval(s’)Gameval(s’)Else if s in YElse if s in Y
return minreturn mins’ in N(s) s’ in N(s) Gameval(s’)Gameval(s’)Else Else
return sumreturn sums’ in N(s) s’ in N(s) P(s’|s) Gameval(s’)P(s’|s) Gameval(s’)}}
Games with LoopsGames with Loops
No known poly time algorithm.No known poly time algorithm.
Approximated by Approximated by value iterationvalue iteration::
For all s, if G(s), L(s) = V(s), else 0For all s, if G(s), L(s) = V(s), else 0
Repeat until changes are small:Repeat until changes are small:
for all s, L(s) = for all s, L(s) =
max, min, avg L(s’), s’ in N(s)max, min, avg L(s’), s’ in N(s)
depending on s in X, Y, or Z.depending on s in X, Y, or Z.
Hidden InformationHidden Information
Games like Poker, 2-player bridge, Games like Poker, 2-player bridge, Scrabble ™, Diplomacy, StrategoScrabble ™, Diplomacy, Stratego
Don’t fit game tree model, even Don’t fit game tree model, even when chance nodes included.when chance nodes included.
Pure StrategiesPure Strategies
X:X: II: 1=L, 4=L: 1=L, 4=L
IIII: 1=L, 4=R: 1=L, 4=R
IIIIII: 1=R, 4=L: 1=R, 4=L
IVIV: 1=R, 4=R: 1=R, 4=R
Y:Y: II: 2=L, 3=R: 2=L, 3=R
IIII: 2=M, 3=R: 2=M, 3=R
IIIIII: 2=R, 3=R: 2=R, 3=R
X-1
+7 +3
-1
+5
+4
Y-2 Y-3
X-4
L R
L R
L M RR
Matrix FormMatrix Form
Summarizes all decisions in one for Summarizes all decisions in one for each, chosen simultaneouslyeach, chosen simultaneously
X-X-II X-X-IIII X-X-IIIIII X-X-IVIV
Y-Y-II 77 77 22 22
Y-Y-IIII 33 33 22 22
Y-Y-IIIIII -1-1 44 22 22
Value of Matrix GameValue of Matrix Game
X picks column with largest minX picks column with largest min
Y picks row with smallest maxY picks row with smallest max
X-X-II X-X-IIII X-X-IIIIII X-X-IVIV
Y-Y-II 77 77 22 22
Y-Y-IIII 33 33 22 22
Y-Y-IIIIII -1-1 44 22 22
MinimaxMinimax
Von Neumann proved zero-sum Von Neumann proved zero-sum matrix game, minimax=maximin.matrix game, minimax=maximin.
Given perfect information (no state Given perfect information (no state uncertainty), there exists optimal uncertainty), there exists optimal pure strategy for each player.pure strategy for each player.
Game w/ Chance NodesGame w/ Chance Nodes
X-1
+4 -20
-5
+3
+10
c Y-3
c
L R
0.5 0.5 RL
0.8 0.2
Use expected Use expected valuesvalues
X-X-I I (L)
X-X-II II (R)
Y-Y-I I (L) -8-8 -2-2
Y-Y-II II (R) -8-8 +3+3
More General MatricesMore General Matrices
What game tree leads to this matrix?What game tree leads to this matrix?
Does von Neumann’s theorem still Does von Neumann’s theorem still hold?hold?
X-X-I I (L)
X-X-II II (R)
Y-Y-I I (L) 11 00
Y-Y-II II (R) 00 11
Hidden Info. MatricesHidden Info. Matrices
X picks L or R, keeping the choice X picks L or R, keeping the choice hidden from Y.hidden from Y.
Y makes a choice.Y makes a choice.
X’s choice is revealed and game X’s choice is revealed and game ends.ends. X-X-I I
(L)X-X-II II (R)
Y-Y-I I (L) 11 00
Y-Y-II II (R) 00 11
Micro PokerMicro Poker
X is dealt high X is dealt high or low card, or low card, holds/folds.holds/folds.
Y folds/sees.Y folds/sees.
High card winsHigh card wins
Y can’t see X’s Y can’t see X’s card.card.
c
-20
+10 -40 +30+10
X-L X-H
Y
fold hold
0.5 0.5
Yseefold fold see
hold
Matrix FormMatrix Form
Player X can guarantee itself +1 on Player X can guarantee itself +1 on average. How?average. How?
It can even announce its strategy.It can even announce its strategy.
X-X-I I (fold)
X-X-II II (hold)
Y-Y-I I (fold) -5-5 +10+10
Y-Y-II II (see) +5+5 -5-5
Mixed StrategiesMixed Strategies
Pick a number p.Pick a number p.
X: With prob. p, fold; else hold.X: With prob. p, fold; else hold.
Since Y doesn’t know what’s coming, Since Y doesn’t know what’s coming, the response will sometimes work, the response will sometimes work, sometimes not.sometimes not.
Guess a ProbabilityGuess a Probability
X announces X announces p=1/3.p=1/3.
Y’s pick?Y’s pick?
X-X-I I (fold)
X-X-II II (hold)
Y-Y-I I (fold) -5-5 +10+10
Y-Y-II II (see) +5+5 -5-5
Fold: +5Fold: +5
See: -1 2/3See: -1 2/3
seesee
Guess a ProbabilityGuess a Probability
X announces X announces p=2/3.p=2/3.
Y’s pick?Y’s pick?
X-X-I I (fold)
X-X-II II (hold)
Y-Y-I I (fold) -5-5 +10+10
Y-Y-II II (see) +5+5 -5-5
Fold: +0Fold: +0
See: +1 2/3See: +1 2/3
foldfold
All StrategiesAll Strategies
What should What should X pick for p X pick for p to to maximize maximize its worst its worst case?case?
p=0.6p=0.6
Payoff +1Payoff +1 -5
0
5
10
0 0.5 1
see
fold
pp
Randomizing YRandomizing Y
If Y random, If Y random, answer is answer is the same.the same.
No matter No matter what, X can what, X can guarantee guarantee itself +1.itself +1.
-5
0
5
10
0 0.5 1
see
fold
BluffingBluffing
c
-20
+10 -40 +30+10
X-L X-H
Y
fold hold
0.5 0.5
Yseefold fold see
hold
X: On a low X: On a low card, bluff card, bluff with prob. with prob. 0.4.0.4.
Y: On hold, Y: On hold, fold with fold with prob. 0.4.prob. 0.4.
Solving 2x2 GameSolving 2x2 Game
X-X-I I with prob. pwith prob. p
X’s expected gain X’s expected gain vs. Y-vs. Y-II : :
mm1111p+mp+m1212(1-p)(1-p)
vs. Y-vs. Y-IIII : :
mm2121p+mp+m2222(1-p)(1-p)
X-X-II X-X-IIII
Y-Y-II mm1111 mm1212
Y-Y-IIII mm2121 mm2222
Maximize the Maximize the minimum.minimum.
Try p=0, p=1, where lines meet.Try p=0, p=1, where lines meet.
Solving General mxnSolving General mxn
Linear program: pLinear program: p11,…,p,…,pnn..
pp11+…+p+…+pnn = 1, p = 1, pii 0 0
Maximize X’s gain, gMaximize X’s gain, g
vs Y-vs Y-II: m: m1111 p p11 + … +m + … +mn1n1 p pn n g g
vs Y-vs Y-IIII: m: m1212 p p11 + … +m + … +mn2n2 p pn n g g
… …
Against all Y strategies.Against all Y strategies.
IssuesIssues
Can we solve poker?Can we solve poker?• More than 2 playersMore than 2 players• Not zero sum (collude)Not zero sum (collude)• Huge state spaceHuge state space
Poker: Opponent modelingPoker: Opponent modeling
Bridge: Use simulation to Bridge: Use simulation to approximateapproximate
What to LearnWhat to Learn
Minimax value in games of chance Minimax value in games of chance and the DFS algorithm for and the DFS algorithm for computing it.computing it.
Converting games to matrix form.Converting games to matrix form.
Solve 2x2 game.Solve 2x2 game.
Homework 5 (due 11/7)Homework 5 (due 11/7)
1.1. The value iteration algorithm from the The value iteration algorithm from the Games of ChanceGames of Chance lecture can be lecture can be applied to deterministic games with applied to deterministic games with loops. Argue that it produces the same loops. Argue that it produces the same answer as the “Loopy” algorithm from answer as the “Loopy” algorithm from the the Game TreeGame Tree lecture. lecture.
2.2. Write the matrix form of the game tree Write the matrix form of the game tree below.below.
Game TreeGame Tree
X-1
+2
-1 +4
Y-2 Y-3
X-4
L R
L R
L R
+5L
+2R
Recommended