Upload
reed
View
30
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Games. Henry Kautz. ExpectiMiniMax: Alpha-Beta Pruning. Cutoffs at Max and Min nodes work just as before If range of values is bounded, can add cutoffs to Chance nodes Assume that all branches not searched have the worst-case result L = lowest value achievable (-10) - PowerPoint PPT Presentation
Citation preview
Games
Henry Kautz
ExpectiMiniMax: Alpha-Beta Pruning
•Cutoffs at Max and Min nodes work just as before
•If range of values is bounded, can add cutoffs to Chance nodes
•Assume that all branches not searched have the worst-case result
•L = lowest value achievable (-10)
•U = highest value achievable (10)
ExpectiMiniMax: Cutoffs
• Beta cutoff:
• Alpha cutoff:
Values seen Values to comeCurrentvalue
Values seen Values to comeCurrentvalue
Probabilistic STRIPS Planning
domain: Hungry Monkeyshake: if (ontable)
Prob(2/3) -> +1 banana Prob(1/3) -> no change
else Prob(1/6) -> +1 banana Prob(5/6) -> no change
jump:if (~ontable)Prob(2/3) -> ontable
Prob(1/3) -> ~ontableelse
ontable
What is the expected reward?
[1] shake
[2] jump; shake
[3] jump; shake; shake;
[4] jump; if (~ontable){ jump; shake}
else { shake; shake }
ExpectiMax
node chance a isn if )(ExpectiMax)(
nodemax isn if )}(children|)(ExpectiMaxmax{
node terminala isn if )(
)(ExpectiMax
)(
nchildrens
ssP
nss
nU
n
Hungry Monkey: 2-Ply Game Tree
0 0 1 0 0 0 1 0 1 1 2 1 0 0 1 0
jump
jump jumpjump
jump
shake
shake shake shakeshake
2/3
2/3 2/3 2/3 2/3 2/3
1/3
1/3 1/3 1/3 1/3 1/3
1/6 5/6
1/6 1/61/6 5/6 5/6 5/6
ExpectiMax 1 – Chance Nodes
0 2/3
0 0 1 0
0 1/6
0 0 1 0
1 7/6
1 1 2 1
0 1/6
0 0 1 0
jump
jump jumpjump
jump
shake
shake shake shakeshake
2/3
2/3 2/32/3 2/3 2/3
1/3
1/3 1/3 1/3 1/3 1/3
1/6 5/6
1/6 1/61/6 5/6 5/6 5/6
ExpectiMax 2 – Max Nodes
2/3
0 2/3
0 0 1 0
1/6
0 1/6
0 0 1 0
7/6
1 7/6
1 1 2 1
1/6
0 1/6
0 0 1 0
jump
jump jumpjump
jump
shake
shake shake shakeshake
2/3
2/3 2/32/3 2/3 2/3
1/3
1/3 1/3 1/3 1/3 1/3
1/6 5/6
1/6 1/61/6 5/6 5/6 5/6
ExpectiMax 3 – Chance Nodes
1/2 1/3
2/3
0 2/3
0 0 1 0
1/6
0 1/6
0 0 1 0
7/6
1 7/6
1 1 2 1
1/6
0 1/6
0 0 1 0
jump
jump jumpjump
jump
shake
shake shake shakeshake
2/3
2/3 2/32/3 2/3 2/3
1/3
1/3 1/3 1/3 1/3 1/3
1/6 5/6
1/6 1/61/6 5/6 5/6 5/6
ExpectiMax 4 – Max Node
1/2
1/2 1/3
2/3
0 2/3
0 0 1 0
1/6
0 1/6
0 0 1 0
7/6
1 7/6
1 1 2 1
1/6
0 1/6
0 0 1 0
jump
jump jumpjump
jump
shake
shake shake shakeshake
2/3
2/3 2/32/3 2/3 2/3
1/3
1/3 1/3 1/3 1/3 1/3
1/6 5/6
1/6 1/61/6 5/6 5/6 5/6
PoliciesThe result of the ExpectiMax analysis
is a conditional plan (also called a policy):
Optimal plan for 2 steps: jump; shake
Optimal plan for 3 steps:jump; if (ontable) {shake; shake}
else {jump; shake}
Probabilistic planning can be generalized in many ways, including action costs and hidden state
The general problem is that of solving a Markov Decision Process (MDP)
Gambler’s Paradox
• How much would you pay to play the following game?
• Flip a coin. If heads, you win $2.
• Otherwise: flip again. If heads, you win $4.
• Otherwise: flip again. If heads, you win $8.
• Otherwise: flip again. If heads, you win $16.
Expected Value
• Expect value is INFINITE!(1/2)*2 + (1/4)*4 + (1/8)*8 + …
• “Rationally” you should pay ANY fixed amount.
• In real life, people will pay about $20.– This is consistent with logarithmic utility of
money– (1/2)*log(2) + (1/4)*log(4) + (1/8)*log(8) + …