Games

Games

Henry Kautz

ExpectiMiniMax: Alpha-Beta Pruning

•Cutoffs at Max and Min nodes work just as before

•If range of values is bounded, can add cutoffs to Chance nodes

•Assume that all branches not searched have the worst-case result

•L = lowest value achievable (-10)

•U = highest value achievable (10)

ExpectiMiniMax: Cutoffs

• Beta cutoff:

• Alpha cutoff:

Values seen Values to comeCurrentvalue

Values seen Values to comeCurrentvalue

Probabilistic STRIPS Planning

domain: Hungry Monkeyshake: if (ontable)

Prob(2/3) -> +1 banana Prob(1/3) -> no change

else Prob(1/6) -> +1 banana Prob(5/6) -> no change

jump:if (~ontable)Prob(2/3) -> ontable

Prob(1/3) -> ~ontableelse

ontable

What is the expected reward?

[1] shake

[2] jump; shake

[3] jump; shake; shake;

[4] jump; if (~ontable){ jump; shake}

else { shake; shake }

ExpectiMax

node chance a isn if )(ExpectiMax)(

nodemax isn if )}(children|)(ExpectiMaxmax{

node terminala isn if )(

)(ExpectiMax

)(

nchildrens

ssP

nss

nU

n

Hungry Monkey: 2-Ply Game Tree

0 0 1 0 0 0 1 0 1 1 2 1 0 0 1 0

jump

jump jumpjump

jump

shake

shake shake shakeshake

2/3

2/3 2/3 2/3 2/3 2/3

1/3

1/3 1/3 1/3 1/3 1/3

1/6 5/6

1/6 1/61/6 5/6 5/6 5/6

ExpectiMax 1 – Chance Nodes

0 2/3

0 0 1 0

0 1/6

0 0 1 0

1 7/6

1 1 2 1

0 1/6

0 0 1 0

jump

jump jumpjump

jump

shake


2/3

2/3 2/32/3 2/3 2/3

1/3

1/3 1/3 1/3 1/3 1/3

1/6 5/6

1/6 1/61/6 5/6 5/6 5/6

ExpectiMax 2 – Max Nodes

2/3

0 2/3

0 0 1 0

1/6

0 1/6

0 0 1 0

7/6

1 7/6

1 1 2 1

1/6

0 1/6

0 0 1 0

jump

jump jumpjump

jump

shake


2/3

2/3 2/32/3 2/3 2/3

1/3

1/3 1/3 1/3 1/3 1/3

1/6 5/6

1/6 1/61/6 5/6 5/6 5/6

ExpectiMax 3 – Chance Nodes

1/2 1/3

2/3

0 2/3

0 0 1 0

1/6

0 1/6

0 0 1 0

7/6

1 7/6

1 1 2 1

1/6

0 1/6

0 0 1 0

jump

jump jumpjump

jump

shake


2/3

2/3 2/32/3 2/3 2/3

1/3

1/3 1/3 1/3 1/3 1/3

1/6 5/6

1/6 1/61/6 5/6 5/6 5/6

ExpectiMax 4 – Max Node

1/2

1/2 1/3

2/3

0 2/3

0 0 1 0

1/6

0 1/6

0 0 1 0

7/6

1 7/6

1 1 2 1

1/6

0 1/6

0 0 1 0

jump

jump jumpjump

jump

shake


2/3

2/3 2/32/3 2/3 2/3

1/3

1/3 1/3 1/3 1/3 1/3

1/6 5/6

1/6 1/61/6 5/6 5/6 5/6

PoliciesThe result of the ExpectiMax analysis

is a conditional plan (also called a policy):

Optimal plan for 2 steps: jump; shake

Optimal plan for 3 steps:jump; if (ontable) {shake; shake}

else {jump; shake}

Probabilistic planning can be generalized in many ways, including action costs and hidden state

The general problem is that of solving a Markov Decision Process (MDP)

Gambler’s Paradox

• How much would you pay to play the following game?

• Flip a coin. If heads, you win $2.

• Otherwise: flip again. If heads, you win $4.



Expected Value

• Expect value is INFINITE!(1/2)*2 + (1/4)*4 + (1/8)*8 + …

• “Rationally” you should pay ANY fixed amount.

• In real life, people will pay about $20.– This is consistent with logarithmic utility of

money– (1/2)*log(2) + (1/4)*log(4) + (1/8)*log(8) + …

Documents

Games