28
Von Neuman (Min-Max theorem) Claude Shannon (finite look-ahead) Chaturanga, India (~550AD) (Proto-Chess) John McCarthy (a-b pruning) Donald Knuth (a-b analysis)

Von Neuman (Min-Max theorem) Claude Shannon (finite look-ahead) Chaturanga, India (~550AD) (Proto-Chess) John McCarthy ( pruning) Donald Knuth (

Embed Size (px)

Citation preview

Page 1: Von Neuman (Min-Max theorem) Claude Shannon (finite look-ahead) Chaturanga, India (~550AD) (Proto-Chess) John McCarthy (  pruning) Donald Knuth (

Von Neuman(Min-Max theorem)

Claude Shannon(finite look-ahead)

Chaturanga, India (~550AD)(Proto-Chess)

John McCarthy ( - a b pruning)

Donald Knuth( - a b analysis)

Page 2: Von Neuman (Min-Max theorem) Claude Shannon (finite look-ahead) Chaturanga, India (~550AD) (Proto-Chess) John McCarthy (  pruning) Donald Knuth (

Wilmer McLean

The war began in my front yard and ended in my front parlor

Page 3: Von Neuman (Min-Max theorem) Claude Shannon (finite look-ahead) Chaturanga, India (~550AD) (Proto-Chess) John McCarthy (  pruning) Donald Knuth (

Deep Thought: Chess is easy for but the pesky opponent

Search: If I do A, then I will be in S, then if I do B, then I will get to S’

Game Search: If I do A, then I will be in S, then my opponent gets to do B. then I will be forced to S’. Then I get to do C,..

Page 4: Von Neuman (Min-Max theorem) Claude Shannon (finite look-ahead) Chaturanga, India (~550AD) (Proto-Chess) John McCarthy (  pruning) Donald Knuth (

Snakes-and-ladders is perfect information with chance think of the utter boringness of deterministic snakes and ladders Not that the normal snakes-and-ladders has any real scope for showing your thinking power (your only action is dictated by the dice—so the dice can play it as a solitaire—at most they need your hand..).

Kriegspiel(blind-fold chess)

Snakes & Ladders?

Page 5: Von Neuman (Min-Max theorem) Claude Shannon (finite look-ahead) Chaturanga, India (~550AD) (Proto-Chess) John McCarthy (  pruning) Donald Knuth (
Page 6: Von Neuman (Min-Max theorem) Claude Shannon (finite look-ahead) Chaturanga, India (~550AD) (Proto-Chess) John McCarthy (  pruning) Donald Knuth (

Searching Tic Tac Toe using Minmax

A game is consideredSolved if it canbe shown thatthe MAX playerhas a winning(or at least Non-losing)Strategy

This means that the backed-upValue in theFull min-max Tree is +ve

Page 7: Von Neuman (Min-Max theorem) Claude Shannon (finite look-ahead) Chaturanga, India (~550AD) (Proto-Chess) John McCarthy (  pruning) Donald Knuth (
Page 8: Von Neuman (Min-Max theorem) Claude Shannon (finite look-ahead) Chaturanga, India (~550AD) (Proto-Chess) John McCarthy (  pruning) Donald Knuth (

2

<= 2

Cut

14

<= 14

5

<= 5

2

<= 2

• Whenever a node gets its “true” value, its parent’s bound gets updated

• When all children of a node have been evaluated (or a cut off occurs below that node), the current bound of that node is its true value

• Two types of cutoffs:

• If a min node n has bound <=k, and a max ancestor of n, say m, has a bound >=j, then cutoff occurs as long as j >=k

• If a max node n has bound >=k, and a min ancestor of n, say m, has a bound <=j, then cutoff occurs as long as j<=k

Page 9: Von Neuman (Min-Max theorem) Claude Shannon (finite look-ahead) Chaturanga, India (~550AD) (Proto-Chess) John McCarthy (  pruning) Donald Knuth (

Another alpha-beta example

Page 11: Von Neuman (Min-Max theorem) Claude Shannon (finite look-ahead) Chaturanga, India (~550AD) (Proto-Chess) John McCarthy (  pruning) Donald Knuth (
Page 12: Von Neuman (Min-Max theorem) Claude Shannon (finite look-ahead) Chaturanga, India (~550AD) (Proto-Chess) John McCarthy (  pruning) Donald Knuth (
Page 13: Von Neuman (Min-Max theorem) Claude Shannon (finite look-ahead) Chaturanga, India (~550AD) (Proto-Chess) John McCarthy (  pruning) Donald Knuth (

Evaluation Functions: TicTacToe

If win for Max +inftyIf lose for Max -inftyIf draw for Max 0Else # rows/cols/diags open for Max - #rows/cols/diags open for Min

Page 14: Von Neuman (Min-Max theorem) Claude Shannon (finite look-ahead) Chaturanga, India (~550AD) (Proto-Chess) John McCarthy (  pruning) Donald Knuth (
Page 15: Von Neuman (Min-Max theorem) Claude Shannon (finite look-ahead) Chaturanga, India (~550AD) (Proto-Chess) John McCarthy (  pruning) Donald Knuth (

What depth should we go to? --Deeper the better (but why?)

Should we go to uniform depth? --Go deeper in branches where the game is in a flux (backed up values are changing fast) [Called “Quiescence” ]

Can we avoid the horizon effect?

Page 16: Von Neuman (Min-Max theorem) Claude Shannon (finite look-ahead) Chaturanga, India (~550AD) (Proto-Chess) John McCarthy (  pruning) Donald Knuth (

Depth Cutoff and Online Search• Until now we considered mostly “all or

nothing” computations– The computation takes the time it takes,

and only at the end will give any answer• When the agent has to make decisions

online, it needs flexibility in the time it can devote to “thinking” (“deliberation scheduling”)– Can’t do it if we have all-or-nothing

computations. We need flexibile or anytime computations

• The depth-limited min-max is an example of an anytime computation. – Pick a small depth limit. Do the analysis

w.r.t. that tree. Decide the best move. Keep it as a back up. If you have more time, go deeper and get a better move.

Online Search is not guaranteed to be optimal --The agent may not even survive unless the world is ergodic (non-zero prob. of reach any state from any other state)

Page 17: Von Neuman (Min-Max theorem) Claude Shannon (finite look-ahead) Chaturanga, India (~550AD) (Proto-Chess) John McCarthy (  pruning) Donald Knuth (

Why is “deeper” better?

• Possible reasons– Taking mins/maxes of the evaluation values of

the leaf nodes improves their collective accuracy

– Going deeper makes the agent notice “traps” thus significantly improving the evaluation accuracy• All evaluation functions first check for termination

states before computing the non-terminal evaluation

If this is indeed the case, then we should remember the backed-up values for game positions—since they are better than straight evaluations

Page 18: Von Neuman (Min-Max theorem) Claude Shannon (finite look-ahead) Chaturanga, India (~550AD) (Proto-Chess) John McCarthy (  pruning) Donald Knuth (

(just as human weight lifters refuse to compete against cranes)

Page 19: Von Neuman (Min-Max theorem) Claude Shannon (finite look-ahead) Chaturanga, India (~550AD) (Proto-Chess) John McCarthy (  pruning) Donald Knuth (

Uncertain Actions &Games Against Nature

Page 20: Von Neuman (Min-Max theorem) Claude Shannon (finite look-ahead) Chaturanga, India (~550AD) (Proto-Chess) John McCarthy (  pruning) Donald Knuth (

[can generalize to have action costs C(a,s)]

If Mij matrix is not known a priori, then we have a reinforcement learning scenario..

Repeat

Page 21: Von Neuman (Min-Max theorem) Claude Shannon (finite look-ahead) Chaturanga, India (~550AD) (Proto-Chess) John McCarthy (  pruning) Donald Knuth (

3,2

4,2 3,3 3,1 3,3 3,2 4,2

-1 -0.04 -0.04

.8 .1 .1 .8 .1 .1

This is a game against the nature, and nature decides which outcome of each action will occur. How do you think it will decide? I am the chosen one: So nature will decide the course that is most beneficial to me [Max-Max] I am the Loser: So nature will decide the course that is least beneficial to me [Min-Max] I am a rationalist: Nature is oblivious of me—and it does what it does—so I do “expectation analysis”

Leaf node values have been set to their immediate rewards Can do better if we set to them to an estimate of their expected value..

Page 22: Von Neuman (Min-Max theorem) Claude Shannon (finite look-ahead) Chaturanga, India (~550AD) (Proto-Chess) John McCarthy (  pruning) Donald Knuth (

Real Time Dynamic Programming• Interleave “search” and

“execution” (Real Time Dynamic Programming)

• Do limited-depth analysis based on reachability to find the value of a state (and there by the best action you should be doing—which is the action that is sending you the best value)

• The values of the leaf nodes are set to be their immediate rewards

– Alternatively some admissible estimate of the value function (h*)

• If all the leaf nodes are terminal nodes, then the backed up value will be true optimal value. Otherwise, it is an approximation…

RTDP

For leaf nodes, can use R(s) or some heuristic value h(s)

Page 23: Von Neuman (Min-Max theorem) Claude Shannon (finite look-ahead) Chaturanga, India (~550AD) (Proto-Chess) John McCarthy (  pruning) Donald Knuth (

The expected value computation is fine if you are maximizing “expected” returnIf you are --if you are risk-averse? (and think “nature” is out to get you) V2= min(V3,V4)

If you are perpetual optimist then V2= max(V3,V4)

If you have deterministic actions then RTDP becomes RTA* (if you use h(.) to evaluate leaves

Page 24: Von Neuman (Min-Max theorem) Claude Shannon (finite look-ahead) Chaturanga, India (~550AD) (Proto-Chess) John McCarthy (  pruning) Donald Knuth (

RTA*(RTDP with deterministic actions

and leaves evaluated by f(.))

S n

m

k

G

S

n mG=1H=2F=3

G=1H=2F=3

kG=2H=3F=5

infty

--Grow the tree to depth d --Apply f-evaluation for the leaf nodes--propagate f-values up to the parent nodes f(parent) = min( f(children))

RTA* is a special case of RTDP --It is useful for acting in determinostic, dynamic worlds --While RTDP is useful for actiong in stochastic, dynamic worlds

LRTA*: Can store backed up values for states (and they will be better heuristics)

Page 25: Von Neuman (Min-Max theorem) Claude Shannon (finite look-ahead) Chaturanga, India (~550AD) (Proto-Chess) John McCarthy (  pruning) Donald Knuth (

End of Gametrees

Page 26: Von Neuman (Min-Max theorem) Claude Shannon (finite look-ahead) Chaturanga, India (~550AD) (Proto-Chess) John McCarthy (  pruning) Donald Knuth (

Game Playing (Adversarial Search)

• Perfect play– Do minmax on the complete game tree

• Alpha-Beta pruning (a neat idea that is the bane of many a CSE471 student)

• Resource limits– Do limited depth lookahead– Apply evaluation functions at the leaf nodes– Do minmax

• Miscellaneous– Games of Chance– Status of computer games..

Page 27: Von Neuman (Min-Max theorem) Claude Shannon (finite look-ahead) Chaturanga, India (~550AD) (Proto-Chess) John McCarthy (  pruning) Donald Knuth (

Multi-player Games

Everyone maximizes their utility --How does this compare to 2-player games? (Max’s utility is negative of Min’s)

Page 28: Von Neuman (Min-Max theorem) Claude Shannon (finite look-ahead) Chaturanga, India (~550AD) (Proto-Chess) John McCarthy (  pruning) Donald Knuth (

Expecti-Max