36
Adversarial Search Keith L. Downing October 7, 2015 Keith L. Downing Adversarial Search

Adversarial Search - · PDF fileWon 2; Tied 2 to achieve an amateur rating. 1967 (MIT) ... grandmaster games. ... Basic Process of Adversarial Search

Embed Size (px)

Citation preview

Adversarial Search

Keith L. Downing

October 7, 2015

Keith L. Downing Adversarial Search

The Ever-Changing Scope of AI

AI is what humans are currently better at ... Jim Hendler

HumanCognitiveChallenges

AIComputerScience

Keith L. Downing Adversarial Search

AI and Chess

1959-1962 (MIT) - Most of the machine’s moves are neither brilliant norstupid. It must be admitted that it occasionally blunders - JohnMcCarthy

1967 (Stanford + Moscow) - Russia -vs- USA; McCarthy + Kotok’sgroup -vs- Adelson-Velskiy’s group. USSR won.

1967 (MIT) - Greenblatt’s MAC HACK VI was first computer to play in ahuman tournament. Won 2; Tied 2 to achieve an amateur rating.

1967 (MIT) - MAC HACK VI beat Hubert Dreyfus, a renowned critic ofAI.

1970 (Northwestern University) - CHESS 3.0 won first ACM computerchess tournament. Evaluated 100 states per second.

1974 (Stockholm) - Kaissa (Russian) became first world computerchess champion.

Chess has been called the Drosophila of AI.

The Quest for Artificial Intelligence, Nilsson (2010).

Keith L. Downing Adversarial Search

Chess: Humans -vs- Computers

A human uses prodigious amounts of knowledge in thepattern-recognition process and a small amount of calculation to verifythe fact that the proposed solution is good in the presentinstance....However, the computer would make the same maneuverbecause it found at the end of a very large search that it was the mostadvantageous way to proceed out of the hundreds of thousands ofpossibilities it looked at. CHESS 4.6 has to date made several wellknown maneuvers without having the slightest knowledge of themaneuver, the conditions for its applications, and so on; but onlyknowing that the end result of the maneuver was good.... HansBerliner (Nature, 1978) The Quest for Artificial Intelligence, Nilsson(2010).

Computers don’t really understand chess the ways humans do, but theysearch thoroughly and quickly. Does this matter?

McCarthy suggests tournaments where computers have limitedcomputation time, like a millisecond.

Keith L. Downing Adversarial Search

Deep Blue

Lost to Kasparov (reigning world champion) in 1996. IBM then improvedit to handle some of its weaknesses. Beat Kasparov in 1997.

I am a human being. When I see something that is well beyond myunderstanding, I’m afraid...Kasparov.

State evaluations per second: Deep Blue - 200 million; Kasparov - 3

Deep chess knowledge: Deep Blue - very little; Kasparov - volumes.

But Deep Blue had memorized thousands of book moves andgrandmaster games.

8000 features are analyzed by Deep Blue. Fast evaluation only checksfor the simplest of them (like piece count), while a slow evaluationchecks for complex patterns (like king safety, center control, etc.).

Deep Blue uses no machine learning.

Deep Blue was dismantled after 1997 match.

Claimed to use no AI, just brute force search (with heuristics): minimaxwith alpha-beta pruning, which McCarthy deems essential.

Deep Fritz (2006) - German AI supposedly better than Deep Blue.

The Quest for Artificial Intelligence, Nilsson (2010).Keith L. Downing Adversarial Search

AI and Checkers

1967 (IBM) Samuel’s Checker’s player learned own evaluation functionby playing against itself.

1990 (University of Alberta) Chinook developed by JonathanSchaeffer’s group.

Blondie 24 (Fogel, 2001) - Evolved a Checker’s player with very highranking on zone.com. Beats Chinook once.

2004 - Chinook beats best human and becomes world champion.

2007 - (University of Alberta) - Schaeffer’s group proves all Checkersgames can be a tie if played perfectly by each side.

18 years of computer runtime.5x1020 positions were evaluated. Chess has ≈ 1040.”Its been 18 years!....obsessive-compulsive behavior...notnormal ...Get a life, Jonathan.” ... his wife.

The Quest for Artificial Intelligence, Nilsson (2010).

Keith L. Downing Adversarial Search

Early AI Poker (D. Waterman, 1968)

5-card draw using an expert system.Uses deep knowledge of the game and human strategy.

Dynamic State Vector

(VDHAND, POT, LASTBET, BLUFFO,

POTBET, ORP, OSTYLE)

GameState

InterpretationRule

ActionRule

Action: Fold, Call, BetUpdated DSV

DSV

Keith L. Downing Adversarial Search

More Recent AI Poker (Billings et. al. 2002)

Simulates millions of possible outcomes offline to build strategydatabases.

Simulates thousands of outcomes online to guide play.

No deep knowledge of the game; just brute force calculation.

?

? ?

?

?

?

?Flop Turn River

Player 1Player 3

Player 4

Roll-OutPerspectivefor Player 2

Player 2

Keith L. Downing Adversarial Search

Actor-Critic AI Systems

Critic = a system that evaluates search states. Very similarto both heuristics and objective functions.Many AI applications involve standard AI search algorithms(A*, Minimax, etc.) plus problem-specific critics. Buildingthe critic is the hard part.Actor = a system for mapping search states to actions.Actor-Critic systems use AI to both compute actions andevaluate states. In some versions, the best action is simplythe one that leads to a state with the highest evaluation.Reinforcement Learning (RL) formalizes all of this - chapter21 of our AI textbook.

Keith L. Downing Adversarial Search

AI Backgammon: G. Tesauro (1995)

ANN-based critic that improved via self-play.Discovered new strategies that humans now use.

V(St)St = state of

the board

St

r1 + V(S1t+1) r2 +V(S2

t+1) rk +V(Sk+1)

S1t+1

a1 a2

S2t+1 Sk

t+1

ak

Max

r* + V(S*t+1)

δt= r* + V(S*t+1) - V(St)

δt

Do move a*Learn

Keith L. Downing Adversarial Search

Classes of Games

Perfect Info - state of playing arena and other players holdings known atall times.Imperfect Info - some info about arena or players not available.Deterministic - outcome of any action is certain.Stochastic - some actions have probabilistic outcomes, e.g. dealt cards,rolled dice, etc.

Deterministic Stochastic

Perfect

Information

Imperfect

Information Bridge

Chess

Backgammon

Monopoly

OthelloCheckers

Hearts

PokerRisk

Battleship

Go Ludo

Scrabble

ScotlandYard

2048

Keith L. Downing Adversarial Search

Adversarial Search Trees

s0

s1 s2 s3 s4

a1 a2 a3My Actions

Similar to an ORnode in AND/OR Search. I

can make ANY of these choices.

Similar to ANDnodes in AND/OR . Imust account for ALL

of the opponent'spossible moves.

a6 a1a9 a4

OpponentActions

My ActionsEvaluate bottom states

and propagate values upward.

Keith L. Downing Adversarial Search

Best-First Search - Review

Generate one large search tree.Use heuristics to evaluate the promise of all states in thetree.When a goal state is found, return the entire path from thestart state as the solution.

Keith L. Downing Adversarial Search

Basic Process of Adversarial Search

1 From the start state, generate a search tree in adepth-limited, depth-first manner, alternating between ownand opponent’s possible moves.

2 Only use heuristics to evaluate the promise of thebottom-level nodes; then propagate values upwards,combining them using MAX and MIN operators.

3 Return to the root and choose the action, Abest , leading tothe highest-rated child state, Sbest .

4 Apply Abest to the current game state, producing Sbest .5 Wait for the opponent to choose an action, which then

produces the new game state, Snew .6 Generate a whole new search tree, with root state = Snew .7 GO TO step 2.

Keith L. Downing Adversarial Search

Pure Minimax: Lookahead

3 2 5 6 8 5 7 8 6 9 8 4 9 78

My Moves

OpponentMoves

My Moves

OpponentMoves

Apply Heuristic FunctionAll scores are from the perspective of the top (max) player.

Keith L. Downing Adversarial Search

Pure Minimax: Backup

5

5

4

4

85

42

3 2 5

5

6 8 5

7

7

7 8

6

6 9 8 4

7

9 7

8

8

Maximize

Minimize

Maximize

Minimize

Best Move

These subtrees did not need to be generated.

Keith L. Downing Adversarial Search

Saving Evaluations in Min-Max Trees

Max

Min Min

7

10 7 5 Who cares??

< 5

Keith L. Downing Adversarial Search

Alpha-Beta Pruning

MAX

MIN

MAX

a = 7b = ?

a = ?b =7

7 3 -4

78

12a = 7b =?

7 a = 7

MIN

MAX

5-1

2

5

5 < a => Stop generating children.This node's value cannot

win at the root. Prune it!

There could be manylarge subtrees down here, butthere is no need to generate

them, since theycannot affect the choice made

at the root.

X X

Keith L. Downing Adversarial Search

Alpha and Beta

Alpha

A modifiable property of a MAX node.

Indicates the lower bound on the evaluation of that MAX node.

Updated whenever a LARGER evaluation is returned from a child MINnode.

Used for pruning at MIN nodes.

Beta

A modifiable property of a MIN node.

Indicates the upper bound on the evaluation of that MIN node.

Updated whenever a SMALLER evaluation is returned from a child MAXnode.

Used for pruning at MAX nodes.

* Both alpha and beta are passed between MIN and MAXnodes.

Keith L. Downing Adversarial Search

A Nanosecond in the Life of a MAX node

a, b

V

MIN

Properties: alpha (a), beta (b), Evaluation(E)

If V > E: E ←V

If V > a: a ← V

If E ≥ b: Prune this node!

MIN

a, b

After each V is received, do:

Generate next child

Keith L. Downing Adversarial Search

A Nanosecond in the Life of a MIN node

a, b

V

MAX

Properties: alpha (a), beta (b), Evaluation(E)

If V < E: E ←V

If V < b: b ← V

If E ≤ a: Prune this node!

MAX

a, b

After each V is received, do:

Generate next child

Keith L. Downing Adversarial Search

The Game of Nim

-1

-2

+2 +1 +1

MAX

MIN

MAX

MINActions: Choose 1 or 2 stones

Goal: Choose the LAST stone(s)Points = number of stones chosen on very last move

Keith L. Downing Adversarial Search

Minimax Applied to Nim

-1

-2

+2 +1 +1

max(-1,2) = 2

2 1

min(2,1) = 1

1

min(1,-2) = -2

1

-2

max(1,-2) = 1

Best ActionChoice

Starting player can always win byinsuring that the opponent alwayshas an odd number of stones to

choose from.Keith L. Downing Adversarial Search

Minimax with Alpha-Beta Pruning

-1

a = ?b = ?

a = ?b = ?

a = -1b = ?

a = ?b = -1 -1

Minimax with alpha-beta pruningis depth-first.

Nodes containing final states have concrete scores.

Nodes at max-depth that are not final states are

scored with an evaluation function.

Keith L. Downing Adversarial Search

Minimax with Alpha-Beta Pruning

-1

a = ?b = ?

a = ?b = 2

a = 2b = ?

a = ?b = -1 +2-1

+2

max(-1,2) = 2a = -1

Update a after 2nd child returns +2

Keith L. Downing Adversarial Search

Minimax with Alpha-Beta Pruning

-1

a = 1b = ?

a = ?b = 1

a = 2b = ?

a = ?b = -1 +2-1

+2

+1

+1a = 1b = 2

b = 2

Update b after 2nd child returns +1

Keith L. Downing Adversarial Search

Minimax with Alpha-Beta Pruning

-1

a = 1b = ?

a = ?b = 1

a = 2b = ?

a = ?b = -1 +2-1

+2

+1

+1

+1

a = 1b = ?

a = 1b = 1

a = 1

a = 1

a = 1b = ? +1

X

At node X, we know that its final evaluation E ≤ 1 = a. So there is no point generating more children for X.

It will not be chosen anyway.

Keith L. Downing Adversarial Search

Alpha-Beta Pruning CAN Save Time

MAX

MIN MIN

MAXMAX MAX

? ?

MAX

? ? ? ?? ?

Arrange the integer evaluations 1, 2 .... 8 in leaf nodes so that:The maximum number of nodes can be pruned.ALL nodes need to be generated; no pruning possible.

Keith L. Downing Adversarial Search

Minimax with Stochasticity

Making intelligent decisions in UNCERTAIN situations.Theoretically, standard minimax has no uncertainty, since we assumethat we will ALWAYS choose our BEST option, while the opponent willALWAYS choose our WORST option.

Max

Min

7 5

0.2

0.3

0.5 0.60.3

0.1

Min Min Min Min Min

58 2 6 1257 94

25 8 2 5 1

5.6 3.5

5.6Best Move

Heuristic Evaluations

Chance Node

Keith L. Downing Adversarial Search

Expectimax

Minimax with stochastic nodes.MIN and MAX nodes operate in the same way.CHANCE nodes compute the expected value of the consequences oftheir stochastic options.

Stochastic Action:Rolling 2 dice

Roll = 2Roll = 7 Roll = 12

p = 1/6p = 1/36 p = 1/36

Max

6

3 56

Max

27

Max

85

78

E = 8(1/36) + ... + 6(1/6) + .... + 7(1/36)

E

E = Expected Value

Keith L. Downing Adversarial Search

The Game of 2048

32 16 8

16 4 4

2 2

32 16 8

16 8

4

2

After each move, the "opponent" addsa 2 or 4 to a random open cell.

Prob (2) = 0.9; prob(4) = 0.1Game ends when allcells are filled. Goal: produce 2048 in a

cell before the game ends.

When 2 equal cells collide,their summed value goes in

the resulting cell.

Keith L. Downing Adversarial Search

The Game of 2048

32 16 8

16 4 4 4

2 2 2 2

2 32 16 8

16 4 8

4 4

When multiple (2) pairs of tiles ofequal value collide (yellow arrows), the two

product tiles become neighbors.

Newly-added2-tile

Keith L. Downing Adversarial Search

Playing 2048 with Minimax Search

L

R U

D

MAX4 possible

slide moves

MIN?? All possible spawns (2 & 4)

L

R U

DMAX

Keith L. Downing Adversarial Search

Playing 2048 with Minimax Search

2 32 16 8

16 4 8

4 4

2 32 16 8

16 4 8

8

2 32 16 8

16 4 8

8

2

32 16

16 48

16

2 32 16

16 4

16

8

Left Right Up Down

2 32 16 8

16 4 8

8

2

2 32 16 8

16 4 8

8

4

2 32 16 8

16 4 8

82

2 32 16 8

16 4 8

8

2

Consider all possiblespawns

LeftRight Up DownMAX

MIN

MAX

Keith L. Downing Adversarial Search

Expectimax for 2048

2 32 16 8

16 4 8

8

2

2 32 16 8

16 4 8

8

4

2 32 16 8

16 4 8

82

2 32 16 8

16 4 8

8

2

MAX

p(2 spawn) = 0.9p(4 spawn) = 0.12 32 16 8

16 4 8

8 p(choosing a cell) = 1/8 = 0.125

0.125 x 0.9 0.125 x 0.1

7 3 5 6

E =7 x 0.1125 +3 x 0.1125 +5 x 0.1125 +6 x 0.0125 +

...

...

= 0.1125 = 0.0125

E

0.1125 0.1125

Keith L. Downing Adversarial Search

Minimax -vs- Expectimax for 2048

The opponent in 2048 really is the stochastic spawning process. It isnot minimizing at all.However, by assuming a perfectly evil opponent, Minimax can playintelligently, albeit very conservatively (i.e., maximally safe).Still, the most appropriate algorithm is Expectimax with alternating MAXand CHANCE nodes.Alpha-Beta pruning works poorly for Expectimax (even when it hasMAX, MIN and CHANCE nodes). A node’s current value cannot bedirectly compared to an alpha or beta (for pruning), since that value willbe scaled by probabilities (and combined with other scaled values) atancestral chance nodes.Considering all spawns creates a large branching factor (with Minimaxand Expectimax), and the cost is higher with Expectimax, since it can’texploit alpha-beta pruning (very well).Approximate Minimax or Expectimax: Consider filling every open cell,but only with a 2 or 4 (not both), and use 2’s and 4’s according to theirspawning probabilities. This gives a maximum branching factor of C(number of open cells) instead of 2C. Then minimize (or compute theexpected value) over all child nodes.

Keith L. Downing Adversarial Search