Adversarial Search - · PDF fileWon 2; Tied 2 to achieve an amateur rating. 1967 (MIT) ... grandmaster games. ... Basic Process of Adversarial Search

Adversarial Search

Keith L. Downing

October 7, 2015

Keith L. Downing Adversarial Search

The Ever-Changing Scope of AI

AI is what humans are currently better at ... Jim Hendler

HumanCognitiveChallenges

AIComputerScience


AI and Chess

1959-1962 (MIT) - Most of the machine’s moves are neither brilliant norstupid. It must be admitted that it occasionally blunders - JohnMcCarthy

1967 (Stanford + Moscow) - Russia -vs- USA; McCarthy + Kotok’sgroup -vs- Adelson-Velskiy’s group. USSR won.

1967 (MIT) - Greenblatt’s MAC HACK VI was first computer to play in ahuman tournament. Won 2; Tied 2 to achieve an amateur rating.

1967 (MIT) - MAC HACK VI beat Hubert Dreyfus, a renowned critic ofAI.

1970 (Northwestern University) - CHESS 3.0 won first ACM computerchess tournament. Evaluated 100 states per second.

1974 (Stockholm) - Kaissa (Russian) became first world computerchess champion.

Chess has been called the Drosophila of AI.

The Quest for Artificial Intelligence, Nilsson (2010).


Chess: Humans -vs- Computers

A human uses prodigious amounts of knowledge in thepattern-recognition process and a small amount of calculation to verifythe fact that the proposed solution is good in the presentinstance....However, the computer would make the same maneuverbecause it found at the end of a very large search that it was the mostadvantageous way to proceed out of the hundreds of thousands ofpossibilities it looked at. CHESS 4.6 has to date made several wellknown maneuvers without having the slightest knowledge of themaneuver, the conditions for its applications, and so on; but onlyknowing that the end result of the maneuver was good.... HansBerliner (Nature, 1978) The Quest for Artificial Intelligence, Nilsson(2010).

Computers don’t really understand chess the ways humans do, but theysearch thoroughly and quickly. Does this matter?

McCarthy suggests tournaments where computers have limitedcomputation time, like a millisecond.


Deep Blue

Lost to Kasparov (reigning world champion) in 1996. IBM then improvedit to handle some of its weaknesses. Beat Kasparov in 1997.

I am a human being. When I see something that is well beyond myunderstanding, I’m afraid...Kasparov.

State evaluations per second: Deep Blue - 200 million; Kasparov - 3

Deep chess knowledge: Deep Blue - very little; Kasparov - volumes.

But Deep Blue had memorized thousands of book moves andgrandmaster games.

8000 features are analyzed by Deep Blue. Fast evaluation only checksfor the simplest of them (like piece count), while a slow evaluationchecks for complex patterns (like king safety, center control, etc.).

Deep Blue uses no machine learning.

Deep Blue was dismantled after 1997 match.

Claimed to use no AI, just brute force search (with heuristics): minimaxwith alpha-beta pruning, which McCarthy deems essential.

Deep Fritz (2006) - German AI supposedly better than Deep Blue.

The Quest for Artificial Intelligence, Nilsson (2010).Keith L. Downing Adversarial Search

AI and Checkers

1967 (IBM) Samuel’s Checker’s player learned own evaluation functionby playing against itself.

1990 (University of Alberta) Chinook developed by JonathanSchaeffer’s group.

Blondie 24 (Fogel, 2001) - Evolved a Checker’s player with very highranking on zone.com. Beats Chinook once.

2004 - Chinook beats best human and becomes world champion.

2007 - (University of Alberta) - Schaeffer’s group proves all Checkersgames can be a tie if played perfectly by each side.

18 years of computer runtime.5x1020 positions were evaluated. Chess has ≈ 1040.”Its been 18 years!....obsessive-compulsive behavior...notnormal ...Get a life, Jonathan.” ... his wife.

The Quest for Artificial Intelligence, Nilsson (2010).


Early AI Poker (D. Waterman, 1968)

5-card draw using an expert system.Uses deep knowledge of the game and human strategy.

Dynamic State Vector

(VDHAND, POT, LASTBET, BLUFFO,

POTBET, ORP, OSTYLE)

GameState

InterpretationRule

ActionRule

Action: Fold, Call, BetUpdated DSV

DSV


More Recent AI Poker (Billings et. al. 2002)

Simulates millions of possible outcomes offline to build strategydatabases.

Simulates thousands of outcomes online to guide play.

No deep knowledge of the game; just brute force calculation.

?

? ?

?

?

?

?Flop Turn River

Player 1Player 3

Player 4

Roll-OutPerspectivefor Player 2

Player 2


Actor-Critic AI Systems

Critic = a system that evaluates search states. Very similarto both heuristics and objective functions.Many AI applications involve standard AI search algorithms(A*, Minimax, etc.) plus problem-specific critics. Buildingthe critic is the hard part.Actor = a system for mapping search states to actions.Actor-Critic systems use AI to both compute actions andevaluate states. In some versions, the best action is simplythe one that leads to a state with the highest evaluation.Reinforcement Learning (RL) formalizes all of this - chapter21 of our AI textbook.


AI Backgammon: G. Tesauro (1995)

ANN-based critic that improved via self-play.Discovered new strategies that humans now use.

V(St)St = state of

the board

St

r1 + V(S1t+1) r2 +V(S2

t+1) rk +V(Sk+1)

S1t+1

a1 a2

S2t+1 Sk

t+1

ak

Max

r* + V(S*t+1)

δt= r* + V(S*t+1) - V(St)

δt

Do move a*Learn


Classes of Games

Perfect Info - state of playing arena and other players holdings known atall times.Imperfect Info - some info about arena or players not available.Deterministic - outcome of any action is certain.Stochastic - some actions have probabilistic outcomes, e.g. dealt cards,rolled dice, etc.

Deterministic Stochastic

Perfect

Information

Imperfect

Information Bridge

Chess

Backgammon

Monopoly

OthelloCheckers

Hearts

PokerRisk

Battleship

Go Ludo

Scrabble

ScotlandYard

2048


Adversarial Search Trees

s0

s1 s2 s3 s4

a1 a2 a3My Actions

Similar to an ORnode in AND/OR Search. I

can make ANY of these choices.

Similar to ANDnodes in AND/OR . Imust account for ALL

of the opponent'spossible moves.

a6 a1a9 a4

OpponentActions

My ActionsEvaluate bottom states

and propagate values upward.


Best-First Search - Review

Generate one large search tree.Use heuristics to evaluate the promise of all states in thetree.When a goal state is found, return the entire path from thestart state as the solution.


Basic Process of Adversarial Search

1 From the start state, generate a search tree in adepth-limited, depth-first manner, alternating between ownand opponent’s possible moves.

2 Only use heuristics to evaluate the promise of thebottom-level nodes; then propagate values upwards,combining them using MAX and MIN operators.

3 Return to the root and choose the action, Abest , leading tothe highest-rated child state, Sbest .

4 Apply Abest to the current game state, producing Sbest .5 Wait for the opponent to choose an action, which then

produces the new game state, Snew .6 Generate a whole new search tree, with root state = Snew .7 GO TO step 2.


Pure Minimax: Lookahead

3 2 5 6 8 5 7 8 6 9 8 4 9 78

My Moves

OpponentMoves

My Moves

OpponentMoves

Apply Heuristic FunctionAll scores are from the perspective of the top (max) player.


Pure Minimax: Backup

5

5

4

4

85

42

3 2 5

5

6 8 5

7

7

7 8

6

6 9 8 4

7

9 7

8

8

Maximize

Minimize

Maximize

Minimize

Best Move

These subtrees did not need to be generated.


Saving Evaluations in Min-Max Trees

Max

Min Min

7

10 7 5 Who cares??

< 5


Alpha-Beta Pruning

MAX

MIN

MAX

a = 7b = ?

a = ?b =7

7 3 -4

78

12a = 7b =?

7 a = 7

MIN

MAX

5-1

2

5

5 < a => Stop generating children.This node's value cannot

win at the root. Prune it!

There could be manylarge subtrees down here, butthere is no need to generate

them, since theycannot affect the choice made

at the root.

X X


Alpha and Beta

Alpha

A modifiable property of a MAX node.

Indicates the lower bound on the evaluation of that MAX node.

Updated whenever a LARGER evaluation is returned from a child MINnode.

Used for pruning at MIN nodes.

Beta

A modifiable property of a MIN node.

Indicates the upper bound on the evaluation of that MIN node.

Updated whenever a SMALLER evaluation is returned from a child MAXnode.

Used for pruning at MAX nodes.

* Both alpha and beta are passed between MIN and MAXnodes.


A Nanosecond in the Life of a MAX node

a, b

V

MIN

Properties: alpha (a), beta (b), Evaluation(E)

If V > E: E ←V

If V > a: a ← V

If E ≥ b: Prune this node!

MIN

a, b

After each V is received, do:

Generate next child


A Nanosecond in the Life of a MIN node

a, b

V

MAX

Properties: alpha (a), beta (b), Evaluation(E)

If V < E: E ←V

If V < b: b ← V

If E ≤ a: Prune this node!

MAX

a, b

After each V is received, do:

Generate next child


The Game of Nim

-1

-2

+2 +1 +1

MAX

MIN

MAX

MINActions: Choose 1 or 2 stones

Goal: Choose the LAST stone(s)Points = number of stones chosen on very last move


Minimax Applied to Nim

-1

-2

+2 +1 +1

max(-1,2) = 2

2 1

min(2,1) = 1

1

min(1,-2) = -2

1

-2

max(1,-2) = 1

Best ActionChoice

Starting player can always win byinsuring that the opponent alwayshas an odd number of stones to

choose from.Keith L. Downing Adversarial Search

Minimax with Alpha-Beta Pruning

-1

a = ?b = ?

a = ?b = ?

a = -1b = ?

a = ?b = -1 -1

Minimax with alpha-beta pruningis depth-first.

Nodes containing final states have concrete scores.

Nodes at max-depth that are not final states are

scored with an evaluation function.



-1

a = ?b = ?

a = ?b = 2

a = 2b = ?

a = ?b = -1 +2-1

+2

max(-1,2) = 2a = -1

Update a after 2nd child returns +2



-1

a = 1b = ?

a = ?b = 1

a = 2b = ?

a = ?b = -1 +2-1

+2

+1

+1a = 1b = 2

b = 2

Update b after 2nd child returns +1



-1

a = 1b = ?

a = ?b = 1

a = 2b = ?

a = ?b = -1 +2-1

+2

+1

+1

+1

a = 1b = ?

a = 1b = 1

a = 1

a = 1

a = 1b = ? +1

X

At node X, we know that its final evaluation E ≤ 1 = a. So there is no point generating more children for X.

It will not be chosen anyway.


Alpha-Beta Pruning CAN Save Time

MAX

MIN MIN

MAXMAX MAX

? ?

MAX

? ? ? ?? ?

Arrange the integer evaluations 1, 2 .... 8 in leaf nodes so that:The maximum number of nodes can be pruned.ALL nodes need to be generated; no pruning possible.


Minimax with Stochasticity

Making intelligent decisions in UNCERTAIN situations.Theoretically, standard minimax has no uncertainty, since we assumethat we will ALWAYS choose our BEST option, while the opponent willALWAYS choose our WORST option.

Max

Min

7 5

0.2

0.3

0.5 0.60.3

0.1

Min Min Min Min Min

58 2 6 1257 94

25 8 2 5 1

5.6 3.5

5.6Best Move

Heuristic Evaluations

Chance Node


Expectimax

Minimax with stochastic nodes.MIN and MAX nodes operate in the same way.CHANCE nodes compute the expected value of the consequences oftheir stochastic options.

Stochastic Action:Rolling 2 dice

Roll = 2Roll = 7 Roll = 12

p = 1/6p = 1/36 p = 1/36

Max

6

3 56

Max

27

Max

85

78

E = 8(1/36) + ... + 6(1/6) + .... + 7(1/36)

E

E = Expected Value


The Game of 2048

32 16 8

16 4 4

2 2

32 16 8

16 8

4

2

After each move, the "opponent" addsa 2 or 4 to a random open cell.

Prob (2) = 0.9; prob(4) = 0.1Game ends when allcells are filled. Goal: produce 2048 in a

cell before the game ends.

When 2 equal cells collide,their summed value goes in

the resulting cell.


The Game of 2048

32 16 8

16 4 4 4

2 2 2 2

2 32 16 8

16 4 8

4 4

When multiple (2) pairs of tiles ofequal value collide (yellow arrows), the two

product tiles become neighbors.

Newly-added2-tile


Playing 2048 with Minimax Search

L

R U

D

MAX4 possible

slide moves

MIN?? All possible spawns (2 & 4)

L

R U

DMAX


Playing 2048 with Minimax Search

2 32 16 8

16 4 8

4 4

2 32 16 8

16 4 8

8

2 32 16 8

16 4 8

8

2

32 16

16 48

16

2 32 16

16 4

16

8

Left Right Up Down

2 32 16 8

16 4 8

8

2

2 32 16 8

16 4 8

8

4

2 32 16 8

16 4 8

82

2 32 16 8

16 4 8

8

2

Consider all possiblespawns

LeftRight Up DownMAX

MIN

MAX


Expectimax for 2048

2 32 16 8

16 4 8

8

2

2 32 16 8

16 4 8

8

4

2 32 16 8

16 4 8

82

2 32 16 8

16 4 8

8

2

MAX

p(2 spawn) = 0.9p(4 spawn) = 0.12 32 16 8

16 4 8

8 p(choosing a cell) = 1/8 = 0.125

0.125 x 0.9 0.125 x 0.1

7 3 5 6

E =7 x 0.1125 +3 x 0.1125 +5 x 0.1125 +6 x 0.0125 +

...

...

= 0.1125 = 0.0125

E

0.1125 0.1125


Minimax -vs- Expectimax for 2048

The opponent in 2048 really is the stochastic spawning process. It isnot minimizing at all.However, by assuming a perfectly evil opponent, Minimax can playintelligently, albeit very conservatively (i.e., maximally safe).Still, the most appropriate algorithm is Expectimax with alternating MAXand CHANCE nodes.Alpha-Beta pruning works poorly for Expectimax (even when it hasMAX, MIN and CHANCE nodes). A node’s current value cannot bedirectly compared to an alpha or beta (for pruning), since that value willbe scaled by probabilities (and combined with other scaled values) atancestral chance nodes.Considering all spawns creates a large branching factor (with Minimaxand Expectimax), and the cost is higher with Expectimax, since it can’texploit alpha-beta pruning (very well).Approximate Minimax or Expectimax: Consider filling every open cell,but only with a 2 or 4 (not both), and use 2’s and 4’s according to theirspawning probabilities. This gives a maximum branching factor of C(number of open cells) instead of 2C. Then minimize (or compute theexpected value) over all child nodes.


Documents

Adversarial Search - · PDF fileWon 2; Tied 2 to achieve an amateur rating. 1967 (MIT) ... grandmaster games. ... Basic Process of Adversarial Search