G51IAI Introduction to AI Minmax and Alpha Beta Pruning Garry Kasparov and Deep Blue. © 1997, GM Gabriel Schwartzman's Chess Camera, courtesy IBM

G51IAIIntroduction to AI

Minmax and Alpha Beta Pruning

Garry Kasparov and Deep Blue. © 1997, GM Gabriel

Schwartzman's Chess Camera, courtesy IBM.

Game Playing - Minimax

• Game Playing

• An opponent tries to thwart your every move

• 1944 - John von Neumann outlined a search method (Minimax) that maximised your position whilst minimising your opponents

Example Game Tic Tac Toe

Game Playing – Example

• Nim (a simple game)

• Start with a single pile of tokens

• At each move the player must select a pile and divide the tokens into two non-empty, non-equal piles

+

+

+


• Starting with 7 tokens, the game is small enough that we can draw the entire game tree

• The “game tree” to describe all possible games follows:

7

6-1 5-2 4-3

5-1-1 4-2-1 3-2-2 3-3-1

4-1-1-1 3-2-1-1 2-2-2-1

3-1-1-1-1 2-2-1-1-1

2-1-1-1-1-1

Game Playing – Nim Game Tree• NOTE: We converted the tree of possible games

to a graph by merging nodes that have the same “game state” – this just saves repetition of work

• But what do we do with the “game tree”

• How can we use it to help decide how to play?

• Use “Minimax Method”


• In order to implement minimax we need a method of measuring how good a position is.

• Often called a utility function– a.k.a. score, evaluation function, utility value, …

• Initially this will be a value that describes our position exactly


• Conventionally, in discussion of minimax, have two players “MAX” and “MIN”

• The utility function is taken to be the utility for MAX

• Larger values are better for MAX”

Game Playing – Nim

• Remember that larger values are taken to be better for MAX

• Assume that use a utility function of

– 1 = a win for MAX– 0 = a win for MIN

• We only compare values, “larger or smaller”, so the actual sizes do not matter– in other games might use {+1,0,-1} for

{win,draw,lose}.

Game Playing – Minimax• Basic idea of minimax:

• Player MAX is going to take the best move available

• Will select the next state to be the one with the highest utility

• Hence, value of a MAX node is the MAXIMUM of the values of the next possible states

– i.e. the maximum of its children in the search tree

Game Playing – Minimax• Player MIN is going to take the best move

available for MIN– i.e. the worst available for MAX

• Will select the next state to be the one with the lowest utility

– recall, higher utility values are better for MAX and so worse for MIN

• Hence, value of a MIN node is the MINIMUM of the values of the next possible states

– i.e. the minimum of its children in the search tree

Game Playing – Minimax Summary

• A “MAX” move takes the best move for MAX – so takes the MAX utility of the children

• A “MIN” move takes the best for min – hence the worst for MAX – so takes the MIN utility of the children

• Games alternate in play between MIN and MAX

Game Playing – Minimax for NIM

• Assuming MIN plays first, complete the MIN/MAX tree

• Assume that use a utility function of

– 1 = a win for MAX

– 0 = a win for MIN

7

6-1 5-2 4-3

5-1-1 4-2-1 3-2-2 3-3-1

4-1-1-1 3-2-1-1 2-2-2-1

3-1-1-1-1 2-2-1-1-1

2-1-1-1-1-1

MIN

MIN

MIN

MAX

MAX

MAX 0 (loss for MAX)

1

0

0

01

0 1 0 1

1 1 1

1

Game Playing – Use of Minimax

• The Min node has value +1

• All moves by MIN lead to a state of value +1 for MAX

• MIN cannot avoid losing

• From the values on the tree one can read off the best moves for each player– make sure you know how to extract these best

moves (“perfect lines of play”)

Game Playing – Bounded Minimax

• For real games, search trees are much bigger and deeper than Nim

• Cannot possibly evaluate the entire tree

• Have to put a bound on the depth of the search


• The terminal states are no longer a definite win/loss– actually they are really a definite win/draw/loss but

with reasonable computer resources we cannot determine which

• Have to heuristically/approximately evaluate the quality of the positions of the states

• Evaluation of the utility function is expensive if it is not a clear win or loss


Next Slide:

• Artificial example of minimax bounded

• Evaluate “terminal position” after all possible moves by MAX

• (The numbers are invented, and just to illustrate the working of minimax)

= terminal position = agent = opponent

1

MIN

MAX

1 -3

A

B

B C

Utility values of “terminal” positions obtained

by an evaluation function

Game Playing – Bounded Minimax• Example of minimax with bounded depth

• Evaluate “terminal position” after all possible moves in the order:

1. MAX (aka “agent”)2. MIN (aka “opponent”)3. MAX

• (The numbers are invented, and just to illustrate the working of minimax)

• Assuming MX plays first, complete the MIN/MAX tree

D E F G


4 -5 -5 1 -7 2 -3 -8

1

MAX

MIN

4 1 2 -3

MAX

1 -3B C

A


• If both players play their best moves, then which “line” does the play follow?

D E F G


4 -5 -5 1 -7 2 -3 -8

1

MAX

MIN

4 1 2 -3

MAX

1 -3B C

A

Game Playing – Perfect Play

• Note that the line of perfect play leads the a terminal node with the same value as the root node

• All intermediate nodes also have that same value

• Essentially, this is the meaning of the value at the root node

• Caveat: This only applies if the tree is not expanded further after a move because then the terminals will change and so values can change

Game Playing – Summary So Far

• Game tree– describes the possible sequences of play

– might be drawn as a graph if we merge together identical states

• Minimax– Utility values assigned to the leaves

• Values “backed up” the tree by– MAX node takes max value of children

– MIN node takes min value of children

– Can read off best lines of play and results

• Depth Bound – utility of terminal states estimated using an “evaluation function”

Minimax algorithm

Minimax

max

min

max

min

Minimax

max

min

max

min 10 9 14 13 2 1 3 24

10 14 2 24

10 2

10

A MINMAX GAME

Properties of minimax• Complete? Yes (if tree is finite)

• Optimal? Yes (against an optimal opponent)

• Time complexity? O(bm)

• Space complexity? O(bm) (depth-first exploration)

• For chess, b ≈ 35, m ≈100 for "reasonable" games exact solution completely infeasible

•

•

•

•

•

α-β pruning example





Alpha and beta

• The ALPHA value of a MAX node is set equal to the current LARGES final backed-up value of its successors.

• The BETA value of a MIN node is set equal to the current SMALLEST final backed-up value of its successors.

ALPHA-BETA PRUNING

Properties of α-β• Pruning does not affect final result

• Good move ordering improves effectiveness of pruning

• With "perfect ordering," time complexity = O(bm/2) doubles depth of search

• A simple example of the value of reasoning about which computations are relevant (a form of metareasoning)

••

•

Why is it called α-β?• α is the value of the

best (i.e., highest-value) choice found so far at any choice point along the path for max

• If v is worse than α, max will avoid it prune that branch

• Define β similarly for min•

The α-β algorithm

The α-β algorithm

Resource limits

Suppose we have 100 secs, explore 104 nodes/sec 106 nodes per move

Standard approach:• cutoff test:

e.g., depth limit (perhaps add quiescence search)

• evaluation function = estimated desirability of position

–

–

•

Evaluation functions• For chess, typically linear weighted sum of features

Eval(s) = w1 f1(s) + w2 f2(s) + … + wn fn(s)

• e.g., w1 = 9 with

f1(s) = (number of white queens) – (number of black queens), etc.

•

Cutting off searchMinimaxCutoff is identical to MinimaxValue except

1. Terminal? is replaced by Cutoff?2. Utility is replaced by Eval

Does it work in practice?bm = 106, b=35 m=4

4-ply lookahead is a hopeless chess player!– 4-ply ≈ human novice– 8-ply ≈ typical PC, human master– 12-ply ≈ Deep Blue, Kasparov–

••

Deterministic games in practice• Checkers: Chinook ended 40-year-reign of human world champion

Marion Tinsley in 1994. Used a precomputed endgame database defining perfect play for all positions involving 8 or fewer pieces on the board, a total of 444 billion positions.

•

• Chess: Deep Blue defeated human world champion Garry Kasparov in a six-game match in 1997. Deep Blue searches 200 million positions per second, uses very sophisticated evaluation, and undisclosed methods for extending some lines of search up to 40 ply.

• Othello: human champions refuse to compete against computers, who are too good.

• Go: human champions refuse to compete against computers, who are too bad. In go, b > 300, so most programs use pattern knowledge bases to suggest plausible moves.

•••

•

Summary

• Games are fun to work on!

• They illustrate several important points about AI

• perfection is unattainable must approximate

• good idea to think about what to think about

•

•

Documents

G51IAI Introduction to AI Minmax and Alpha Beta Pruning Garry Kasparov and Deep Blue. © 1997, GM Gabriel Schwartzman's Chess Camera, courtesy IBM