43
Rutgers CS440, Fall 2003 Lecture 6: Adversarial Search & Games Reading: Ch. 6, AIMA

Rutgers CS440, Fall 2003 Lecture 6: Adversarial Search & Games Reading: Ch. 6, AIMA

  • View
    216

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Rutgers CS440, Fall 2003 Lecture 6: Adversarial Search & Games Reading: Ch. 6, AIMA

Rutgers CS440, Fall 2003

Lecture 6:Adversarial Search & Games

Reading: Ch. 6, AIMA

Page 2: Rutgers CS440, Fall 2003 Lecture 6: Adversarial Search & Games Reading: Ch. 6, AIMA

Rutgers CS440, Fall 2003

Adversarial search

• So far, single agent search – no opponents or collaborators• Multi-agent search:

– Playing a game with an opponent: adversarial search

– Economies: even more complex, societies of cooperative and non-cooperative agents

• Game playing and AI:– Games can be complex, require (?) human intelligence

– Have to evolve in “real-time”

– Well-defined problems

– Limited scope

Page 3: Rutgers CS440, Fall 2003 Lecture 6: Adversarial Search & Games Reading: Ch. 6, AIMA

Rutgers CS440, Fall 2003

Games and AI

Deterministic Chance

perfect info Checkers, Chess, Go, Othello

Backgammon, Monopoly

imperfect info Bridge, Poker, Scrabble

Page 4: Rutgers CS440, Fall 2003 Lecture 6: Adversarial Search & Games Reading: Ch. 6, AIMA

Rutgers CS440, Fall 2003

Games and search

• Traditional search: single agent, searches for its well-being, unobstructed

• Games: search against an opponent

• Consider a two player board game:– e.g., chess, checkers, tic-tac-toe

– board configuration: unique arrangement of "pieces"

• Representing board games as search problem:– states: board configurations

– operators: legal moves

– initial state: current board configuration

– goal state: winning/terminal board configuration

Page 5: Rutgers CS440, Fall 2003 Lecture 6: Adversarial Search & Games Reading: Ch. 6, AIMA

Rutgers CS440, Fall 2003

Wrong representation

• We want to optimize our (agent’s) goal, hence build a search tree based on possible moves/actions

• Problem: discounts the opponent

X X X

X

X

X O

X

X O

X

O

X X

OX

X

Page 6: Rutgers CS440, Fall 2003 Lecture 6: Adversarial Search & Games Reading: Ch. 6, AIMA

Rutgers CS440, Fall 2003

Better representation: game search tree

• Include opponent’s actions as well

X X X

X

OX OX

O

X

O

X

X

O

X

X O

X

X O

X

O

X X

OX

X

Agent move

Opponent move

Agent move

Fullmove

10 15 Utilities(assigned to goal nodes)

Page 7: Rutgers CS440, Fall 2003 Lecture 6: Adversarial Search & Games Reading: Ch. 6, AIMA

Rutgers CS440, Fall 2003

Game search trees

• What is the size of the game search trees?– O(bd)

– Tic-tac-toe: 9! leaves (max depth= 9)

– Chess: 35 legal moves, average “depth” 100• bd ~ 35100 ~10154 states, “only” ~1040 legal states

• Too deep for exhaustive search!

Page 8: Rutgers CS440, Fall 2003 Lecture 6: Adversarial Search & Games Reading: Ch. 6, AIMA

Rutgers CS440, Fall 2003

Utilities in search trees

• Assign utility to (terminal) states, describing how much they are valued for the agent– High utility – good for the agent

– Low utility – good for the opponent

M1

N3

O2

K0

L2

F-7

G-5

H3

I9

J-6

E3

D2

B-5

C9

opponent'spossible moves

board evaluation from agent's perspective

A9

computer's possible moves

terminal states

Page 9: Rutgers CS440, Fall 2003 Lecture 6: Adversarial Search & Games Reading: Ch. 6, AIMA

Rutgers CS440, Fall 2003

Search strategy

• Worst-case scenario: assume the opponent will always make a best move (i.e., worst move for us)

• Minimax search: maximize the utility for our agent while expecting that the opponent plays his best moves:1. High utility favors agent => chose move with maximal utility

2. Low move favors opponent => assume opponent makes the move with lowest utility

EDB C

A

E1

D0

B-7

C-6

A1

M1

N3

O2

K0

L2

F-7

G-5

H3

I9

J-6

computer's possible moves

opponent'spossible moves

terminal states

Page 10: Rutgers CS440, Fall 2003 Lecture 6: Adversarial Search & Games Reading: Ch. 6, AIMA

Rutgers CS440, Fall 2003

Minimax algorithm

1. Start with utilities of terminal nodes

2. Propagate them back to root node by choosing the minimax strategy

EDB C

A

EDB C

A

M1

N3

O2

K0

L2

F-7

G-5

H3

I9

J-6

EDB C

A

E1

D0

B-5

C-6

A

M1

N3

O2

K0

L2

F-7

G-5

H3

I9

J-6

min

EDB C

A

E1

D0

B-5

C-6

A1

M1

N3

O2

K0

L2

F-7

G-5

H3

I9

J-6

max

Page 11: Rutgers CS440, Fall 2003 Lecture 6: Adversarial Search & Games Reading: Ch. 6, AIMA

Rutgers CS440, Fall 2003

Complexity of minimax algorithm

• Utilities propagate up in a recursive fashion:– DFS

• Space complexity:– O(bd)

• Time complexity:– O(bd)

• Problem: time complexity – it’s a game, finite time to make a move

Page 12: Rutgers CS440, Fall 2003 Lecture 6: Adversarial Search & Games Reading: Ch. 6, AIMA

Rutgers CS440, Fall 2003

Reducing complexity of minimax (1)

• Don’t search to full depth d, terminate early• Prune bad paths

• Problem:– Don’t have utility of non-terminal nodes

• Estimate utility for non-terminal nodes: – static board evaluation function (SBE) is a heuristic that assigns

utility to non-terminal nodes

– it reflects the computer’s chances of winning from that node

– it must be easy to calculate from board configuration

• For example, Chess:SBE = α * materialBalance + β * centerControl + γ * …material balance = Value of white pieces - Value of black piecespawn = 1, rook = 5, queen = 9, etc.

Page 13: Rutgers CS440, Fall 2003 Lecture 6: Adversarial Search & Games Reading: Ch. 6, AIMA

Rutgers CS440, Fall 2003

Minimax with Evaluation Functions

• Same as general Minimax, except– only goes to depth m– estimates using SBE function

• How would this algorithm perform at chess?– if could look ahead ~4 pairs of moves (i.e., 8 ply)

would be consistently beaten by average players

– if could look ahead ~8 pairs as done in a typical PC, is as good as human master

Page 14: Rutgers CS440, Fall 2003 Lecture 6: Adversarial Search & Games Reading: Ch. 6, AIMA

Rutgers CS440, Fall 2003

Reducing complexity of minimax (2)

• Some branches of the tree will not be taken if the opponent plays cleverly. Can we detect them ahead of time?

• Prune off paths that do not need to be explored• Alpha-beta pruning

• Keep track of while doing DFS of game tree:– maximizing level: alpha

• highest value seen so far

• lower bound on node's evaluation/score

– minimizing level: beta • lowest value seen so far

• higher bound on node's evaluation/score

Page 15: Rutgers CS440, Fall 2003 Lecture 6: Adversarial Search & Games Reading: Ch. 6, AIMA

Rutgers CS440, Fall 2003

O

W-3

B

N4

F G-5

X-5

ED0C

R0

P9

Q-6

S3

T5

U-7

V-9

K MH3

I8 J L

2

A

Alpha-Beta Example

minimax(A,0,4)

max CallStack

A

AAα=

Page 16: Rutgers CS440, Fall 2003 Lecture 6: Adversarial Search & Games Reading: Ch. 6, AIMA

Rutgers CS440, Fall 2003

O

W-3

B

N4

F G-5

X-5

ED0C

R0

P9

Q-6

S3

T5

U-7

V-9

K MH3

I8 J L

2

Aα=

Alpha-Beta Example

minimax(B,1,4)

max CallStack

A

BBβ=

B

min

Page 17: Rutgers CS440, Fall 2003 Lecture 6: Adversarial Search & Games Reading: Ch. 6, AIMA

Rutgers CS440, Fall 2003

O

W-3

Bβ=

N4

F G-5

X-5

ED0C

R0

P9

Q-6

S3

T5

U-7

V-9

K MH3

I8 J L

2

Aα=

Alpha-Beta Example

minimax(F,2,4)

max CallStack

A

FFα=

B

min

max

F

Page 18: Rutgers CS440, Fall 2003 Lecture 6: Adversarial Search & Games Reading: Ch. 6, AIMA

Rutgers CS440, Fall 2003

O

W-3

Bβ=

N4

Fα=

G-5

X-5

ED0C

R0

P9

Q-6

S3

T5

U-7

V-9

K MH3

I8 J L

2

Aα=

Alpha-Beta Example

minimax(N,3,4)

max CallStack

A

N4

B

min

max

F

blue: terminal state

N

Page 19: Rutgers CS440, Fall 2003 Lecture 6: Adversarial Search & Games Reading: Ch. 6, AIMA

Rutgers CS440, Fall 2003

O

W-3

Bβ=

N4

Fα=

G-5

X-5

ED0C

R0

P9

Q-6

S3

T5

U-7

V-9

K MH3

I8 J L

2

Aα=

Alpha-Beta Example

minimax(F,2,4) is returned to

maxCall

Stack

A

alpha = 4, maximum seen so far

B

min

max

F

blue: terminal state

Fα=4

Page 20: Rutgers CS440, Fall 2003 Lecture 6: Adversarial Search & Games Reading: Ch. 6, AIMA

Rutgers CS440, Fall 2003

O

W-3

Bβ=

N4

Fα=4

G-5

X-5

ED0C

R0

P9

Q-6

S3

T5

U-7

V-9

K MH3

I8 J L

2

Aα=

Alpha-Beta Example

minimax(O,3,4)

maxCall

Stack

AB

min

max

F

blue: terminal state

Omin OOβ=

Page 21: Rutgers CS440, Fall 2003 Lecture 6: Adversarial Search & Games Reading: Ch. 6, AIMA

Rutgers CS440, Fall 2003

blue: terminal state

Oβ=

W-3

Bβ=

N4

Fα=4

G-5

X-5

ED0C

R0

P9

Q-6

S3

T5

U-7

V-9

K MH3

I8 J L

2

Aα=

Alpha-Beta Example

minimax(W,4,4)

max CallStack

AB

min

max

F

blue: terminal state (depth limit)

O

W-3

min

W

Page 22: Rutgers CS440, Fall 2003 Lecture 6: Adversarial Search & Games Reading: Ch. 6, AIMA

Rutgers CS440, Fall 2003

blue: terminal state

Oβ=

W-3

Bβ=

N4

Fα=4

G-5

X-5

ED0C

R0

P9

Q-6

S3

T5

U-7

V-9

K MH3

I8 J L

2

Aα=

Alpha-Beta Example

minimax(O,3,4) is returned to

max CallStack

A

beta = -3, minimum seen so far

B

min

max

FO

min Oβ=-3

Page 23: Rutgers CS440, Fall 2003 Lecture 6: Adversarial Search & Games Reading: Ch. 6, AIMA

Rutgers CS440, Fall 2003

blue: terminal state

Oβ=-3

W-3

Bβ=

N4

Fα=4

G-5

X-5

ED0C

R0

P9

Q-6

S3

T5

U-7

V-9

K MH3

I8 J L

2

Aα=

Alpha-Beta Example

minimax(O,3,4) is returned to

maxCall

Stack

A

O's beta F's alpha: stop expanding O (alpha cut-off)

B

min

max

FO

min

X-5

Page 24: Rutgers CS440, Fall 2003 Lecture 6: Adversarial Search & Games Reading: Ch. 6, AIMA

Rutgers CS440, Fall 2003

blue: terminal state

Oβ=-3

W-3

Bβ=

N4

Fα=4

G-5

X-5

ED0C

R0

P9

Q-6

S3

T5

U-7

V-9

K MH3

I8 J L

2

Aα=

Alpha-Beta Example

Why? Smart opponent will choose W or worse, thus O's upper bound is –3

So computer shouldn't choose O:-3 since N:4 is better

maxCall

Stack

AB

min

max

FO

min

Page 25: Rutgers CS440, Fall 2003 Lecture 6: Adversarial Search & Games Reading: Ch. 6, AIMA

Rutgers CS440, Fall 2003

blue: terminal state

Oβ=-3

W-3

Bβ=

N4

Fα=4

G-5

X-5

ED0C

R0

P9

Q-6

S3

T5

U-7

V-9

K MH3

I8 J L

2

Aα=

Alpha-Beta Example

minimax(F,2,4) is returned to

maxCall

Stack

AB

min

max

Fmin

X-5

alpha not changed (maximizing)

Page 26: Rutgers CS440, Fall 2003 Lecture 6: Adversarial Search & Games Reading: Ch. 6, AIMA

Rutgers CS440, Fall 2003

blue: terminal state

Oβ=-3

W-3

Bβ=

N4

Fα=4

G-5

X-5

ED0C

R0

P9

Q-6

S3

T5

U-7

V-9

K MH3

I8 J L

2

Aα=

Alpha-Beta Example

minimax(B,1,4) is returned to

maxCall

Stack

AB

min

max

min

X-5

beta = 4, minimum seen so far

Bβ=4

Page 27: Rutgers CS440, Fall 2003 Lecture 6: Adversarial Search & Games Reading: Ch. 6, AIMA

Rutgers CS440, Fall 2003

Effectiveness of Alpha-Beta Search

Effectiveness depends on the order in which successors are examined. More effective if bestare examined first

• Worst Case:– ordered so that no pruning takes place

– no improvement over exhaustive search

• Best Case:– each player’s best move is evaluated first (left-most)

• In practice, performance is closer to bestrather than worst case

Page 28: Rutgers CS440, Fall 2003 Lecture 6: Adversarial Search & Games Reading: Ch. 6, AIMA

Rutgers CS440, Fall 2003

Effectiveness of Alpha-Beta Search

• In practice often get O(b(d/2)) rather than O(bd)– same as having a branching factor of b

since (b)d = b(d/2)

• For Example: Chess– goes from b ~ 35 to b ~ 6– permits much deeper search for the same time

– makes computer chess competitive with humans

Page 29: Rutgers CS440, Fall 2003 Lecture 6: Adversarial Search & Games Reading: Ch. 6, AIMA

Rutgers CS440, Fall 2003

Dealing with Limited Time

• In real games, there is usually a time limit T on making a move• How do we take this into account?

– cannot stop alpha-beta midway and expect to useresults with any confidence

– so, we could set a conservative depth-limit that guarantees we will find a move in time < T

– but then, the search may finish early andthe opportunity is wasted to do more search

Page 30: Rutgers CS440, Fall 2003 Lecture 6: Adversarial Search & Games Reading: Ch. 6, AIMA

Rutgers CS440, Fall 2003

Dealing with Limited Time

• In practice, iterative deepening search (IDS) is used– run alpha-beta search with an increasing depth limit

– when the clock runs out, use the solution foundfor the last completed alpha-beta search(i.e., the deepest search that was completed)

Page 31: Rutgers CS440, Fall 2003 Lecture 6: Adversarial Search & Games Reading: Ch. 6, AIMA

Rutgers CS440, Fall 2003

The Horizon Effect

• Sometimes disaster lurks just beyond search depth– computer captures queen, but a few moves later the opponent

checkmates (i.e., wins)

• The computer has a limited horizon; it cannotsee that this significant event could happen

• How do you avoid catastrophic losses due to “short-sightedness”?– quiescence search

– secondary search

Page 32: Rutgers CS440, Fall 2003 Lecture 6: Adversarial Search & Games Reading: Ch. 6, AIMA

Rutgers CS440, Fall 2003

The Horizon Effect

• Quiescence Search– when evaluation frequently changing, look deeper than limit

– look for a point when game “quiets down”

• Secondary Search1. find best move looking to depth d2. look k steps beyond to verify that it still looks good

3. if it doesn't, repeat Step 2 for next best move

Page 33: Rutgers CS440, Fall 2003 Lecture 6: Adversarial Search & Games Reading: Ch. 6, AIMA

Rutgers CS440, Fall 2003

Book Moves

• Build a database of opening moves, end games, and studied configurations

• If the current state is in the database, use database:– to determine the next move

– to evaluate the board

• Otherwise, do alpha-beta search

Page 34: Rutgers CS440, Fall 2003 Lecture 6: Adversarial Search & Games Reading: Ch. 6, AIMA

Rutgers CS440, Fall 2003

Examples of Algorithmswhich Learn to Play Well

Checkers:

A. L. Samuel, “Some Studies in Machine Learning using the Game of Checkers,” IBM Journal of Research and Development, 11(6):601-617, 1959

• Learned by playing a copy of itself thousands of times• Used only an IBM 704 with 10,000 words of RAM, magnetic

tape, and a clock speed of 1 kHz• Successful enough to compete well at human tournaments

Page 35: Rutgers CS440, Fall 2003 Lecture 6: Adversarial Search & Games Reading: Ch. 6, AIMA

Rutgers CS440, Fall 2003

Examples of Algorithmswhich Learn to Play Well

Backgammon:

G. Tesauro and T. J. Sejnowski, “A Parallel Network that Learns to Play Backgammon,” Artificial Intelligence 39(3), 357-390, 1989

• Also learns by playing copies of itself• Uses a non-linear evaluation function - a neural network • Rated one of the top three players in the world

Page 36: Rutgers CS440, Fall 2003 Lecture 6: Adversarial Search & Games Reading: Ch. 6, AIMA

Rutgers CS440, Fall 2003

Non-deterministic Games

• Some games involve chance, for example:– roll of dice– spin of game wheel– deal of cards from shuffled deck

• How can we handle games with random elements?• The game tree representation is extended

to include chance nodes:1. agent moves2. chance nodes3. opponent moves

Page 37: Rutgers CS440, Fall 2003 Lecture 6: Adversarial Search & Games Reading: Ch. 6, AIMA

Rutgers CS440, Fall 2003

Non-deterministic Games

The game tree representation is extended:

Aα=

Bβ=2

7 2

Cβ=6

9 6

Dβ=0

5 0

Eβ=-4

8 -4

50/50 50/50

.5 .5 .5 .5

max

chance

min

Page 38: Rutgers CS440, Fall 2003 Lecture 6: Adversarial Search & Games Reading: Ch. 6, AIMA

Rutgers CS440, Fall 2003

Non-deterministic Games

• Weight score by the probabilities that move occurs• Use expected value for move: sum of possible random

outcomes

Aα=

Bβ=2

7 2

Cβ=6

9 6

Dβ=0

5 0

Eβ=-4

8 -4

50/50 50/50

.5 .5 .5 .5

max

chance

min

50/504

50/50-2

Page 39: Rutgers CS440, Fall 2003 Lecture 6: Adversarial Search & Games Reading: Ch. 6, AIMA

Rutgers CS440, Fall 2003

Non-deterministic Games

• Choose move with highest expected value

Aα=

Bβ=2

7 2

Cβ=6

9 6

Dβ=0

5 0

Eβ=-4

8 -4

50/504

50/50-2

.5 .5 .5 .5

max

chance

min

Aα=4

Page 40: Rutgers CS440, Fall 2003 Lecture 6: Adversarial Search & Games Reading: Ch. 6, AIMA

Rutgers CS440, Fall 2003

Non-deterministic Games

• Non-determinism increases branching factor– 21 possible rolls with 2 dice

• Value of lookahead diminishes: as depth increases probability of reaching a given node decreases

• alpha-beta pruning less effective• TDGammon:

– depth-2 search

– very good heuristic

– plays at world champion level

Page 41: Rutgers CS440, Fall 2003 Lecture 6: Adversarial Search & Games Reading: Ch. 6, AIMA

Rutgers CS440, Fall 2003

Computers can playGrandMaster Chess

“Deep Blue” (IBM)• Parallel processor, 32 nodes• Each node has 8 dedicated VLSI “chess chips”• Can search 200 million configurations/second• Uses minimax, alpha-beta, sophisticated heuristics• It currently can search to 14 ply (i.e., 7 pairs of moves)• Can avoid horizon by searching as deep as 40 ply• Uses book moves

Page 42: Rutgers CS440, Fall 2003 Lecture 6: Adversarial Search & Games Reading: Ch. 6, AIMA

Rutgers CS440, Fall 2003

Computers can playGrandMaster Chess

Kasparov vs. Deep Blue, May 1997• 6 game full-regulation chess match sponsored by ACM• Kasparov lost the match 2 wins & 1 tie to 3 wins & 1 tie• This was an historic achievement for computer chess being

the first time a computer became the best chess player on the planet

• Note that Deep Blue plays by “brute force” (i.e., raw power from computer speed and memory); it uses relatively little that is similar to human intuition and cleverness

Page 43: Rutgers CS440, Fall 2003 Lecture 6: Adversarial Search & Games Reading: Ch. 6, AIMA

Rutgers CS440, Fall 2003

Status of Computersin Other Deterministic Games

• Checkers/Draughts– current world champion is Chinook

– can beat any human, (beat Tinsley in 1994)

– uses alpha-beta search, book moves (> 443 billion)

• Othello– computers can easily beat the world experts

• Go– branching factor b ~ 360 (very large!)

– $2 million prize for any system that can beat a world expert