View
216
Download
0
Tags:
Embed Size (px)
Citation preview
Rutgers CS440, Fall 2003
Lecture 6:Adversarial Search & Games
Reading: Ch. 6, AIMA
Rutgers CS440, Fall 2003
Adversarial search
• So far, single agent search – no opponents or collaborators• Multi-agent search:
– Playing a game with an opponent: adversarial search
– Economies: even more complex, societies of cooperative and non-cooperative agents
• Game playing and AI:– Games can be complex, require (?) human intelligence
– Have to evolve in “real-time”
– Well-defined problems
– Limited scope
Rutgers CS440, Fall 2003
Games and AI
Deterministic Chance
perfect info Checkers, Chess, Go, Othello
Backgammon, Monopoly
imperfect info Bridge, Poker, Scrabble
Rutgers CS440, Fall 2003
Games and search
• Traditional search: single agent, searches for its well-being, unobstructed
• Games: search against an opponent
• Consider a two player board game:– e.g., chess, checkers, tic-tac-toe
– board configuration: unique arrangement of "pieces"
• Representing board games as search problem:– states: board configurations
– operators: legal moves
– initial state: current board configuration
– goal state: winning/terminal board configuration
Rutgers CS440, Fall 2003
Wrong representation
• We want to optimize our (agent’s) goal, hence build a search tree based on possible moves/actions
• Problem: discounts the opponent
X X X
X
X
X O
X
X O
X
O
X X
OX
X
Rutgers CS440, Fall 2003
Better representation: game search tree
• Include opponent’s actions as well
X X X
X
OX OX
O
X
O
X
X
O
X
X O
X
X O
X
O
X X
OX
X
Agent move
Opponent move
Agent move
Fullmove
10 15 Utilities(assigned to goal nodes)
Rutgers CS440, Fall 2003
Game search trees
• What is the size of the game search trees?– O(bd)
– Tic-tac-toe: 9! leaves (max depth= 9)
– Chess: 35 legal moves, average “depth” 100• bd ~ 35100 ~10154 states, “only” ~1040 legal states
• Too deep for exhaustive search!
Rutgers CS440, Fall 2003
Utilities in search trees
• Assign utility to (terminal) states, describing how much they are valued for the agent– High utility – good for the agent
– Low utility – good for the opponent
M1
N3
O2
K0
L2
F-7
G-5
H3
I9
J-6
E3
D2
B-5
C9
opponent'spossible moves
board evaluation from agent's perspective
A9
computer's possible moves
terminal states
Rutgers CS440, Fall 2003
Search strategy
• Worst-case scenario: assume the opponent will always make a best move (i.e., worst move for us)
• Minimax search: maximize the utility for our agent while expecting that the opponent plays his best moves:1. High utility favors agent => chose move with maximal utility
2. Low move favors opponent => assume opponent makes the move with lowest utility
EDB C
A
E1
D0
B-7
C-6
A1
M1
N3
O2
K0
L2
F-7
G-5
H3
I9
J-6
computer's possible moves
opponent'spossible moves
terminal states
Rutgers CS440, Fall 2003
Minimax algorithm
1. Start with utilities of terminal nodes
2. Propagate them back to root node by choosing the minimax strategy
EDB C
A
EDB C
A
M1
N3
O2
K0
L2
F-7
G-5
H3
I9
J-6
EDB C
A
E1
D0
B-5
C-6
A
M1
N3
O2
K0
L2
F-7
G-5
H3
I9
J-6
min
EDB C
A
E1
D0
B-5
C-6
A1
M1
N3
O2
K0
L2
F-7
G-5
H3
I9
J-6
max
Rutgers CS440, Fall 2003
Complexity of minimax algorithm
• Utilities propagate up in a recursive fashion:– DFS
• Space complexity:– O(bd)
• Time complexity:– O(bd)
• Problem: time complexity – it’s a game, finite time to make a move
Rutgers CS440, Fall 2003
Reducing complexity of minimax (1)
• Don’t search to full depth d, terminate early• Prune bad paths
• Problem:– Don’t have utility of non-terminal nodes
• Estimate utility for non-terminal nodes: – static board evaluation function (SBE) is a heuristic that assigns
utility to non-terminal nodes
– it reflects the computer’s chances of winning from that node
– it must be easy to calculate from board configuration
• For example, Chess:SBE = α * materialBalance + β * centerControl + γ * …material balance = Value of white pieces - Value of black piecespawn = 1, rook = 5, queen = 9, etc.
Rutgers CS440, Fall 2003
Minimax with Evaluation Functions
• Same as general Minimax, except– only goes to depth m– estimates using SBE function
• How would this algorithm perform at chess?– if could look ahead ~4 pairs of moves (i.e., 8 ply)
would be consistently beaten by average players
– if could look ahead ~8 pairs as done in a typical PC, is as good as human master
Rutgers CS440, Fall 2003
Reducing complexity of minimax (2)
• Some branches of the tree will not be taken if the opponent plays cleverly. Can we detect them ahead of time?
• Prune off paths that do not need to be explored• Alpha-beta pruning
• Keep track of while doing DFS of game tree:– maximizing level: alpha
• highest value seen so far
• lower bound on node's evaluation/score
– minimizing level: beta • lowest value seen so far
• higher bound on node's evaluation/score
Rutgers CS440, Fall 2003
O
W-3
B
N4
F G-5
X-5
ED0C
R0
P9
Q-6
S3
T5
U-7
V-9
K MH3
I8 J L
2
A
Alpha-Beta Example
minimax(A,0,4)
max CallStack
A
AAα=
Rutgers CS440, Fall 2003
O
W-3
B
N4
F G-5
X-5
ED0C
R0
P9
Q-6
S3
T5
U-7
V-9
K MH3
I8 J L
2
Aα=
Alpha-Beta Example
minimax(B,1,4)
max CallStack
A
BBβ=
B
min
Rutgers CS440, Fall 2003
O
W-3
Bβ=
N4
F G-5
X-5
ED0C
R0
P9
Q-6
S3
T5
U-7
V-9
K MH3
I8 J L
2
Aα=
Alpha-Beta Example
minimax(F,2,4)
max CallStack
A
FFα=
B
min
max
F
Rutgers CS440, Fall 2003
O
W-3
Bβ=
N4
Fα=
G-5
X-5
ED0C
R0
P9
Q-6
S3
T5
U-7
V-9
K MH3
I8 J L
2
Aα=
Alpha-Beta Example
minimax(N,3,4)
max CallStack
A
N4
B
min
max
F
blue: terminal state
N
Rutgers CS440, Fall 2003
O
W-3
Bβ=
N4
Fα=
G-5
X-5
ED0C
R0
P9
Q-6
S3
T5
U-7
V-9
K MH3
I8 J L
2
Aα=
Alpha-Beta Example
minimax(F,2,4) is returned to
maxCall
Stack
A
alpha = 4, maximum seen so far
B
min
max
F
blue: terminal state
Fα=4
Rutgers CS440, Fall 2003
O
W-3
Bβ=
N4
Fα=4
G-5
X-5
ED0C
R0
P9
Q-6
S3
T5
U-7
V-9
K MH3
I8 J L
2
Aα=
Alpha-Beta Example
minimax(O,3,4)
maxCall
Stack
AB
min
max
F
blue: terminal state
Omin OOβ=
Rutgers CS440, Fall 2003
blue: terminal state
Oβ=
W-3
Bβ=
N4
Fα=4
G-5
X-5
ED0C
R0
P9
Q-6
S3
T5
U-7
V-9
K MH3
I8 J L
2
Aα=
Alpha-Beta Example
minimax(W,4,4)
max CallStack
AB
min
max
F
blue: terminal state (depth limit)
O
W-3
min
W
Rutgers CS440, Fall 2003
blue: terminal state
Oβ=
W-3
Bβ=
N4
Fα=4
G-5
X-5
ED0C
R0
P9
Q-6
S3
T5
U-7
V-9
K MH3
I8 J L
2
Aα=
Alpha-Beta Example
minimax(O,3,4) is returned to
max CallStack
A
beta = -3, minimum seen so far
B
min
max
FO
min Oβ=-3
Rutgers CS440, Fall 2003
blue: terminal state
Oβ=-3
W-3
Bβ=
N4
Fα=4
G-5
X-5
ED0C
R0
P9
Q-6
S3
T5
U-7
V-9
K MH3
I8 J L
2
Aα=
Alpha-Beta Example
minimax(O,3,4) is returned to
maxCall
Stack
A
O's beta F's alpha: stop expanding O (alpha cut-off)
B
min
max
FO
min
X-5
Rutgers CS440, Fall 2003
blue: terminal state
Oβ=-3
W-3
Bβ=
N4
Fα=4
G-5
X-5
ED0C
R0
P9
Q-6
S3
T5
U-7
V-9
K MH3
I8 J L
2
Aα=
Alpha-Beta Example
Why? Smart opponent will choose W or worse, thus O's upper bound is –3
So computer shouldn't choose O:-3 since N:4 is better
maxCall
Stack
AB
min
max
FO
min
Rutgers CS440, Fall 2003
blue: terminal state
Oβ=-3
W-3
Bβ=
N4
Fα=4
G-5
X-5
ED0C
R0
P9
Q-6
S3
T5
U-7
V-9
K MH3
I8 J L
2
Aα=
Alpha-Beta Example
minimax(F,2,4) is returned to
maxCall
Stack
AB
min
max
Fmin
X-5
alpha not changed (maximizing)
Rutgers CS440, Fall 2003
blue: terminal state
Oβ=-3
W-3
Bβ=
N4
Fα=4
G-5
X-5
ED0C
R0
P9
Q-6
S3
T5
U-7
V-9
K MH3
I8 J L
2
Aα=
Alpha-Beta Example
minimax(B,1,4) is returned to
maxCall
Stack
AB
min
max
min
X-5
beta = 4, minimum seen so far
Bβ=4
Rutgers CS440, Fall 2003
Effectiveness of Alpha-Beta Search
Effectiveness depends on the order in which successors are examined. More effective if bestare examined first
• Worst Case:– ordered so that no pruning takes place
– no improvement over exhaustive search
• Best Case:– each player’s best move is evaluated first (left-most)
• In practice, performance is closer to bestrather than worst case
Rutgers CS440, Fall 2003
Effectiveness of Alpha-Beta Search
• In practice often get O(b(d/2)) rather than O(bd)– same as having a branching factor of b
since (b)d = b(d/2)
• For Example: Chess– goes from b ~ 35 to b ~ 6– permits much deeper search for the same time
– makes computer chess competitive with humans
Rutgers CS440, Fall 2003
Dealing with Limited Time
• In real games, there is usually a time limit T on making a move• How do we take this into account?
– cannot stop alpha-beta midway and expect to useresults with any confidence
– so, we could set a conservative depth-limit that guarantees we will find a move in time < T
– but then, the search may finish early andthe opportunity is wasted to do more search
Rutgers CS440, Fall 2003
Dealing with Limited Time
• In practice, iterative deepening search (IDS) is used– run alpha-beta search with an increasing depth limit
– when the clock runs out, use the solution foundfor the last completed alpha-beta search(i.e., the deepest search that was completed)
Rutgers CS440, Fall 2003
The Horizon Effect
• Sometimes disaster lurks just beyond search depth– computer captures queen, but a few moves later the opponent
checkmates (i.e., wins)
• The computer has a limited horizon; it cannotsee that this significant event could happen
• How do you avoid catastrophic losses due to “short-sightedness”?– quiescence search
– secondary search
Rutgers CS440, Fall 2003
The Horizon Effect
• Quiescence Search– when evaluation frequently changing, look deeper than limit
– look for a point when game “quiets down”
• Secondary Search1. find best move looking to depth d2. look k steps beyond to verify that it still looks good
3. if it doesn't, repeat Step 2 for next best move
Rutgers CS440, Fall 2003
Book Moves
• Build a database of opening moves, end games, and studied configurations
• If the current state is in the database, use database:– to determine the next move
– to evaluate the board
• Otherwise, do alpha-beta search
Rutgers CS440, Fall 2003
Examples of Algorithmswhich Learn to Play Well
Checkers:
A. L. Samuel, “Some Studies in Machine Learning using the Game of Checkers,” IBM Journal of Research and Development, 11(6):601-617, 1959
• Learned by playing a copy of itself thousands of times• Used only an IBM 704 with 10,000 words of RAM, magnetic
tape, and a clock speed of 1 kHz• Successful enough to compete well at human tournaments
Rutgers CS440, Fall 2003
Examples of Algorithmswhich Learn to Play Well
Backgammon:
G. Tesauro and T. J. Sejnowski, “A Parallel Network that Learns to Play Backgammon,” Artificial Intelligence 39(3), 357-390, 1989
• Also learns by playing copies of itself• Uses a non-linear evaluation function - a neural network • Rated one of the top three players in the world
Rutgers CS440, Fall 2003
Non-deterministic Games
• Some games involve chance, for example:– roll of dice– spin of game wheel– deal of cards from shuffled deck
• How can we handle games with random elements?• The game tree representation is extended
to include chance nodes:1. agent moves2. chance nodes3. opponent moves
Rutgers CS440, Fall 2003
Non-deterministic Games
The game tree representation is extended:
Aα=
Bβ=2
7 2
Cβ=6
9 6
Dβ=0
5 0
Eβ=-4
8 -4
50/50 50/50
.5 .5 .5 .5
max
chance
min
Rutgers CS440, Fall 2003
Non-deterministic Games
• Weight score by the probabilities that move occurs• Use expected value for move: sum of possible random
outcomes
Aα=
Bβ=2
7 2
Cβ=6
9 6
Dβ=0
5 0
Eβ=-4
8 -4
50/50 50/50
.5 .5 .5 .5
max
chance
min
50/504
50/50-2
Rutgers CS440, Fall 2003
Non-deterministic Games
• Choose move with highest expected value
Aα=
Bβ=2
7 2
Cβ=6
9 6
Dβ=0
5 0
Eβ=-4
8 -4
50/504
50/50-2
.5 .5 .5 .5
max
chance
min
Aα=4
Rutgers CS440, Fall 2003
Non-deterministic Games
• Non-determinism increases branching factor– 21 possible rolls with 2 dice
• Value of lookahead diminishes: as depth increases probability of reaching a given node decreases
• alpha-beta pruning less effective• TDGammon:
– depth-2 search
– very good heuristic
– plays at world champion level
Rutgers CS440, Fall 2003
Computers can playGrandMaster Chess
“Deep Blue” (IBM)• Parallel processor, 32 nodes• Each node has 8 dedicated VLSI “chess chips”• Can search 200 million configurations/second• Uses minimax, alpha-beta, sophisticated heuristics• It currently can search to 14 ply (i.e., 7 pairs of moves)• Can avoid horizon by searching as deep as 40 ply• Uses book moves
Rutgers CS440, Fall 2003
Computers can playGrandMaster Chess
Kasparov vs. Deep Blue, May 1997• 6 game full-regulation chess match sponsored by ACM• Kasparov lost the match 2 wins & 1 tie to 3 wins & 1 tie• This was an historic achievement for computer chess being
the first time a computer became the best chess player on the planet
• Note that Deep Blue plays by “brute force” (i.e., raw power from computer speed and memory); it uses relatively little that is similar to human intuition and cleverness
Rutgers CS440, Fall 2003
Status of Computersin Other Deterministic Games
• Checkers/Draughts– current world champion is Chinook
– can beat any human, (beat Tinsley in 1994)
– uses alpha-beta search, book moves (> 443 billion)
• Othello– computers can easily beat the world experts
• Go– branching factor b ~ 360 (very large!)
– $2 million prize for any system that can beat a world expert