Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
COMP 102: Excursions in Computer ScienceLecture 21: Game Playing
Instructor: Joelle Pineau ([email protected])
Class web page: www.cs.mcgill.ca/~jpineau/comp102
Joelle Pineau2COMP-102
Overview of AI
• Typically three components:
Joelle Pineau3COMP-102
Example AI system: Chess playing
IBM Deep Blue defeated Garry Kasparov (1997)
• Perception: advanced features of the board.Actions: choose a move.
• Reasoning: search and evaluation of possible board positions.
http://www-03.ibm.com
www.bobby-fischer.net
Joelle Pineau4COMP-102
Game playing
• One of the oldest, most well-studied domains in AI! Why?
– People like them! People are good at playing them!
– Often viewed as an indicator of intelligence.• State spaces are very large and complicated.• Sometimes there is stochasticity and imperfect information.
– Clear, clean description of the environment.
– Easy performance indicator.
“Games are to AI as Grand Prix racing is to automobile design”.
Joelle Pineau5COMP-102
Start with an easy game: Tic-Tac-Toe
Joelle Pineau6COMP-102
Defining a search problem for games
• State space S: all possible configurations of the domain.
– Initial state s0 ∈ S: the start state E.g.
– Goal states G ⊂ S: the set of end states
E.g.
etc.
• Actions A: the set of moves available
Joelle Pineau7COMP-102
Defining a search problem for games
• Path: a sequence of states and operators.
• Solution of search problem: a path from s0 to sg∈G
• Utility: a numerical value associated with a state (higher is
better, lower is worse).
E.g. +1 if it’s a win,
-1 if it’s a loss,
0 if it’s a draw or game not terminated.
Joelle Pineau8COMP-102
Representing search: Graphs and Trees
• Visualize the state space search in terms of a graph.
• Graph defined by a set of nodes and a set of edges connecting
the vertices.
– Nodes correspond to states.
– Edges correspond to actions.
• We search for a solution by building a search trees and
traversing it to find a goal state.
Joelle Pineau9COMP-102
Search tree for Tic-Tac-Toe
x x xx x x
x x x
o oo o o
o o o
x x x x x x x x
Etc.
We want to find a strategy (i.e. way of picking moves) that wins the game.
Joelle Pineau10COMP-102
Game search challenge
• Not quite the same as simple graph searching.
• There is an opponent! The opponent is malicious!
– Opponent is trying to make things good for itself, and bad for us.
– We have to simulate the opponent’s decisions.
• Key idea:– Define a max player (who wants to maximize the utility)
– And a min player (who wants to minimize the utility.)
Joelle Pineau11COMP-102
Example: Tic-Tac-Toe
Joelle Pineau12COMP-102
Minimax search
• Expand complete search tree, until terminal states have been
reached and their utilities computed.
• Go back up from leaves towards the current state of the game.
– At each min node: backup the worst value among the children.
– At each max node: backup the best value among the children.
Joelle Pineau13COMP-102
A simple Minimax example
MAX
MIN
MAX
3 12 8 2 14 5 2 6 11
a1 a2 a3 a1 a2 a3 a1 a2 a3
a1 a2 a3
3 2 2
3
Joelle Pineau14COMP-102
Properties of Minimax search
• Can we use minimax to solve any game?
– Solve Tic-Tac-Toe? Yes!
– Solve chess? No.
• Why not?
– Large number of actions possible (I.e. large branching factor) b≈35.
– Path to goal may be very long (I.e. deep tree) m≈100
– Large number of states!
Joelle Pineau15COMP-102
Coping with resource limitations
• Suppose we have 100 seconds to make a move, and we can
search 104 nodes per second.
– Can only search 106 nodes!
(Or even fewer, if we spend time deciding which nodes to search.)
• Possible approach:
– Only search to a pre-determined depth.
– Use an evaluation function for the nodes where we cutoff thesearch.
Joelle Pineau16COMP-102
Cutting the search effort
• Use an evaluation function to evaluate non-terminal nodes.
– Helps us make a decision without searching until the end of thegame.
• Minimax cutoff algorithm:
Same as standard Minimax, except stop at some maximum
depth m and use the evaluation function on those nodes.
Joelle Pineau17COMP-102
Evaluation functions
• An evaluation function v(s) represents the “goodness” of a board
state (e.g. chance of winning from that position).– Similar to a utility function, but approximate.
• If the features of the board can be evaluated independently, use
a linear combination:v(s) = f1(s) + f2(s) + … + fn(s) (where s is board state)
• This function can be given by the designer or learned from
experience.
Joelle Pineau18COMP-102
Example: Chess
• Evaluation function: v(s) = f1(s) + f2(s) f1(s) = w1 * [(# white queens) - (# black queens)]
f2(s) = w2 * [(# white pawns) - (# black pawns)]
Black to moveWhite slightly better
White to moveBlack winning
Joelle Pineau19COMP-102
How precise should the evaluation fn be?
• Evaluation function is only approximate, and is usually better if we are
close to the end of the game.
• Only the order of the numbers matter: payoffs in deterministic games act
as an ordinal utility function.
Joelle Pineau20COMP-102
Minimax cutoff in Chess
• How many moves ahead can we search in Chess?
>> 106 nodes with b=35 allows us to search 4 moves ahead!
• Is this useful?
4-moves ahead ≈ novice player
8-moves ahead ≈ human master, typical PC
12-moves ahead ≈ Deep Blue, Kasparov
• Key idea:
Search few lines of play, but search them deeply. Need pruning!
Joelle Pineau21COMP-102
α-β Pruning example
3 12 8
3
≥3
Joelle Pineau22COMP-102
α-β Pruning example
3 12 8 2
3 ≤2
≥3
Joelle Pineau23COMP-102
α-β Pruning example
3 12 8 2
3 ≤2
≥3
X X
Joelle Pineau24COMP-102
α-β Pruning example
3 12 8 2 14
3 ≤2
≥3
≤14
X X
Joelle Pineau25COMP-102
α-β Pruning example
3 12 8 2 14 5
3 ≤2
≥3
≤14 ≤5
X X
Joelle Pineau26COMP-102
α-β Pruning example
3 12 8 2 14 5 2
3 ≤2
≥3 3
≤14 ≤5 2
X X
Joelle Pineau27COMP-102
α-β Pruning
• Basic idea: if a path looks worse than what we already have,
ignore it.
– If the best move at a node cannot change (regardless of what wewould find by searching) then no need to search further!
• Algorithm is like Minimax, but keeps track of best leaf value for
our player (α) and best one for the opponent (β)
Joelle Pineau28COMP-102
Properties of α-β pruning
• Pruning does not affect the final result! You will not play worse than without it.
• Good move ordering is key to the effectiveness of pruning.
– With perfect ordering, explore approximately bm/2 nodes.• Means double the search depth, for same resources.• In chess: this is difference between novice and expert player.
– With bad move ordering, explore approximately bm nodes.• Means nothing was pruned.
– Evaluation function can be used to order the nodes.
The α-β pruning demonstrates the value of reasoning about which computations
are important!
Joelle Pineau29COMP-102
Human or computer - who is better?
Checkers:– 1994: Chinook (U.of A.) beat world champion Marion Tinsley, ending 40-yr reign.
Othello:– 1997: Logistello (NEC research) beat the human world champion.– Today: world champions refuse to play AI computer program (because it’s too good).
Chess:– 1997: Deep Blue (IBM) beat world champion Gary Kasparov
Backgammon:– TD-Gammon (IBM) is world champion amongst humans and computers
Go:– Human champions refuse to play top AI player (because it’s too weak)
Bridge:– Still out of reach for AI players because of coordination issue.
Joelle Pineau30
Jeopardy!• In Winter 2011, Watson, a computer program created by IBM,
made history by winning at Jeopardy!– Main innovation of Watson: ability to answer questions posed in
natural language.
COMP-102
Joelle Pineau31
Jeopardy!
• How it works:
– Watson isn’t connected to the internet, but had access to 4TB ofstored information (incl. all of Wikipedia).
– When given a question, it extracts keywords, looks in database forrelated facts, compiles list of answers, and ranks them byconfidence.
– Watson is much better at buzzing in than its human opponents. Soas long as it knows the answer, it has an edge.
Joelle Pineau32COMP-102
Take-home message
• Understand the basic components (state space, start state, end
state, utility function, etc.) required to represent the types of
games discussed today.
• Know how to build the search tree.
• Understand the how and why of Minimax, Alpha-beta pruning,
and evaluation functions.
• Have some intuition for what makes certain games harder than
others.