Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Artificial Intelligence and its applications
Lecture 4
Game Playing
Dr. Patrick [email protected]
South China University of Technology, China
1
Summary
Artificial Intelligence and its applications - Lecture 4: Game Playing2
Search
Game Playing
Markov Decision Processes
Reinforcement Learning
Dif
fic
ult
y
Constraint Satisfaction Problems
• From start state to goal state
• Consider constraints
• Consider an adversary
• Consider an uncertainty
• No information is given
Agenda
Expected Value
Expected Max Algorithm
Minimax Algorithm
Alpha-beta Pruning
Simultaneous Game
Artificial Intelligence and its applications - Lecture 4: Game Playing3
Games
Intelligence opponents in game
Benchmark of intelligence
Model for many applications: Military confrontations, negotiation, auctions, …
Artificial Intelligence and its applications - Lecture 4: Game Playing4
Search Problem VS Games
Search Problem
Independent from your decision
Usually has particular goal states
Can be offline (most our previous discussions focus on offline)
Games
Competition with adversary(Opponent acts according to your decision)
May not has a goal state
Decision should be made in limited time(approximation is needed)
Artificial Intelligence and its applications -Lecture 4: Game Playing
5
Game Type
Turn-based Strategy Games Take turns while playing
Chess, Board Game, Card Game
Focus in this Chapter
Real-Time Strategy Games Play simultaneously
TV and PC games
Learn later
Simultaneous Game Make decision at the same time
Briefly discuss in this chapter
Artificial Intelligence and its applications - Lecture 4: Game Playing6
Action by playerAction by opponent
Game Type
Artificial Intelligence and its applications - Lecture 4: Game Playing7
DeterministicNon-
Deterministic
Perfect Information
Imperfect Information
Backgammon
Battleships
Chess, Go, Checkers
Bridge, Poker
Checkers
Recall, Search Task
Deterministic + Full Knowledge + Static
Given: (s: state, s’: new state, a: action)
1. S: All states
1. Start (Initial) State
2a. End (Goal) State: Goal is well defined
b. Goal(s): Goal Test function returns T/F or score
2. Action(s): Possible actions on state s
3. Cost(s, a): Cost of taking action a in state s
4. Succ(s, a) = s’: Transition of states (s to s’) by a
Output: Path (sequence of actions)
Artificial Intelligence and its applications - Lecture 2: Search8
Some games consider the cost action
Which information is needed in Game?
Some games do not has End State
Artificial Intelligence and its applications - Lecture 4: Game Playing9
All games have objective but not all have the goal state
e.g. Flippy bird, Tetris
Formulation on Game Objective
Utility(s) / Reward(s): preference to reach s
Example: Utility(SGoal) > 0, Utility(Sdie) < 0
Turn-based Game: Task
Given (s: state, s’: new state, a: action)
1. S: State
1. sstart: Start (Initial) State
2. IsEnd(s): whether s is an end state (game over)
3. Actions(s): possible actions from s
4. Utility(s): reward for end state s
5. Succ(s, a) = s’: next state after taking action a in s
6. Players: who play the game, eg Players = {agent, opp}
7. Player(s) ϵ Players: which player moves in s
Output: Path (sequence of actions)
Artificial Intelligence and its applications - Lecture 4: Game Playing10
Additional Information
Action Cost is not considered explicitly
Turn-based Game
Example
State s:
IsEnd(s):
Actions(s):
Utility(s):
Succ(s, a) :
Players:
Artificial Intelligence and its applications - Lecture 4: Game Playing11
{white, black}
position of all pieces
checkmate or draw
legal chess moves
+1 if white wins 0 if draw, −1 if black wins
Assume you are white
Position after move
Turn-based Game
save the penguin
State s:
IsEnd(s):
Actions(s):
Utility(s):
Succ(s, a) :
Artificial Intelligence and its applications - Lecture 4: Game Playing12
position and colors of the remaining blocks
the penguin falls out
Select which blocks in your color (blue or white) will be taken down
+1 if the penguin falls out in your turnOtherwise −1
position and colors of the remaining blocks after player’s action
Turn-based Game
Example: Bin Selection
Goal: maximize the chosen number
You choose one of the bin
Opponent chooses a number in your chosen bin
Which one your should choose?
A
-50 50
B
1 3
C
-5 15
Artificial Intelligence and its applications - Lecture 4: Game Playing13
Turn-based Game
Example: Bin Selection
Depend on the goal of your opponent
Adversarial Opponent minimize your reward
Against with you
Helpful Opponent maximize your reward
Work together
Stochastic Opponent expected value
Randomly choose a value(human likes that?)
Artificial Intelligence and its applications - Lecture 4: Game Playing14
A-50 50
B1 3
C-5 15
-50 1 -5Opponent (Min)
50 3 15Opponent (Max)
0 2 5Opponent (Avg)(Assume uniformly random)
You ( )
You (Max)
You (Max)
Max
Turn-based Game
Node Type
SearchNode: a state (All nodes are the same)
GameNode: a state and policy ( ) of a player
Min Node: downward-pointing triangle
Deterministic policy (s, a)
Max Node: upward-pointing triangle
Deterministic policy (s, a)
Chance Node: Circle
Stochastic policy (s, a) [0, 1]
Probability of taking a in sArtificial Intelligence and its applications - Lecture 4: Game Playing15
0 Other a
1 a yields the min reward
0 Other a
1 a yields the max reward
Turn-based Game: Node Type
Example: Bin Selection
Goal: maximize the chosen number
You choose one of the bin
Opponent chooses a number in your chosen bin
Two players, two policies (πagent and πagent)
Graph representing decision flow but not policy
Artificial Intelligence and its applications - Lecture 4: Game Playing16
A-50 50
B1 3
C-5 15πopp
πagent
(Agent means you!)
-50 50 1 3 -5 15
A B C
πagent
πopp
Utility
Turn-based Game: Node Type
Example: Bin Selection
Artificial Intelligence and its applications - Lecture 4: Game Playing17
A-50 50
B1 3
C-5 15
-50 1 -5Opponent (Min)
You ( )Max-50 50 1 3 -5 15
A B C
50 3 15Opponent (Max)
You (Max)-50 50 1 3 -5 15
A B C
0 2 5Opponent (Avg)Assume random
�(s, a) = 0.5 for all s and a
You (Max)0.5 0.5 0.5 0.5 0.5 0.5
-50 50 1 3 -5 15
A B C
Turn-based Game: Node Type
Example
Which action an agent will choose in minimax?
2
Artificial Intelligence and its applications - Lecture 4: Game Playing18
2-50 7 5 6 8 2 7 2 5 10 22 6
2 2 6
5 2 7 10 6
6
Turn-based Game
Value Function
V(s) denotes the value (utility) of state s
Last state: the value is given
Other state: the value can be calculated according to the policy
Artificial Intelligence and its applications - Lecture 4: Game Playing19
-50 50 1 3 -5 15
A B C
Last stateValue is given
Other stateValue should be calculated
Turn-based Game
Value Function
Value calculation
Artificial Intelligence and its applications - Lecture 4: Game Playing20
-50 50
A
-50 50
A
s(A,L) s(A,R)
-50 50
A
s(A,L) s(A,R)
�∈������(�)
(�,�) (�,�)
�∈������(�)
(�,�) (�,�)
(�,�) + (�,�)
+
�∈������(�)
max�∈������(�)
�(����(�, �))
min�∈������(�)
�(����(�, �))
� � �, � × �(����(�, �))
�∈������(�)
s(A,L) s(A,R)
Turn-based Game
Value Function: Example
Artificial Intelligence and its applications - Lecture 4: Game Playing21
max�∈������(�)
�(����(�, �))
min�∈������(�)
�(����(�, �))
� � �, � × �(����(�, �))
�∈������(�)
= max�∈������(�)
�(����(�, �))� �
s(A,L) s(A,R)
L R
RL
s(B,L) s(B,R) s(C,L) s(C,R)
L R L R
O
M
-50 50 1 3 -5 15
A B C
= max �(����(�, �)), �(����(�, �)), �(����(�, �))
= max �(�)), �(�), �(�)
� � = min�∈������(�)
�(����(�, �))
= min �(����(�, �)), �(����(�, �))
= min �(�(�,�)), �(�(�,�))
= min �(�(�,�)), �(�(�,�)) � �
= min �(�(�,�)), �(�(�,�)) � �
(�,�)
(�,�)
(�,�)
(�,�)
(�,�)
(�,�)
Turn-based Game: Value Function
Exercise
Calculate the value for the following games?
Artificial Intelligence and its applications - Lecture 4: Game Playing22
50-50 31 15-5
1/3 1/31/3
0.5 0.5 0.5 0.5 0.5 0.5
S1
S2 S3 S4
20-10 3010 155
0.5
0.3 0.6
0.20.3
S1
S2
S3 S4
Turn-based Game: Value Function
Exercise
Artificial Intelligence and its applications - Lecture 4: Game Playing23
50-50 31 15-5
1/3 1/3 1/3
0.5 0.5 0.5 0.5 0.5 0.5
S1
S2 S3 S4
20-10 3010 155
0.5
0.3 0.6
0.20.3
S1
S2
S3 S4
Opponent(Avg) 0 2 5
You (Avg) 7/3
Opponent(mix) 20 10 10.5
You (Avg) 43.5
Turn-based Game
Example: Bin Selection 2
A New Game: Rules
Goal: maximize the chosen number
You choose one of the three bins
Then Flip a coin;
if heads, then move one bin to the left (with wrap around)
If not, just stick on your choice
Your opponent chooses a number from that bin
A
-50 50
B
1 3
C
-5 15
Artificial Intelligence and its applications - Lecture 4: Game Playing24
Turn-based Game
Example: Bin Selection 2
Three parties: Players = {agent, opp, coin}
agent coin opp
V (s) =
Utility
maxaϵActions(s) V (Succ(s,a))
ΣaϵActions(s) πcoin(s,a) V(Succ(s,a))
isEnd(s)
Player(s) = agent
Player(s) = oppminaϵActions(s) V(Succ(s,a))
Player(s) = coin
Artificial Intelligence and its applications - Lecture 4: Game Playing25
Utility
Turn-based Game
Example: Bin Selection 2
-50 50 -5 15
1/2 1/2
31 50-50
1/2 1/2
15-5 31
1/2 1/2
A-50 50
B1 3
C-5 15
Artificial Intelligence and its applications - Lecture 4: Game Playing26
You choose one of the three bins Flip a coin;
if heads, then move one bin to the left (with wrap around) If not, just stick on your choice
Your opponent chooses a number from that bin Your goal is to maximize the chosen number
Turn-based Game
Example: Bin Selection 2
50-50 15-5
1/2 1/2
31 50-50
1/2 1/2
15-5 31
1/2 1/2
A-50 50
B1 3
C-5 15
5-50 -501 1-5
-2
-24.5 -2-27.5
V (s) = max(E(min(-50,50), min(-5,15)),E(min(1,3), min(-50,50),E(min(-5,15), min(1,3)
)
= max(E(-50, -5), E(1, -50), E(-5,1)) = max(-27.5,-24.5,-2)= -2
Artificial Intelligence and its applications - Lecture 4: Game Playing27
Turn-based Game
Minimax
Zero-sum game
One player's gain is equivalent to another's loss
Players are not collaborate but compete with others
All opponent(s) aims to minimize your utility
This problem is calledMinimax, MinMax, MM or saddle point
Minimax is used in our course
Artificial Intelligence and its applications - Lecture 4: Game Playing28
Turn-based Game
Minimax
Minimax assumes opponent selects the worst action to an agent
Example for two players
…agent opp agent
V(s)=
Utility
maxaϵActions(s) Vmax,min(Succ(s,a))
minaϵActions(s) Vmax,min(Succ(s,a))
isEnd(s)
Player(s) = agent
Player(s) = opp
Artificial Intelligence and its applications - Lecture 4: Game Playing29
Turn-based Game: Minimax
Example
Which action an agent will choose in minimax? What is it value at Sstart?
50-50 31 15-5
-51-50
maxaϵActions(s) Vmax,min(Succ(s,a))
Action = M
V(Sstart) = 1
minaϵActions(s) Vmax,min(Succ(s,a))
1
Artificial Intelligence and its applications - Lecture 4: Game Playing30
ML R
L R L R L R
Sstart
Turn-based Game: Minimax
Exercise
Which action an agent will choose in minimax? What is it value at Sstart?
Artificial Intelligence and its applications - Lecture 4: Game Playing31
515 38 210 2122 1011 1612 155 1918
8 7 20 14
Sstart
V(Sstart) = 15
1 7
5 3 2
7 85
5
15
15
11 5 15
15 20
Turn-based Game
Time Complexity
After a game is modeled as a tree, the search technique can be used
Even a simple game like Tic Tac Toe, the tree is very complicated How about Go?
A long path to get the utility
Time / Space complexity is large in practice
https://commons.wikimedia.org/wiki/File:Tic-tac-toe-full-game-tree-x-rational.jpg
utility
Artificial Intelligence and its applications - Lecture 4: Game Playing32
Turn-based Game
Time Complexity
Complexity:
Space: O(d)
Time: O(b2d)
Example: Chessb ≈ 35, d ≈ 50
Time Complexity ≈ 25515520672986852924121150151425587630190414488161019324176778440771467258239937365843732987043555789782336195637736653285543297897675074636936187744140625
Artificial Intelligence and its applications - Lecture 4: Game Playing33
where b: branching factor (Width)d: depth (Height)
Turn-based Game
Advanced Methods
How to speed up the process?
Evaluation Functions
Not access TRUE utility but approximate it
Stop earlier
Require domain-specific knowledge
Alpha-beta Pruning
Compute TRUE utility
Ignore unnecessary path
General-purpose
Artificial Intelligence and its applications - Lecture 4: Game Playing34
Turn-based Game: Advanced Method
Evaluation Function
Sstart
Utility = 1 (win)
Send
Very tall
s
Utility = ???
dmax
Original Evaluation Functions
Artificial Intelligence and its applications - Lecture 4: Game Playing35
Evaluation!
Turn-based Game: Advanced Method
Evaluation Function
Limited depth tree search (stop at maximum depth dmax)
Eval(s) evaluates the value of V(s) at dmax
(may be very inaccurate)
V (s,d)=
Utility
∑aϵActions(s) V(Succ(s, πagent(s)), d-1)
∑aϵActions(s) V(Succ(s, πopp(s)), d-1)
isEnd(s)
Player(s) = agent
Player(s) = opp
Eval(s) d=0
dmax
Artificial Intelligence and its applications - Lecture 4: Game Playing36
πagent
Turn-based Game: Advanced Method: Evaluation Function
Example
Example: Chess
Eval(s) = material + mobility + king-safety + center-control
Material: 10100(K – K’) + 9(Q – Q’) + 5(R –R’)+ 3(B – B’) + 3(N – N’) + 1(P – P’)
K : King , Q : Queen , R: rook ,B : bishop , N : Knight , P : Pawn
A – A’ : the difference in A due to the move
Mobility: 0.1 x (legal_move# - legal_move#’)
King-safety: keeping the king safe is good
Center-control: control the center of the board
Artificial Intelligence and its applications - Lecture 4: Game Playing37
Turn-based Game: Advanced Method
Alpha-beta Pruning
In some cases, visiting some branches is not necessary in minimax algorithm
For example After evaluating V(succ(S3,L),
should V(succ(S3,R) be evaluated?
Consider S1
V(S1) = max(V(succ(S1,L)), V(succ(S1,R)))
Therefore, V(S1) ≥ 3
Consider S3
V(S3) = min(V(succ(S3,L)), V(succ(S3,R)))
Therefore, V(S3) ≤ 2
No need to further investigate on V(succ(S3,R)
Artificial Intelligence and its applications - Lecture 4: Game Playing38
53 2
≤ 23
Should we evaluate this?
…
S1
S2 S3L R
3
2
≥ 3
Turn-based Game: Advanced Method
Alpha-beta Pruning
Prune a node if its value is not in the interval bounded by and , (i.e. ( ), v is value of node)
�
(greatest lower bound)
where as : lower bound on value of max node s
�
(least upper bound)
where bs : upper bound on value of min node s
Artificial Intelligence and its applications - Lecture 4: Game Playing39
Turn-based Game: Advanced Method: Alpha-beta Pruning
Example 1
Example:
The last node can be pruned
≤ 8
≥ 5
≤ 4
≥ 6
2 4 6 8
α βα
Not inside α and β
Artificial Intelligence and its applications - Lecture 4: Game Playing40
8
5
4
6
�
�
(greatest lower bound)
(least upper bound)
Turn-based Game: Advanced Method: Alpha-beta Pruning
Example 2
9 7
6
3 4 7 9
8
≥ 7
≤ 97 ≤ 6 ≤ 8≤ 5
7
Artificial Intelligence and its applications - Lecture 4: Game Playing41
6 8
β α
6 8
β α
6 8
β α105
6 8
β
6 8
α
(d)
Still need to check the rest as the value may be equal to 8
(d)
(a)
Still need to check the rest as no bound yet
(a)
(b)
All branches are checked. Return 7 to itsparents
(b) (c)
No need to check this branch as the value cannot be bigger than 6
(c)
(e)
No need to check the rest of branches as the value cannot be bigger than 5
(e)
Turn-based Game: Advanced Method: Alpha-beta Pruning
Example 2
Artificial Intelligence and its applications - Lecture 4: Game Playing42
9 7
6
3 4 7 9
8 5 10
≤ 5≤ 6
7
7
≤ 9
9 7
6
3 4 7 9
8 5 10
9 7
6
3 4 7 9
8 5 10
≥ 7
7
Prune here!
9 7
6
3 4 7 9
8 5 10
≤ 6
≥ 7
7
9 7
6
3 4 7 9
8 5 10
≤ 8≤ 6
≥ 7
7
9 7
6
3 4 7 9
8 5 10
≤ 5≤ 6
≥ 7
7
Prune here!
Turn-based Game: Advanced Method: Alpha-beta Pruning
Example 3
Artificial Intelligence and its applications - Lecture 4: Game Playing43
50 20 -2 15
1/2 1/2
31 -5050 155 31
≤ 5020 ≤ -2-2
9
≥ 9
≤ 1
≤ 1
≤ 55
≥ 5
≤ 11
5
9
No pruning can be done on a chance node
Turn-based Game: Advanced Method: Alpha-beta Pruning
Exercise
Alpha-beta Pruning method to find out V(Sstart)
Artificial Intelligence and its applications - Lecture 4: Game Playing44
515 38 210 2122 1011 1612 155 1918
8 7 20 14
Sstart
≤ 21
V(Sstart) = 15
≤ 5150 30
1 7
≤ 10 ≤ 15
≥ 20
150 30
≥ 1
≥ 5150 30
150 30≤ 8
150 30
150 30≤ 3
150 30
150 30≥ 8
150 30 150 30 150 30
150 30
≤ 15150 30
≥ 11150 30
11 > 5, Larger Lower Bound
150 30
150 30≥ 15
150 30
≥ 5
5
5
2
150 30
≥ 2
7
5
11 15
15
15
15
Simultaneous Game
Players will take actions at the same time in simultaneous game
For example: Two-finger Morra
Rules
Players A and B each show 1 or 2 fingers.
If both show 1, B gives A 2 dollars.
If both show 2, B gives A 4 dollars.
Otherwise, A gives B 3 dollars
Artificial Intelligence and its applications - Lecture 4: Game Playing45
Simultaneous Game
Two-finger Morra
Goal: Maximize the dollars
If both show 1, B gives A 2 dollars.
If both show 2, B gives A 4 dollars.
Otherwise, A gives B 3 dollars
Value V(a,b), where a,b ϵ Action
Example: Action = {1 finger, 2 fingers}
A
B
2 -3
-3 4Artificial Intelligence and its applications - Lecture 4: Game Playing46
Simultaneous Game
Expected Value
Value of the gameif A and B use the policy πA and πB
V(πA, πB) = Σa,baction πA(a) πB(b) V(a,b)
where πA(a) is the probability of A takes the action a
Example: πA = [1, 0], πB= [1/2, 1/2]
V(πA, πB) = Σa,b πA(a) πB(b) V(a,b)
A
B2 -3
-3 4
= (1 x 1/2 x 2)+(1 x 1/2 x -3)+(0 x 1/2 x -3)+(0 x 1/2 x 4)
= -1/2
Artificial Intelligence and its applications - Lecture 4: Game Playing47
Simultaneous Game: Exercise
Prisoner’s Dilemma
Two members of a criminal gang are arrested
Each prisoner is in solitary confinement
no communication
Prosecutors lack sufficient evidence
Offer each prisoner a bargain
Betray: betray the other by testifying that the other committed the crime
Remain Silent: say nothing
Artificial Intelligence and its applications - Lecture 4: Game Playing48
Simultaneous Game: Exercise
Prisoner’s Dilemma
Four outcomes
If A and B each betray the other, each of them serves two years in prison
If A betrays B but B remains silent, A will be set free and B will serve three years in prison (and vice versa)
If A and B both remain silent, both of them will serve only one year in prison (on the lesser charge).
What should you do if you are one of the prisoners?
Artificial Intelligence and its applications - Lecture 4: Game Playing49
Simultaneous Game: Exercise
Prisoner’s Dilemma
If A and B each betray the other, each of them serves two years in prison
If A betrays B but B remains silent, A will be set free and B will serve three years in prison (and vice versa)
If A and B both remain silent, both of them will serve only one year in prison (on the lesser charge)
Artificial Intelligence and its applications - Lecture 4: Game Playing50
V(A) V(B) Stay Silent Betray
Stay Silent (S)
Betray (B)
-1 -1
0 -3
-3 0
-2 -2
A
B
Simultaneous Game: Exercise
Prisoner’s Dilemma
Assume you are A, the policy of B is πB= [0.3, 0.7]
V(A)
Artificial Intelligence and its applications - Lecture 4: Game Playing51
V(A) V(B) Stay Silent Betray
Stay Silent (S)
Betray (B)
-1 -1
0 -3
-3 0
-2 -2
A
B
V(πA, πB) = Σa,baction πA(a) πB(b) V(a,b)
= (πA(S) x (0.3 x -1 + 0.7 x -3)) + (πA(B) x (0.3 x 0 + 0.7 x -2))
= (πA(S) x 0.3 x -1) + (πA(S) x 0.7 x -3) +(πA(B) x 0.3 x 0) + (πA(B) x 0.7 x -2)
= (-2.4) x πA(S) + (-1.4) x πA(B)
Simultaneous Game: Exercise
Prisoner’s Dilemma
Coefficient of “betray” (-2.4) is larger than “stay silent” (-1.4)
Is “betray” better than “stay silent”?
Yes if only play one time (One-Shot Game)
How about play several times?
Artificial Intelligence and its applications - Lecture 4: Game Playing52
V(A) = (-2.4) x πA(S) + (-1.4) x πA(B)
V(A) V(B) Stay Silent Betray
Stay Silent (S)
Betray (B)
-1 -1
0 -3
-3 0
-2 -2
A
B
Simultaneous Game: Exercise
Prisoner’s Dilemma
The smartest way to play this game many times is to make a decision which is good for all players
Consider A and B as a group,
Both “betray” is the worst (-2 + -2 = -4)
Both “stay silent” is the best (-1 + -1 = -2)
Artificial Intelligence and its applications - Lecture 4: Game Playing53
V(A) V(B) Stay Silent Betray
Stay Silent (S)
Betray (B)
-1 -1
0 -3
-3 0
-2 -2
A
B
Summary
Artificial Intelligence and its applications - Lecture 4: Game Playing54
Search
Game Playing
Markov Decision Processes
Reinforcement Learning
Dif
fic
ult
y
Constraint Satisfaction Problems
• From start state to goal state
• Consider constraints
• Consider an adversary
• Consider an uncertainty
• No information is given