Summary - mlclab.org · Artificial Intelligence and its applications Lecture 4 Game Playing Dr. Patrick Chan [email protected] South China University of Technology, China 1 Summary

Artificial Intelligence and its applications

Lecture 4

Game Playing

Dr. Patrick [email protected]

South China University of Technology, China

1

Summary

Artificial Intelligence and its applications - Lecture 4: Game Playing2

Search

Game Playing

Markov Decision Processes

Reinforcement Learning

Dif

fic

ult

y

Constraint Satisfaction Problems

• From start state to goal state

• Consider constraints

• Consider an adversary

• Consider an uncertainty

• No information is given

Agenda

Expected Value

Expected Max Algorithm

Minimax Algorithm

Alpha-beta Pruning

Simultaneous Game


Games

Intelligence opponents in game

Benchmark of intelligence

Model for many applications: Military confrontations, negotiation, auctions, …


Search Problem VS Games

Search Problem

Independent from your decision

Usually has particular goal states

Can be offline (most our previous discussions focus on offline)

Games

Competition with adversary(Opponent acts according to your decision)

May not has a goal state

Decision should be made in limited time(approximation is needed)

Artificial Intelligence and its applications -Lecture 4: Game Playing

5

Game Type

Turn-based Strategy Games Take turns while playing

Chess, Board Game, Card Game

Focus in this Chapter

Real-Time Strategy Games Play simultaneously

TV and PC games

Learn later

Simultaneous Game Make decision at the same time

Briefly discuss in this chapter


Action by playerAction by opponent

Game Type


DeterministicNon-

Deterministic

Perfect Information

Imperfect Information

Backgammon

Battleships

Chess, Go, Checkers

Bridge, Poker

Checkers

Recall, Search Task

Deterministic + Full Knowledge + Static

Given: (s: state, s’: new state, a: action)

1. S: All states

1. Start (Initial) State

2a. End (Goal) State: Goal is well defined

b. Goal(s): Goal Test function returns T/F or score

2. Action(s): Possible actions on state s

3. Cost(s, a): Cost of taking action a in state s

4. Succ(s, a) = s’: Transition of states (s to s’) by a

Output: Path (sequence of actions)

Artificial Intelligence and its applications - Lecture 2: Search8

Some games consider the cost action

Which information is needed in Game?

Some games do not has End State


All games have objective but not all have the goal state

e.g. Flippy bird, Tetris

Formulation on Game Objective

Utility(s) / Reward(s): preference to reach s

Example: Utility(SGoal) > 0, Utility(Sdie) < 0

Turn-based Game: Task

Given (s: state, s’: new state, a: action)

1. S: State

1. sstart: Start (Initial) State

2. IsEnd(s): whether s is an end state (game over)

3. Actions(s): possible actions from s

4. Utility(s): reward for end state s

5. Succ(s, a) = s’: next state after taking action a in s

6. Players: who play the game, eg Players = {agent, opp}

7. Player(s) ϵ Players: which player moves in s

Output: Path (sequence of actions)


Additional Information

Action Cost is not considered explicitly

Turn-based Game

Example

State s:

IsEnd(s):

Actions(s):

Utility(s):

Succ(s, a) :

Players:


{white, black}

position of all pieces

checkmate or draw

legal chess moves

+1 if white wins 0 if draw, −1 if black wins

Assume you are white

Position after move

Turn-based Game

save the penguin

State s:

IsEnd(s):

Actions(s):

Utility(s):

Succ(s, a) :


position and colors of the remaining blocks

the penguin falls out

Select which blocks in your color (blue or white) will be taken down

+1 if the penguin falls out in your turnOtherwise −1

position and colors of the remaining blocks after player’s action

Turn-based Game

Example: Bin Selection

Goal: maximize the chosen number

You choose one of the bin

Opponent chooses a number in your chosen bin

Which one your should choose?

A

-50 50

B

1 3

C

-5 15


Turn-based Game


Depend on the goal of your opponent

Adversarial Opponent minimize your reward

Against with you

Helpful Opponent maximize your reward

Work together

Stochastic Opponent expected value

Randomly choose a value(human likes that?)


A-50 50

B1 3

C-5 15

-50 1 -5Opponent (Min)

50 3 15Opponent (Max)

0 2 5Opponent (Avg)(Assume uniformly random)

You ( )

You (Max)

You (Max)

Max

Turn-based Game

Node Type

SearchNode: a state (All nodes are the same)

GameNode: a state and policy ( ) of a player

Min Node: downward-pointing triangle

Deterministic policy (s, a)

Max Node: upward-pointing triangle

Deterministic policy (s, a)

Chance Node: Circle

Stochastic policy (s, a) [0, 1]

Probability of taking a in sArtificial Intelligence and its applications - Lecture 4: Game Playing15

0 Other a

1 a yields the min reward

0 Other a

1 a yields the max reward

Turn-based Game: Node Type



You choose one of the bin

Opponent chooses a number in your chosen bin

Two players, two policies (πagent and πagent)

Graph representing decision flow but not policy


A-50 50

B1 3

C-5 15πopp

πagent

(Agent means you!)

-50 50 1 3 -5 15

A B C

πagent

πopp

Utility




A-50 50

B1 3

C-5 15

-50 1 -5Opponent (Min)

You ( )Max-50 50 1 3 -5 15

A B C

50 3 15Opponent (Max)

You (Max)-50 50 1 3 -5 15

A B C

0 2 5Opponent (Avg)Assume random

�(s, a) = 0.5 for all s and a

You (Max)0.5 0.5 0.5 0.5 0.5 0.5

-50 50 1 3 -5 15

A B C


Example

Which action an agent will choose in minimax?

2


2-50 7 5 6 8 2 7 2 5 10 22 6

2 2 6

5 2 7 10 6

6

Turn-based Game

Value Function

V(s) denotes the value (utility) of state s

Last state: the value is given

Other state: the value can be calculated according to the policy


-50 50 1 3 -5 15

A B C

Last stateValue is given

Other stateValue should be calculated

Turn-based Game

Value Function

Value calculation


-50 50

A

-50 50

A

s(A,L) s(A,R)

-50 50

A

s(A,L) s(A,R)

�∈��(�)

(�,�) (�,�)

�∈��(�)

(�,�) (�,�)

(�,�) + (�,�)

+

�∈��(�)

max�∈��(�)

�(��(�, �))

min�∈��(�)

�(��(�, �))

� � �, � × �(��(�, �))

�∈��(�)

s(A,L) s(A,R)

Turn-based Game

Value Function: Example


max�∈��(�)

�(��(�, �))

min�∈��(�)

�(��(�, �))

� � �, � × �(��(�, �))

�∈��(�)

= max�∈��(�)

�(��(�, �))� �

s(A,L) s(A,R)

L R

RL

s(B,L) s(B,R) s(C,L) s(C,R)

L R L R

O

M

-50 50 1 3 -5 15

A B C

= max �(��(�, �)), �(��(�, �)), �(��(�, �))

= max �(�)), �(�), �(�)

� � = min�∈��(�)

�(��(�, �))

= min �(��(�, �)), �(��(�, �))

= min �(�(�,�)), �(�(�,�))

= min �(�(�,�)), �(�(�,�)) � �

= min �(�(�,�)), �(�(�,�)) � �

(�,�)

(�,�)

(�,�)

(�,�)

(�,�)

(�,�)

Turn-based Game: Value Function

Exercise

Calculate the value for the following games?


50-50 31 15-5

1/3 1/31/3

0.5 0.5 0.5 0.5 0.5 0.5

S1

S2 S3 S4

20-10 3010 155

0.5

0.3 0.6

0.20.3

S1

S2

S3 S4

Turn-based Game: Value Function

Exercise


50-50 31 15-5

1/3 1/3 1/3

0.5 0.5 0.5 0.5 0.5 0.5

S1

S2 S3 S4

20-10 3010 155

0.5

0.3 0.6

0.20.3

S1

S2

S3 S4

Opponent(Avg) 0 2 5

You (Avg) 7/3

Opponent(mix) 20 10 10.5

You (Avg) 43.5

Turn-based Game

Example: Bin Selection 2

A New Game: Rules


You choose one of the three bins

Then Flip a coin;

if heads, then move one bin to the left (with wrap around)

If not, just stick on your choice

Your opponent chooses a number from that bin

A

-50 50

B

1 3

C

-5 15


Turn-based Game


Three parties: Players = {agent, opp, coin}

agent coin opp

V (s) =

Utility

maxaϵActions(s) V (Succ(s,a))

ΣaϵActions(s) πcoin(s,a) V(Succ(s,a))

isEnd(s)

Player(s) = agent

Player(s) = oppminaϵActions(s) V(Succ(s,a))

Player(s) = coin


Utility

Turn-based Game


-50 50 -5 15

1/2 1/2

31 50-50

1/2 1/2

15-5 31

1/2 1/2

A-50 50

B1 3

C-5 15


You choose one of the three bins Flip a coin;

if heads, then move one bin to the left (with wrap around) If not, just stick on your choice

Your opponent chooses a number from that bin Your goal is to maximize the chosen number

Turn-based Game


50-50 15-5

1/2 1/2

31 50-50

1/2 1/2

15-5 31

1/2 1/2

A-50 50

B1 3

C-5 15

5-50 -501 1-5

-2

-24.5 -2-27.5

V (s) = max(E(min(-50,50), min(-5,15)),E(min(1,3), min(-50,50),E(min(-5,15), min(1,3)

)

= max(E(-50, -5), E(1, -50), E(-5,1)) = max(-27.5,-24.5,-2)= -2


Turn-based Game

Minimax

Zero-sum game

One player's gain is equivalent to another's loss

Players are not collaborate but compete with others

All opponent(s) aims to minimize your utility

This problem is calledMinimax, MinMax, MM or saddle point

Minimax is used in our course


Turn-based Game

Minimax

Minimax assumes opponent selects the worst action to an agent

Example for two players

…agent opp agent

V(s)=

Utility

maxaϵActions(s) Vmax,min(Succ(s,a))

minaϵActions(s) Vmax,min(Succ(s,a))

isEnd(s)

Player(s) = agent

Player(s) = opp


Turn-based Game: Minimax

Example

Which action an agent will choose in minimax? What is it value at Sstart?

50-50 31 15-5

-51-50

maxaϵActions(s) Vmax,min(Succ(s,a))

Action = M

V(Sstart) = 1

minaϵActions(s) Vmax,min(Succ(s,a))

1


ML R

L R L R L R

Sstart

Turn-based Game: Minimax

Exercise

Which action an agent will choose in minimax? What is it value at Sstart?


515 38 210 2122 1011 1612 155 1918

8 7 20 14

Sstart

V(Sstart) = 15

1 7

5 3 2

7 85

5

15

15

11 5 15

15 20

Turn-based Game

Time Complexity

After a game is modeled as a tree, the search technique can be used

Even a simple game like Tic Tac Toe, the tree is very complicated How about Go?

A long path to get the utility

Time / Space complexity is large in practice

https://commons.wikimedia.org/wiki/File:Tic-tac-toe-full-game-tree-x-rational.jpg

utility


Turn-based Game

Time Complexity

Complexity:

Space: O(d)

Time: O(b2d)

Example: Chessb ≈ 35, d ≈ 50

Time Complexity ≈ 25515520672986852924121150151425587630190414488161019324176778440771467258239937365843732987043555789782336195637736653285543297897675074636936187744140625


where b: branching factor (Width)d: depth (Height)

Turn-based Game

Advanced Methods

How to speed up the process?

Evaluation Functions

Not access TRUE utility but approximate it

Stop earlier

Require domain-specific knowledge

Alpha-beta Pruning

Compute TRUE utility

Ignore unnecessary path

General-purpose


Turn-based Game: Advanced Method

Evaluation Function

Sstart

Utility = 1 (win)

Send

Very tall

s

Utility = ???

dmax

Original Evaluation Functions


Evaluation!


Evaluation Function

Limited depth tree search (stop at maximum depth dmax)

Eval(s) evaluates the value of V(s) at dmax

(may be very inaccurate)

V (s,d)=

Utility

∑aϵActions(s) V(Succ(s, πagent(s)), d-1)

∑aϵActions(s) V(Succ(s, πopp(s)), d-1)

isEnd(s)

Player(s) = agent

Player(s) = opp

Eval(s) d=0

dmax


πagent

Turn-based Game: Advanced Method: Evaluation Function

Example

Example: Chess

Eval(s) = material + mobility + king-safety + center-control

Material: 10100(K – K’) + 9(Q – Q’) + 5(R –R’)+ 3(B – B’) + 3(N – N’) + 1(P – P’)

K : King , Q : Queen , R: rook ,B : bishop , N : Knight , P : Pawn

A – A’ : the difference in A due to the move

Mobility: 0.1 x (legal_move# - legal_move#’)

King-safety: keeping the king safe is good

Center-control: control the center of the board



Alpha-beta Pruning

In some cases, visiting some branches is not necessary in minimax algorithm

For example After evaluating V(succ(S3,L),

should V(succ(S3,R) be evaluated?

Consider S1

V(S1) = max(V(succ(S1,L)), V(succ(S1,R)))

Therefore, V(S1) ≥ 3

Consider S3

V(S3) = min(V(succ(S3,L)), V(succ(S3,R)))

Therefore, V(S3) ≤ 2

No need to further investigate on V(succ(S3,R)


53 2

≤ 23

Should we evaluate this?

…

S1

S2 S3L R

3

2

≥ 3


Alpha-beta Pruning

Prune a node if its value is not in the interval bounded by and , (i.e. ( ), v is value of node)

�

(greatest lower bound)

where as : lower bound on value of max node s

�

(least upper bound)

where bs : upper bound on value of min node s


Turn-based Game: Advanced Method: Alpha-beta Pruning

Example 1

Example:

The last node can be pruned

≤ 8

≥ 5

≤ 4

≥ 6

2 4 6 8

α βα

Not inside α and β


8

5

4

6

�

�

(greatest lower bound)

(least upper bound)


Example 2

9 7

6

3 4 7 9

8

≥ 7

≤ 97 ≤ 6 ≤ 8≤ 5

7


6 8

β α

6 8

β α

6 8

β α105

6 8

β

6 8

α

(d)

Still need to check the rest as the value may be equal to 8

(d)

(a)

Still need to check the rest as no bound yet

(a)

(b)

All branches are checked. Return 7 to itsparents

(b) (c)

No need to check this branch as the value cannot be bigger than 6

(c)

(e)

No need to check the rest of branches as the value cannot be bigger than 5

(e)


Example 2


9 7

6

3 4 7 9

8 5 10

≤ 5≤ 6

7

7

≤ 9

9 7

6

3 4 7 9

8 5 10

9 7

6

3 4 7 9

8 5 10

≥ 7

7

Prune here!

9 7

6

3 4 7 9

8 5 10

≤ 6

≥ 7

7

9 7

6

3 4 7 9

8 5 10

≤ 8≤ 6

≥ 7

7

9 7

6

3 4 7 9

8 5 10

≤ 5≤ 6

≥ 7

7

Prune here!


Example 3


50 20 -2 15

1/2 1/2

31 -5050 155 31

≤ 5020 ≤ -2-2

9

≥ 9

≤ 1

≤ 1

≤ 55

≥ 5

≤ 11

5

9

No pruning can be done on a chance node


Exercise

Alpha-beta Pruning method to find out V(Sstart)


515 38 210 2122 1011 1612 155 1918

8 7 20 14

Sstart

≤ 21

V(Sstart) = 15

≤ 5150 30

1 7

≤ 10 ≤ 15

≥ 20

150 30

≥ 1

≥ 5150 30

150 30≤ 8

150 30

150 30≤ 3

150 30

150 30≥ 8

150 30 150 30 150 30

150 30

≤ 15150 30

≥ 11150 30

11 > 5, Larger Lower Bound

150 30

150 30≥ 15

150 30

≥ 5

5

5

2

150 30

≥ 2

7

5

11 15

15

15

15

Simultaneous Game

Players will take actions at the same time in simultaneous game

For example: Two-finger Morra

Rules

Players A and B each show 1 or 2 fingers.

If both show 1, B gives A 2 dollars.


Otherwise, A gives B 3 dollars


Simultaneous Game

Two-finger Morra

Goal: Maximize the dollars



Otherwise, A gives B 3 dollars

Value V(a,b), where a,b ϵ Action

Example: Action = {1 finger, 2 fingers}

A

B

2 -3

-3 4Artificial Intelligence and its applications - Lecture 4: Game Playing46

Simultaneous Game

Expected Value

Value of the gameif A and B use the policy πA and πB

V(πA, πB) = Σa,baction πA(a) πB(b) V(a,b)

where πA(a) is the probability of A takes the action a

Example: πA = [1, 0], πB= [1/2, 1/2]

V(πA, πB) = Σa,b πA(a) πB(b) V(a,b)

A

B2 -3

-3 4

= (1 x 1/2 x 2)+(1 x 1/2 x -3)+(0 x 1/2 x -3)+(0 x 1/2 x 4)

= -1/2


Simultaneous Game: Exercise

Prisoner’s Dilemma

Two members of a criminal gang are arrested

Each prisoner is in solitary confinement

no communication

Prosecutors lack sufficient evidence

Offer each prisoner a bargain

Betray: betray the other by testifying that the other committed the crime

Remain Silent: say nothing




Four outcomes

If A and B each betray the other, each of them serves two years in prison

If A betrays B but B remains silent, A will be set free and B will serve three years in prison (and vice versa)

If A and B both remain silent, both of them will serve only one year in prison (on the lesser charge).

What should you do if you are one of the prisoners?




If A and B each betray the other, each of them serves two years in prison

If A betrays B but B remains silent, A will be set free and B will serve three years in prison (and vice versa)

If A and B both remain silent, both of them will serve only one year in prison (on the lesser charge)


V(A) V(B) Stay Silent Betray

Stay Silent (S)

Betray (B)

-1 -1

0 -3

-3 0

-2 -2

A

B



Assume you are A, the policy of B is πB= [0.3, 0.7]

V(A)



Stay Silent (S)

Betray (B)

-1 -1

0 -3

-3 0

-2 -2

A

B

V(πA, πB) = Σa,baction πA(a) πB(b) V(a,b)

= (πA(S) x (0.3 x -1 + 0.7 x -3)) + (πA(B) x (0.3 x 0 + 0.7 x -2))

= (πA(S) x 0.3 x -1) + (πA(S) x 0.7 x -3) +(πA(B) x 0.3 x 0) + (πA(B) x 0.7 x -2)

= (-2.4) x πA(S) + (-1.4) x πA(B)



Coefficient of “betray” (-2.4) is larger than “stay silent” (-1.4)

Is “betray” better than “stay silent”?

Yes if only play one time (One-Shot Game)

How about play several times?


V(A) = (-2.4) x πA(S) + (-1.4) x πA(B)


Stay Silent (S)

Betray (B)

-1 -1

0 -3

-3 0

-2 -2

A

B



The smartest way to play this game many times is to make a decision which is good for all players

Consider A and B as a group,

Both “betray” is the worst (-2 + -2 = -4)

Both “stay silent” is the best (-1 + -1 = -2)



Stay Silent (S)

Betray (B)

-1 -1

0 -3

-3 0

-2 -2

A

B

Summary


Search

Game Playing

Markov Decision Processes

Reinforcement Learning

Dif

fic

ult

y

Constraint Satisfaction Problems

• From start state to goal state

• Consider constraints

• Consider an adversary

• Consider an uncertainty

• No information is given

Documents

Summary - mlclab.org · Artificial Intelligence and its applications Lecture 4 Game Playing Dr. Patrick Chan [email protected] South China University of Technology, China 1 Summary