27
Artificial Intelligence and its applications Lecture 4 Game Playing Dr. Patrick Chan [email protected] South China University of Technology, China 1 Summary Artificial Intelligence and its applications - Lecture 4: Game Playing 2 Search Game Playing Markov Decision Processes Reinforcement Learning Difficulty Constraint Satisfaction Problems From start state to goal state Consider constraints Consider an adversary Consider an uncertainty No information is given

Summary - mlclab.org · Artificial Intelligence and its applications Lecture 4 Game Playing Dr. Patrick Chan [email protected] South China University of Technology, China 1 Summary

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Summary - mlclab.org · Artificial Intelligence and its applications Lecture 4 Game Playing Dr. Patrick Chan patrickchan@ieee.org South China University of Technology, China 1 Summary

Artificial Intelligence and its applications

Lecture 4

Game Playing

Dr. Patrick [email protected]

South China University of Technology, China

1

Summary

Artificial Intelligence and its applications - Lecture 4: Game Playing2

Search

Game Playing

Markov Decision Processes

Reinforcement Learning

Dif

fic

ult

y

Constraint Satisfaction Problems

• From start state to goal state

• Consider constraints

• Consider an adversary

• Consider an uncertainty

• No information is given

Page 2: Summary - mlclab.org · Artificial Intelligence and its applications Lecture 4 Game Playing Dr. Patrick Chan patrickchan@ieee.org South China University of Technology, China 1 Summary

Agenda

Expected Value

Expected Max Algorithm

Minimax Algorithm

Alpha-beta Pruning

Simultaneous Game

Artificial Intelligence and its applications - Lecture 4: Game Playing3

Games

Intelligence opponents in game

Benchmark of intelligence

Model for many applications: Military confrontations, negotiation, auctions, …

Artificial Intelligence and its applications - Lecture 4: Game Playing4

Page 3: Summary - mlclab.org · Artificial Intelligence and its applications Lecture 4 Game Playing Dr. Patrick Chan patrickchan@ieee.org South China University of Technology, China 1 Summary

Search Problem VS Games

Search Problem

Independent from your decision

Usually has particular goal states

Can be offline (most our previous discussions focus on offline)

Games

Competition with adversary(Opponent acts according to your decision)

May not has a goal state

Decision should be made in limited time(approximation is needed)

Artificial Intelligence and its applications -Lecture 4: Game Playing

5

Game Type

Turn-based Strategy Games Take turns while playing

Chess, Board Game, Card Game

Focus in this Chapter

Real-Time Strategy Games Play simultaneously

TV and PC games

Learn later

Simultaneous Game Make decision at the same time

Briefly discuss in this chapter

Artificial Intelligence and its applications - Lecture 4: Game Playing6

Action by playerAction by opponent

Page 4: Summary - mlclab.org · Artificial Intelligence and its applications Lecture 4 Game Playing Dr. Patrick Chan patrickchan@ieee.org South China University of Technology, China 1 Summary

Game Type

Artificial Intelligence and its applications - Lecture 4: Game Playing7

DeterministicNon-

Deterministic

Perfect Information

Imperfect Information

Backgammon

Battleships

Chess, Go, Checkers

Bridge, Poker

Checkers

Recall, Search Task

Deterministic + Full Knowledge + Static

Given: (s: state, s’: new state, a: action)

1. S: All states

1. Start (Initial) State

2a. End (Goal) State: Goal is well defined

b. Goal(s): Goal Test function returns T/F or score

2. Action(s): Possible actions on state s

3. Cost(s, a): Cost of taking action a in state s

4. Succ(s, a) = s’: Transition of states (s to s’) by a

Output: Path (sequence of actions)

Artificial Intelligence and its applications - Lecture 2: Search8

Some games consider the cost action

Which information is needed in Game?

Some games do not has End State

Page 5: Summary - mlclab.org · Artificial Intelligence and its applications Lecture 4 Game Playing Dr. Patrick Chan patrickchan@ieee.org South China University of Technology, China 1 Summary

Artificial Intelligence and its applications - Lecture 4: Game Playing9

All games have objective but not all have the goal state

e.g. Flippy bird, Tetris

Formulation on Game Objective

Utility(s) / Reward(s): preference to reach s

Example: Utility(SGoal) > 0, Utility(Sdie) < 0

Turn-based Game: Task

Given (s: state, s’: new state, a: action)

1. S: State

1. sstart: Start (Initial) State

2. IsEnd(s): whether s is an end state (game over)

3. Actions(s): possible actions from s

4. Utility(s): reward for end state s

5. Succ(s, a) = s’: next state after taking action a in s

6. Players: who play the game, eg Players = {agent, opp}

7. Player(s) ϵ Players: which player moves in s

Output: Path (sequence of actions)

Artificial Intelligence and its applications - Lecture 4: Game Playing10

Additional Information

Action Cost is not considered explicitly

Page 6: Summary - mlclab.org · Artificial Intelligence and its applications Lecture 4 Game Playing Dr. Patrick Chan patrickchan@ieee.org South China University of Technology, China 1 Summary

Turn-based Game

Example

State s:

IsEnd(s):

Actions(s):

Utility(s):

Succ(s, a) :

Players:

Artificial Intelligence and its applications - Lecture 4: Game Playing11

{white, black}

position of all pieces

checkmate or draw

legal chess moves

+1 if white wins 0 if draw, −1 if black wins

Assume you are white

Position after move

Turn-based Game

save the penguin

State s:

IsEnd(s):

Actions(s):

Utility(s):

Succ(s, a) :

Artificial Intelligence and its applications - Lecture 4: Game Playing12

position and colors of the remaining blocks

the penguin falls out

Select which blocks in your color (blue or white) will be taken down

+1 if the penguin falls out in your turnOtherwise −1

position and colors of the remaining blocks after player’s action

Page 7: Summary - mlclab.org · Artificial Intelligence and its applications Lecture 4 Game Playing Dr. Patrick Chan patrickchan@ieee.org South China University of Technology, China 1 Summary

Turn-based Game

Example: Bin Selection

Goal: maximize the chosen number

You choose one of the bin

Opponent chooses a number in your chosen bin

Which one your should choose?

A

-50 50

B

1 3

C

-5 15

Artificial Intelligence and its applications - Lecture 4: Game Playing13

Turn-based Game

Example: Bin Selection

Depend on the goal of your opponent

Adversarial Opponent minimize your reward

Against with you

Helpful Opponent maximize your reward

Work together

Stochastic Opponent expected value

Randomly choose a value(human likes that?)

Artificial Intelligence and its applications - Lecture 4: Game Playing14

A-50 50

B1 3

C-5 15

-50 1 -5Opponent (Min)

50 3 15Opponent (Max)

0 2 5Opponent (Avg)(Assume uniformly random)

You ( )

You (Max)

You (Max)

Max

Page 8: Summary - mlclab.org · Artificial Intelligence and its applications Lecture 4 Game Playing Dr. Patrick Chan patrickchan@ieee.org South China University of Technology, China 1 Summary

Turn-based Game

Node Type

SearchNode: a state (All nodes are the same)

GameNode: a state and policy ( ) of a player

Min Node: downward-pointing triangle

Deterministic policy (s, a)

Max Node: upward-pointing triangle

Deterministic policy (s, a)

Chance Node: Circle

Stochastic policy (s, a) [0, 1]

Probability of taking a in sArtificial Intelligence and its applications - Lecture 4: Game Playing15

0 Other a

1 a yields the min reward

0 Other a

1 a yields the max reward

Turn-based Game: Node Type

Example: Bin Selection

Goal: maximize the chosen number

You choose one of the bin

Opponent chooses a number in your chosen bin

Two players, two policies (πagent and πagent)

Graph representing decision flow but not policy

Artificial Intelligence and its applications - Lecture 4: Game Playing16

A-50 50

B1 3

C-5 15πopp

πagent

(Agent means you!)

-50 50 1 3 -5 15

A B C

πagent

πopp

Utility

Page 9: Summary - mlclab.org · Artificial Intelligence and its applications Lecture 4 Game Playing Dr. Patrick Chan patrickchan@ieee.org South China University of Technology, China 1 Summary

Turn-based Game: Node Type

Example: Bin Selection

Artificial Intelligence and its applications - Lecture 4: Game Playing17

A-50 50

B1 3

C-5 15

-50 1 -5Opponent (Min)

You ( )Max-50 50 1 3 -5 15

A B C

50 3 15Opponent (Max)

You (Max)-50 50 1 3 -5 15

A B C

0 2 5Opponent (Avg)Assume random

�(s, a) = 0.5 for all s and a

You (Max)0.5 0.5 0.5 0.5 0.5 0.5

-50 50 1 3 -5 15

A B C

Turn-based Game: Node Type

Example

Which action an agent will choose in minimax?

2

Artificial Intelligence and its applications - Lecture 4: Game Playing18

2-50 7 5 6 8 2 7 2 5 10 22 6

2 2 6

5 2 7 10 6

6

Page 10: Summary - mlclab.org · Artificial Intelligence and its applications Lecture 4 Game Playing Dr. Patrick Chan patrickchan@ieee.org South China University of Technology, China 1 Summary

Turn-based Game

Value Function

V(s) denotes the value (utility) of state s

Last state: the value is given

Other state: the value can be calculated according to the policy

Artificial Intelligence and its applications - Lecture 4: Game Playing19

-50 50 1 3 -5 15

A B C

Last stateValue is given

Other stateValue should be calculated

Turn-based Game

Value Function

Value calculation

Artificial Intelligence and its applications - Lecture 4: Game Playing20

-50 50

A

-50 50

A

s(A,L) s(A,R)

-50 50

A

s(A,L) s(A,R)

�∈������(�)

(�,�) (�,�)

�∈������(�)

(�,�) (�,�)

(�,�) + (�,�)

+

�∈������(�)

max�∈������(�)

�(����(�, �))

min�∈������(�)

�(����(�, �))

� � �, � × �(����(�, �))

�∈������(�)

s(A,L) s(A,R)

Page 11: Summary - mlclab.org · Artificial Intelligence and its applications Lecture 4 Game Playing Dr. Patrick Chan patrickchan@ieee.org South China University of Technology, China 1 Summary

Turn-based Game

Value Function: Example

Artificial Intelligence and its applications - Lecture 4: Game Playing21

max�∈������(�)

�(����(�, �))

min�∈������(�)

�(����(�, �))

� � �, � × �(����(�, �))

�∈������(�)

= max�∈������(�)

�(����(�, �))� �

s(A,L) s(A,R)

L R

RL

s(B,L) s(B,R) s(C,L) s(C,R)

L R L R

O

M

-50 50 1 3 -5 15

A B C

= max �(����(�, �)), �(����(�, �)), �(����(�, �))

= max �(�)), �(�), �(�)

� � = min�∈������(�)

�(����(�, �))

= min �(����(�, �)), �(����(�, �))

= min �(�(�,�)), �(�(�,�))

= min �(�(�,�)), �(�(�,�)) � �

= min �(�(�,�)), �(�(�,�)) � �

(�,�)

(�,�)

(�,�)

(�,�)

(�,�)

(�,�)

Turn-based Game: Value Function

Exercise

Calculate the value for the following games?

Artificial Intelligence and its applications - Lecture 4: Game Playing22

50-50 31 15-5

1/3 1/31/3

0.5 0.5 0.5 0.5 0.5 0.5

S1

S2 S3 S4

20-10 3010 155

0.5

0.3 0.6

0.20.3

S1

S2

S3 S4

Page 12: Summary - mlclab.org · Artificial Intelligence and its applications Lecture 4 Game Playing Dr. Patrick Chan patrickchan@ieee.org South China University of Technology, China 1 Summary

Turn-based Game: Value Function

Exercise

Artificial Intelligence and its applications - Lecture 4: Game Playing23

50-50 31 15-5

1/3 1/3 1/3

0.5 0.5 0.5 0.5 0.5 0.5

S1

S2 S3 S4

20-10 3010 155

0.5

0.3 0.6

0.20.3

S1

S2

S3 S4

Opponent(Avg) 0 2 5

You (Avg) 7/3

Opponent(mix) 20 10 10.5

You (Avg) 43.5

Turn-based Game

Example: Bin Selection 2

A New Game: Rules

Goal: maximize the chosen number

You choose one of the three bins

Then Flip a coin;

if heads, then move one bin to the left (with wrap around)

If not, just stick on your choice

Your opponent chooses a number from that bin

A

-50 50

B

1 3

C

-5 15

Artificial Intelligence and its applications - Lecture 4: Game Playing24

Page 13: Summary - mlclab.org · Artificial Intelligence and its applications Lecture 4 Game Playing Dr. Patrick Chan patrickchan@ieee.org South China University of Technology, China 1 Summary

Turn-based Game

Example: Bin Selection 2

Three parties: Players = {agent, opp, coin}

agent coin opp

V (s) =

Utility

maxaϵActions(s) V (Succ(s,a))

ΣaϵActions(s) πcoin(s,a) V(Succ(s,a))

isEnd(s)

Player(s) = agent

Player(s) = oppminaϵActions(s) V(Succ(s,a))

Player(s) = coin

Artificial Intelligence and its applications - Lecture 4: Game Playing25

Utility

Turn-based Game

Example: Bin Selection 2

-50 50 -5 15

1/2 1/2

31 50-50

1/2 1/2

15-5 31

1/2 1/2

A-50 50

B1 3

C-5 15

Artificial Intelligence and its applications - Lecture 4: Game Playing26

You choose one of the three bins Flip a coin;

if heads, then move one bin to the left (with wrap around) If not, just stick on your choice

Your opponent chooses a number from that bin Your goal is to maximize the chosen number

Page 14: Summary - mlclab.org · Artificial Intelligence and its applications Lecture 4 Game Playing Dr. Patrick Chan patrickchan@ieee.org South China University of Technology, China 1 Summary

Turn-based Game

Example: Bin Selection 2

50-50 15-5

1/2 1/2

31 50-50

1/2 1/2

15-5 31

1/2 1/2

A-50 50

B1 3

C-5 15

5-50 -501 1-5

-2

-24.5 -2-27.5

V (s) = max(E(min(-50,50), min(-5,15)),E(min(1,3), min(-50,50),E(min(-5,15), min(1,3)

)

= max(E(-50, -5), E(1, -50), E(-5,1)) = max(-27.5,-24.5,-2)= -2

Artificial Intelligence and its applications - Lecture 4: Game Playing27

Turn-based Game

Minimax

Zero-sum game

One player's gain is equivalent to another's loss

Players are not collaborate but compete with others

All opponent(s) aims to minimize your utility

This problem is calledMinimax, MinMax, MM or saddle point

Minimax is used in our course

Artificial Intelligence and its applications - Lecture 4: Game Playing28

Page 15: Summary - mlclab.org · Artificial Intelligence and its applications Lecture 4 Game Playing Dr. Patrick Chan patrickchan@ieee.org South China University of Technology, China 1 Summary

Turn-based Game

Minimax

Minimax assumes opponent selects the worst action to an agent

Example for two players

…agent opp agent

V(s)=

Utility

maxaϵActions(s) Vmax,min(Succ(s,a))

minaϵActions(s) Vmax,min(Succ(s,a))

isEnd(s)

Player(s) = agent

Player(s) = opp

Artificial Intelligence and its applications - Lecture 4: Game Playing29

Turn-based Game: Minimax

Example

Which action an agent will choose in minimax? What is it value at Sstart?

50-50 31 15-5

-51-50

maxaϵActions(s) Vmax,min(Succ(s,a))

Action = M

V(Sstart) = 1

minaϵActions(s) Vmax,min(Succ(s,a))

1

Artificial Intelligence and its applications - Lecture 4: Game Playing30

ML R

L R L R L R

Sstart

Page 16: Summary - mlclab.org · Artificial Intelligence and its applications Lecture 4 Game Playing Dr. Patrick Chan patrickchan@ieee.org South China University of Technology, China 1 Summary

Turn-based Game: Minimax

Exercise

Which action an agent will choose in minimax? What is it value at Sstart?

Artificial Intelligence and its applications - Lecture 4: Game Playing31

515 38 210 2122 1011 1612 155 1918

8 7 20 14

Sstart

V(Sstart) = 15

1 7

5 3 2

7 85

5

15

15

11 5 15

15 20

Turn-based Game

Time Complexity

After a game is modeled as a tree, the search technique can be used

Even a simple game like Tic Tac Toe, the tree is very complicated How about Go?

A long path to get the utility

Time / Space complexity is large in practice

https://commons.wikimedia.org/wiki/File:Tic-tac-toe-full-game-tree-x-rational.jpg

utility

Artificial Intelligence and its applications - Lecture 4: Game Playing32

Page 17: Summary - mlclab.org · Artificial Intelligence and its applications Lecture 4 Game Playing Dr. Patrick Chan patrickchan@ieee.org South China University of Technology, China 1 Summary

Turn-based Game

Time Complexity

Complexity:

Space: O(d)

Time: O(b2d)

Example: Chessb ≈ 35, d ≈ 50

Time Complexity ≈ 25515520672986852924121150151425587630190414488161019324176778440771467258239937365843732987043555789782336195637736653285543297897675074636936187744140625

Artificial Intelligence and its applications - Lecture 4: Game Playing33

where b: branching factor (Width)d: depth (Height)

Turn-based Game

Advanced Methods

How to speed up the process?

Evaluation Functions

Not access TRUE utility but approximate it

Stop earlier

Require domain-specific knowledge

Alpha-beta Pruning

Compute TRUE utility

Ignore unnecessary path

General-purpose

Artificial Intelligence and its applications - Lecture 4: Game Playing34

Page 18: Summary - mlclab.org · Artificial Intelligence and its applications Lecture 4 Game Playing Dr. Patrick Chan patrickchan@ieee.org South China University of Technology, China 1 Summary

Turn-based Game: Advanced Method

Evaluation Function

Sstart

Utility = 1 (win)

Send

Very tall

s

Utility = ???

dmax

Original Evaluation Functions

Artificial Intelligence and its applications - Lecture 4: Game Playing35

Evaluation!

Turn-based Game: Advanced Method

Evaluation Function

Limited depth tree search (stop at maximum depth dmax)

Eval(s) evaluates the value of V(s) at dmax

(may be very inaccurate)

V (s,d)=

Utility

∑aϵActions(s) V(Succ(s, πagent(s)), d-1)

∑aϵActions(s) V(Succ(s, πopp(s)), d-1)

isEnd(s)

Player(s) = agent

Player(s) = opp

Eval(s) d=0

dmax

Artificial Intelligence and its applications - Lecture 4: Game Playing36

πagent

Page 19: Summary - mlclab.org · Artificial Intelligence and its applications Lecture 4 Game Playing Dr. Patrick Chan patrickchan@ieee.org South China University of Technology, China 1 Summary

Turn-based Game: Advanced Method: Evaluation Function

Example

Example: Chess

Eval(s) = material + mobility + king-safety + center-control

Material: 10100(K – K’) + 9(Q – Q’) + 5(R –R’)+ 3(B – B’) + 3(N – N’) + 1(P – P’)

K : King , Q : Queen , R: rook ,B : bishop , N : Knight , P : Pawn

A – A’ : the difference in A due to the move

Mobility: 0.1 x (legal_move# - legal_move#’)

King-safety: keeping the king safe is good

Center-control: control the center of the board

Artificial Intelligence and its applications - Lecture 4: Game Playing37

Turn-based Game: Advanced Method

Alpha-beta Pruning

In some cases, visiting some branches is not necessary in minimax algorithm

For example After evaluating V(succ(S3,L),

should V(succ(S3,R) be evaluated?

Consider S1

V(S1) = max(V(succ(S1,L)), V(succ(S1,R)))

Therefore, V(S1) ≥ 3

Consider S3

V(S3) = min(V(succ(S3,L)), V(succ(S3,R)))

Therefore, V(S3) ≤ 2

No need to further investigate on V(succ(S3,R)

Artificial Intelligence and its applications - Lecture 4: Game Playing38

53 2

≤ 23

Should we evaluate this?

S1

S2 S3L R

3

2

≥ 3

Page 20: Summary - mlclab.org · Artificial Intelligence and its applications Lecture 4 Game Playing Dr. Patrick Chan patrickchan@ieee.org South China University of Technology, China 1 Summary

Turn-based Game: Advanced Method

Alpha-beta Pruning

Prune a node if its value is not in the interval bounded by and , (i.e. ( ), v is value of node)

(greatest lower bound)

where as : lower bound on value of max node s

(least upper bound)

where bs : upper bound on value of min node s

Artificial Intelligence and its applications - Lecture 4: Game Playing39

Turn-based Game: Advanced Method: Alpha-beta Pruning

Example 1

Example:

The last node can be pruned

≤ 8

≥ 5

≤ 4

≥ 6

2 4 6 8

α βα

Not inside α and β

Artificial Intelligence and its applications - Lecture 4: Game Playing40

8

5

4

6

(greatest lower bound)

(least upper bound)

Page 21: Summary - mlclab.org · Artificial Intelligence and its applications Lecture 4 Game Playing Dr. Patrick Chan patrickchan@ieee.org South China University of Technology, China 1 Summary

Turn-based Game: Advanced Method: Alpha-beta Pruning

Example 2

9 7

6

3 4 7 9

8

≥ 7

≤ 97 ≤ 6 ≤ 8≤ 5

7

Artificial Intelligence and its applications - Lecture 4: Game Playing41

6 8

β α

6 8

β α

6 8

β α105

6 8

β

6 8

α

(d)

Still need to check the rest as the value may be equal to 8

(d)

(a)

Still need to check the rest as no bound yet

(a)

(b)

All branches are checked. Return 7 to itsparents

(b) (c)

No need to check this branch as the value cannot be bigger than 6

(c)

(e)

No need to check the rest of branches as the value cannot be bigger than 5

(e)

Turn-based Game: Advanced Method: Alpha-beta Pruning

Example 2

Artificial Intelligence and its applications - Lecture 4: Game Playing42

9 7

6

3 4 7 9

8 5 10

≤ 5≤ 6

7

7

≤ 9

9 7

6

3 4 7 9

8 5 10

9 7

6

3 4 7 9

8 5 10

≥ 7

7

Prune here!

9 7

6

3 4 7 9

8 5 10

≤ 6

≥ 7

7

9 7

6

3 4 7 9

8 5 10

≤ 8≤ 6

≥ 7

7

9 7

6

3 4 7 9

8 5 10

≤ 5≤ 6

≥ 7

7

Prune here!

Page 22: Summary - mlclab.org · Artificial Intelligence and its applications Lecture 4 Game Playing Dr. Patrick Chan patrickchan@ieee.org South China University of Technology, China 1 Summary

Turn-based Game: Advanced Method: Alpha-beta Pruning

Example 3

Artificial Intelligence and its applications - Lecture 4: Game Playing43

50 20 -2 15

1/2 1/2

31 -5050 155 31

≤ 5020 ≤ -2-2

9

≥ 9

≤ 1

≤ 1

≤ 55

≥ 5

≤ 11

5

9

No pruning can be done on a chance node

Turn-based Game: Advanced Method: Alpha-beta Pruning

Exercise

Alpha-beta Pruning method to find out V(Sstart)

Artificial Intelligence and its applications - Lecture 4: Game Playing44

515 38 210 2122 1011 1612 155 1918

8 7 20 14

Sstart

≤ 21

V(Sstart) = 15

≤ 5150 30

1 7

≤ 10 ≤ 15

≥ 20

150 30

≥ 1

≥ 5150 30

150 30≤ 8

150 30

150 30≤ 3

150 30

150 30≥ 8

150 30 150 30 150 30

150 30

≤ 15150 30

≥ 11150 30

11 > 5, Larger Lower Bound

150 30

150 30≥ 15

150 30

≥ 5

5

5

2

150 30

≥ 2

7

5

11 15

15

15

15

Page 23: Summary - mlclab.org · Artificial Intelligence and its applications Lecture 4 Game Playing Dr. Patrick Chan patrickchan@ieee.org South China University of Technology, China 1 Summary

Simultaneous Game

Players will take actions at the same time in simultaneous game

For example: Two-finger Morra

Rules

Players A and B each show 1 or 2 fingers.

If both show 1, B gives A 2 dollars.

If both show 2, B gives A 4 dollars.

Otherwise, A gives B 3 dollars

Artificial Intelligence and its applications - Lecture 4: Game Playing45

Simultaneous Game

Two-finger Morra

Goal: Maximize the dollars

If both show 1, B gives A 2 dollars.

If both show 2, B gives A 4 dollars.

Otherwise, A gives B 3 dollars

Value V(a,b), where a,b ϵ Action

Example: Action = {1 finger, 2 fingers}

A

B

2 -3

-3 4Artificial Intelligence and its applications - Lecture 4: Game Playing46

Page 24: Summary - mlclab.org · Artificial Intelligence and its applications Lecture 4 Game Playing Dr. Patrick Chan patrickchan@ieee.org South China University of Technology, China 1 Summary

Simultaneous Game

Expected Value

Value of the gameif A and B use the policy πA and πB

V(πA, πB) = Σa,baction πA(a) πB(b) V(a,b)

where πA(a) is the probability of A takes the action a

Example: πA = [1, 0], πB= [1/2, 1/2]

V(πA, πB) = Σa,b πA(a) πB(b) V(a,b)

A

B2 -3

-3 4

= (1 x 1/2 x 2)+(1 x 1/2 x -3)+(0 x 1/2 x -3)+(0 x 1/2 x 4)

= -1/2

Artificial Intelligence and its applications - Lecture 4: Game Playing47

Simultaneous Game: Exercise

Prisoner’s Dilemma

Two members of a criminal gang are arrested

Each prisoner is in solitary confinement

no communication

Prosecutors lack sufficient evidence

Offer each prisoner a bargain

Betray: betray the other by testifying that the other committed the crime

Remain Silent: say nothing

Artificial Intelligence and its applications - Lecture 4: Game Playing48

Page 25: Summary - mlclab.org · Artificial Intelligence and its applications Lecture 4 Game Playing Dr. Patrick Chan patrickchan@ieee.org South China University of Technology, China 1 Summary

Simultaneous Game: Exercise

Prisoner’s Dilemma

Four outcomes

If A and B each betray the other, each of them serves two years in prison

If A betrays B but B remains silent, A will be set free and B will serve three years in prison (and vice versa)

If A and B both remain silent, both of them will serve only one year in prison (on the lesser charge).

What should you do if you are one of the prisoners?

Artificial Intelligence and its applications - Lecture 4: Game Playing49

Simultaneous Game: Exercise

Prisoner’s Dilemma

If A and B each betray the other, each of them serves two years in prison

If A betrays B but B remains silent, A will be set free and B will serve three years in prison (and vice versa)

If A and B both remain silent, both of them will serve only one year in prison (on the lesser charge)

Artificial Intelligence and its applications - Lecture 4: Game Playing50

V(A) V(B) Stay Silent Betray

Stay Silent (S)

Betray (B)

-1 -1

0 -3

-3 0

-2 -2

A

B

Page 26: Summary - mlclab.org · Artificial Intelligence and its applications Lecture 4 Game Playing Dr. Patrick Chan patrickchan@ieee.org South China University of Technology, China 1 Summary

Simultaneous Game: Exercise

Prisoner’s Dilemma

Assume you are A, the policy of B is πB= [0.3, 0.7]

V(A)

Artificial Intelligence and its applications - Lecture 4: Game Playing51

V(A) V(B) Stay Silent Betray

Stay Silent (S)

Betray (B)

-1 -1

0 -3

-3 0

-2 -2

A

B

V(πA, πB) = Σa,baction πA(a) πB(b) V(a,b)

= (πA(S) x (0.3 x -1 + 0.7 x -3)) + (πA(B) x (0.3 x 0 + 0.7 x -2))

= (πA(S) x 0.3 x -1) + (πA(S) x 0.7 x -3) +(πA(B) x 0.3 x 0) + (πA(B) x 0.7 x -2)

= (-2.4) x πA(S) + (-1.4) x πA(B)

Simultaneous Game: Exercise

Prisoner’s Dilemma

Coefficient of “betray” (-2.4) is larger than “stay silent” (-1.4)

Is “betray” better than “stay silent”?

Yes if only play one time (One-Shot Game)

How about play several times?

Artificial Intelligence and its applications - Lecture 4: Game Playing52

V(A) = (-2.4) x πA(S) + (-1.4) x πA(B)

V(A) V(B) Stay Silent Betray

Stay Silent (S)

Betray (B)

-1 -1

0 -3

-3 0

-2 -2

A

B

Page 27: Summary - mlclab.org · Artificial Intelligence and its applications Lecture 4 Game Playing Dr. Patrick Chan patrickchan@ieee.org South China University of Technology, China 1 Summary

Simultaneous Game: Exercise

Prisoner’s Dilemma

The smartest way to play this game many times is to make a decision which is good for all players

Consider A and B as a group,

Both “betray” is the worst (-2 + -2 = -4)

Both “stay silent” is the best (-1 + -1 = -2)

Artificial Intelligence and its applications - Lecture 4: Game Playing53

V(A) V(B) Stay Silent Betray

Stay Silent (S)

Betray (B)

-1 -1

0 -3

-3 0

-2 -2

A

B

Summary

Artificial Intelligence and its applications - Lecture 4: Game Playing54

Search

Game Playing

Markov Decision Processes

Reinforcement Learning

Dif

fic

ult

y

Constraint Satisfaction Problems

• From start state to goal state

• Consider constraints

• Consider an adversary

• Consider an uncertainty

• No information is given