32
Copyright (c) 2003 Brooks/Cole, a division of Thomson Learning, Inc Markov Chains and the Theory of Games 9 Markov Chains Regular Markov Chains Absorbing Markov Chains Game Theory and Strictly Determined Games Games with Mixed Strategies

Copyright (c) 2003 Brooks/Cole, a division of Thomson Learning, Inc Markov Chains and the Theory of Games 9 Markov Chains Regular Markov Chains Absorbing

Embed Size (px)

Citation preview

Page 1: Copyright (c) 2003 Brooks/Cole, a division of Thomson Learning, Inc Markov Chains and the Theory of Games 9 Markov Chains Regular Markov Chains Absorbing

Copyright (c) 2003 Brooks/Cole, a division of Thomson Learning, Inc

Markov Chains and the Theory of Games

9• Markov Chains

• Regular Markov Chains

• Absorbing Markov Chains

• Game Theory and Strictly Determined Games

• Games with Mixed Strategies

Page 2: Copyright (c) 2003 Brooks/Cole, a division of Thomson Learning, Inc Markov Chains and the Theory of Games 9 Markov Chains Regular Markov Chains Absorbing

Copyright (c) 2003 Brooks/Cole, a division of Thomson Learning, Inc

1. 0 for all and .ija i j2. The sum of the entries in each column of T is 1.

Any square matrix that satisfies the properties:

can be referred to as a stochastic matrix.

Ex. 0.3 0.9

0.7 0.1

Stochastic, columns add to 1

0.6 0.9

0.7 0.1

Not stochastic, column 1: sum = 1.3

Page 3: Copyright (c) 2003 Brooks/Cole, a division of Thomson Learning, Inc Markov Chains and the Theory of Games 9 Markov Chains Regular Markov Chains Absorbing

Copyright (c) 2003 Brooks/Cole, a division of Thomson Learning, Inc

Markov Process (Chain)

Stochastic Process in which the outcomes at any stage of the experiment depend only on the outcomes of the preceding stage.

The state is the outcome at any stage. The outcome at the current stage is called the current state.

Page 4: Copyright (c) 2003 Brooks/Cole, a division of Thomson Learning, Inc Markov Chains and the Theory of Games 9 Markov Chains Regular Markov Chains Absorbing

Copyright (c) 2003 Brooks/Cole, a division of Thomson Learning, Inc

Ex. We are given a process with 2 choices: A and B. It is expected that if a person chooses A then that person has a 30% probability of choosing A the next time. If a person chooses B then that person has a 60% probability of choosing B the next time.

AA A

BB

BBasically:

0.3

0.7 0.6

0.4

This can be represented by a transition matrix

0.3 0.4

0.7 0.6

Page 5: Copyright (c) 2003 Brooks/Cole, a division of Thomson Learning, Inc Markov Chains and the Theory of Games 9 Markov Chains Regular Markov Chains Absorbing

Copyright (c) 2003 Brooks/Cole, a division of Thomson Learning, Inc

0.3 0.4

0.7 0.6

State 1

State 2

The probability that something in state 1 will be in state 1 in the next step is a11 = 0.3

The probability that something in state 2 will be in state 1 in the next step is a12 = 0.4

state | state ija P i j

Page 6: Copyright (c) 2003 Brooks/Cole, a division of Thomson Learning, Inc Markov Chains and the Theory of Games 9 Markov Chains Regular Markov Chains Absorbing

Copyright (c) 2003 Brooks/Cole, a division of Thomson Learning, Inc

Transition MatrixA transition matrix associated with a Markov chain with n states is an n X n matrix T with entries aij

11 12 1 1

21 22 2 2

1 2

1 2

... ...

... ...

: : : : : :

... ...

: : : : : :

... ...

j n

j n

i i ij in

n n nj nn

a a a a

a a a a

a a a a

a a a a

1. 0 for all and .ija i j2. The sum of the entries in each column of T is 1.

Next state

Current State

state | state ija P i j

Current state

Next state

Page 7: Copyright (c) 2003 Brooks/Cole, a division of Thomson Learning, Inc Markov Chains and the Theory of Games 9 Markov Chains Regular Markov Chains Absorbing

Copyright (c) 2003 Brooks/Cole, a division of Thomson Learning, Inc

Ex. It has been found that of the people that eat brand X cereal, 85% will eat brand X again the next time and the rest will switch to brand Y. Also, 90% of the people that eat brand Y will eat brand Y the next time with the rest switching to brand X. At present, 70% of the people will eat brand X and the rest will eat brand Y. What percent will eat brand X after 1 cycle?

0.85 0.1

0.15 0.9T

0

0.7

0.3X

Initial state Transition:

One cycle: 10.85 0.1 0.7

0.15 0.9 0.3X

0.625

0.375

So 62.5% will eat brand X

X

Y

Page 8: Copyright (c) 2003 Brooks/Cole, a division of Thomson Learning, Inc Markov Chains and the Theory of Games 9 Markov Chains Regular Markov Chains Absorbing

Copyright (c) 2003 Brooks/Cole, a division of Thomson Learning, Inc

0.625

0.375

1

0.85 0.1 0.7

0.15 0.9 0.3X

Notice from the previous example:

10.625

0.375X

is called a distribution vector

In general, the probability distribution of the system after n observations is given by

0n

nX T X

Page 9: Copyright (c) 2003 Brooks/Cole, a division of Thomson Learning, Inc Markov Chains and the Theory of Games 9 Markov Chains Regular Markov Chains Absorbing

Copyright (c) 2003 Brooks/Cole, a division of Thomson Learning, Inc

Regular Markov ChainA stochastic matrix T is a regular Markov chain if the sequence T, T2, T3,… approaches a steady-state matrix in which the rows of the limiting matrix are all equal and all the entries are positive.

Ex.

0.3 0.9 0.6

0.4 0.05 0.1

0.3 0.05 0.3

Regular: all entries positive

Page 10: Copyright (c) 2003 Brooks/Cole, a division of Thomson Learning, Inc Markov Chains and the Theory of Games 9 Markov Chains Regular Markov Chains Absorbing

Copyright (c) 2003 Brooks/Cole, a division of Thomson Learning, Inc

Ex.

0 0.3

1 0.7T

T is Regular: all entries of T2 positive

Notice

2 0 0.3 0 0.3 0.3 0.21

1 0.7 1 0.7 0.7 0.79T

0 0

1 1T

Ex.

Notice

0 0 0 0 0 0

1 1 1 1 1 1

Not regular: entries of T to a power will never all be positive.

Page 11: Copyright (c) 2003 Brooks/Cole, a division of Thomson Learning, Inc Markov Chains and the Theory of Games 9 Markov Chains Regular Markov Chains Absorbing

Copyright (c) 2003 Brooks/Cole, a division of Thomson Learning, Inc

Steady–State Distribution Vector

The steady-state distribution vector is the limiting vector from the repeated application of the transition matrix to the distribution vector.

Ex. Given 00.7

0.3X

0.8 0.1

0.2 0.9T

Notice (after some work)

1212 0

0.3384

0.6616X T X

2525 0

.3334

.6666X T X

Tends toward

1/ 3

2 / 3

Page 12: Copyright (c) 2003 Brooks/Cole, a division of Thomson Learning, Inc Markov Chains and the Theory of Games 9 Markov Chains Regular Markov Chains Absorbing

Copyright (c) 2003 Brooks/Cole, a division of Thomson Learning, Inc

Finding the Steady-State Distribution Vector

Let T be a regular stochastic matrix. Then the steady-state distribution vector X may be found by solving the vector equation

TX X

together with the condition that the sum of the elements of the vector X be equal to 1.

Page 13: Copyright (c) 2003 Brooks/Cole, a division of Thomson Learning, Inc Markov Chains and the Theory of Games 9 Markov Chains Regular Markov Chains Absorbing

Copyright (c) 2003 Brooks/Cole, a division of Thomson Learning, Inc

Ex. Find the steady-state vector for the transition matrix: 0.4 0.7

0.6 0.3T

TX X

0.4 0.7

0.6 0.3

x x

y y

0.4 0.7

0.6 0.3

x y x

x y y

0.6 0.7 0x y 1x y Also need:

Which gives:7 6

,13 13

x y So7 /13

6 /13X

Page 14: Copyright (c) 2003 Brooks/Cole, a division of Thomson Learning, Inc Markov Chains and the Theory of Games 9 Markov Chains Regular Markov Chains Absorbing

Copyright (c) 2003 Brooks/Cole, a division of Thomson Learning, Inc

Absorbing Stochastic MatrixAn absorbing stochastic matrix has the properties:

1. There is at least one absorbing state.

2. It is possible to go from any non-absorbing state to an absorbing state in one or more stages.

Ex. Absorbing Matrix

0 0.5 1

0.4 0.2 0

0.6 0.3 0

State 3 is an absorbing state and an object may go from state 2 or 1 (non-absorbing states) to state 3

Page 15: Copyright (c) 2003 Brooks/Cole, a division of Thomson Learning, Inc Markov Chains and the Theory of Games 9 Markov Chains Regular Markov Chains Absorbing

Copyright (c) 2003 Brooks/Cole, a division of Thomson Learning, Inc

Given an absorbing stochastic matrix it is possible to rewrite it in the form:

I S

O R

Absorbing Nonabsorbing

I: identity matrix

O: zero matrix

Ex.

0 0.2 0.5 0

0 0.3 0.1 1

1 0.4 0.2 0

0 0.1 0.2 0

1 0 0.4 0.2

0 1 0.3 0.1

0 0 0.1 0.2

0 0 0.2 0.5

1

2

3

4

1 2 3 4 1 4 2 3

3

2

4

1

Page 16: Copyright (c) 2003 Brooks/Cole, a division of Thomson Learning, Inc Markov Chains and the Theory of Games 9 Markov Chains Regular Markov Chains Absorbing

Copyright (c) 2003 Brooks/Cole, a division of Thomson Learning, Inc

Finding the Steady-State Matrix for an Absorbing Stochastic Matrix

Suppose an absorbing stochastic matrix A has been partitioned into submatrices

I SA

O R

Then the steady-state matrix of A is given by

1I S I R

O O

Where the order of the identity matrix is chosen to have the same order as R.

1I R

Page 17: Copyright (c) 2003 Brooks/Cole, a division of Thomson Learning, Inc Markov Chains and the Theory of Games 9 Markov Chains Regular Markov Chains Absorbing

Copyright (c) 2003 Brooks/Cole, a division of Thomson Learning, Inc

1 0 0.4 0.2

0 1 0.3 0.1

0 0 0.1 0.2

0 0 0.2 0.5

0.4 0.2 0.1 0.2

0.3 0.1 0.2 0.5S R

Ex. Compute the steady-state matrix for the matrix from the previous example:

0.9 0.2

0.2 0.5I R

1 0.5 0.21

0.2 0.90.41I R

1 0.4 0.2 50 / 41 20 / 41

0.3 0.1 20 / 41 90 / 41S I R

0.585 0.634

0.415 0.366

Page 18: Copyright (c) 2003 Brooks/Cole, a division of Thomson Learning, Inc Markov Chains and the Theory of Games 9 Markov Chains Regular Markov Chains Absorbing

Copyright (c) 2003 Brooks/Cole, a division of Thomson Learning, Inc

1 0 0.585 0.634

0 1 0.415 0.366

0 0 0 0

0 0 0 0

The steady-state matrix is:

3

2

4

1

1 4 2 3 Original columns

Original rows

This means that an object starting in state 3 will have a probability of 0.366 of being in state 2 in the long term.

Page 19: Copyright (c) 2003 Brooks/Cole, a division of Thomson Learning, Inc Markov Chains and the Theory of Games 9 Markov Chains Regular Markov Chains Absorbing

Copyright (c) 2003 Brooks/Cole, a division of Thomson Learning, Inc

Game Theory

A combination of matrix methods with the theory of probability to determine the optimal strategies to be used when opponents are competing to maximize gains (minimize losses).

Page 20: Copyright (c) 2003 Brooks/Cole, a division of Thomson Learning, Inc Markov Chains and the Theory of Games 9 Markov Chains Regular Markov Chains Absorbing

Copyright (c) 2003 Brooks/Cole, a division of Thomson Learning, Inc

Ex. Rafe (row player) and Carley (column player) are playing a game where each holds out a red or black chip simultaneously (neither knows the other’s choice). The betting is summarized below

Carley holds black

Carley holds red

Rafe holds black

Carley pays Rafe $5

Rafe pays Carley $10

Rafe holds red

Carley pays Rafe $2

Carley pays Rafe $3

Page 21: Copyright (c) 2003 Brooks/Cole, a division of Thomson Learning, Inc Markov Chains and the Theory of Games 9 Markov Chains Regular Markov Chains Absorbing

Copyright (c) 2003 Brooks/Cole, a division of Thomson Learning, Inc

The game is a zero-sum game since one person’s payoff is the same as the other person’s loss.

5 10

2 3

We can summarize the game as a payoff matrix for Rafe:

R1

R2

C1 C2

Rafe basically picks a row and Carley picks a column. Since the matrix is a payoff for Rafe, he wants to maximize the entry while Carley wants to minimize the entry.

Page 22: Copyright (c) 2003 Brooks/Cole, a division of Thomson Learning, Inc Markov Chains and the Theory of Games 9 Markov Chains Regular Markov Chains Absorbing

Copyright (c) 2003 Brooks/Cole, a division of Thomson Learning, Inc

5 10

2 3

R1

R2

C1 C2

Rafe should look at the minima for each row, then pick the larger of the minima. This is called the Maximin strategy.

–10 2

minima

Carley should look for the maxima of the columns, then pick the smallest of the maxima. This is called the Minimax strategy.

5 3maxima

From this we see that Rafe should pick row 2 while Carley should pick column 2.

Page 23: Copyright (c) 2003 Brooks/Cole, a division of Thomson Learning, Inc Markov Chains and the Theory of Games 9 Markov Chains Regular Markov Chains Absorbing

Copyright (c) 2003 Brooks/Cole, a division of Thomson Learning, Inc

Maxmin Strategy (R’s move)

1. For each row (payoff matrix), find the smallest entry in that row.

2. Choose the row for which the entry found in step 1 is as large as possible.

Minmax Strategy (C’s move)

1. For each column of the payoff matrix, find the largest entry in that column.

2. Choose the column for which the entry found in step 1 is as small as possible.

Page 24: Copyright (c) 2003 Brooks/Cole, a division of Thomson Learning, Inc Markov Chains and the Theory of Games 9 Markov Chains Regular Markov Chains Absorbing

Copyright (c) 2003 Brooks/Cole, a division of Thomson Learning, Inc

4 1 2

1 3 4

Ex. Determine the maximin and minimax strategies for each player in a game that has the payoff matrix:

Row minima

Column maxima

–2 –4

4 3 –2

The row player should pick row 1.

The column player should pick column 3.

*Note: the column player is favored under these strategies (win 2)

Page 25: Copyright (c) 2003 Brooks/Cole, a division of Thomson Learning, Inc Markov Chains and the Theory of Games 9 Markov Chains Regular Markov Chains Absorbing

Copyright (c) 2003 Brooks/Cole, a division of Thomson Learning, Inc

Optimal Strategy

The optimal strategy in a game is the strategy that is most profitable to a particular player.

Page 26: Copyright (c) 2003 Brooks/Cole, a division of Thomson Learning, Inc Markov Chains and the Theory of Games 9 Markov Chains Regular Markov Chains Absorbing

Copyright (c) 2003 Brooks/Cole, a division of Thomson Learning, Inc

Strictly Determined GameA strictly determined game is characterized by the following properties:

1. There is an entry in the payoff matrix that is simultaneously the smallest entry in its row and the largest entry in its column. This entry is called the saddle point for the game.

2. The optimal strategy for the row (column) player is precisely the maxmin (minmax) strategy and is the row (column) containing the saddle point.

Page 27: Copyright (c) 2003 Brooks/Cole, a division of Thomson Learning, Inc Markov Chains and the Theory of Games 9 Markov Chains Regular Markov Chains Absorbing

Copyright (c) 2003 Brooks/Cole, a division of Thomson Learning, Inc

4 1 2

1 3 4

From a previous example:

This is a strictly determined game with saddle point.

The optimal strategies are for row player to pick row 1 and the column player to pick column 3.

The value of the game is –2.

Page 28: Copyright (c) 2003 Brooks/Cole, a division of Thomson Learning, Inc Markov Chains and the Theory of Games 9 Markov Chains Regular Markov Chains Absorbing

Copyright (c) 2003 Brooks/Cole, a division of Thomson Learning, Inc

Mixed Strategies

Making different moves during a game. A row (column) player may choose different rows (columns) during the game.

Ex. The game below has no saddle point.

3 2 4

1 3 2

From a minimax/maximin strategy the row player should pick row 2 and the column player should pick column 3.

Page 29: Copyright (c) 2003 Brooks/Cole, a division of Thomson Learning, Inc Markov Chains and the Theory of Games 9 Markov Chains Regular Markov Chains Absorbing

Copyright (c) 2003 Brooks/Cole, a division of Thomson Learning, Inc

0.1

3 2 40.2 0.8 0.2

1 3 20.7

For a mixed strategy, let the row player pick row 2, 80% of the time and row 1, 20 % of the time. Let the column player pick column 1, 2, and 3, 10%, 20%, and 70% of the time respectively.

0.2 0.8P 0.1

0.2

0.7

Q

Row player:

Column player:

To find the expected value of the game we compute:

= 1.1

payoff

Expected value

Page 30: Copyright (c) 2003 Brooks/Cole, a division of Thomson Learning, Inc Markov Chains and the Theory of Games 9 Markov Chains Regular Markov Chains Absorbing

Copyright (c) 2003 Brooks/Cole, a division of Thomson Learning, Inc

Expected Value of a Game

Let P and Q be the mixed strategies for the row player R and the column player C respectively. The expected value, E, of the game is given by:

11 12 1 1

21 22 2 21 2

1 2

...

...

: : : : :

...

n

nm

m m mn n

a a a q

a a a qE PAQ p p p

a a a q

Payoff matrix

Page 31: Copyright (c) 2003 Brooks/Cole, a division of Thomson Learning, Inc Markov Chains and the Theory of Games 9 Markov Chains Regular Markov Chains Absorbing

Copyright (c) 2003 Brooks/Cole, a division of Thomson Learning, Inc

Optimal Strategies for Nonstrictly Determined Games

a bA

c d

Payoff matrix Row player:

Column player:

1 2P p p

1

2

qQ

q

1 2 1 and 1d c

p p pa d b c

1 2 1 and 1d b

q q qa d b c

Where

Value of the game:ad bc

E PAQa d b c

Page 32: Copyright (c) 2003 Brooks/Cole, a division of Thomson Learning, Inc Markov Chains and the Theory of Games 9 Markov Chains Regular Markov Chains Absorbing

Copyright (c) 2003 Brooks/Cole, a division of Thomson Learning, Inc

Ex. Given the payoff matrix, find the optimal strategies and then the value of the game.

1 6

2 3

1d c

pa d b c

3 2

1 3 6 2

5 1

10 2

3 6 9

1 3 6 2 10

1d b

qa d b c

0.5 0.5P

0.9

0.1Q

Row player should pick each row 50% of the time. Column player should pick column 1, 90% of the time.

Value:1( 3) 6(2) 15

1.501 3 6 2 10

ad bc

Ea d b c