28
Repeated Games

Repeated Games

  • Upload
    gili

  • View
    83

  • Download
    1

Embed Size (px)

DESCRIPTION

Repeated Games. Repeated Prisoners Dilemma. Thus far we have studied games, both static and dynamic, which are only played once. We now move to an environment where players interact repeatedly. Each player can now condition his action on previous actions by the other players. - PowerPoint PPT Presentation

Citation preview

Page 1: Repeated Games

Repeated Games

Page 2: Repeated Games

Repeated Prisoners Dilemma

• Thus far we have studied games, both static and dynamic, which are only played once. We now move to an environment where players interact repeatedly.

• Each player can now condition his action on previous actions by the other players.

• The first application will be to the prisoner’s dilemma, studied in chapter 2.

Page 3: Repeated Games

Repeated Prisoners Dilemma

• Recall the Prisoner’s Dilemma in strategic form (positive payoffs):

Cooperate

Cooperate

Defect

Defect

2,2

3,0 1,1

0,3

• Note in the PD game, Cooperate = Remain Silent and Defect = Confess.

• Clearly one pure strategy NE at (D,D).

Page 4: Repeated Games

Repeated Prisoners Dilemma

• Suppose now that the game is repeated T times. Are there any strategies we can write down that would sustain (C,C) as the equilibrium strategy in every repetition of the game?

Page 5: Repeated Games

Repeated Prisoners Dilemma

• Consider the following “Grim Trigger Strategy” for each player:– Play C in the first period.– Play C in every period as long as all players in all previous

periods have played C.– If any player has deviated (to D) in any previous period, play D

in all periods forward.

• If the strategy sustains cooperation, players obtain a payoff: 2 + 2 + 2 + …

• Starting in any period that a player deviates, the deviating player gets 3 + 1 + 1 + …

• So cooperating in all periods is optimal. The short term gains are outweighed by the long term losses.

Page 6: Repeated Games

Repeated Prisoners Dilemma

• Issues:– Patience: is $1000 today worth the same as $1000 a year from

now? We assumed above that players are infinitely patient. What if players weren’t so patient?

• Option A / Option B– Other NE? Another NE would be both players playing D in all

periods.– Grim Trigger seems like a strategy with a very severe

punishment? Could we sustain cooperation with something less severe?

– What if there is a final period to the game?

Page 7: Repeated Games

Primer on Arithmetic Series• Gauss Sum

)1(2

1

1

nnin

i a

aa

nn

i

i

1

1 1

0

• Limit of a geometric sum. If |a| < 1, then as n

aaaa

i

i

1

1...1

0

21

• What if the summation index starts at 1 instead?

a

a

a

a

aaaaa

i

i

i

i

11

111

1

11...

01

21

• Geometric Sum

Page 8: Repeated Games

Primer on Arithmetic Series• Two other useful sums (Still assuming |a|<1). • Even powers:

20

2

0

2642

1

1)(...1

aaaaaa

i

i

i

i

• Odd powers:

20

2

0

2

0

12531

1*...

a

aaaaaaaaa

i

i

i

i

i

i

Page 9: Repeated Games

Repeated Prisoners Dilemma

• Discounting. Let i (0,1) be the discount factor of player i with utility function u(.)

• If player i takes action at in period t, his discounted aggregate utility over T periods is:

T

t

ti

ti

Ti

Tiiiiii auauauauau

1

113221 )()(...)()()(

• So if delta is close to 1, the player is very patient. If delta is close to 0, the player discounts the future a lot.

• We will usually assume i = for all players.

• What if T = and at = a for all t ? Then:

1

1)()()()(

001

1 auauauau it

tit i

t

t it

Page 10: Repeated Games

Repeated Prisoners Dilemma

• Definition: Repeated Game. Let G be a strategic game. Suppose there are N players and denote the set of actions and payoff function of each player i by Ai and ui respectively. The T-period repeated game of G for the discount factor is the extensive game with perfect information and simultaneous moves in which:– Each player chooses a sequence of actions (a1,…,aT).

– The players always choose from the same set of actions, Ai .

– Each player evaluates each action profile according to his discounted aggregate utility (or discounted average utility).

• Denote the game G(T,) or in the case of an infinitely repeated game, G( ,)

Page 11: Repeated Games

Repeated Prisoners Dilemma

• Nash equilibrium in the finitely repeated Prisoner’s Dilemma ? Solve period T and work backwards as usual.

• Unique NE is (D,D) in all periods. Cooperation is impossible to sustain.

• Nash equilibrium in the infinitely repeated Prisoner’s Dilemma ? Most real life situations are not finitely repeated games. Even if they may have a certain ending date, the number of interactions may be uncertain and thus modeling the PD game as an infinitely repeated game seems intuitively pleasing.

• As we stated, there may be a strategy in the infinitely repeated PD game that generates cooperation in all periods, for a given level of patience of the players.

Page 12: Repeated Games

Repeated Prisoners Dilemma

• Consider the following strategy for player i where j is the other player:

otherwise , ),...,( and

),...,( if ,

),...,( ti

1i

tj

1j

t1

D (C,...,C)aa

(C,...,C)aaC

aasi

• Payoff along the equilibrium path:

1

22...)1(2

0

2

t

tei

• Payoff following a deviation in the first period:

1

1213...)(13

0

2

t

tdi

Page 13: Repeated Games

Repeated Prisoners Dilemma

• So cooperate is optimal if

2 1 12 2 2 2 1 2 1

1 1 2

• So for discount factors greater than or equal to 1/2, cooperation in all periods can be sustained as a NE. Players must meet this minimum level of patience.

Page 14: Repeated Games

Repeated Prisoners Dilemma

• We now consider less draconian strategies than the Grim Trigger.• Tit for Tat Strategy. Play C in the first period and then do whatever the other player did in the previous period in all subsequent periods.• Limited Punishment Strategy. This strategy entails punishing a deviation for a certain number of periods and then reverting to the collusive outcome after the punishment no matter

how players have acted during the punishment. For example, Play D in periods t=1, t=2, and t=3 if a deviation has occurred in t=0 and then play C in t=4.

Page 15: Repeated Games

Repeated Prisoners Dilemma

• Limited Punishment in the Prisoners Dilemma. Suppose the punishment phase is k=3 periods and both players are playing the same strategy.

• Consider the game starting in any period t. If no deviation occurs in periods t,t+1,t+2, and t+3 then, the payoffs to each player over these periods are:

1

)1(22)1(2

43

0

32

t

tei

• If player i deviates in period t, he knows that his opponent will play D for the next 3 periods so he should also play D in those periods. Thus his payoff is:

1

)1(213)(13

43

0

32

t

tdi

Page 16: Repeated Games

Repeated Prisoners Dilemma

• So cooperation is optimal if:

4 44 4 42(1 ) (1 )

2 2(1 ) 2 2 (1 ) 1 2 21 1

0.55 (Approx)

• What happens as k, the punishment period, gets larger? Delta approaches 1/2, the grim trigger cutoff.

Page 17: Repeated Games

Repeated Prisoners Dilemma

• Why do we only need to check for one episode of deviation and punishment? Because the critical discount factor will be lower (less binding) for multiple deviations.

• For example, suppose player I is considering two rounds of deviation and punishment.

877

0

2(1 )2(1 ... ) 2

1e ti t

2 3 4 5 6 7

2 24 5

0 0

34 5

3 1( ) 3 1( )

3(1 )

(1 )3(1 ) ( )

1

di

t t

t t

Page 18: Repeated Games

Repeated Prisoners Dilemma

• Cooperation is optimal if:

8 34 52(1 ) (1 )

3(1 ) ( )1 1

e di i

Delta Equilibrium Path Deviation Path0.40 3.33 3.560.41 3.39 3.580.42 3.44 3.600.43 3.50 3.620.44 3.57 3.630.45 3.63 3.650.46 3.70 3.670.47 3.76 3.690.48 3.84 3.710.49 3.91 3.720.50 3.98 3.740.51 4.06 3.760.52 4.14 3.780.53 4.23 3.800.54 4.32 3.820.55 4.41 3.840.56 4.50 3.86

Payoff along

• Critical discount factor is about 0.46. Therefore, if we find the critical discount factor that prevents one deviation (0.55), we also prevent any other sequence of deviations.

Page 19: Repeated Games

Repeated Prisoners Dilemma

• Now consider the tit for tat strategy in the Prisoners Dilemma. • Suppose Bob (Axelrod) and Larry and playing the repeated PD

game.• Suppose Bob is playing tit for tat and Larry considers

deviating to D in period t. Bob will respond with D in all periods until Larry again chooses C. If Larry chooses C, we revert to the same situation we started in (and again Larry could (possibly should) deviate to D).

• So Larry will either deviate and then play D forever or will alternate between D and C.

• Along the equilibrium path of the game, Bob and Larry each earn 2/(1-).

Page 20: Repeated Games

Repeated Prisoners Dilemma

• If player 2 deviates and then always plays D, his payoff is:

1

1213...)(13

0

3212 t

td

• If player 2 deviates and alternates C and D, his payoff is:

0

2543222 *3...)030303

t

td

20

222 1

3)(*3

ttd

• So we need the equilibrium path payoffs to be larger than both of these:

2

2 1 2 32 and

1 1 1 1

≥ 1/2

Page 21: Repeated Games

Repeated Prisoners Dilemma

• So far we have been using trigger type mechanisms to sustain a collusive outcome (ie, Pareto optimal outcome). Can we attain any other payoffs as an equilibrium outcome of the game? Yes!

• Definition. The set of feasible payoff profiles of a strategic game is the set of all weighted averages of payoff profiles in the game. – Eg, the prisoner’s dilemma:

•(1,1)

•(2,2)

•(0,3)

u1

u2

•(3,0)

Page 22: Repeated Games

Repeated Prisoners Dilemma

• Folk Theorem for the Prisoners Dilemma: – For any discount factor, 0<<1, the discounted average payoff of each player i in any NE of G( ,) is at least ui(D,D). I.e., players must at least get the NE payoffs of the static one-shot game.

– Let (x1,x2) be a feasible pair of payoffs in G for which xi > ui(D,D) for each player i. Then there is some < 1, such that there is a NE of G( ,) in which the discounted average payoff of each player i is xi.

– Note for any discount factor, we can always attain at least ui(D,D) as a NE of the infinitely repeated game.

Page 23: Repeated Games

Repeated Prisoners Dilemma

• Folk Theorem “Region” (or just the Folk Region) for the PD game:

•(1,1)

•(2,2)

•(0,3)

u1

u2

•(3,0)

For every point in the shaded region, as long as is high enough, we can generate those payoffs as the average discounted payoffs in a NE of the infinitely repeated game.

Page 24: Repeated Games

Repeated Prisoners Dilemma

• Generate (1.5,1.5) as the average per period payoff of G( ,).

Cooperate

Cooperate

Defect

Defect

2,2

3,0 1,1

0,3

• One way to achieve this is to alternate between (C,C), and (D,D) in each period, cooperating in odd periods and defecting in even periods.

• Grim trigger strategies (for each player): play C in odd periods, D in even periods and play D in all periods if any player has deviated from “C in odd periods and D in even periods.”

Page 25: Repeated Games

Repeated Prisoners Dilemma

• Generate (1.5,1.5) as the average per period payoff of G( ,).

Equilibrium path payoffs:

2 3 4 5

2 2

0 0

2 2

2

2 1 2 1 2 1 ...

2* 1*

2 / (1 ) / (1 )

(2 ) / (1 )

ei

t t

t t

Deviation path payoffs:2 3

0

3 ...

3

3 / (1 )

di

t

t

Page 26: Repeated Games

Repeated Prisoners Dilemma

• Generate (1.5,1.5) as the average per period payoff of G( ,). So we require:

2

2 2

2

2

2

2

(2 ) / (1 ) 3 / (1 )

2 3(1 ) (1 ) / (1 )

2 3 3 (1 )

1 2

1 2

1

2

1/ 2 0.707

e di i

Caution: dividing by a negative!

Page 27: Repeated Games

0

5

10

15

20

25

30

350.

00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

0.40

0.45

0.50

0.55

0.60

0.65

0.70

0.75

0.80

0.85

0.90

0.95

Agg

rega

te P

ayoff

Delta

Payoffs as a Function of Delta(Generating an average payoff of (1.5,1.5) in the Repeated PD Game)

Equilibrium Path Payoffs

Deviation Payoffs

~ 0.707

Page 28: Repeated Games

Repeated Prisoners Dilemma

• See chapter 15 of Osborne for a discussion of general repeated games. The notation is just more general but the ideas, definitions, and theorems are the same.• Two remaining topics.

– Alternating Offers Bargaining– Cournot game / Monopoly outcome using grim trigger