130
1 Chapter 2 Decisions and Games

1 Chapter 2 Decisions and Games. 2 “Доверяй, Но Проверяй” (“Trust, but Verify”) - Russian Proverb (Ronald Reagan)

  • View
    223

  • Download
    3

Embed Size (px)

Citation preview

1

Chapter 2

Decisions and Games

2

“Доверяй, Но Проверяй” (“Trust, but Verify”)

- Russian Proverb (Ronald Reagan)

3

Criteria for evaluating systems• Computational efficiency• Distribution of computation• Communication efficiency• Social welfare: maxoutcome ∑i ui(outcome) where ui is the utility for

player i.• Surplus: social welfare of outcome – social welfare of status quo

– Constant sum games have 0 surplus. Markets are not constant sum

• Pareto efficiency: An outcome o is Pareto efficient if there exists no other outcome o’ s.t. some agent has higher utility in o’ than in o and no agent has lower utility– Implied by social welfare maximization

• Individual rationality: Participating in the negotiation (or individual deal) is no worse than not participating

• Stability: No agents can increase their utility by changing their strategies (given everyone else keeps the same strategy)

• Symmetry: No agent should be inherently preferred, e.g. dictator

4

The term pareto efficient…

• The term pareto efficient is named after Vilfredo Pareto, an Italian economist who used the concept in his studies of economic efficiency and income distribution.

• If an economic system is not Pareto efficient, then it is the case that some individual can be made better off without anyone being made worse off. It is commonly accepted that such inefficient outcomes are to be avoided, and therefore Pareto efficiency is an important criterion for evaluating economic systems and political policies.

• He is also the one credited with the 80/20 rule to describe the unequal distribution of wealth in his country, observing that twenty percent of the people owned eighty percent of the wealth.

5

Strategic Form Game• A game: Formal representation of a

situation of strategic interdependence– Set of players, I |I|=n– Each agent, j, has a set of actions, Aj

• AKA strategy set

– Actions define outcomes • AKA strategic combination• For each possible set of actions, there is an

outcome.

– Outcomes define payoffs• Agents’ derive utility from different

outcomes

6

Normal form game*(matching pennies)

Agent 1

Agent 2

H

H

T

T

-1, 1

-1, 1

1, -1

1, -1

*aka strategic form, matrix form

ActionOutcome

Payoffs

7

Extensive form game(matching pennies)

Player 1

Player 2

H

H H

T

TT

(-1,1) (-1,1)(1,-1) (1,-1)

Action

Terminal node (outcome)

Payoffs (player1,player 2)

Player 2 doesn’t know what has been playedso he doesn’t know which node he is at.How fair would it be to say, “Let’s play matching pennies. You go first.” ?

8

Strategies• Strategy:

– A strategy, sj, is a complete contingency plan; defines actions which agent j should take for all possible states of the world

• Strategy profile: s=(s1,…,sn)– s-i = (s1,…,si-1,si+1,…,sn)

• Utility function: ui(s)– Note that the utility of an agent depends on

the strategy profile, not just its own strategy– We assume agents are expected utility

maximizers

9

Normal form game*(matching pennies)

Agent 1

Agent 2

H

H

T

T

-1, 1

-1, 1

1, -1

1, -1

*aka strategic form, matrix form

Strategy for agent 1: H

Strategy for agent 2: T

Strategy profile(H,T)

U1((H,T))=1U2((H,T))=-1

10

Extensive form game(matching pennies, sequential moves)

H

H H

T

TT

(-1,1) (-1,1)(1,-1) (1,-1)

Strategy for agent 1: T

Strategy profile: (T,(H,T))

U1((T,(H,T)))=-1

U2((T,(H,T)))=1

Recall: A strategy is a contingency plan for all states of the game. Now we have different states to worry about.

Strategy for agent 2: H if 1 plays H, T if 1 plays T so

(H,T) means H if 1 plays H, T if 1 plays T. (First value is associated with specific move of other player.)

11

Dominant Strategies• Recall that

– Agents’ utilities depend on what strategies other agents are playing

– Agents’ are expected utility maximizers

• Agents’ will play best-response strategies (if they exist)

• si* is a best response if ui(si*,s-i)ui(si’,s-i) for all si’

• A dominant strategy is a best-response for player i which is the best for all s-i

– They do not always exist– Inferior strategies are called dominated

12

Dominant Strategy Equilibrium

• A dominant strategy equilibrium is a strategy profile where the strategy for each player is dominant (so neither wants to change)

s*=(s*1,…,s*n)

ui(s*i,s-i)ui(s’i,s-i) for all i, for all s’i, for all s-i

• Known as “DUH” strategy.• Nice: Agents do not need to

counterspeculate (reciprocally reason about what others will do)!

13

Prisoners’ dilemma

-10, -10 0, -30

-30, 0 -1, -1

Confess

Confess

Don’t Confess

Don’t Confess

Ned

Kelly

Two people are arrested for a crime. If neither suspect confesses, both get light sentence. If both confess, then they get sent to jail. If one confesses and the other does not, then the confessor gets no jail time and the other gets a heavy sentence.

14

Prisoners’ dilemma

-10, -10 0, -30

-30, 0 -1, -1

Confess

Confess

Don’t Confess

Don’t Confess

Ned

Kelly

Note that no matter what Ned does, Kelly is better off if she confesses than if she does not confess. So ‘confess’ is a dominant strategy from Kelly’s perspective. We can predict that she will always confess.

15

Prisoners’ dilemma

-10, -10 0, -30

Confess

Confess

Don’t Confess

Don’t Confess

Ned

Kelly

The same holds for Ned.

16

Prisoners’ dilemma

-10, -10

Confess

Confess

Don’t Confess

Don’t Confess

Ned

Kelly

So the only outcome that involves each player choosing their dominant strategies is where they both confess.

Solve by iterative elimination of dominant strategies

17

Example: Prisoner’s Dilemma

• Two people are arrested for a crime. If neither suspect confesses, both get light sentence. If both confess, then they get sent to jail. If one confesses and the other does not, then the confessor gets no jail time and the other gets a heavy sentence.

• (Actual numbers vary in different versions of the problem, but relative values are the same)

-10,-10 0,-30

-30,0 -1,-1

Confess

Confess

Don’tConfess

Dom. Str. Eq not pareto optimal

Optimal Outcome

Don’tConfess

Pareto optimal

18

Iterated Elimination of Dominated Strategies

• Let RiSi be the set of removed strategies for agent i

• Initially Ri=Ø• Choose agent i, and strategy si such that siSi\

Ri (Si subtract Ri) and there exists si’ Si\Ri

such that

• Add si to Ri, continue

• Thm: If a unique strategy profile, s*, survives iterated elimination, then it is a Nash Eq.

• Thm: If a profile, s*, is a Nash Eq then it must survive iterated elimination.

ui(si’,s-i)>ui(si,s-i) for all s-i S-i\R-i

19

A simple competition game

60, 60 36, 70 36, 35

70, 36 50, 50 30, 35

35, 36 35, 30 25, 25

High

High

Medium

Medium

Low

Low

Pierce

Donna

Note – no player has a dominant strategy. But low is dominated for both players. So we can predict that neither will play low.

20

A simple competition game

60, 60 36, 70

70, 36 50, 50

High

High

Medium

Medium

Low

Low

Pierce

Donna

Once we have removed low, medium is now a dominant strategy. So we predict that both Pierce and Donna will play medium.

21

Example – Zero Sum (We divide the same cake. If I lose, you

win.) bi matrix form• Cake slicing• Two players

– cutter– chooser

Cutter'sUtility

Choose bigger piece

Choose smaller piece

Cut cake evenly

½ - a bit ½ + a bit

Cut unevenly

Small piece

Big piece

Chooser'sUtility

Choose bigger piece

Choose smaller piece

Cut cake evenly

½ + a bit ½ - a bit

Cut unevenly

Big piece Small piece

22

Rationality

• Rationality– each player will take highest utility option– taking into account the other player's likely

behavior• In example

– if cutter cuts unevenly• he might like to end up in the lower right• but the other player would never do that

– -10– if the current cuts evenly,

• he will end up in the upper left– -1

• this is a stable outcome– neither player has an incentive to deviate

Cutter'sUtility

Choose bigger piece

Choose smaller piece

Cut cake evenly

(-1, +1) (+1, -1)

Cut unevenly

(-10, +10)

(+10, -10)

23

Classic Examples

• Car Dealers– Why are they always

next to each other?– Why aren't they spaced

equally around town?• Optimal in the sense of

not drawing customers to the competition

• Equilibrium– because to move away

from the competitor is to cede some customers to it

Car Dealer

close far

close 4,4 6,3

far 3,6 5,5

24

Decision Tree

• Examines game interactions over time• Each node

– Is a unique game state• Player choices

– create branches• Leaves

– end of game (win/lose)• Important concept for design

– usually at abstract level• Example

– tic-tac-toe

25

Example: Bach or Stravinsky

• A couple likes going to concerts together. One loves Bach but not Stravinsky. The other loves Stravinsky but not Bach. However, they prefer being together than being apart.

2,1 0,0

0,0 1,2

B

B S

S

No dominant strategy equilibrium

26

Nash Equilibrium• Sometimes an agent’s best-response depends

on the strategies other agents are playing– No dominant strategy equilibria

• A strategy profile is a Nash equilibrium if no player has incentive to deviate from his strategy given that others do not deviate.

• Need to know that others are playing fixed choice

– for every agent i, ui(si*,s-i) ≥ ui(si’,s-i) for all si’

2,1 0,0

0,0 1,2

B

S

B S

27

Example: Mozart Mahler• A couple likes going to concerts together. Both

prefer Mozart. Two Nash Equilibrium. (Mozart, Mozart) is better, but Nash Equilibrium also exists at (Mahler, Mahler)

2,2 0,0

0,0 1,1

Mozart

Mozart Mahler

Mahler

28

Example – Rock, scissors, paper

• Players – Ernie and Bert• Strategies – Rock, Scissors, Paper• Payoffs

– If choose the same strategy, neither wins.– If one chooses rock and other chooses

scissors, then rock wins $1 from other.– If one chooses rock and other chooses

paper, then paper wins $1 from other.– If one chooses paper and other chooses

scissors, then scissors wins $1 from other.

29

Example – Rock, scissors, paper

0,0 1,-1 -1,1

-1,1 0,0 1,-1

1,-1 -1,1 0,0

Rock

Rock

PaperScissors

Scissors

Paper

Bert

Ernie

No Nash Equilbrium

30

Example: Hawk Dove• Two animals fight over prey. Best outcome is for

one to act like Hawk and other to act like Dove. Two Nash Equilbria.

3,3 1,4

4,1 0,0

Dove

Dove Hawk

Hawk

31

Solutions to simultaneous games

If there is no unique solution in dominant/dominated strategies then we use ‘mutual best response analysis’ to find a Nash equilibrium.

An outcome is a Nash equilibrium, if each player -- holding the choices of all other players as constant -- cannot do better by changing their own choice.

So where all players are playing their ‘best response’, this is a Nash equilibrium.

32

How much will we clean?

3, 3-2, 6- 8, 39 hours

6, - 2 4, 4- 4, 26 hoursRoommate

1

3, - 82, - 41, 13 hours

9 hours6 hours3 hours

Roommate 2

33

How much will we clean?

3, 3-2, 6- 8, 39 hours

6, - 24, 4- 4, 26 hoursRoom-mate

1

3, - 82, - 41, 13 hours

9 hours6 hours3 hours

Roommate 2Best responses for Roommate 1:(best first value in each column)

34

How much will we clean?

3, 3-2, 6 - 8, 39 hours

6, - 24, 4 - 4, 26 hoursRoom-mate

1

3, - 82, - 41, 13 hours

9 hours6 hours3 hours

Roommate 2Best responses for Roommate 2:best 2nd value in

each row

35

Best response for both:(Mutual best response)

3, 3-2, 6- 8, 39 hours

6,- 24, 4- 4, 26 hoursRoom-mate

1

3, - 82, - 41, 13 hours

9 hours6 hours3 hours

Roommate 2Two Nash Two Nash EquilibriaEquilibria

36

concepts of rationality [doing the rational thing]

• undominated strategy (problem: too weak) can’t always find a single one• (weakly) dominating strategy (alias “duh?”) (problem: too strong, rarely exists)• Nash equilibrium (or double best response) (problem: may not exist) • randomized (mixed) Nash equilibrium – players

choose various options based on some random number (assigned via a probability)

Theorem [Nash 1952]: randomized Nash Equilibrium always exists.

.

.

.

37

Why is a Nash equilibrium a sensible solution?

A Nash equilibrium can be viewed as a self-reinforcing agreement (e.g. what is reasonable if players can talk before the game but cannot sign binding contracts).

A Nash equilibrium can be viewed as a consistent set of conjectures by all players recognising their strategic interdependence.

A Nash equilibrium can be viewed as the result of ‘learning’ over time

38

Nash Equilibrium• Interpretations:

– Focal points, self-enforcing agreements, stable social convention, consequence of rational inference..

• Criticisms– They may not be unique (Bach or Stravinsky)

• Ways of overcoming this– Refinements of equilibrium concept, Mediation, Learning

– Do not exist in all games – They may be hard to find (if lots of choices)– People don’t always behave based on what

equilibria would predict (ultimatum games and notions of fairness,…)

39

Nash Equilibrium Test(for continuous choices)

• If utilities can be represented as a function ui:S1xS2x…Sn

• Can find Nash equilibrium if each si* is selected to make partial derivative with respect to si equal to zero. In other words:

• If each si* is the only solution• And

0*)*,...( 1

i

ni

sssu

021

2 *)*,...(

i

ni

s

ssu

40

Example

• u1(x,y,z) = 2xz – x2y

• u2(x,y,z) =

• u3(x,y,z) = 2z – xyz2

• du1/dx = 2z-2xy = 0

• du2/dy = 0 =

• du3/dz = 0 = 2-2xyz

• Solution (1,1,1)

yzyx )(12

1)(

3

zyx

41

How do we tell if a Nash Equilibrium exists?

• In a zero sum game, we say player 1 maximinimizes if he chooses an action that is best for him on the assumption that player j will chose her action to hurt him as much as possible.

• A Nash equilbrium exists iff the action of each is a maxminimizer

42

Fixed Points

• Let a* be a profile of actions such that a*i

Bi(a*-i) where B is the “best response” function. In other words, Bi says that if other responses are known, a*i is the best for player i.

• Fixed point theorems give conditions on B under which there exists a value of a* such that a* B(a*). In other words, given what other people will do, no one will change.

43

Intuition behind Brouwer’s fixed point theorem

• Take two sheets of paper, one lying directly above the other. Draw a grid on the paper, number the gridboxes, then xerox that sheet of paper. Crumple the top sheet, and place it on top of the other sheet. You will see that at least one number is on top of the corresponding number on the lower sheet of paper. Brouwer's theorem says that there must be at least one point on the top sheet that is directly above the corresponding point on the bottom sheet.

• In dimension three, Brouwer's theorem says that if you take a cup of coffee, and slosh it around, then after the sloshing there must be some point in the coffee which is in the exact spot that it was before you did the sloshing (though it might have moved around in between). Moreover, if you tried to slosh that point out of its original position, you can't help but slosh another point back into its original position.

44

Brouwer’s fixed point theorem in dimension one• Theorem:

• Let f : [0, 1] → [0, 1] be a continuous function. Then, there exists a fixed point, i.e. there is a x* in [0, 1] such that f (x*) = x*.

• Proof: There are two essential possibilities: (i) if f(0) = 0 or if f(1) = 1, then we are done.(ii) if f (0)≠0 and f(1)≠1, then define F(x) =f(x) - x. In this case:F(0) = f(0) - 0 =f(0) > 0F(1) = f(1) - 1 < 0So F: [0, 1] → R, where F(0)·F(1) < 0. As f(.) is continuous, then F(.) is also continuous. Then by using the Intermediate Value Theorem, there is a x* in [0, 1] such that F(x*) = 0. By the definition of F(.), then F(x*) = f (x*) - x* = 0, thus f (x*) = x*.

45

General statement of Brouwer’s fixed point

theorem • Theorem:

Any continuous function from a closed n-dimensional ball into itself must have at least one fixed point.

• Continuity of the function is essential (if you rip the paper or if you slosh discontinuously, then there may not be fixed point).

• The closure of the ball is also essential; there exists continuous mapping f:(0,1)→(0,1) with no fixed points.

• The round shape of the ball is not essential; instead one can replace it by any shape obtained by a continuous deformation of the ball. However, one cannot replace it by a something with `holes', like a donut shape.

46

Applications of Brouwer’s fixed point theorem

• Topology is a branch of pure mathematics devoted to the shape of objects. It ignores issues like size and angle, which are important in geometry.

• For this reason, it is sometimes called rubber-sheet geometry.

• One important problem in topology is the study of the conditions under which any transformation of a certain domain has a point that remains fixed.

• Fixed point theorems are some of the most important theorems in all of mathematics. Among other applications, they are used to show the existence of solutions to differential equations, as well as the existence of equilibria in game theory.

• The Brouwer fixed point theorem was a main mathematical tool in John Nash’s papers, for which he has won a Nobel prize in economics.

47

History• Brouwer was a major contributor to the

theory of topology. He did almost all his work in topology between 1909 and 1913. He discovered characterizations of topological mappings of the Cartesian plane and a number of fixed point theorems.

• He later rejected many of his results, as being “non-constructive”.

• Brouwer founded the doctrine of mathematical intuitionism, in which a nonconstructive argument cannot be accepted as proof of existence.

• He gave grounds to reject the law of excluded middle (proof by contradiction), which many logicians had taken to be true for all statements, going back a millenium or two.

• Intuitionistic logic does not permit the inference:

• not(not(p)) => (p)

Luitzen Egbertus Jan Brouwer

Born: Feb 27, 1881 in Netherlands

Died: Dec 2, 1966 in Netherlands

48

Mixed strategy equilibria

i(sj)) is the probability player i selects strategy sj

• (0,0,…1,0,…0) is a pure strategy• Strategy profile: =(1,…, n)• Expected utility: ui()=sS(j (sj))ui(s)• (chance the combination occurs times

utility)• Nash Equilibrium:

* is a (mixed) Nash equilibrium if

ii defines a probability distribution over Si

ui(*i, *-i)ui(i, *-i) for all ii, for all i

49

Example: Matching Penniesno pure strategy Nash Equilibrium

-1, 1 1,-1

1,-1 -1, 1H

H T

T

So far we have talked only about pure strategy equilibria [I make one choice.].

Not all games have pure strategy equilibria. Some equilibria are mixed strategy equilibria.

50

Example: Matching Pennies

-1, 1 1,-1

1,-1 -1, 1

p H

q H 1-q T

1-p T

Want to play each strategy with a certain probability. If player 1 is optimally mixing strategies, player 1 is indifferent to what player 2 does. Compute expected utility given each pure possibility of other player.

51

• If player1 picks head:• -q+(1-q)• If Player 1 picks tails• q + -(1-q)• Want not to care about own choices:• -q +(1-q) =q + -1+q• 1-2q=2q-1 so q=1/2

52

Example: Bach/Stravinsky

2, 1 0,0

0,0 1, 2

p B

q B 1-q S

1-p S

Want to play each strategy with a certain probability. If player 1 is optimally mixing strategies, player 1 is indifferent to what player1 does. Compute expected utility given each pure possibility of yours.

p = 2(1-p) p=2/3

2q = (1-q) q=1/3

player 1 is optimally mixing

player 2 is optimally mixing

53

“I Used to Think I Was Indecisive

- But Now I’m Not So Sure”-Anonymous

54

Mixed Strategies• Unreasonable predictors of

one-time human interaction

• Reasonable predictors of long-term proportions

55

Employee Monitoring• Employees can work hard or shirk

• Salary: $100K unless caught shirking • Cost of effort: $50K

• Managers can monitor or not• Value of employee output: $200K• Profit if employee doesn’t work: $0• Cost of monitoring: $10K

56

• Best replies do not correspond• No equilibrium in pure strategies• What do the players do?

Employee MonitoringManager

Monitor No Monitor

EmployeeWork 50 , 90 50 , 100

Shirk 0 , -10 100 , -100

57

Mixed Strategies• Randomize – surprise the rival

• Mixed Strategy:• Specifies that an actual move be chosen

randomly from the set of pure strategies with some specific probabilities.

• Nash Equilibrium in Mixed Strategies:• A probability distribution for each player• The distributions are mutual best

responses to one another in the sense of expectations

58

Finding Mixed Strategies• Suppose:

• Employee chooses (shirk, work) with probabilities (p,1-p)

• Manager chooses (monitor, no monitor) with probabilities (q,1-q)

• Find expected payoffs for each player• Use these to calculate best responses

59

Employee’s Payoff• First, find employee’s expected

payoff from each pure strategy

• If employee works: receives 50• Profit(work) = 50 q + 50 (1-q)

= 50

• If employee shirks: receives 0 or 100• Profit(shirk) = 0 q + 100 (1-q)

= 100 – 100q

60

Employee’s Best Response• Next, calculate the best strategy for

possible strategies of the opponent

• For q<1/2: SHIRK

Profit(shirk) = 100-100q > 50 = Profit(work)

• For q>1/2: WORK

Profit(shirk) = 100-100q < 50 = Profit(work) • For q=1/2: INDIFFERENT

Profit(shirk) = 100-100q = 50 = Profit(work)

61

Manager’s Best Response• u2(mntr) = 90 (1-p) - 10 p• u2(no m) = 100 (1-p) -100p• For p<1/10: NO MONITOR

u2 (mntr) = 90-100p < 100-200p = u2(no m)

• For p>1/10: MONITOR

u2(mntr) = 90-100p > 100-200p = u2(no m)

• For p=1/10: INDIFFERENT

u2(mntr) = 90-100p = 100-200p = u2(no m)

62

Cycles

q0 11/2

p

0

1/10

1

shirk

work

monitorno monitor

63

Mutual Best Replies

q0 11/2

p

0

1/10

1

shirk

work

monitorno monitor

64

Mixed Strategy Equilibrium

• Employees shirk with probability 1/10• Managers monitor with probability ½• Expected payoff to employee:

chance of each of four outcomes x payoff from each

• Expected payoff to manager:

50 ]5050[]1000[21

21

109

21

21

101

80 ]100100[]1090[101

109

21

101

109

21

65

Properties of Equilibrium• Both players are indifferent between

any mixture over their strategies• E.g. employee:

• If shirk:

• If work:

• Regardless of what employee does, expected payoff is the same

50 ]1000[21

21

50 ]5050[21

21

66

Use Indifference to Solve Iq 1-q

Monitor No Monitor

Work 50, 90 50 , 100 = 50q+50(1-q)

Shirk 0, -10 100 , -100 = 0q+100(1-q)

50q+50(1-q) = 0q+100(1-q) 50 = 100-100q

50 = 100q q = 1/2

67

Use Indifference to Solve IIMonitor No Monitor

1-p Work 50 , 90 50 , 100

p Shirk 0 , -10 100 , -100

= 90(1-p)-10p = 100(1-p)-100p

90(1-p)-10p = 100(1-p)-100p90-100p = 100 – 200p

100p = 10 p = 1/10

68

Indifference

1/2 1/2

Monitor No Monitor

9/10 Work 50 , 90 50 , 100 = 50

1/10 Shirk 0 , -10 100 , -100 = 50

= 80 = 80

69

Upsetting?• This example is upsetting as it appears to tell you, as

workers, to shirk.• Think of it from the manager’s point of view, assuming

you have unmotivated (or unhappy) workers.• A better option would be to hire dedicated workers, but

if you have people who are trying to cheat you, this gives a reasonable response.

• Sometimes you are dealing with individuals who just want to beat the system. In that case, you need to play their game. For example, people who try to beat the IRS.

• On the positive side, even if you have dishonest workers, if you get too paranoid about monitoring their work, you lose! This theory tells you to lighten up!

• This theory might be applied to criticising your friend or setting up rules/punishment for your (future?) children.

70

Why Do We Mix?• Since a player does not care what

mixture she uses, she picks the mixture that will make her opponent indifferent!

COMMANDMENT

Use the mixed strategy that keeps your opponent guessing.

71

Mixed Strategy Equilibriums

• Anyone for tennis?

– Should you serve to the forehand or the backhand?

72

Tennis Payoffs

Server's Aim

Receiver'sMove

Forehand Backhand

Forehand 90, 10 20, 80

Backhand 30, 70 60, 40

73

Zero Sum Game (or fixed sum)

If you win (the points), I lose (the points)AKA: Strictly competitiveServer's Aim

Receiver'sMove

Forehand Backhand

Forehand 90 20

Backhand 30 60

X 1-X

Y

1-Y

74

Solving for Server’s Optimal Mix

• What would happen if the the server always served to the forehand?

– A rational receiver would always anticipate forehand and 90% of the serves would be successfully returned.

75

Solving for Server’s Optimal Mix

• What would happen if the the server aimed to the forehand 50% of the time and the backhand 50% of the time and the receiver always guessed forehand?

– (0.5*0.9) + (0.5*0.2) = 0.55 successful returns

76

Solving for Server’s Optimal Mix

• What is the best mix for each player?

77

% of Successful Returns Given Server and Receiver Actions

% of Successful Returns

% of ServesAimed atForehand

ReceiverAnticipatesForehand

ReceiverAnticipatesBackhand

0 20 60

20 34 54

50 55 45

70 69 39

100 90 30

Where would you shoot knowing the other playerwill respond to your choices?In other words, you pick the rowbut will likelyget the smaller value in a row.

78

% of Successful Returns Given Server and Receiver Actions

• If 20% of the serves are aimed at the forehand and the receiver is anticipating forehand then the % of successful returns is:

– (0.2 * 0.9) + (0.8 * 0.2) = 0.34

– Therefore, 34% of the serves are returned successfully.

79

% of Successful Returns Given Server and Receiver Actions

• More generally, when the receiver anticipates forehand the % of successful returns is defined by:

– X = % of serves aimed at forehand– 1-X = % of serves aimed at backhand

– % of Successful Returns = 0.90X + 0.20(1-X)

80

Server’s Point of ViewNote, high number of returns is bad for

server!

60

20

90

30

Y = % of SuccessfulReturns

X = % of Serves Aimed at Forehand

ReceiverAnticipates Forehand

Y = 0.9X + 0.2(1-X)

81

Server’s Point of View

60

20

90

30

Y = % of SuccessfulReturns

X = % of Serves Aimed at Forehand

ReceiverAnticipates Backhand

Y = 0.3X + 0.6(1-X)

82

Server’s Point of View

60

20

90

30

Y = % of SuccessfulReturns

X = % of Serves Aimed at Forehand

ReceiverAnticipates Backhand

Receiver AnticipatesForehand

83

Envision This

• Envision this in 3 space.• This is the payoff function for the server.• These two lines are cross sections with

the planes q=0 and q=1 (where q is probability of receiver planning on forehand).

• You are taking the derivative with respect to q and looking for a partial derivative of zero.

• The point these lines cross in two space is a stationary line in 3 space

84

Best Response

• Where can the server minimize the receiver’s maximum payoff?

85

Solving for Mixed Strategy Equilibrium

• Set the linear equations equal to each other and solve:

– 0.9X + 0.2(1-X) = 0.3X + 0.6(1-X)– X = 0.40

86

Solving for Mixed Strategy Equilibrium

• If the server mixes his serves 40% forehand / 60% backhand, the receiver is indifferent between anticipating forehand and anticipating backhand because her payoff (% of successful returns) is the same.

87

Solving for the Optimal Mix

• Now we have to do the same thing from the receiver’s point of view to determine how often the receiver should anticipate forehand/backhand.

• In equilibrium, if player A is optimally mixing then player B is indifferent to the action player B selects. If a player is not optimally mixing then he can be taken advantage of by his opponent. This fact allows us to easily solve for the optimal mix in zero sum, 2x2 games.

88

Zero Sum GameAssume both with optimally

mixServer's Aim

Receiver'sMove

Forehand(X)

Backhand(1-X)

Forehand(Y)

90 20

Backhand(1-Y)

30 60

89

Receiver’s Optimal Mix

• If the receiver is optimally mixing her anticipation of forehand (Y) and backhand (1-Y), then the server is indifferent between aiming forehand/backhand because his payoff is the same.

• y(10) + (1-y)70 = y(80)+(1-y)40• 10y +70-70y=80y+40 -40y• y = 30/100

90

Receiver’s Optimal Mix

• This means that if the receiver is optimally mixing then the server’s payoff for aiming forehand is equal to his payoff for aiming backhand.

91

Similarly Server’s Optimal Mix• If the server is optimally mixing her forehand (X) and

backhand (1-X), then the receiver is indifferent between anticipating forehand/backhand because her payoff is the same.

• Solving for X:

• 90X + 20(1-X) = 30X + 60(1-X)• 90X + 20- 20X = 30X + 60 -60X• 70X+20 = -30X + 60• X = .40

• Thus the server should serve forehand 40% of the time and backhand 60%.

92

Computing mixed stategies for two players (the book’s

way)• Write the matrix game in bi matrix form A=[aij] B=[bij]• Compute payoffs

• Replace pm =1- and qn =1-

• Consider the partial derviatives of 1 and 2 with respect to all pi and all qi respectively.

• Solve system of equations with all partials set to zero

m

i

n

jijji aqpqp

1 11 ),(

m

i

n

jijji bqpqp

1 12 ),(

1

1

m

iip

1

1

n

jjq

93

Example

40

01

10

03BA

1 = 3 p1q1 + p2q2 = 3p1q1 +(1-p1)(1-q1)= 1 +-p1 –q1 +4p1q12 = p1q1 + 4p2q2 = p1q1 +4(1-p1)(1-q1)= 4 -4p1-4q1 +5p1q1d1 /dp1 = -1 +4q1 so q1 = ¼d2 /dq1 = -4 +5p1 so p1 = 4/5So strategies are ((4/5,1/5)(¼, ¾))

94

Example 2

40

01

12

13BA

1 = 3 p1q1 + -p1q2 -2p2q1 +p2q2 = 3 p1q1 + -p1(1-q1) -2(1-p1)q1 +(1-p1)(1-q1)=3p1q1 –p1 +p1q1 -2q1 +2p1q1 + 1- p1 –q1 +p1q1=1+7p1q1-2p1-3q12 = p1q1 + 4p2q2 = p1q1 +4(1-p1)(1-q1)= 4 -4p1-4q1 +5p1q1d1 /dp1 = -2 +7q1 so q1 = 2/7d2 /dq1 = -4 +5p1 so p1 = 4/5So strategies are ((4/5,1/5)(2/7,5/7))

95

Tennis Example

4070

8010

6030

2090BA

1 = 90 p1q1 + 20p1q2 +30p2q1 +60p2q2 = 90pq +20p(1-q) + 30(1-p)q +60(1-p)(1-q)= 90pq + 20p-20pq +30q-30pq +60 -60p-60q+60pq= 60+100pq -40p -30q2 = 10pq + 80p(1-q)+70(1-p)q+40(1-p)(1-q) = 10pq +80p -80pq +70q-70pq+40-40p-40q+40pq=-100pq +40p+30q +40d1 /dp1 =100q-40 so q = .4d2 /dq1 = -100p +30 so p =.3 So strategies are ((.3, .7)(.4, .6))

96

Mixed Nash Equilibrium• Thm (Nash 50):

– Every game in which set strategy sets, S1,…,Sn, have a finite number of elements has a mixed strategy equilibrium.

• Finding Nash Equil is another problem– “Together with factoring, the complexity of

findind a Nash Eq is, in my opinion, the most important concrete open question on the boundary of P today” (Papadimitriou)

97

The critique of mixed Nash

• Is it really rational to randomize?(cf: bluffing in poker, IRS audits)

• If (x,y) is a Nash equilibrium, then any y’ with the same support (set of choices by other player) is as good as y.

• Convergence/learning results mixed• There may be too many Nash equilibria

98

Consider: Bach or Stravinsky

• If the other player is maximally mixing, my payoffs are the same, so 2(Y) = 1(1-Y); Y = 1/3

• 1(X) = 2 (1-X); X =2/3

2,1 0,0

0,0 1,2

B(X)

B (Y) S(1-Y)

S(1-X)

No dom. str. equil.

99

Best Response Function

• If 0 < Y < 1/3, then player 1’s best response is X=0.

• If y = 1/3, then ALL of player 1’s responses are best responses

• If y > 1/3, then player 1’s best response is X=1.

• Using excel, prove this to yourself!

100

p qplayer 1 player 2

0.1 0.1 0.83 1.63

0.1 0.2 0.76 1.46

0.1 0.3 0.69 1.29

0.1 0.4 0.62 1.12

0.1 0.5 0.55 0.95

0.1 0.6 0.48 0.78

0.1 0.7 0.41 0.61

0.1 0.7 0.41 0.61

0.1 0.8 0.34 0.44

0.1 0.9 0.27 0.27

0.1 1 0.2 0.1

p q player 1player 2

0.67 0.1 0.4338 0.67

0.67 0.2 0.5336 0.67

0.67 0.3 0.6334 0.67

0.67 0.4 0.7332 0.67

0.67 0.5 0.833 0.67

0.67 0.6 0.9328 0.67

0.67 0.7 1.0326 0.67

0.67 0.7 1.0326 0.67

0.67 0.8 1.1324 0.67

0.67 0.9 1.2322 0.67

0.67 1 1.332 0.67

p q player 1player 2

0.1 0.33 0.669 1.24

0.2 0.33 0.668 1.14

0.3 0.33 0.667 1.04

0.4 0.33 0.666 0.94

0.5 0.33 0.665 0.84

0.6 0.33 0.664 0.73

0.7 0.33 0.663 0.63

0.7 0.33 0.663 0.63

0.8 0.33 0.662 0.53

0.9 0.33 0.661 0.43

1 0.33 0.66 0.33

101

Best Response Function(The dotted line is a function only if you mentally switch

the axes.)

Y

X

1/3

1

2/3 1

Fixed Point – wherebest response functionsintersect is the nash Equilibrium

The best response of player 1is shown as a dotted line.

102

Repeated games

• A repeated game involves the same players playing the same simultaneous move game over and over again.• For example, Ned and Kelly play the prisoners’

dilemma 10 times• The simultaneous move game that is repeated

is called the ‘stage game’ of the repeated game

• Repeated games may be Finite (definite ‘last round’) or Infinite (in theory may go on forever)

• Competition between firms is often like an infinite repeated prisoners’ dilemma game

103

Repeated Interaction

• Review– Simultaneous games

• Put yourself in your opponent’s shoes• Iterative reasoning

• Outline:– What if interaction is repeated?– What strategies can lead players to

cooperate?

104

The Prisoner’s Dilemma(Different numbers – same relationship)

Consider profits based on price of toothpaste.

Firm 2

Low High

Firm 1Low 54 , 54 72 , 47

High

47 , 72 60 , 60

Equilibrium: $54 K

Cooperation: $60 K

105

Prisoner’s Dilemma

• Private rationality collective irrationality

• The equilibrium that arises from using dominant strategies is worse for every player than the outcome that would arise if every player used her dominated strategy instead

• Goal:• To sustain mutually beneficial cooperative

outcome overcoming incentives to cheat (if you have agreed beforehand what you will do)

106

Moving Beyond the Prisoner’s

Dilemma• Why does the dilemma occur?– Interaction

• No fear of punishment• Short term or myopic play

– Firms:• Lack of monopoly power – can’t force others to

pick the cooperative choice.• Homogeneity in products and costs – if all the

same, can easily buy from different firm.• Overcapacity – if have capacity for more without

increased cost, changes incentives.• Incentives for profit or market share – if desperate

to get more of market share, may select a lower payoff initially. WalMart strategy.

107

Moving Beyond the Prisoner’s

Dilemma• Why does the dilemma occur?– Consumers

• Price sensitive, want cheaper regardless of quality.

• Price aware – know real value and unwilling to pay more

• Low switching costs – can switch between brands easily as prices fluctuate.

108

Solution - Altering Interaction

• Interaction– No fear of punishment

• Exploit repeated play

– Short term or myopic play• Introduce repeated encounters• Introduce uncertainty – not sure when

interaction will end

109

Finite Interaction (Silly Theoretical

Trickery)• Suppose the market relationship lasts

for only T periods• Use backward induction (rollback)• Tth period: no incentive to cooperate

• No future loss to worry about in last period

• T-1th period: no incentive to cooperate• No cooperation in Tth period in any case• No opportunity cost to cheating in period T-1

• Unraveling: logic goes back to period 1

110

Finite Interaction

• Cooperation is impossible if the relationship between players is for a fixed and known length of time.

• But, people think forward (what will my opponent do) if …– Game length uncertain– Game length unknown– Game length too long to think to end

111

Finite Interaction (Theoretical Aside)

• Unraveling prevents cooperation if the number of periods is fixed and known

• Probabilistic termination– The “game” continues to the next period

with some probability p:• Equivalent to infinite game

– $1 next year is worth now

– Value of future = { value if there is a future } { probability of a future }

– Effective interest rate: r’ =

pr1

1

11

p

r

112

In the first period

-10, -10 0, -30

-30, 0 -1, -1

Confess

Confess

Don’t Confess

Don’t Confess

Ned

Kelly

Both Ned and Kelly can predict that regardless of what happens in the first period they will both confess in the second period. Knowing this, the best thing that they can individually do in the first period is to confess.

So finite repetition doesn’t help at all!

113

Objections!

• What if repeat more than 2 times – say 50 times.• Using Roll Back, this does not help!

• In the last round, both Ned and Kelly will confess – so we can throw that round out.

• So the second last round can have no effect on the last round. So both will confess in the second last round

• And so on as we ‘roll back’ the game tree

• But does this suggest a problem with using Roll Back to solve all sequential games?

114

The centipede gameJack

stop

(2, 0)

Go on Jill

stop

(1, 4)

JillGo on

stop

(5, 3)

JackGo on

stop

(4, 7)

Jill

(98, 96)

stop

(99, 99)

Go on

(97, 100)

stop

Go on

JackGo on Jill

(94, 97)

115

The centipede gameJack

stop

(2, 0)

Go on Jill

stop

(1, 4)

JillGo on

stop

(5, 3)

JackGo on

stop

(4, 7)

Jill

(98, 96)

stop

(99, 99)

Go on

(97, 100)

stop

Go on

JackGo on Jill

(94, 97)

The solution to this game through roll back is for Jack to stop in the first round!

116

The centipede game

• What actually happens?• In experiments the game usually continues

for at least a few rounds and occasionally goes all the way to the end.

• But going all the way to the (99, 99) payoff almost never happens – at some stage of the game ‘cooperation’ breaks down.

• So still do not get sustained cooperation even if move away from ‘roll back’ as a solution

117

Lessons from finite repeated games

– Finite repetition often does not help players to reach better solutions

– Often the outcome of the finitely repeated game is simply the one-shot Nash equilibrium repeated again and again.

– There are SOME repeated games where finite repetition can create new equilibrium outcomes. But these games tend to have special properties

– For a large number of repetitions, there are some games where the Nash equilibrium logic breaks down in practice.

118

Infinitely repeated games

• In ‘real life’, there are many times when you do not know for sure that this is the ‘last round’ of the game• When firms interact there is always a

chance that they will interact again in the future. So repeated competition is more like an infinitely repeated game.

119

Long-Term Interaction

• No last period, so no rollback• Use history-dependent strategies• Trigger strategies:

• Begin by cooperating• Cooperate as long as the rivals do• Upon observing a defection:

immediately revert to a period of punishment of specified length in which everyone plays non-cooperatively

120

Two Trigger Strategies

• Grim trigger strategy– Cooperate until a rival deviates– Once a deviation occurs,

play non-cooperatively for the rest of the game

• Tit-for-tat– Cooperate if your rival cooperated

in the most recent period– Cheat if your rival cheated

in the most recent period

121

What is Credibility?

“ The difference between genius and stupidity is that genius has its limits.”

– Albert Einstein

• You are not credible if you propose to take suboptimal actions.:

If a rational actorproposes to play a strategy which earns suboptimal profit.

• How can one be credible?

122

Trigger Strategy Extremes

• Tit-for-Tat is– most forgiving– shortest memory– proportional– credible

but lacks deterrence

Tit-for-tat answers:

“Is cooperation easy?”

• Grim trigger is– least forgiving– longest memory– MAD– adequate deterrence

but lacks credibility

Grim trigger answers:

“Is cooperation possible?”

123

Why Cooperate (Against GrimTriggerStrategy)?

• Cooperate if the present value of cooperation is greater than the present value of defection

• Cooperate: 60 today, 60 next year, 60 … 60 • Defect: 72 today, 54 next year, 54 … 54

Low High

Firm 1Low 54 , 54 72 , 47

High 47 , 72 60 , 60

Firm 2

124

Payoff Stream (GTS)

72

54

t t+1 t+2 t+3

defect

time

profit

60 cooperate

125

Calculus of GTS

• Cooperate if (r is number of times to repeat)

• Cooperation is sustainable using grim trigger strategies as long as r >2

PV(defection)

72…54…54…54…

72 + 54*r

12

2

PV(cooperation)

60…60…60…60…

60 + 60*r

6*r

r

>

>

>

>

>

126

Payoff Stream (TitForTat)

72

47

t t+1 t+2 t+3 time

profit

60 cooperatedefect oncedefect54

127

Trigger Strategies• Grim Trigger and Tit-for-Tat

are extremes• Balance two goals:

Deterrence• GTS is adequate punishment• Tit-for-tat might be too little, especially if I can

invest the money I make now (so all money is not the same).

Credibility• GTS hurts the punisher too much• Tit-for-tat is credible

128

Optimal Punishment

COMMANDMENT

In announcing a punishment strategy:

Punish enough to deter your opponent. Temper punishment to remain credible.

129

Axelrod’s Simulation

• R. Axelrod, The Evolution of Cooperation• Prisoner’s Dilemma repeated 200 times• Economists submitted strategies• Pairs of strategies competed• Winner: Tit-for-Tat• Reasons:

•Forgiving, Nice, Provocable, Clear

130

Main Ideas from Axelrod

• Not necessarily tit-for-tat• Doesn’t always work • Works because you knew the mix of the

opponets. Would lose against all defect.

• Don’t be envious• Don’t be the first to cheat• Reciprocate opponent’s behavior

• Cooperation and defection

• Don’t be too clever