Download ppt - Concepts of Game Theory II. 2 The prisioners reasoning… Put yourself in the place of prisoner i (or j)… Reason as follows: –Suppose I cooperate… If j

Concepts of Game Theory II

2

The prisioner’s reasoning…• Put yourself in the place of prisoner i (or j)…• Reason as follows:

– Suppose I cooperate…• If j cooperates, we both get a payoff of 3. • If j defects, then I will get a payoff of 0.

Best payoff I can be guaranteed to get if I cooperate is 0.– Suppose I defect…

• If j cooperates, I get a payoff of 5. • If j defects, then I will get a payoff of 2.

Best payoff I can be guaranteed to get if I defect is 2.• In summary:

– If I cooperate the worst case is that I will get a payoff of 0– If I defect the worst case is that I will get a payoff of 2– I’d prefer a guaranteed payoff of 2 to a payoff of 0!

i

j

3

Features of Prisoner’s Dilemma (1)• The individual rational action is defect

– This guarantees a pay-off of no worse than 2– Whereas cooperating guarantees a pay-off of at most 1.

• So, defection is the best response to all possible strategies: – Both agents defect and get a pay-off of 2

• But naïve intuition says this is not the best outcome: – They could both cooperate and each get a pay-off of 3!

4

Features of Prisoner’s Dilemma (2)• This apparent paradox is the fundamental problem of multi-

agent interactions.– It seems to imply that cooperation will not occur in societies of

self-interested agents.• A real world example: nuclear arms reduction• The prisoner’s dilemma is ubiquitous (very common!)• Can we recover cooperation?

5

Arguments for Recovering Cooperation• Some conclusions that have been drawn from this analysis:

– The game theory notion of rational action is wrong!– Somehow the dilemma is being formulated incorrectly.

• Arguments to recover cooperation:– We are not all Machiavellian!– The other prisoner is my twin!– People are not (always) rational!– The shadow of the future…

6

The Iterated Prisoner’s Dilemma• One answer: play the game more than once

– Let’s use an applet:

• If you know you will be meeting your opponent again– Then the incentive to defect appears to evaporate.

• Cooperation is the rational choice in the infinitely repeated prisoner’s dilemma

7

Backwards Induction• Suppose you both know that you will play the game exactly

n times• On round n, you have an incentive to defect to gain that

extra bit of pay-off.• This makes round n-1 the last “real” game, and so you have

an incentive to defect there too– And so on…

• When playing the prisoner’s dilemma with a – fixed– finite– pre-determined and – commonly known

number of rounds, defection is the best strategy.

8

Axelrod’s Tournament• Suppose you play the prisoner’s dilemma game against a

range of opponents.• What single strategy should you use to play against all

these opponents so that you maximise your overall pay-off?

• Axelrod (1984) investigated this problem with a tournament for computer programs playing the prisoner’s dilemma.

http://www-personal.umich.edu/~axe/

Rob

ert

A

xelr

od

9

Strategies• ALL-D

– Always defect — the hawk strategy.• TIT-FOR-TAT

– On round u=0, cooperate– On round u>0, copy the opponent’s round u-1 move

• TESTER– On round u=0, defect. – If the opponent retaliated, then play TIT-FOR-TAT– Otherwise intersperse cooperation and defection

• JOSS– As for TIT-FOR-TAT, except periodically defect

10

How to succeed in Axelrod’s TournamentAxelrod suggests the following:• Don’t be envious

– Don’t play as if it were a zero sum game– You don’t have to beat your opponent for you to do well

• Be nice (don’t be the first to defect)– Start by cooperating, and reciprocate cooperation

• Retaliate appropriately– Always punish defection immediately, – But use “measured” force — don’t overdo it

• Don’t hold grudges– Always reciprocate cooperation immediately

11

Who wins?• In the 1980s tournament, TIT-FOR-TAT won.• But, when paired with a mindless strategy like RANDOM,

TIT-FOR-TAT sinks to its opponent's level.• So, it can’t be seen as a “best” strategy.• The tournament was run again in 2004, and TIT-FOR-TAT

did not win.• What strategy won, and why?

12

Game of Chicken

• Difference to prisoner’s dilemma:– Mutual defection is the most feared outcome.

• Strategies (C,D ) and (D,C ) in Nash equilibrium.

i

j

Defect Coop

Defect1

12

4

Coop4

23

3

13

The Stag Hunt (1)

• You can hunt deer (cooperate) or hare (defect)• Only if both cooperate will they succeed in catching the deer and

receive the maximum pay-off.

i

j

Defect Coop

Defect3

30

3

Coop3

04

4

14

The Stag Hunt (2)• A pessimist would always hunt hare.• A cautious player who is uncertain about what the other

player will choose to do would also hunt hare.• For agents to cooperate in the Stag Hunt, there must be a

measure of trust between them.• This measure of trust is a kind of social contract between

the players; a contract that requires prior agreement.

15

A Variation of the Prisoner’s Dilemma• A spatial variant of the iterated prisoner's dilemma• A model for cooperation vs. conflict in groups• It shows spread of

– altruism – exploitation for personal gain

in an interacting population of agents learning from each other– Initially population consists of cooperators and a certain amount

of defectors– Advantage of defection is determined by value of b in the 'payoff

matrix' – A player determines its new strategy by selecting the most

favourable strategy from itself and its direct neighbours

16

Variation of the Prisoner’s Dilemma• Applet:

17

• An Introduction to Multi-Agent Systems, M. Wooldridge, John Wiley & Sons, 2002. Chapter 6.

Also check:• Various applets for the prisoner’s dilemma:

http://www.gametheory.net/applets/prisoners.html • Spatial variant of the iterated prisoner’s dilemma:

http://prisonersdilemma.groenefee.nl/ • Software for Axelrod’s Tournament:http://www.econ.iastate.edu/tesfatsi/demos/axelrod/axelrodt.htm

Recommended Reading