Concepts of Game Theory II
2
The prisioner’s reasoning…• Put yourself in the place of prisoner i (or j)…• Reason as follows:
– Suppose I cooperate…• If j cooperates, we both get a payoff of 3. • If j defects, then I will get a payoff of 0.
Best payoff I can be guaranteed to get if I cooperate is 0.– Suppose I defect…
• If j cooperates, I get a payoff of 5. • If j defects, then I will get a payoff of 2.
Best payoff I can be guaranteed to get if I defect is 2.• In summary:
– If I cooperate the worst case is that I will get a payoff of 0– If I defect the worst case is that I will get a payoff of 2– I’d prefer a guaranteed payoff of 2 to a payoff of 0!
i
j
3
Features of Prisoner’s Dilemma (1)• The individual rational action is defect
– This guarantees a pay-off of no worse than 2– Whereas cooperating guarantees a pay-off of at most 1.
• So, defection is the best response to all possible strategies: – Both agents defect and get a pay-off of 2
• But naïve intuition says this is not the best outcome: – They could both cooperate and each get a pay-off of 3!
4
Features of Prisoner’s Dilemma (2)• This apparent paradox is the fundamental problem of multi-
agent interactions.– It seems to imply that cooperation will not occur in societies of
self-interested agents.• A real world example: nuclear arms reduction• The prisoner’s dilemma is ubiquitous (very common!)• Can we recover cooperation?
5
Arguments for Recovering Cooperation• Some conclusions that have been drawn from this analysis:
– The game theory notion of rational action is wrong!– Somehow the dilemma is being formulated incorrectly.
• Arguments to recover cooperation:– We are not all Machiavellian!– The other prisoner is my twin!– People are not (always) rational!– The shadow of the future…
6
The Iterated Prisoner’s Dilemma• One answer: play the game more than once
– Let’s use an applet:
• If you know you will be meeting your opponent again– Then the incentive to defect appears to evaporate.
• Cooperation is the rational choice in the infinitely repeated prisoner’s dilemma
7
Backwards Induction• Suppose you both know that you will play the game exactly
n times• On round n, you have an incentive to defect to gain that
extra bit of pay-off.• This makes round n-1 the last “real” game, and so you have
an incentive to defect there too– And so on…
• When playing the prisoner’s dilemma with a – fixed– finite– pre-determined and – commonly known
number of rounds, defection is the best strategy.
8
Axelrod’s Tournament• Suppose you play the prisoner’s dilemma game against a
range of opponents.• What single strategy should you use to play against all
these opponents so that you maximise your overall pay-off?
• Axelrod (1984) investigated this problem with a tournament for computer programs playing the prisoner’s dilemma.
http://www-personal.umich.edu/~axe/
Rob
ert
A
xelr
od
9
Strategies• ALL-D
– Always defect — the hawk strategy.• TIT-FOR-TAT
– On round u=0, cooperate– On round u>0, copy the opponent’s round u-1 move
• TESTER– On round u=0, defect. – If the opponent retaliated, then play TIT-FOR-TAT– Otherwise intersperse cooperation and defection
• JOSS– As for TIT-FOR-TAT, except periodically defect
10
How to succeed in Axelrod’s TournamentAxelrod suggests the following:• Don’t be envious
– Don’t play as if it were a zero sum game– You don’t have to beat your opponent for you to do well
• Be nice (don’t be the first to defect)– Start by cooperating, and reciprocate cooperation
• Retaliate appropriately– Always punish defection immediately, – But use “measured” force — don’t overdo it
• Don’t hold grudges– Always reciprocate cooperation immediately
11
Who wins?• In the 1980s tournament, TIT-FOR-TAT won.• But, when paired with a mindless strategy like RANDOM,
TIT-FOR-TAT sinks to its opponent's level.• So, it can’t be seen as a “best” strategy.• The tournament was run again in 2004, and TIT-FOR-TAT
did not win.• What strategy won, and why?
12
Game of Chicken
• Difference to prisoner’s dilemma:– Mutual defection is the most feared outcome.
• Strategies (C,D ) and (D,C ) in Nash equilibrium.
i
j
Defect Coop
Defect1
12
4
Coop4
23
3
13
The Stag Hunt (1)
• You can hunt deer (cooperate) or hare (defect)• Only if both cooperate will they succeed in catching the deer and
receive the maximum pay-off.
i
j
Defect Coop
Defect3
30
3
Coop3
04
4
14
The Stag Hunt (2)• A pessimist would always hunt hare.• A cautious player who is uncertain about what the other
player will choose to do would also hunt hare.• For agents to cooperate in the Stag Hunt, there must be a
measure of trust between them.• This measure of trust is a kind of social contract between
the players; a contract that requires prior agreement.
15
A Variation of the Prisoner’s Dilemma• A spatial variant of the iterated prisoner's dilemma• A model for cooperation vs. conflict in groups• It shows spread of
– altruism – exploitation for personal gain
in an interacting population of agents learning from each other– Initially population consists of cooperators and a certain amount
of defectors– Advantage of defection is determined by value of b in the 'payoff
matrix' – A player determines its new strategy by selecting the most
favourable strategy from itself and its direct neighbours
16
Variation of the Prisoner’s Dilemma• Applet:
17
• An Introduction to Multi-Agent Systems, M. Wooldridge, John Wiley & Sons, 2002. Chapter 6.
Also check:• Various applets for the prisoner’s dilemma:
http://www.gametheory.net/applets/prisoners.html • Spatial variant of the iterated prisoner’s dilemma:
http://prisonersdilemma.groenefee.nl/ • Software for Axelrod’s Tournament:http://www.econ.iastate.edu/tesfatsi/demos/axelrod/axelrodt.htm
Recommended Reading
Recommended