Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
Nau: Game Theory 1
Introduction to Game Theory
2. Normal-Form Games Dana Nau
University of Maryland
Nau: Game Theory 2
How to reason about games? In single-agent decision theory, look at an optimal strategy
Maximize the agent’s expected payoff in its environment
With multiple agents, the best strategy depends on others’ choices Deal with this by identifying certain subsets of outcomes called solution
concepts
Some solution concepts: Pareto optimality Nash equilibrium Maximin and Minimax Dominant strategies Correlated equilibrium
Nau: Game Theory 3
Pareto Optimality Strategy profile S Pareto dominates a strategy profile Sʹ′ if
no agent gets a worse payoff with S than with Sʹ′, i.e., ui(S) ≥ ui(Sʹ′) for all i ,
at least one agent gets a better payoff with S than with Sʹ′, i.e., ui(S) > ui(Sʹ′) for at least one i
Strategy profile s is Pareto optimal, or strictly Pareto efficient, if there’s no strategy s' that Pareto dominates s Every game has at least one Pareto optimal profile Always at least one Pareto optimal profile in which the strategies are
pure
Nau: Game Theory 4
5, 0 1, 1
3, 3 0, 5
Examples The Prisoner’s Dilemma
(C,C) is Pareto optimal No profile gives both players a higher payoff
(D,C) is Pareto optimal No profile gives player 1 a higher payoff
(D,C) is Pareto optimal - same argument
(D,D) is Pareto dominated by (C,C)
Which Side of the Road (Left,Left) and (Right,Right) are Pareto optimal
In common-payoff games, all Pareto optimal strategy profiles have the same payoffs
If (Left,Left) had payoffs (2,2), then (Right,Right) wouldn’t be Pareto optimal
Nau: Game Theory 5
Best Response Suppose agent i knows how the others are going to play
Then agent i has the single-agent problem of choosing a utility-maximizing action
Some notation: S−i = (s1, …, si−1, si+1, …, sn),
• i.e., S–i is strategy profile S without agent i’s strategy If si' is any strategy for agent i, then
• (si' , S−i ) = (s1, …, si−1, si', si+1, …, sn) Hence (si , S−i ) = S
For agent i, a best response to S−i is a mixed strategy si* such that ui (si*, S−i ) ≥ ui (si , S−i ) for every strategy si available to agent i
A best response si* is unique if ui (si*, S−i ) > ui (si , S−i ) for every si ≠ si*
Nau: Game Theory 6
Best Response Unless there’s a unique best response to S–i that’s a pure strategy, the
number of best responses is infinite Suppose a best response to S–i is a mixed strategy s* whose support
includes ≥ 2 actions • Then every action a in support(s*) must have the same expected
utility ui(a,S–i) › If one of them had a higher expected utility than the others,
then it would be a better response than s* • Thus any mixture of the actions in support(s*) is a best response
Similarly, if there are ≥ 2 pure strategies that are best responses to S–i then any mixture of them is also a best response
Nau: Game Theory 7
A strategy profile s = (s1, …, sn) is a Nash equilibrium if for every i,
si is a best response to S−i , i.e., no agent can do better by unilaterally changing his/her strategy
A Nash equilibrium S = (s1, . . . , sn) is strict if for every i, si is the only best response to S−i ,i.e., any agent
who unilaterally changes strategy will do worse
Otherwise the Nash equilibrium is weak
In the Prisoner’s Dilemma, (D,D) is a strict Nash equilibrium If either agent unilaterally switches to a different
strategy, his/her expected utility goes below 1
In Which Side of the Road, (Left,Left) and (Right,Right) both are strict Nash equilibria
If either agent unilaterally switches to a different strategy, his/her expected utility goes below 1
Nash Equilibrium
5, 0 1, 1
3, 3 0, 5
Nau: Game Theory 8
Properties of Nash Equilibria Theorem (Nash, 1951): Every game with a finite number of agents and
action profiles has at least one Nash equilibrium
Weak Nash equilibria are less stable than strict Nash equilibria In the weak case, at least one agent has > 1 best responses,
and only one of them is in the Nash equilibrium
Pure-strategy Nash equilibria can be either strict or weak
Mixed-strategy Nash equilibria are always weak Reason: if there are ≥ 2 pure strategies that are best responses to S–i
then any mixture of them is also a best response Thus in a strict Nash equilibrium, all of the strategies are pure
e.g., Prisoner’s Dilemma, Which Side of the Road
Nau: Game Theory 9
Weak and Strong Nash Equilibria
Weak Nash equilibria are less stable than strict Nash equilibria In the weak case, at least one agent has > 1 best responses,
and only one of them is in the Nash equilibrium
Pure-strategy Nash equilibria can be either strict (e.g., the Prisoner’s Dilemma) or weak (e.g., which side of the road)
From the corollary on the previous slide, we know that mixed-strategy Nash equilibria are always weak Thus in a strict Nash equilibrium, all of the strategies are pure
Nau: Game Theory 10
The Battle of the Sexes has two pure-strategy Nash equilibria
Generally it’s tricky to compute mixed-strategy equilibria But easy if we identify the support of the equilibrium strategies
Assume both agents randomize, and that the husband’s mixed strategy sh is
sh(Opera) = p; sh(Football) = 1 – p
Expected utilities of the wife’s actions: uw(Football, sh) = 0p + 1(1 − p)
uw(Opera, sh) = 2p If the wife mixes between her two actions, they must have the same expected utility
If one of the actions had a better expected utility, she’d do better with a pure strategy that always used that action
Thus 0p + 1(1 – p) = 2p, so p = 1/3
So the husband’s mixed strategy is sh(Opera) = 1/3; sh(Football) = 2/3
Husband Wife
Opera Football
Opera 2, 1 0, 0
Football 0, 0 1, 2
Finding Nash Equilibria
Nau: Game Theory 11
Finding Nash Equilibria
A similar calculation shows that the wife’s mixed strategy sw is sw(Opera) = 2/3, sw(Football) = 1/3
Like all mixed-strategy equilibria, this is a weak Nash equilibrium Each agent has infinitely many
other best-response strategies In this equilibrium,
Probability 5/9 that payoff is 0 for both agents Probability 2/9 that wife gets 2 and husband gets 1 Probability 2/9 that husband gets 2 and wife gets 1 Expected utility for each agent is 2/3
Pareto-dominated by both of the pure-strategy equilibria In them, one agent gets 1 and the other gets 2
Husband Wife
Opera Football
Opera 2, 1 0, 0
Football 0, 0 1, 2
Nau: Game Theory 12
Finding Nash Equilibria Matching Pennies Easy to see that in this game, no pure strategy
could be part of a Nash equilibrium in this game For each combination of pure strategies,
one of the agents can do better by changing his/her strategy • e.g., for (Heads,Heads),
agent 2 can do better by switching to Tails Thus there isn’t a strict Nash equilibrium But again there’s a mixed-strategy equilibrium:
(s,s), where s(Heads) = s(Tails) = ½
–1, 1 1,–1
1,–1 –1, 1
Heads Tails
Heads
Tails
Nau: Game Theory 13
A Real-World Example Penalty kicks in soccer
A kicker and a goalie in a penalty kick Kicker can kick left or right Goalie can jump to left or right Kicker scores iff he/she kicks to one
side and goalie jumps to the other Analogy to Matching Pennies
• If you use a pure strategy and the other agent uses his/her best response, the other agent will win
• If you kick or jump in either direction with equal probability, the opponent can’t exploit your strategy
Nau: Game Theory 14
Another Interpretation of Mixed Strategies Another interpretation of mixed strategies is that
Each agent’s strategy is deterministic But each agent has uncertainty regarding the other’s strategy
Agent i’s mixed strategy is everyone else’s assessment of how likely i is to play each pure strategy
Example: In a series of soccer penalty kicks, the kicker could kick left or right in
a deterministic pattern that the goalie thinks is random
Nau: Game Theory 15
Another Real-World Example Road Networks Suppose that:
1,000 drivers wish to travel from S (start) to D (destination) Two possible paths:
• S→A→D and S→B→D The road from S to A is long: t = 50 minutes
• But it’s also very wide: t = 50 minutes, no matter how many drivers
Same for road from B to D Road from A to E is shorter but is narrow
• Time = (number of cars)/25 Nash equilibrium:
• 500 cars go through A, 500 cars through B • Everyone’s time is 50 + 500/25 = 70 minutes • If a single driver changes to the other route
› There now are 501 cars on that route, so his/her time goes up
S D
t = cars/25
t = cars/25
t = 50
t = 50
B
A
Nau: Game Theory 16
Braess’s Paradox Suppose we add a new road from B to A
The road is so wide and so short that it takes 0 minutes to traverse it Nash equilibrium:
• All 1000 cars go S→B→A→D • Time for S→B is 1000/25 = 40 minutes • Total time is 80 minutes
To see that this is an equilibrium: • If driver goes S→A→D, his/her cost is 50 + 40 = 90 minutes • If driver goes S→B→D, his/her cost is 40 + 50 = 90 minutes • Both are dominated by S→B→A→D
To see that it’s the only Nash equilibrium: • For every traffic pattern, S→B→A→D dominates S→A→D and S→B→D
• Choose any traffic pattern, and compute the times a driver would get on all three routes
Carelessly adding capacity can actually be hurtful!
S D
t = cars/25
t = cars/25
t = 50
t = 50
B
A
t = 0
Nau: Game Theory 17
Braess’s Paradox in practice From an article about Seoul, South Korea:
“The idea was sown in 1999,” Hwang says. “We had experienced a strange thing. We had three tunnels in the city and one needed to be shut down. Bizarrely, we found that that car volumes dropped. I thought this was odd. We discovered it was a case of ‘Braess paradox’, which says that by taking away space in an urban area you can actually increase the flow of traffic, and, by implication, by adding extra capacity to a road network you can reduce overall performance.”
John Vidal, “Heart and soul of the city”, The Guardian, Nov. 1, 2006http://www.guardian.co.uk/environment/2006/nov/01/society.travelsenvironmentalimpact
Nau: Game Theory 18
Maximin and Minimax Strategies Let s be a strategy for agent i
Then s’s worst-case payoff is the payoff if the other agents play a combination of strategies that is worst for agent i
Agent i’s maximin value (or security level) is the best worst-case payoff of any combination of the other agents’ strategies, i.e.,
A maximin strategy for agent i is any strategy whose worst-case payoff is i’s maximin value, i.e.,
The maximin strategy not necessarily unique, and it often is mixed
€
maxsi
minS− i
ui si,S− i( )
€
argmaxsi
minS− i
ui si,S− i( )
Nau: Game Theory 19
Example Battle of the Sexes
Opera’s worst-case payoff is 0 Football’s worst-case payoff is 0
Consider the Nash equilibrium strategies we discussed earlier: The wife’s strategy: sw(Opera) = 2/3, sw(Football) = 1/3
• Recall that this strategy’s expected payoff is 2/3, regardless of what the husband does
The husband’s strategy: sh(Opera) = 1/3, sh(Football) = 2/3 • Recall that this strategy’s expected payoff is 2/3, regardless of what
the wife does These are also the wife’s and husband’s maximin strategies
Husband Wife
Opera Football
Opera 2, 1 0, 0
Football 0, 0 1, 2
Nau: Game Theory 20
Comments If agent i plays a maximin strategy and the other agents play arbitrarily,
i still receives an expected payoff of at least his/her maximin value So the maximin strategy is sensible for a conservative agent
Maximize his/her expected utility without any assumptions about the others
E.g., don’t assume— • that they’re rational • that they draw their action choices from known distributions
Nau: Game Theory 21
Minimax Strategies The minimax strategy and minimax value are duals of their maximin
counterparts
In a 2-player game, a minimax strategy for agent 1 against agent 2 is a strategy s1* that minimizes agent 2’s maximum payoff
i.e., suppose player 1 first chooses a strategy s1 and then player 2 can choose a best response. Then s1* is a strategy that minimizes the payoff of 2’s best response
Useful if you want to punish an opponent without regard for your own payoff (e.g., see repeated games)
Agent 2’s minimax value is agent 2’s maximum payoff against s1*
€
s1* = argmin
s1maxs2
u2 s1, s2( )
€
maxs2
u2 s1*, s2( ) = min
s1maxs2
u2 s1, s2( )
Nau: Game Theory 22
Minimax Strategies in n-Agent Games In n-agent games (n > 2), agent i usually can’t minimize agent j’s payoff by
acting unilaterally But suppose all the agents “gang up” on agent j
Let S−j* be a mixed-strategy profile that minimizes j’s maximum payoff, i.e.,
Then a minimax strategy for agent i ≠ j is i’s component of S-j* Agent j’s minimax value is j’s maximum payoff against S–j*
In two-player games, an agent’s minimax value = his/her maximin value For n-player games, an agent’s maximin value ≤ his/her minimax value
€
S− j* = argmin
S− jmaxs j
u j s j ,S− j( )
€
maxs j
uj s j ,S− j*( ) = min
S− jmaxs j
uj s j ,S− j( )
Nau: Game Theory 23
Minimax Theorem (von Neumann, 1928) Theorem. Let G be any two-person finite zero-sum game. Then there are a
number VG called the value of G, and strategies s1* and s2* for agents 1 and 2, such that If agent 2 uses s2*, agent 1’s expected utility is ≤ VG, i.e.,
• maxs U(s,s2*) = VG If agent 1 uses s1*, agent 1’s expected utility is ≥ VG, i.e.,
• mint U(s1*,t) = VG How this relates to the last few slides:
agent 1’s maximin value = agent 1’s minimax value = VG agent 2’s maximin value = agent 2’s minimax value = –VG si* = any minimax strategy for agent i
= any maximin strategy for agent i (s1*, s2*) = a Nash equilibrium
Nau: Game Theory 24
Agent i’s mixed strategy is to display heads with some probability pi
The graph shows u1(p1, p2) for every possible p1, p2
u2(p1, p2) = –u1(p1, p2)
The Nash equilibrium is the saddle point, p1 = p2 = 0.5
The value of the game is 0
If agent i uses a strategy with pi ≠ 0.5, The other agent has a
“best response” strategy that will lower i’s expected utility below 0
Example: Matching Pennies
–1, 1 1,–1
1,–1 –1, 1
Heads Tails
Heads
Tails
p1 p2
u1(p1, p2)
Nau: Game Theory 25
Two-Finger Morra There are several versions of this game
Here’s the one I’ll use:
Each agent holds up 1 or 2 fingers If the total number of fingers is odd
Agent 1 gets that many points If the total number of fingers is even
Agent 2 gets that many points
Agent 1 has no dominant strategy
Agent 2 plays 1 => agent 1’s best response is 2 Agent 2 plays 2 => agent 1’s best response is 1
Similarly, agent 2 has no dominant strategy Thus there’s no pure-strategy Nash equilibrium
Look for a mixed-strategy equilibrium
3, –3 –4, 4
–2, 2 3, –3
2
1
1 2
Nau: Game Theory 26
3, –3 –4, 4
–2, 2 3, –3
2
1
1 2
Suppose agent 2 plays 1 with probability p2
If agent 1 mixes between 1 and 2, they must have the same expected utility Agent 1 plays 1 => expected utility is –2p2 + 3(1−p2) = 3 – 5p2
Agent 1 plays 2 => expected utility is 3p2 – 4(1−p2) = 7p2 – 4 Thus 3 – 5p2 = 7p2 – 4, so p2 = 7/12
Agent 1’s expected utility is 3–5(7/12) = 1/12
Suppose agent 1 plays 1 with probability p1
If agent 2 mixes between 1 and 2, they must have the same expected utility Agent 2 plays 1 => agent 2’s expected utility is 2p1 – 3(1−p1) = 5p1 – 3
Agent 2 plays 2 => agent 2’s expected utility is –3p1 + 4(1−p1) = 4 – 7p1
Thus 5p1 – 3 = 4 – 7p1 , so p1 = 7/12 Agent 2’s expected utility is 5(7/12) – 3 = –1/12
Two-Finger Morra
Nau: Game Theory 27
Dominant Strategies Let si and si' be two strategies for agent i
Intuitively, si dominates si' if s gives agent i a greater payoff than si’ for every strategy profile s−i of the remaining agents
Mathematically, there are three gradations of dominance: 1. si strictly dominates sʹ′i if ui (si, s−i) > ui (sʹ′i, s−i) for all s−i ∈ S−i 2. si weakly dominates sʹ′i if
ui (si, s−i) ≥ ui (sʹ′i, s−i) for all s−i ∈ S−i
and ui (si, s−i ) > ui (sʹ′i, s−i ) for at least one s−i ∈ S−i
3. si very weakly dominates sʹ′i if ui (si, s−i ) ≥ ui (sʹ′i, s−i) for all s−i ∈ S−i
Nau: Game Theory 28
Dominant Strategy Equilibria A strategy is strictly (resp., weakly, very weakly) dominant for an agent
if it strictly (weakly, very weakly) dominates any other strategy for that agent It’s not hard to show that a strictly dominant strategy must be pure
Obviously, a strategy profile (s1, . . . , sn) where every si is dominant for agent i (strictly, weakly, or very weakly) is a Nash equilibrium
Such a strategy profile forms an equilibrium in strictly (weakly, very weakly) dominant strategies
An equilibrium in strictly dominant strategies is necessarily the unique Nash equilibrium
Nau: Game Theory 29
5, 0 1, 1
3, 3 0, 5
Examples Example: the Prisoner’s Dilemma
D is strongly dominant for agent 1 it strongly dominates C if agent 2 uses C it strongly dominates C if agent 2 uses D
Similarly, D is strongly dominant for agent 2
So (D,D) is a Nash equilibrium in dominant strategies Ironically, of the pure strategy profiles,
(D,D) is the only one that’s not Pareto optimal
C D
Nau: Game Theory 30
Example: Matching Pennies Matching Pennies
Heads isn’t strongly dominant for agent 1, because u1(Heads,Tails) < u1(Tails,Tails)
Tails isn’t strongly dominant for agent 1, because u1(Heads,Heads) > u1(Tails,Heads)
Agent 1 doesn’t have a dominant strategy No Nash equilibrium in dominant strategies
Which Side of the Road No Nash equilibrium in dominant strategies Same kind of argument as above
–1, 1 1,–1
1,–1 –1, 1
Heads Tails
Heads
Tails
Nau: Game Theory 31
Elimination of Dominated Strategies A strategy si is strictly (weakly, very weakly) dominated for an agent i if
some other strategy sʹ′i strictly (weakly, very weakly) dominates si
A strictly dominated strategy can’t be a best response to any move
So we can eliminate it (remove it from the payoff mtrix)
Once a pure strategy is eliminated, another strategy may become dominated
This elimination can be repeated
Nau: Game Theory 32
Elimination of Dominated Strategies A pure strategy s may be dominated by a mixed strategy t
without being dominated by any of the strategies in t’s support
Example: the three games shown below
In the first game, agent 2’s strategy R is dominated by L (and also by C) • Eliminate it, giving the second game
In the second game, M is dominated by neither U nor D • But it’s dominated by the mixed strategy P(U) = 0.5, P(D) = 0.5
› (It wasn’t dominated before we removed R) • Removing it gives the third game (maximally reduced)
Nau: Game Theory 33
Elimination of Dominated Strategies This gives a solution concept:
The set of all strategy profiles that assign 0 probability to playing any action that would be removed through iterated removal of strictly dominated strategies
A much weaker solution concept than Nash equilibrium The set of strategy profiles includes all the Nash equilibria But it includes many other mixed strategies
• In some games, it’s equal to S (the set of all mixed strategies) Use this techniques to simplify finding Nash equilibria
In the current example, go from a 3 × 3 to a 2 × 2 game Sometimes (e.g., Prisoner’s dilemma), end with a single cell
• In this case, the single cell must be a Nash equilibrium
Nau: Game Theory 34
Elimination of Dominated Strategies Each flavor of domination has advantages and disadvantages
Strict domination yields a reduced game Get the same game, independent of the elimination order
Weak domination can yield a smaller reduced game But which reduced game may depend on the elimination order
Very weak domination can yield even smaller reduced games But again, which one might depend on elimination order And it doesn’t impose a strict order on strategies:
• Can have two strategies that each very weakly dominates the other • For this reason, very weak domination usually is considered the
least important
Nau: Game Theory 35
Correlated Equilibrium Motivating example: the mixed-strategy equilibrium
for the Battle of the Sexes P(both agents choose Opera) = (2/3)(1/3) = 2/9
Wife’s payoff is 2, husband’s payoff is 1 P(both agents choose Football) = (1/3)(2/3) = 2/9
Wife’s payoff is 1, husband’s payoff is 2
P(the agents choose different activities) = (2/3)(2/3) + (1/3)(1/3) = 5/9 In this case, payoff is 0 for both agents
Each agent’s expected utility is 2(2/9) + 1(2/9) + 0(5/9) = 2/3 If the agents could coordinate their choices, they could
eliminate the case where they choose different activities
Each agent’s payoff would always be 1 or 2 Higher expected utility for both agents
This leads to the notion of a correlated equilibrium Generalization of a Nash equilibrium
Husband Wife
Opera Football
Opera 2, 1 0, 0
Football 0, 0 1, 2
Nau: Game Theory 36
General Setting Start out with an n-agent game G
Let v1, . . . , vn be random variables called signals, one for each agent For each i, let Di be the domain (the set of possible values) of vi
Let π be a joint distribution over v1, . . . , vn π(d1, …, dn) = P(v1=d1, …, vn=dn)
Nature uses π to choose values d1, …, dn for v1, …, vi
Nature tells each agent i the value of vi An agent can condition his/her action on the value of vi
An agent’s strategy is a mapping σi : Di → Ai
• σi is deterministic (i.e., a pure strategy)
• Mixed strategies wouldn’t give any greater generality
A strategy profile is (σ1, …, σn) The games we’ve been considering before now are a degenerate case in which the
random variables v1, . . . , vn are independent
Nau: Game Theory 37
Correlated Equilibrium Given an n-player game G
Let v, π, and Σ be as defined on the previous slide Given a strategy profile (σ1, . . . , σn), the expected utility for agent i is
(v, π, Σ) is a correlated equilibrium if for every agent i and every mapping σiʹ′ from Di to Ai , ui(σ1, …, σi–1, σi, σi+1, …, σn) ≥ ui(σ1, …, σi–1, σiʹ′, σi+1, …, σn)
€
ui(σ1,…,σn ) ≥ π d1,…,dn( )ui σ1 d1( ),…,σn dn( )( )d1 ,…,dn
∑
Nau: Game Theory 38
Correlated Equilibrium Theorem. For every Nash equilibrium (s1, …, sn), there’s a corresponding correlated
equilibrium (σ1, . . . , σn) “Corresponding” means they produce the same distribution on outcomes
Sketch of Proof Let each vi = (v1, …, vn) be independently distributed a random variables, where
• each vi has domain Ai and probability distribution si(ai ), for ai in Ai
Hence the joint distribution is π(a1, …, an) = Πi si(ai ) Let each σi be the identity function
When the agents play the strategy profile (σ1, …, σn), the distribution over outcomes is identical to that under (s1, …, sn)
The vi’s are uncorrelated and no agent i can benefit by deviating from σi, so (σ1, . . . , σn) is a correlated equilibrium
But not every correlated equilibrium is equivalent to a Nash equilibrium
e.g., the correlated equilibrium for the Battle of the Sexes
Nau: Game Theory 39
Convex Combinations Correlated equilibria can be combined to form new correlated equilibria
First we need a definition
Let x1, x2, …, xn be a finite number of points (vectors, scalars, …) A convex combination of x1, x2, …, xn is a linear combination of them
in which all coefficients are non-negative and sum up to 1, i.e., a1x1 + a2x2 + … + anxn
where every ai ≥ 0 and a1 + a2 + … + an = 1 Any convex combination of two points x1 and x2 lies on the straight line
segment between x1 and x2 Any convex combination of x1, x2, …, xn is within the convex hull of
x1, x2, …, xn
Nau: Game Theory 40
Convex Combinations Theorem. Any convex combination of correlated equilibrium payoffs can
be realized as the payoff profile of some correlated equilibrium
To understand this, imagine a public random device that randomly selects which correlated equilibrium to play Each agent’s overall expected payoff is the weighted sum of the
payoffs from the correlated equilibria that were combined No agent has an incentive to deviate regardless of the probabilities
governing the first random device
Nau: Game Theory 41
Summary I’ve discussed several solution concepts, and ways of finding them:
Pareto optimality • Prisoner’s Dilemma, Which Side of the Road
best responses and Nash equilibria • Battle of the Sexes, Matching Pennies
real-world examples • soccer penalty kicks, road networks (Braess’s Paradox)
maximin and minimax strategies, and the Minimax Theorem • Matching Pennies, Two-Finger Morra
dominant strategies • Prisoner’s Dilemma, Which Side of the Road, Matching Pennies
correlated equilibrium • Battle of the Sexes