Upload
verdi
View
47
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Best Reply Mechanisms. Justin Thaler and Victor Shnayder. What are best-reply dynamics?. Start with an arbitrary strategy profile In each step let some player switch his strategy to be a best reply to the current strategies of the others. What are best-reply dynamics?. Definition: - PowerPoint PPT Presentation
Citation preview
Best Reply Mechanisms
Justin Thaler and Victor Shnayder
What are best-reply dynamics?
•Start with an arbitrary strategy profile
•In each step let some player switch his strategy to be a best reply to the current strategies of the others.
What are best-reply dynamics?
Definition: A repeated-reply mechanism for a private info game G:• Extensive form game with perfect recall (same players)• At most M steps. In each step:• A single player announces an element of Ai
• Players play in round-robin order• Stop when all players “pass” in n consecutive steps. • Enforce action profile of the most recently announced actions• If M steps go by without stopping, penalize the players.
What are best-reply dynamics?
•Need a penalty to ensure non-convergence is not in best interest of any player.
•Realistic modeling assumption for BGP, TCP, etc.
•Best-reply dynamics is the strategy profile of a repeated-reply mechanism in which each player i updates to i’s best-reply to the other players’ strategies each time it is i’s turn.
Why best reply dynamics?
•If convergence occurs, we have a highly justifiable Nash Equilibrium
•Computationally simple
•Players only need private information
•Feasible in distributed, asynchronous settings
•Prescribed by existing protocols (Ex: BGP)
Why best reply dynamics?
•In light of Theorems 1 and 2 (which we’ll see soon):
•Often gives a non-VCG way of creating incentive compatible mechanisms (?). And sometimes without $$$.
•Often get collusion-proofness, Pareto-efficiency
Outline
•When do best reply dynamics work?
•Universal max-solvability (UMS)
•Thm: UMS implies convergence to unique NE, collusion-proofness
•Example applications (correlated markets, BGP, etc)
•Connections to strategy-proofness
•Discussion
Universal max-dominance
•A subset T of S is universally max-dominated if:
•Very strong condition!
•Existence of max-dominated set is strictly stronger than existence of dominated strategy.
•Exists si, si’ s.t. ui(si, s-i) < ui(si’, s-i) for all s-i
Universal max-solveability (UMS)
•A game G is universally max-solvable if we can iteratively remove universally max-dominated strategy sets and get to a single strategy for each player.
•Stronger condition than solvable by iterated removal of strictly dominated strategies (IRSDS)
Example 1
5, 5 0, 0
10, 0 4, 4
Solvable by IRSDS, but not UMS. Neither player has a universally max-dominated set. Note unique NE is not PE, and best-
reply dynamics are not incentive compatible for the row player.
Example 2
0, 1 0, 1
1, 1 1, 0
UMS
Example 2
0, 1 0, 1
1, 1 1, 0
UMS
Example 2
0, 1 0, 1
1, 1 1, 0
UMS
Example 3 (UMS)
1, 9 2, 9 2,9
3, 1 3, 2 3, 2
3, 1 4, 3 5, 4
L M R
A
C
B
Example 3 (UMS)
1, 9 2, 9 2,9
3, 1 3, 2 3, 2
3, 1 4, 3 5, 4
L M R
A
C
B
Example 3 (UMS)
1, 9 2, 9 2,9
3, 1 3, 2 3, 2
3, 1 4, 3 5, 4
L M R
A
C
B
Example 3 (UMS)
1, 9 2, 9 2,9
3, 1 3, 2 3, 2
3, 1 4, 3 5, 4
L M R
A
C
B
Example 3 (UMS)
1, 9 2, 9 2,9
3, 1 3, 2 3, 2
3, 1 4, 3 5, 4
L M R
A
C
B
TheoremsTheorem 1: G is UMS ⇒ G has unique, pure NE, and it is
collusion-proof.
Corollary: Collusion-proof NE ⇒ NE is Pareto optimal
Theorems
Note that solvable by IRSDS suffices for unique, pure NE. UMS is needed for collusion-proofness and PE.
Proof of theorem 1:•By contradiction: G is UMS, so fix an elimination sequence of dominated strategy-sets. •Let s* be the final strategy profile.•If s* is not collusion proof NE, some set of players T can deviate and be better off.•Let s be new strategies where players in T change strategy from s*•Let si be first strategy eliminated. Then it was max-dominated, so si* is strictly better, so i can’t be better off.
Example 1
5, 5 0, 0
10, 0 4, 4
Solvable by IRSDS, but not UMS. Neither player has a universally max-dominated set. Note unique NE is not PE, and best-
reply dynamics are not incentive compatible for the row player.
TheoremsTheorem 2: If G is UMS with private
information, then best reply dynamics are incentive-compatible in ex-post NE, and
converge to the unique NE of the induced full-information game.
Theorems
Proof: Similar to Theorem 1. The main idea is that a strategy eliminated in the t‘th step of the UMS elimination process can never be used after the nt’th step of the best-reply mechanism.
Correlated two-sided markets
•Agents: buyers and sellers
•Game: weighted bipartite graph -- buyers on one side, sellers on the other
•Buyers have preference order over sellers (higher edge weight = higher preference)
•Sellers prefer buyers connected by heavier edges
Correlated two-sided markets are UMS
•Let e be maximum weight edge. Choosing it universally max-dominates all other strategies of both endpoints.
•Remove the two endpoints of e and all incident edges, repeat.
•Therefore, best reply dynamics converge to ex-post NE.
Extended Example: BGP
Internet routing: BGP• Receive update messages from neighbours announcing routes to d. • Choose a single neighbor, whose route you prefer most, to send traffic through. • Announce your new route to all your neighbors
d
1 2
12d1d
21d2d
Internet routing: BGP•BGP is asynchronous, distributed
•Prescribes best-reply dynamics
•But does BGP converge?
•And is BGP “incentive compatible”? Do ASes have an incentive to deviate from the protocol?
Does BGP Converge?
•We can break this into two questions:
•Does a stable solution even exist in the static game?
•If so, will BGP find such a solution?
•But we only need one answer.
Does a Stable Solution Exist?
d
1 2
3
13d
1d
21d2d
32d3d
No stable solution exists!
It is actually NP-complete to
determine existence in
general networks
Does BGP Converge When A Stable Solution Exists?
d
1 2
12d1d
21d2d•Notice that multiple NE exist.
•And asynchronous best-reply dynamics do not necessarily converge.
•So must not be UMS.
So What Do We Do?• Approach #1: Use mechanism design to
achieve IC convergence, but solution must be distributed.
• Approach #2: Identify conditions (on network topology and/or AS preferences) under which BGP converges and is IC.
• Both approaches are canonical problems in Distributed Algorithmic Mechanism Design.
Approach #2 for Convergence
• Griffin et al. (1999): If BGP fails to converge, then there exists a Dispute Wheel.
•Each ui would rather route clockwise through ui+1 than Qi
Image Source: Levin et al. “Internet Routing and Games,” 2008.
Approach #2 for Convergence
• Gao and Rexford (2001): Identified reasonable conditions based on economic structure of the Internet that guarantee No Dispute Wheel and hence convergence. (No bounds on convergence rate given).
•But limited progress made until recently on conditions for guaranteeing that BGP is IC.
Approach #2 for Incentive Compatibility • Theorem 3: Assuming non-convergence after n3 rounds is a penalty, and No Dispute Wheel holds, then routing games are UMS.
•Corollary: Under the above conditions, best-reply strategies are IC in collusion-proof ex-post NE.
•Corollary: Under the Gao-Rexford conditions, BGP converges in O(n3) time and is IC.
Theorem 3
• Proof sketch: The case of finding the first universally max-dominated action set is general.
•Find a node a1 with at least 2 actions. Let R be a1’s most preferred existing route. One of two cases must occur:
Theorem 31. Every node a2 on R prefers the suffix
of R leading from a2 to d. In this case, if u is the closest node to d on R with at least two actions, then (u, d) universally max-dominates all other actions of u, and we’re done.
2. Some node a2 on R prefers some other path over the suffix of R leading from a2 to d. In this case, we repeat the analysis at a2. Eventually we either form a dispute wheel or find ourselves in Case 1.
What’s left in Routing?
•Complete characterization of BGP convergence (No Dispute Wheel sufficient, not necessary).
•Conditions for convergence to globally optimal solution. Can it even be efficiently found?
•Do mechanism design and/or $$$ have a role to play?
•Changes in network topology?
Other applications•Congestion control
• Criticism: Best-reply dynamics are only somewhat descriptive of how TCP works in practice.
•Cost sharing games
•Matching games (stable-roommate, intern assignment)
•Auctions (unit demand bidders, GSP)
• Relies a lot on VCG results
• Main contribution is proof of convergence! (opposite of BGP)
Relationship to DSIC
OutcomeθEx-postEx-post
NENE
Play s(θ)
Given UMS game, best-replying is a strategy that gives ex-post NE.
Get a direct-revelation, dominant strategy IC mechanism.Good: New way to create DSIC mechanisms.Bad: Impossibility results limit the class of problems amenable to this approach (at
least without money or limits on preferences).
Discussion
•What is the main contribution?
1. Sufficient conditions for IC convergence of best-reply dynamics. General enough to encompass many applications, esp. BGP.
2. Bounds on time to convergence.
3. New framework for developing IC mechanisms?
Next Steps
1.Necessary conditions for best-reply dynamics to converge? To be IC (under what definition?)?
2.Better-reply dynamics? Other types of dynamics aka algorithms? What types of dynamics are reasonable or “natural”?
Economists and Complexity
See recent blog post by Noam Nisan: Does complexity of equilibria matter?
Kamal Jain: “If your laptop can’t find it then neither can the market“.
Jeff Ely: “Solving the n-body problem is beyond the capabilities of the world’s smartest mathematicians. How do those rocks-for-brains planets manage to do pull it off?“