Best Reply Mechanisms

Best Reply Mechanisms

Justin Thaler and Victor Shnayder

What are best-reply dynamics?

•Start with an arbitrary strategy profile

•In each step let some player switch his strategy to be a best reply to the current strategies of the others.


Definition: A repeated-reply mechanism for a private info game G:• Extensive form game with perfect recall (same players)• At most M steps. In each step:• A single player announces an element of Ai

• Players play in round-robin order• Stop when all players “pass” in n consecutive steps. • Enforce action profile of the most recently announced actions• If M steps go by without stopping, penalize the players.


•Need a penalty to ensure non-convergence is not in best interest of any player.

•Realistic modeling assumption for BGP, TCP, etc.

•Best-reply dynamics is the strategy profile of a repeated-reply mechanism in which each player i updates to i’s best-reply to the other players’ strategies each time it is i’s turn.

Why best reply dynamics?

•If convergence occurs, we have a highly justifiable Nash Equilibrium

•Computationally simple

•Players only need private information

•Feasible in distributed, asynchronous settings

•Prescribed by existing protocols (Ex: BGP)

Why best reply dynamics?

•In light of Theorems 1 and 2 (which we’ll see soon):

•Often gives a non-VCG way of creating incentive compatible mechanisms (?). And sometimes without $$$.

•Often get collusion-proofness, Pareto-efficiency

Outline

•When do best reply dynamics work?

•Universal max-solvability (UMS)

•Thm: UMS implies convergence to unique NE, collusion-proofness

•Example applications (correlated markets, BGP, etc)

•Connections to strategy-proofness

•Discussion

Universal max-dominance

•A subset T of S is universally max-dominated if:

•Very strong condition!

•Existence of max-dominated set is strictly stronger than existence of dominated strategy.

•Exists si, si’ s.t. ui(si, s-i) < ui(si’, s-i) for all s-i

Universal max-solveability (UMS)

•A game G is universally max-solvable if we can iteratively remove universally max-dominated strategy sets and get to a single strategy for each player.

•Stronger condition than solvable by iterated removal of strictly dominated strategies (IRSDS)

Example 1

5, 5 0, 0

10, 0 4, 4

Solvable by IRSDS, but not UMS. Neither player has a universally max-dominated set. Note unique NE is not PE, and best-

reply dynamics are not incentive compatible for the row player.

Example 2

0, 1 0, 1

1, 1 1, 0

UMS

Example 2

0, 1 0, 1

1, 1 1, 0

UMS

Example 2

0, 1 0, 1

1, 1 1, 0

UMS

Example 3 (UMS)

1, 9 2, 9 2,9

3, 1 3, 2 3, 2

3, 1 4, 3 5, 4

L M R

A

C

B

Example 3 (UMS)

1, 9 2, 9 2,9

3, 1 3, 2 3, 2

3, 1 4, 3 5, 4

L M R

A

C

B

Example 3 (UMS)

1, 9 2, 9 2,9

3, 1 3, 2 3, 2

3, 1 4, 3 5, 4

L M R

A

C

B

Example 3 (UMS)

1, 9 2, 9 2,9

3, 1 3, 2 3, 2

3, 1 4, 3 5, 4

L M R

A

C

B

Example 3 (UMS)

1, 9 2, 9 2,9

3, 1 3, 2 3, 2

3, 1 4, 3 5, 4

L M R

A

C

B

TheoremsTheorem 1: G is UMS ⇒ G has unique, pure NE, and it is

collusion-proof.

Corollary: Collusion-proof NE ⇒ NE is Pareto optimal

Theorems

Note that solvable by IRSDS suffices for unique, pure NE. UMS is needed for collusion-proofness and PE.

Proof of theorem 1:•By contradiction: G is UMS, so fix an elimination sequence of dominated strategy-sets. •Let s* be the final strategy profile.•If s* is not collusion proof NE, some set of players T can deviate and be better off.•Let s be new strategies where players in T change strategy from s*•Let si be first strategy eliminated. Then it was max-dominated, so si* is strictly better, so i can’t be better off.

Example 1

5, 5 0, 0

10, 0 4, 4

Solvable by IRSDS, but not UMS. Neither player has a universally max-dominated set. Note unique NE is not PE, and best-

reply dynamics are not incentive compatible for the row player.

TheoremsTheorem 2: If G is UMS with private

information, then best reply dynamics are incentive-compatible in ex-post NE, and

converge to the unique NE of the induced full-information game.

Theorems

Proof: Similar to Theorem 1. The main idea is that a strategy eliminated in the t‘th step of the UMS elimination process can never be used after the nt’th step of the best-reply mechanism.

Correlated two-sided markets

•Agents: buyers and sellers

•Game: weighted bipartite graph -- buyers on one side, sellers on the other

•Buyers have preference order over sellers (higher edge weight = higher preference)

•Sellers prefer buyers connected by heavier edges

Correlated two-sided markets are UMS

•Let e be maximum weight edge. Choosing it universally max-dominates all other strategies of both endpoints.

•Remove the two endpoints of e and all incident edges, repeat.

•Therefore, best reply dynamics converge to ex-post NE.

Extended Example: BGP

Internet routing: BGP• Receive update messages from neighbours announcing routes to d. • Choose a single neighbor, whose route you prefer most, to send traffic through. • Announce your new route to all your neighbors

d

1 2

12d1d

21d2d

Internet routing: BGP•BGP is asynchronous, distributed

•Prescribes best-reply dynamics

•But does BGP converge?

•And is BGP “incentive compatible”? Do ASes have an incentive to deviate from the protocol?

Does BGP Converge?

•We can break this into two questions:

•Does a stable solution even exist in the static game?

•If so, will BGP find such a solution?

•But we only need one answer.

Does a Stable Solution Exist?

d

1 2

3

13d

1d

21d2d

32d3d

No stable solution exists!

It is actually NP-complete to

determine existence in

general networks

Does BGP Converge When A Stable Solution Exists?

d

1 2

12d1d

21d2d•Notice that multiple NE exist.

•And asynchronous best-reply dynamics do not necessarily converge.

•So must not be UMS.

So What Do We Do?• Approach #1: Use mechanism design to

achieve IC convergence, but solution must be distributed.

• Approach #2: Identify conditions (on network topology and/or AS preferences) under which BGP converges and is IC.

• Both approaches are canonical problems in Distributed Algorithmic Mechanism Design.

Approach #2 for Convergence

• Griffin et al. (1999): If BGP fails to converge, then there exists a Dispute Wheel.

•Each ui would rather route clockwise through ui+1 than Qi

Image Source: Levin et al. “Internet Routing and Games,” 2008.

Approach #2 for Convergence

• Gao and Rexford (2001): Identified reasonable conditions based on economic structure of the Internet that guarantee No Dispute Wheel and hence convergence. (No bounds on convergence rate given).

•But limited progress made until recently on conditions for guaranteeing that BGP is IC.

Approach #2 for Incentive Compatibility • Theorem 3: Assuming non-convergence after n3 rounds is a penalty, and No Dispute Wheel holds, then routing games are UMS.

•Corollary: Under the above conditions, best-reply strategies are IC in collusion-proof ex-post NE.

•Corollary: Under the Gao-Rexford conditions, BGP converges in O(n3) time and is IC.

Theorem 3

• Proof sketch: The case of finding the first universally max-dominated action set is general.

•Find a node a1 with at least 2 actions. Let R be a1’s most preferred existing route. One of two cases must occur:

Theorem 31. Every node a2 on R prefers the suffix

of R leading from a2 to d. In this case, if u is the closest node to d on R with at least two actions, then (u, d) universally max-dominates all other actions of u, and we’re done.

2. Some node a2 on R prefers some other path over the suffix of R leading from a2 to d. In this case, we repeat the analysis at a2. Eventually we either form a dispute wheel or find ourselves in Case 1.

What’s left in Routing?

•Complete characterization of BGP convergence (No Dispute Wheel sufficient, not necessary).

•Conditions for convergence to globally optimal solution. Can it even be efficiently found?

•Do mechanism design and/or $$$ have a role to play?

•Changes in network topology?

Other applications•Congestion control

• Criticism: Best-reply dynamics are only somewhat descriptive of how TCP works in practice.

•Cost sharing games

•Matching games (stable-roommate, intern assignment)

•Auctions (unit demand bidders, GSP)

• Relies a lot on VCG results

• Main contribution is proof of convergence! (opposite of BGP)

Relationship to DSIC

OutcomeθEx-postEx-post

NENE

Play s(θ)

Given UMS game, best-replying is a strategy that gives ex-post NE.

Get a direct-revelation, dominant strategy IC mechanism.Good: New way to create DSIC mechanisms.Bad: Impossibility results limit the class of problems amenable to this approach (at

least without money or limits on preferences).

Discussion

•What is the main contribution?

1. Sufficient conditions for IC convergence of best-reply dynamics. General enough to encompass many applications, esp. BGP.

2. Bounds on time to convergence.

3. New framework for developing IC mechanisms?

Next Steps

1.Necessary conditions for best-reply dynamics to converge? To be IC (under what definition?)?

2.Better-reply dynamics? Other types of dynamics aka algorithms? What types of dynamics are reasonable or “natural”?

Economists and Complexity

See recent blog post by Noam Nisan: Does complexity of equilibria matter?

Kamal Jain: “If your laptop can’t find it then neither can the market“.

Jeff Ely: “Solving the n-body problem is beyond the capabilities of the world’s smartest mathematicians. How do those rocks-for-brains planets manage to do pull it off?“

Documents

Best Reply Mechanisms