Regret Minimization against Strategic Buyersmohri/talks/NIPSStrategicWkshp2017.pdfrevenue of modern search engine and popular online sites. billions of transactions every day. key

Regret Minimization against Strategic Buyers

Mehryar Mohri Courant Institute & Google Research

Andrés Muñoz Medina Google Research

Motivation• Online advertisement:

✦ revenue of modern search engine and popular online sites.

✦ billions of transactions every day.

✦ key role of revenue optimization algorithms.

Motivation• Second-price auctions with reserve:

✦ widely adopted mechanism in Ad Exchanges.

✦ many transactions admit a single bidder posted-price auctions.

✦ study of posted-price auctions with strategic buyers.

Related Work• Revenue optimization in second-price auctions [Cui et al. 2011; He et al., 2013;

Cesa-Bianchi et al., 2013; MM and Muñoz 2014].

• Revenue optimization in generalized second-price auctions (GSP) [MM and Muñoz, 2015; Varian, 2007; Lucier et al., 2014; Sun et al, 2014; Rudolph et al. 2016; Charles et al., 2016;

Roughgarden and Wang, 2016].

• Dynamic pricing [Kanoria and Nazerzadeh 2014, Bikhchandani and McCardle 2012; den Boer, 2015;

Chen et al. 2015].

• Pricing with strategic and patient buyers [Feldman et al., 2016].

• Preference reconstruction [Blum et al. 2014].

• Learning optimal auctions [Huang et al., 2015; Morgenstern and Roughgarden, 2015].

This Talk• Scenarios:

✦ Fixed valuation.

✦ Random valuation.

Setup• Repeated posted-price auctions:

✦ good repeatedly offered for sale by a seller to a single buyer over rounds.

✦ buyer holds private valuation .

✦ seller offers price and buyer accepts, , or rejects, , at each round .

pt

T

v 2 [0, 1]

at = 1at = 0 t 2 [T ]

Setup• Seller: pricing algorithm .

✦ total revenue:

✦ regret:

• Buyer: discounting factor .

✦ surplus:

PTt=1 atpt.

A

� 2 (0, 1]

RegT (A) = vT �PT

t=1 atpt.

SurT (A) =PT

t=1 �t�1at(v � pt).

Strategic Setting• Seller announces his algorithm .

• Buyer acts strategically: he seeks to maximize his surplus .

• Seller seeks to minimize his strategic regret, that is regret against strategic buyer.

• Question: can we design algorithms for minimizing strategic regret?

A

SurT (A)

RegT (A)

[Amin et al., 2013]

Truthful Setting• Fast Search (FS) algorithm:

✦ keeps track of feasible interval , starting with and parameter .

✦ in each phase, offers prices until a price is rejected.

✦ if price is rejected, new phase with interval and parameter .

✦ until size of the interval less than .

[Kleinberg and Leighton, 2003]

[a, b]

[0, 1] ✏ = 12

a+ ✏, a+ 2✏, . . .

[a+ (k � 1)✏, a+ k✏]a+ k✏

✏2

1T

Truthful Setting• Fast Search (FS) algorithm:

✦ at most phases.

✦ regret in .

✦ lower bound: .

[Kleinberg and Leighton, 2003]

dlog2 log2 T )e+ 1

O(log log T )

⌦(log log T )

Example

v = 16$8?$4?

$2?!!!$1?

NoNoNoYES!

Monotone algorithms• Algorithm:

✦ offer price ( ) until it is accepted.

✦ offer accepted price thereafter.

✦ idea: slow enough decrease inconvenient for the buyer.

• Strategic regret in .

[Amin et al., 2013]

pt = �t � < 1

O

⇣qT

1��

⌘

Monotone algorithms• Theorem: the strategic regret of any monotone

decreasing convex algorithm is in .⌦(pT )

[MM and Muñoz, 2014]

Proof idea• Fix monotone function.

• Choose at random.

• Let , then

• Tradeoff optimized for .pt � pt+1 ⇠ 1pT

E[]E[v � p] � c.

= inf{t : pt < v}

v 2 [ 12 , 1]

Lower Bound• Theorem: for any pricing algorithm , the

following lower bound holds:

for some universal constant .

[Amin et al. 2013; Kleinberg and Leighton, 2003; MM and Muñoz, 2014]

A

RegT (A) � max

✓1

12(1� �), C log log T

◆

C

Idea• Lie buyer when rejecting or accepting

when .

• Can we dissuade the buyer from lying?

✦ buyer’s weakness: time (discounted surplus).

✦ penalization: if buyer rejects price, reoffer the price for another rounds.

✦ choice of subject to a trade-off.

v < pt

v > pt⌘

(r � 1)

r

Pricing Strategies• Any deterministic strategy can be represented

by a tree.1/2

1/4

1/8

3/4

3/8 5/8 7/8

Meta-Algorithm1/2

1/4

1/8

3/4

3/8 5/8 7/8 1/2

1/2

1/4

3/4

7/83/4

Truthful

Strategic

r rounds

PFS Guarantees• Theorem: let ; the following strategic

regret guarantees hold for Penalized Fast Search (PFS):

✦ for ;

✦ for .

RegT (PFS) = O(log log T ) � 2 (0, 12 ]

RegT (PFS) = O(log T log log T ) � 2 ( 12 , �0)

�0 2 ( 12 , 1)


Proof Idea

�t�1(v � pt)

Surplus of accepted path at least

Surplus of rejected path at most�t+r�1

1� �

(v � pt) �r

1� �

pt

Horizon-Indep. Regret• Extension of PFS via ‘exponentiating trick’ to

horizon-independent algorithm :

✦ length of th epoch verifies .

✦ for ;

✦ for .

gPFS

log2 log2 Ti = 2i�1i

� 2 (0, 12 ]

� 2 ( 12 , �0)

RegT (gPFS) = O(log log T )

RegT (gPFS) = O(log T log log T )

[Drutsa, 2017]

Further Improvement• PRRFES algorithm:

✦ truthful FES algorithm: modified FS; after rejection, reoffer last accepted price g times.

✦ same lie penalization as in PFS.

✦ continue to offer until rejection.

✦ strategic regret: for .

[Drutsa, 2017]

RegT (PRRFES) = O(log log T )� 2 (0, �0]

Random valuations

Setup• Repeated posted-price auctions:

✦ good repeatedly offered for sale by a seller to a single buyer over rounds.

✦ buyer receives valuation , .

✦ seller offers price and buyer accepts, , or rejects, , at each round .

pt

T

at = 1at = 0

[Amin et al., 2013]

t 2 [T ]

vt 2 [0, 1] vt ⇠ D

Setup• Seller: pricing algorithm .

✦ total revenue: .

✦ regret:

• Buyer: discounting factor .

✦ surplus:

A

� 2 (0, 1]

E⇥PT

t=1 atpt⇤

RegT (A) = maxp2P

pP(v > p)T � E TX

t=1

atpt

�.

SurT (A) = E TX

t=1

�t�1at(v � pt)

�.

Strategic buyers• Simple tree structure for fixed valuation.

• Seller offers price from distribution .

• Surplus of state :

• Solution found in time .

Pt

st = (Pt, Ht�1, vt, pt)

St(st) = maxat2{0,1}

�t�1

at(vt � pt)

+ E(vt+1,pt+1)

⇠D⇥ft(Pt,Ht�1)

⇥St+1(ft(Pt, Ht�1), Ht, vt+1, pt+1

�⇤.

⌦�T |P|�

-strategic buyers• Stop optimizing if all future surplus is at most .

• Behave truthfully otherwise,

• Optimize for rounds.

• Tractable MDP.

✏

✏

⇠log[ 1

✏(1��) ]log

�1�

�⇡

Bandit Formulation• Only observe reward of price offered.

• Minimize pseudo-regret

• Problem: rewards not i.i.d. (strategic buyer).

RegT (A) = maxp2P

pP(v > p)T � E TX

t=1

atpt

�.

Theorem: Let be a finite set of prices. Let be the number of time the buyer lies. Let and

. For any ,

where is the number of times price has been offered up to time .

Regret bound

RegT E[L] +X

p : �p>�

E[Tp(t)]�p + T �

LP

Tp(t) pt

p⇤ = argmaxp2P

pP(v > p)

�p = p⇤ P(v > p⇤)� pP(v > p) � > 0


R-UCB• Make UCB robust to lies.

• Use different upper confidence bounds

+Lp

Tp(t)+

s2 log t

Tp(t)bµp(t) =

1

Tp(t)

tX

i=1

aipi1pi=p

Regret analysis• Proposition: The regret of the R-UCB algorithm is

bounded by

where

RegT E[L] +X

p:�p>�

4Lp+32 log T

�p+ 2�p + T � +

TX

t=1

Pt(p, L),

Pt(p, L) := P⇣��

Lt(p)

Tp(t)

��+��Lt(p⇤)

T ⇤(t)

�� L⇣ p

Tp(t)+

p⇤

T ⇤(t)

⌘⌘.

Bound on Lies• An -strategic buyer lies at most .

• Regret of R-UCB in .

• Extension to continuous set of prices by discretization.

• Regret in .

O

⇣log T +

P1� �

⌘

O

⇣pT +

T1/4

1� �

⌘

l log(1/✏(1� �))

log(1/�)

m✏

Conclusion• Analysis of strategic regret.

✦ Fixed and random valuation scenarios.

✦ Simple algorithms extending truthful scenario.

• Many questions:

✦ Can we extend results to other types of buyers?

✦ What about if the buyer learns too?

✦ Extension to general auctions?

Other Related Questions• Can the buyer trust the algorithm announced?

✦ testing incentive-compatibility [Lahaie, Muñoz, Sivan, and

Vassilvitskii, 2017] (Andres’s talk).

• Extend analysis to the case where algorithmic details are not known.

Documents

Regret Minimization against Strategic Buyersmohri/talks/NIPSStrategicWkshp2017.pdfrevenue of modern search engine and popular online sites. billions of transactions every day. key