Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Regret Minimization against Strategic Buyers
Mehryar Mohri Courant Institute & Google Research
Andrés Muñoz Medina Google Research
Motivation• Online advertisement:
✦ revenue of modern search engine and popular online sites.
✦ billions of transactions every day.
✦ key role of revenue optimization algorithms.
Motivation• Second-price auctions with reserve:
✦ widely adopted mechanism in Ad Exchanges.
✦ many transactions admit a single bidder posted-price auctions.
✦ study of posted-price auctions with strategic buyers.
Related Work• Revenue optimization in second-price auctions [Cui et al. 2011; He et al., 2013;
Cesa-Bianchi et al., 2013; MM and Muñoz 2014].
• Revenue optimization in generalized second-price auctions (GSP) [MM and Muñoz, 2015; Varian, 2007; Lucier et al., 2014; Sun et al, 2014; Rudolph et al. 2016; Charles et al., 2016;
Roughgarden and Wang, 2016].
• Dynamic pricing [Kanoria and Nazerzadeh 2014, Bikhchandani and McCardle 2012; den Boer, 2015;
Chen et al. 2015].
• Pricing with strategic and patient buyers [Feldman et al., 2016].
• Preference reconstruction [Blum et al. 2014].
• Learning optimal auctions [Huang et al., 2015; Morgenstern and Roughgarden, 2015].
This Talk• Scenarios:
✦ Fixed valuation.
✦ Random valuation.
Setup• Repeated posted-price auctions:
✦ good repeatedly offered for sale by a seller to a single buyer over rounds.
✦ buyer holds private valuation .
✦ seller offers price and buyer accepts, , or rejects, , at each round .
pt
T
v 2 [0, 1]
at = 1at = 0 t 2 [T ]
Setup• Seller: pricing algorithm .
✦ total revenue:
✦ regret:
• Buyer: discounting factor .
✦ surplus:
PTt=1 atpt.
A
� 2 (0, 1]
RegT (A) = vT �PT
t=1 atpt.
SurT (A) =PT
t=1 �t�1at(v � pt).
Strategic Setting• Seller announces his algorithm .
• Buyer acts strategically: he seeks to maximize his surplus .
• Seller seeks to minimize his strategic regret, that is regret against strategic buyer.
• Question: can we design algorithms for minimizing strategic regret?
A
SurT (A)
RegT (A)
[Amin et al., 2013]
Truthful Setting• Fast Search (FS) algorithm:
✦ keeps track of feasible interval , starting with and parameter .
✦ in each phase, offers prices until a price is rejected.
✦ if price is rejected, new phase with interval and parameter .
✦ until size of the interval less than .
[Kleinberg and Leighton, 2003]
[a, b]
[0, 1] ✏ = 12
a+ ✏, a+ 2✏, . . .
[a+ (k � 1)✏, a+ k✏]a+ k✏
✏2
1T
Truthful Setting• Fast Search (FS) algorithm:
✦ at most phases.
✦ regret in .
✦ lower bound: .
[Kleinberg and Leighton, 2003]
dlog2 log2 T )e+ 1
O(log log T )
⌦(log log T )
Example
v = 16$8?$4?
$2?!!!$1?
NoNoNoYES!
Monotone algorithms• Algorithm:
✦ offer price ( ) until it is accepted.
✦ offer accepted price thereafter.
✦ idea: slow enough decrease inconvenient for the buyer.
• Strategic regret in .
[Amin et al., 2013]
pt = �t � < 1
O
⇣qT
1��
⌘
Monotone algorithms• Theorem: the strategic regret of any monotone
decreasing convex algorithm is in .⌦(pT )
[MM and Muñoz, 2014]
Proof idea• Fix monotone function.
• Choose at random.
• Let , then
• Tradeoff optimized for .pt � pt+1 ⇠ 1pT
E[]E[v � p] � c.
= inf{t : pt < v}
v 2 [ 12 , 1]
Lower Bound• Theorem: for any pricing algorithm , the
following lower bound holds:
for some universal constant .
[Amin et al. 2013; Kleinberg and Leighton, 2003; MM and Muñoz, 2014]
A
RegT (A) � max
✓1
12(1� �), C log log T
◆
C
Idea• Lie buyer when rejecting or accepting
when .
• Can we dissuade the buyer from lying?
✦ buyer’s weakness: time (discounted surplus).
✦ penalization: if buyer rejects price, reoffer the price for another rounds.
✦ choice of subject to a trade-off.
v < pt
v > pt⌘
(r � 1)
r
Pricing Strategies• Any deterministic strategy can be represented
by a tree.1/2
1/4
1/8
3/4
3/8 5/8 7/8
Meta-Algorithm1/2
1/4
1/8
3/4
3/8 5/8 7/8 1/2
1/2
1/4
3/4
7/83/4
Truthful
Strategic
r rounds
PFS Guarantees• Theorem: let ; the following strategic
regret guarantees hold for Penalized Fast Search (PFS):
✦ for ;
✦ for .
RegT (PFS) = O(log log T ) � 2 (0, 12 ]
RegT (PFS) = O(log T log log T ) � 2 ( 12 , �0)
�0 2 ( 12 , 1)
[MM and Muñoz, 2014]
Proof Idea
�t�1(v � pt)
Surplus of accepted path at least
Surplus of rejected path at most�t+r�1
1� �
(v � pt) �r
1� �
pt
Horizon-Indep. Regret• Extension of PFS via ‘exponentiating trick’ to
horizon-independent algorithm :
✦ length of th epoch verifies .
✦ for ;
✦ for .
gPFS
log2 log2 Ti = 2i�1i
� 2 (0, 12 ]
� 2 ( 12 , �0)
RegT (gPFS) = O(log log T )
RegT (gPFS) = O(log T log log T )
[Drutsa, 2017]
Further Improvement• PRRFES algorithm:
✦ truthful FES algorithm: modified FS; after rejection, reoffer last accepted price g times.
✦ same lie penalization as in PFS.
✦ continue to offer until rejection.
✦ strategic regret: for .
[Drutsa, 2017]
RegT (PRRFES) = O(log log T )� 2 (0, �0]
Random valuations
Setup• Repeated posted-price auctions:
✦ good repeatedly offered for sale by a seller to a single buyer over rounds.
✦ buyer receives valuation , .
✦ seller offers price and buyer accepts, , or rejects, , at each round .
pt
T
at = 1at = 0
[Amin et al., 2013]
t 2 [T ]
vt 2 [0, 1] vt ⇠ D
Setup• Seller: pricing algorithm .
✦ total revenue: .
✦ regret:
• Buyer: discounting factor .
✦ surplus:
A
� 2 (0, 1]
E⇥PT
t=1 atpt⇤
RegT (A) = maxp2P
pP(v > p)T � E TX
t=1
atpt
�.
SurT (A) = E TX
t=1
�t�1at(v � pt)
�.
Strategic buyers• Simple tree structure for fixed valuation.
• Seller offers price from distribution .
• Surplus of state :
• Solution found in time .
Pt
st = (Pt, Ht�1, vt, pt)
St(st) = maxat2{0,1}
�t�1
at(vt � pt)
+ E(vt+1,pt+1)
⇠D⇥ft(Pt,Ht�1)
⇥St+1(ft(Pt, Ht�1), Ht, vt+1, pt+1
�⇤.
⌦�T |P|�
-strategic buyers• Stop optimizing if all future surplus is at most .
• Behave truthfully otherwise,
• Optimize for rounds.
• Tractable MDP.
✏
✏
⇠log[ 1
✏(1��) ]log
�1�
�⇡
Bandit Formulation• Only observe reward of price offered.
• Minimize pseudo-regret
• Problem: rewards not i.i.d. (strategic buyer).
RegT (A) = maxp2P
pP(v > p)T � E TX
t=1
atpt
�.
Theorem: Let be a finite set of prices. Let be the number of time the buyer lies. Let and
. For any ,
where is the number of times price has been offered up to time .
Regret bound
RegT E[L] +X
p : �p>�
E[Tp(t)]�p + T �
LP
Tp(t) pt
p⇤ = argmaxp2P
pP(v > p)
�p = p⇤ P(v > p⇤)� pP(v > p) � > 0
[MM and Muñoz, 2015]
R-UCB• Make UCB robust to lies.
• Use different upper confidence bounds
+Lp
Tp(t)+
s2 log t
Tp(t)bµp(t) =
1
Tp(t)
tX
i=1
aipi1pi=p
Regret analysis• Proposition: The regret of the R-UCB algorithm is
bounded by
where
RegT E[L] +X
p:�p>�
4Lp+32 log T
�p+ 2�p + T � +
TX
t=1
Pt(p, L),
Pt(p, L) := P⇣���
Lt(p)
Tp(t)
���+���Lt(p⇤)
T ⇤(t)
��� � L⇣ p
Tp(t)+
p⇤
T ⇤(t)
⌘⌘.
Bound on Lies• An -strategic buyer lies at most .
• Regret of R-UCB in .
• Extension to continuous set of prices by discretization.
• Regret in .
O
⇣log T +
P1� �
⌘
O
⇣pT +
T1/4
1� �
⌘
l log(1/✏(1� �))
log(1/�)
m✏
Conclusion• Analysis of strategic regret.
✦ Fixed and random valuation scenarios.
✦ Simple algorithms extending truthful scenario.
• Many questions:
✦ Can we extend results to other types of buyers?
✦ What about if the buyer learns too?
✦ Extension to general auctions?
Other Related Questions• Can the buyer trust the algorithm announced?
✦ testing incentive-compatibility [Lahaie, Muñoz, Sivan, and
Vassilvitskii, 2017] (Andres’s talk).
• Extend analysis to the case where algorithmic details are not known.