50
S.Rady, Yonsei University 2012 Lecture 1: Poisson Bandits – 1 Strategic Experimentation Lecture 1: Poisson Bandits

Strategic Experimentation

  • Upload
    others

  • View
    17

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Strategic Experimentation

S. Rady, Yonsei University 2012 Lecture 1: Poisson Bandits – 1

Strategic Experimentation

Lecture 1: Poisson Bandits

Page 2: Strategic Experimentation

Introduction

Introduction

• Optimal Learning

• Strategic Issues

• Bandits

The Model

The Single-AgentProblem

The CooperativeProblem

The Strategic Problem

Conclusion

S. Rady, Yonsei University 2012 Lecture 1: Poisson Bandits – 2

Page 3: Strategic Experimentation

A Model of Rational Learning

Introduction

• Optimal Learning

• Strategic Issues

• Bandits

The Model

The Single-AgentProblem

The CooperativeProblem

The Strategic Problem

Conclusion

S. Rady, Yonsei University 2012 Lecture 1: Poisson Bandits – 3

People are uncertain about their environment

They learn from experience

Bayesian learning

• Agents possess a prior belief about some unknownparameter(s) of the environment

• They know the distribution of possible outcomesconditional on the parameter(s)

• They use Bayes’ rule to update their belief on the basis ofwhat they observe

Page 4: Strategic Experimentation

A Single-Agent Example

Introduction

• Optimal Learning

• Strategic Issues

• Bandits

The Model

The Single-AgentProblem

The CooperativeProblem

The Strategic Problem

Conclusion

S. Rady, Yonsei University 2012 Lecture 1: Poisson Bandits – 4

Optimal learning typically involves experimentation– Sacrifice current rewards for better information

which will generate better rewards in the future– Exploitation versus exploration

Examples:– Seller of a new good– Farmer choosing between a traditional crop

and a gene-modified one– Researcher pursuing a new research agenda

Page 5: Strategic Experimentation

A Single-Agent Example

Introduction

• Optimal Learning

• Strategic Issues

• Bandits

The Model

The Single-AgentProblem

The CooperativeProblem

The Strategic Problem

Conclusion

S. Rady, Yonsei University 2012 Lecture 1: Poisson Bandits – 5

Multi-period monopolist learning about the demand curve

No physical links between periods

• Rothschild (1974)• Aghion, Bolton, Harris & Julien (1991)• Keller & Rady (1999)

Optimal learning may be incomplete

Identical sellers may charge different prices in the long runbecause of different sequences of observations

Page 6: Strategic Experimentation

Strategic Experimentation

Introduction

• Optimal Learning

• Strategic Issues

• Bandits

The Model

The Single-AgentProblem

The CooperativeProblem

The Strategic Problem

Conclusion

S. Rady, Yonsei University 2012 Lecture 1: Poisson Bandits – 6

Agents learn from the experiments of others,as well as from their own

Examples:– Oligopolists in a new market– Farmers with neighboring fields– Researchers pursuing a common research agenda

With publicly observable actions and outcomes,there is an informational externality

Page 7: Strategic Experimentation

Strategic Experimentation

Introduction

• Optimal Learning

• Strategic Issues

• Bandits

The Model

The Single-AgentProblem

The CooperativeProblem

The Strategic Problem

Conclusion

S. Rady, Yonsei University 2012 Lecture 1: Poisson Bandits – 7

• Aghion, Espinoza & Julien (1993), Harrington (1995),Keller & Rady (2003):

– Differentiated-goods duopoly

• Bergemann & Valimaki (1996, 1997, 2000):– Sellers and buyers learning about a new product

• Bolton & Harris (1999, 2000), Keller, Rady & Cripps (2005):– Pure informational externality– Two-armed bandits in continuous time

Page 8: Strategic Experimentation

Multi-Armed Bandit Problem

Introduction

• Optimal Learning

• Strategic Issues

• Bandits

The Model

The Single-AgentProblem

The CooperativeProblem

The Strategic Problem

Conclusion

S. Rady, Yonsei University 2012 Lecture 1: Poisson Bandits – 8

Stylized model of the exploitation-exploration trade-off– Agent decides repeatedly which slot machine to play– Each machine produces a random stream of payoffs– Uncertainty about the distribution of payoff streams

Sequential design of experiments

• Robbins (1952)• Bellman (1956), Bradt, Johnson & Karlin (1956)• Gittins and Jones (1974)• Karatzas (1984), Presman (1990)

• Bergemann & Valimaki (2006)

Page 9: Strategic Experimentation

Strategic Experimentation with Bandits

Introduction

• Optimal Learning

• Strategic Issues

• Bandits

The Model

The Single-AgentProblem

The CooperativeProblem

The Strategic Problem

Conclusion

S. Rady, Yonsei University 2012 Lecture 1: Poisson Bandits – 9

Bolton & Harris (1999):– Two-armed bandits in continuous time– One arm is “safe”– The other arm is either “good” or “bad”

(Brownian motion with high or low drift)– Type of risky arm identical across players– Publicly observable actions and outcomes– Pure informational externality– Unique symmetric Markov perfect equilibrium– Free-riding versus encouragement

Page 10: Strategic Experimentation

Strategic Experimentation with Bandits

Introduction

• Optimal Learning

• Strategic Issues

• Bandits

The Model

The Single-AgentProblem

The CooperativeProblem

The Strategic Problem

Conclusion

S. Rady, Yonsei University 2012 Lecture 1: Poisson Bandits – 10

Keller, Rady & Cripps (2005), Keller & Rady (2010):– Poisson process with high or low arrival rate– Unique symmetric Markov perfect equilibrium– Multiplicity of asymmetric Markov perfect equilibria– No equilibria in cutoff strategies– Inefficiency– Encouragement effect

Page 11: Strategic Experimentation

The Model

Introduction

The Model

• Setup

• Beliefs

• Strategies

The Single-AgentProblem

The CooperativeProblem

The Strategic Problem

Conclusion

S. Rady, Yonsei University 2012 Lecture 1: Poisson Bandits – 11

Page 12: Strategic Experimentation

Poisson Bandits

Introduction

The Model

• Setup

• Beliefs

• Strategies

The Single-AgentProblem

The CooperativeProblem

The Strategic Problem

Conclusion

S. Rady, Yonsei University 2012 Lecture 1: Poisson Bandits – 12

Two-armed bandit in continuous time

One arm is safe (S)generates a known flow payoff s

Other arm is risky (R)yields i.i.d. lump-sums of known mean h which arriveaccording to a Poisson process

if good (θ = 1), Poisson intensity is λ1 (≡ flow payoff λ1h)

if bad (θ = 0), Poisson intensity is λ0 (≡ flow payoff λ0h)

s > 0 and λ1 > λ0 ≥ 0 known to players

True value of θ initially unknown to players

Assumption: λ1h > s > λ0h

Page 13: Strategic Experimentation

Exponential Bandits

Introduction

The Model

• Setup

• Beliefs

• Strategies

The Single-AgentProblem

The CooperativeProblem

The Strategic Problem

Conclusion

S. Rady, Yonsei University 2012 Lecture 1: Poisson Bandits – 13

Keller, Rady & Cripps (2005):

• λ0 = 0

• First lump-sum resolves all uncertainty about θ

• Arrives at an exponentially distributed random time(→ “exponential bandit”)

Much simpler than the case λ0 > 0 in Keller & Rady (2010):

• No encouragement effect

• Backward induction from single-agent cut-off

Page 14: Strategic Experimentation

Actions and Payoffs

Introduction

The Model

• Setup

• Beliefs

• Strategies

The Single-AgentProblem

The CooperativeProblem

The Strategic Problem

Conclusion

S. Rady, Yonsei University 2012 Lecture 1: Poisson Bandits – 14

One unit of a perfectly divisible resource per unit of time

If fraction kt ∈ [0, 1] is allocated to R on [t, t + dt[

S yields (1− kt)s dt

R pays a lump-sum with probability ktλθ dt

Expected payoff increment conditional on θ:

[(1− kt)s+ ktλθh] dt

Page 15: Strategic Experimentation

Payoffs and Beliefs

Introduction

The Model

• Setup

• Beliefs

• Strategies

The Single-AgentProblem

The CooperativeProblem

The Strategic Problem

Conclusion

S. Rady, Yonsei University 2012 Lecture 1: Poisson Bandits – 15

pt = posterior probability that θ equals 1

Et[λθ] = ptλ1 + (1− pt)λ0 = λ(pt)

Expected payoff increment conditional on observationsup to time t:

[(1− kt)s+ ktλ(pt)h] dt

Total payoff in per-period units:

E[∫ ∞

0r e−r t [(1− kt)s+ ktλ(pt)h] dt

]

Page 16: Strategic Experimentation

Common Posterior Beliefs

Introduction

The Model

• Setup

• Beliefs

• Strategies

The Single-AgentProblem

The CooperativeProblem

The Strategic Problem

Conclusion

S. Rady, Yonsei University 2012 Lecture 1: Poisson Bandits – 16

Each player has a replica two-armed bandit

– same θ

– i.i.d. arrival times

Common prior

Observable actions and outcomes

Hence common posterior

Page 17: Strategic Experimentation

Applying Bayes’ Rule

Introduction

The Model

• Setup

• Beliefs

• Strategies

The Single-AgentProblem

The CooperativeProblem

The Strategic Problem

Conclusion

S. Rady, Yonsei University 2012 Lecture 1: Poisson Bandits – 17

Let player n = 1, . . . , N allocate kn,t to R on [t, t + dt[

Define

Kt =N∑

n=1

kn,t

Conditional on θ, the probability of none of the playersreceiving a lump-sum in [t, t + dt[ is

N∏

n=1

e−kn,tλθ dt =N∏

n=1

(1− kn,tλθ dt) = 1−Ktλθ dt

Page 18: Strategic Experimentation

No News is Bad News

Introduction

The Model

• Setup

• Beliefs

• Strategies

The Single-AgentProblem

The CooperativeProblem

The Strategic Problem

Conclusion

S. Rady, Yonsei University 2012 Lecture 1: Poisson Bandits – 18

Given pt at time t and no lump-sum in [t, t + dt[:

pt + dpt =pt (1−Ktλ1 dt)

(1− pt)(1−Ktλ0 dt) + pt (1−Ktλ1 dt)

ordpt = −Kt∆λ pt(1− pt) dt

with ∆λ = λ1 − λ0

When there is a lump-sum on any of the risky arms, theposterior belief jumps up from pt− to

pt = j(pt−) =λ1pt−λ(pt−)

Page 19: Strategic Experimentation

Stationary Markov Strategies

Introduction

The Model

• Setup

• Beliefs

• Strategies

The Single-AgentProblem

The CooperativeProblem

The Strategic Problem

Conclusion

S. Rady, Yonsei University 2012 Lecture 1: Poisson Bandits – 19

With pt− as the natural state variable:

kt = k(pt−)

where k : [0, 1] → [0, 1] is left-continuous and piecewiseLipschitz-continuous

Call the strategy simple if k(p) ∈ {0, 1} for all p

Page 20: Strategic Experimentation

Cut-off Strategies

Introduction

The Model

• Setup

• Beliefs

• Strategies

The Single-AgentProblem

The CooperativeProblem

The Strategic Problem

Conclusion

S. Rady, Yonsei University 2012 Lecture 1: Poisson Bandits – 20

For some p ∈ [0, 1] :

If p > p, then k(p) = 1 (play R exclusively)If p ≤ p, then k(p) = 0 (play S exclusively)

Example:

Myopic agent ignores information content of own actions(as if r = ∞)

Payoff = max {s, λ(p)h}

Cut-off:

pm =s− λ0h

λ1h− λ0h

Page 21: Strategic Experimentation

The Single-Agent Problem

Introduction

The Model

The Single-AgentProblem

• Principle of Optimality

• Bellman Equation

• Solution

The CooperativeProblem

The Strategic Problem

Conclusion

S. Rady, Yonsei University 2012 Lecture 1: Poisson Bandits – 21

Page 22: Strategic Experimentation

Principle of Optimality

Introduction

The Model

The Single-AgentProblem

• Principle of Optimality

• Bellman Equation

• Solution

The CooperativeProblem

The Strategic Problem

Conclusion

S. Rady, Yonsei University 2012 Lecture 1: Poisson Bandits – 22

u(p) = maxk∈[0,1]

{

r [(1− k)s+ kλ(p)h] dt

+ e−r dtE [u(p) | p, k ]}

With probability kλ(p) dt:

Lump-sum,

u(p) = u(j(p))

With probability 1− kλ(p) dt:

No lump-sum,

u(p) = u(p)− k∆λ p(1− p)u′(p) dt

Insert and simplify to get Bellman equation!

Page 23: Strategic Experimentation

Bellman Equation

Introduction

The Model

The Single-AgentProblem

• Principle of Optimality

• Bellman Equation

• Solution

The CooperativeProblem

The Strategic Problem

Conclusion

S. Rady, Yonsei University 2012 Lecture 1: Poisson Bandits – 23

u(p) = s+ maxk∈[0,1]

k {b(p, u)− c(p)}

with the expected short-term opportunity cost

c(p) = s− λ(p)h

and the expected long-term learning benefit

b(p, u) = [λ(p) (u(j(p))− u(p))−∆λ p(1− p)u′(p)]/r

(left-hand derivative!)

=⇒ k = 1 or k = 0 is optimal

Page 24: Strategic Experimentation

Single-Agent Optimum

Introduction

The Model

The Single-AgentProblem

• Principle of Optimality

• Bellman Equation

• Solution

The CooperativeProblem

The Strategic Problem

Conclusion

S. Rady, Yonsei University 2012 Lecture 1: Poisson Bandits – 24

When k = 1 is optimal, u satisfies a differential-differenceequation (DDE) that admits an explicit solution

Optimal cut-off p∗1 < pm determined by

• value matching: u(p∗1) = s

• smooth pasting: u′(p∗1) = 0

0 1pm

u

s

λ1h

����

p∗1

. ........................... .........................................................

...............................

................................

.................................

..................................

.....................................

Page 25: Strategic Experimentation

The Cooperative Problem

Introduction

The Model

The Single-AgentProblem

The CooperativeProblem

• Efficient Strategies

The Strategic Problem

Conclusion

S. Rady, Yonsei University 2012 Lecture 1: Poisson Bandits – 25

Page 26: Strategic Experimentation

Efficient Experimentation

Introduction

The Model

The Single-AgentProblem

The CooperativeProblem

• Efficient Strategies

The Strategic Problem

Conclusion

S. Rady, Yonsei University 2012 Lecture 1: Poisson Bandits – 26

Bellman equation for N players jointly maximising theiraverage payoff:

u(p) = s+ maxK∈[0,N ]

K {b(p, u)− c(p)/N}

=⇒ K = N or K = 0 is optimal

=⇒ Optimal cut-off p∗N < p∗1, decreasing in N

0 1p∗N p∗1

K

0

N

Page 27: Strategic Experimentation

The Strategic Problem

Introduction

The Model

The Single-AgentProblem

The CooperativeProblem

The Strategic Problem

• Best Responses

• General Results

• Symmetric MPE

• Asymmetric MPE

• Simple MPE

• More Asymmetry

Conclusion

S. Rady, Yonsei University 2012 Lecture 1: Poisson Bandits – 27

Page 28: Strategic Experimentation

Characterising Best Responses

Introduction

The Model

The Single-AgentProblem

The CooperativeProblem

The Strategic Problem

• Best Responses

• General Results

• Symmetric MPE

• Asymmetric MPE

• Simple MPE

• More Asymmetry

Conclusion

S. Rady, Yonsei University 2012 Lecture 1: Poisson Bandits – 28

Bellman equation for player n, given K¬n = K − kn:

un(p) = s+K¬n b(p, un) + maxkn∈[0,1]

kn {b(p, un)− c(p)}

Indifference diagonal:

DK¬n= {(p, u) ∈ [0, 1]× IR+: u = s+K¬n c(p)}

0 1pm

u

s

λ1h

����

DK¬n

R best responseto K¬n

S best responseto K¬n

Page 29: Strategic Experimentation

General Results for Markov Perfect Equilibria

Introduction

The Model

The Single-AgentProblem

The CooperativeProblem

The Strategic Problem

• Best Responses

• General Results

• Symmetric MPE

• Asymmetric MPE

• Simple MPE

• More Asymmetry

Conclusion

S. Rady, Yonsei University 2012 Lecture 1: Poisson Bandits – 29

(1) In any MPE, each player obtains at least the single-agentoptimum payoff

(2) All MPE are inefficient (free-riding)

(3) There is no MPE where all players use cut-off strategies

(4) In any MPE, at least one player experiments at somebeliefs below p∗1 (encouragement) if and only if λ0 > 0

Page 30: Strategic Experimentation

No MPE in Cut-Off Strategies

Introduction

The Model

The Single-AgentProblem

The CooperativeProblem

The Strategic Problem

• Best Responses

• General Results

• Symmetric MPE

• Asymmetric MPE

• Simple MPE

• More Asymmetry

Conclusion

S. Rady, Yonsei University 2012 Lecture 1: Poisson Bandits – 30

0 1pm

u

s

λ1h

��������

D1

R dominant

S best response

to K¬n = 1

p1 p2

u2

........................

.....................

.....................

....................

....................................... ..........................

.....................................................

............................

u1. ................ ................. ..................

.....................................

...................

Player 1 must change action at p2 as well as at p1

Page 31: Strategic Experimentation

Encouragement Effect for λ0 > 0

Introduction

The Model

The Single-AgentProblem

The CooperativeProblem

The Strategic Problem

• Best Responses

• General Results

• Symmetric MPE

• Asymmetric MPE

• Simple MPE

• More Asymmetry

Conclusion

S. Rady, Yonsei University 2012 Lecture 1: Poisson Bandits – 31

Suppose all players play S at all beliefs p ≤ p∗1

un(p∗1) = V ∗

1 (p∗1) = s, u′

n(p∗1) = (V ∗

1 )′(p∗1) = 0

b(p∗1, un) ≤ c(p∗1) = b(p∗1, V∗1 )

=⇒ un(j(p∗1)) ≤ V ∗

1 (j(p∗1))

=⇒ un(j(p∗1)) = V ∗

1 (j(p∗1))

un − V ∗1 minimal at j(p∗1)

=⇒ u′n(j(p

∗1)) ≤ (V ∗

1 )′(j(p∗1))

un(j2(p∗1)) ≥ V ∗

1 (j2(p∗1))

=⇒ b(j(p∗1), un) ≥ b(j(p∗1), V∗1 ) > c(j(p∗1))

So all players must use R at the belief j(p∗1)

Page 32: Strategic Experimentation

Encouragement Effect for λ0 > 0

Introduction

The Model

The Single-AgentProblem

The CooperativeProblem

The Strategic Problem

• Best Responses

• General Results

• Symmetric MPE

• Asymmetric MPE

• Simple MPE

• More Asymmetry

Conclusion

S. Rady, Yonsei University 2012 Lecture 1: Poisson Bandits – 32

Each player’s Bellman equation now yields

un(j(p∗1)) = s+N b(j(p∗1), un)− c(j(p∗1))

≥ s+N b(j(p∗1), V∗1 )− c(j(p∗1))

> s+ b(j(p∗1), V∗1 )− c(j(p∗1))

= V ∗1 (j(p

∗1))

— a contradiction with un(j(p∗1)) = V ∗

1 (j(p∗1))

Page 33: Strategic Experimentation

No Encouragement Effect for λ0 = 0

Introduction

The Model

The Single-AgentProblem

The CooperativeProblem

The Strategic Problem

• Best Responses

• General Results

• Symmetric MPE

• Asymmetric MPE

• Simple MPE

• More Asymmetry

Conclusion

S. Rady, Yonsei University 2012 Lecture 1: Poisson Bandits – 33

Encouragement effect rests on two conditions

• Experimentation by a pioneer must increase the likelihoodthat others will return to the risky arm

• These future experiments must be valuable to the pioneer

The second condition is not met for λ0 = 0

So for λ0 = 0, experimentation in any MPE stops at p∗1

Page 34: Strategic Experimentation

Symmetric Equilibrium

Introduction

The Model

The Single-AgentProblem

The CooperativeProblem

The Strategic Problem

• Best Responses

• General Results

• Symmetric MPE

• Asymmetric MPE

• Simple MPE

• More Asymmetry

Conclusion

S. Rady, Yonsei University 2012 Lecture 1: Poisson Bandits – 34

No symmetric MPE in pure strategies

Indifference between R and S requires

b(p, u) = c(p)

— DDE with closed-form solution

Bellman equation then implies

k(p) =u(p)− s

(N − 1)c(p)

Page 35: Strategic Experimentation

Symmetric MPE

Introduction

The Model

The Single-AgentProblem

The CooperativeProblem

The Strategic Problem

• Best Responses

• General Results

• Symmetric MPE

• Asymmetric MPE

• Simple MPE

• More Asymmetry

Conclusion

S. Rady, Yonsei University 2012 Lecture 1: Poisson Bandits – 35

Construction and uniqueness by elementary methods

0 1pm

u

s

λ1h

����

DN−1

pN p†

N

. .........................................................

..............................

...............................

................................

.................................

.................................

..................................

R w.prob. 0

R w.prob. 1

strict mixing

Smooth pasting at pN

Comparative statics as N↑: common payoffs↑, pN↓, p†Nl

Page 36: Strategic Experimentation

Symmetric MPE

Introduction

The Model

The Single-AgentProblem

The CooperativeProblem

The Strategic Problem

• Best Responses

• General Results

• Symmetric MPE

• Asymmetric MPE

• Simple MPE

• More Asymmetry

Conclusion

S. Rady, Yonsei University 2012 Lecture 1: Poisson Bandits – 36

Interior allocation between cut-offs pN < p†N

0 1p∗N p∗

1

K

0

N

pN p†

N

. ............ ........... ......... .....................

................

...................

.........................

............................

......................

.........................

...........................

Inefficiency: pN > p∗N (not reached in finite time)

Encouragement effect: pN < p∗1 if λ0 > 0

Page 37: Strategic Experimentation

Asymmetric Equilibria

Introduction

The Model

The Single-AgentProblem

The CooperativeProblem

The Strategic Problem

• Best Responses

• General Results

• Symmetric MPE

• Asymmetric MPE

• Simple MPE

• More Asymmetry

Conclusion

S. Rady, Yonsei University 2012 Lecture 1: Poisson Bandits – 37

No asymmetric MPE demonstrated in Bolton & Harris (1999)

Keller, Rady & Cripps (2005):

• λ0 = 0 (no encouragement effect)

• variety of asymmetric MPE in simple strategies

• complete classification for N = 2

• most inequitable MPE for arbitrary N

• higher average payoff than symmetric MPE

Keller, Rady & Cripps (2010):

• λ0 > 0 (encouragement effect)

• construction of a particular asymmetric MPE

• Pareto improvement over symmetric MPE

Page 38: Strategic Experimentation

The Most Inequitable MPE for λ0 = 0 and N = 2

Introduction

The Model

The Single-AgentProblem

The CooperativeProblem

The Strategic Problem

• Best Responses

• General Results

• Symmetric MPE

• Asymmetric MPE

• Simple MPE

• More Asymmetry

Conclusion

S. Rady, Yonsei University 2012 Lecture 1: Poisson Bandits – 38

0 1p∗1

pm

u

s

λ1h

����

D1

p1

...........................

..........................

................................................. ..... .........

..............................................

.......................

.......................

........................

........................

. .............................................

........................

.........................

..... .... ....

p2

.....................................

....................................

....................................

....................................

Average payoff higher than in symmetric MPE

Average payoff increases as burden of experimentation isshared more equitably

Page 39: Strategic Experimentation

Constructing Asymmetric MPE for λ0 > 0

Introduction

The Model

The Single-AgentProblem

The CooperativeProblem

The Strategic Problem

• Best Responses

• General Results

• Symmetric MPE

• Asymmetric MPE

• Simple MPE

• More Asymmetry

Conclusion

S. Rady, Yonsei University 2012 Lecture 1: Poisson Bandits – 39

Two ideas:

• Give all players identical continuation values after asuccess on a risky arm

• Let players alternate between the roles of experimenterand free-rider before all experimentation stops

Construction below for N = 2 generalises to arbitrary N

Page 40: Strategic Experimentation

Asymmetric MPE for λ0 > 0 and N = 2

Introduction

The Model

The Single-AgentProblem

The CooperativeProblem

The Strategic Problem

• Best Responses

• General Results

• Symmetric MPE

• Asymmetric MPE

• Simple MPE

• More Asymmetry

Conclusion

S. Rady, Yonsei University 2012 Lecture 1: Poisson Bandits – 40

0 1pm

u

s

λ1h

����

D1D1/2

K = 2

p‡

1<K<2

p♯

K = 1

p♭

K = 0

• On ]p♯, 1] both players play symmetrically

• On ]p♭, p♯] players take turns playing R

• On [0, p♭] both players play S

Strictly higher average payoff on ]p♭, 1] than in the symmetricequilibrium

Pareto improvement for sufficiently frequent turns on ]p♭, p♯]

Page 41: Strategic Experimentation

Why is Taking Turns More Efficient?

Introduction

The Model

The Single-AgentProblem

The CooperativeProblem

The Strategic Problem

• Best Responses

• General Results

• Symmetric MPE

• Asymmetric MPE

• Simple MPE

• More Asymmetry

Conclusion

S. Rady, Yonsei University 2012 Lecture 1: Poisson Bandits – 41

Under symmetry:

• A player who deviates to the safe action slows down thegradual slide of beliefs towards more pessimism

• This makes the other players experiment more than theywould on the equilibrium path

In the alternation phase of an asymmetric MPE:

• A deviation from the risky to the safe action freezes thebelief in its current state

• This delays the time at which another player takes over

Page 42: Strategic Experimentation

Best, Worst and Most Inequitable MPE

Introduction

The Model

The Single-AgentProblem

The CooperativeProblem

The Strategic Problem

• Best Responses

• General Results

• Symmetric MPE

• Asymmetric MPE

• Simple MPE

• More Asymmetry

Conclusion

S. Rady, Yonsei University 2012 Lecture 1: Poisson Bandits – 42

Keller, Rady & Cripps (2005):

• λ0 = 0

• Symmetric MPE yields uniformly lowest average payoffs

• For N = 2, the limits as λ0 ↓ 0 of the MPE justconstructed yield uniformly highest average payoffs

• For N = 2, the most inequitable MPE yield uniformlyextremal individual payoffs; these are in simple strategies

Do such MPE exist for λ0 > 0?

Consider simple MPE for N = 2 and small λ0 > 0

Page 43: Strategic Experimentation

Large Jumps

Introduction

The Model

The Single-AgentProblem

The CooperativeProblem

The Strategic Problem

• Best Responses

• General Results

• Symmetric MPE

• Asymmetric MPE

• Simple MPE

• More Asymmetry

Conclusion

S. Rady, Yonsei University 2012 Lecture 1: Poisson Bandits – 43

For λ0 > 0 small enough:

• j(p∗2) ≥ pm

• Everybody plays R after a success

• We can again give the players common continuation valuesafter a success on a risky arm

Precise condition:

λ0

(

λ0

λ1

)

λ0∆λ

≤r

2

Sufficient condition:λ0 ≤

r

2

Page 44: Strategic Experimentation

A Simple MPE for N = 2

Introduction

The Model

The Single-AgentProblem

The CooperativeProblem

The Strategic Problem

• Best Responses

• General Results

• Symmetric MPE

• Asymmetric MPE

• Simple MPE

• More Asymmetry

Conclusion

S. Rady, Yonsei University 2012 Lecture 1: Poisson Bandits – 44

0 1pm

u

s

λ1h

����

D1

R dominant

S/R mutualbest responses

p

.....................................

.....................................

......................................

......................................

ps

. .......................................

................

.........................

...........

p

..................

................

.............................

. ............. ............. ............................

a

• on ]p, 1] both players play R• on ]ps, p] player 1 plays S, player 2 plays R• on ]p, ps] player 1 plays R, player 2 plays S• on [0, p] both players play S

Lower average payoff than in the previous MPE

◦ Monotonicity? ‘Anticipation’

Page 45: Strategic Experimentation

Rewarding the Last Experimenter

Introduction

The Model

The Single-AgentProblem

The CooperativeProblem

The Strategic Problem

• Best Responses

• General Results

• Symmetric MPE

• Asymmetric MPE

• Simple MPE

• More Asymmetry

Conclusion

S. Rady, Yonsei University 2012 Lecture 1: Poisson Bandits – 45

• Maintain assumption of ‘large jumps’

• Relax payoff symmetry above D1

– Give player 1 higher payoff at j(p)

– Two effects:

◦ Lower intensity just to the right of p

◦ Player 1 is willing to play R left of p

– Average payoff

◦ decreases at high beliefs

◦ increases around p

Page 46: Strategic Experimentation

Rewarding the Last Experimenter

Introduction

The Model

The Single-AgentProblem

The CooperativeProblem

The Strategic Problem

• Best Responses

• General Results

• Symmetric MPE

• Asymmetric MPE

• Simple MPE

• More Asymmetry

Conclusion

S. Rady, Yonsei University 2012 Lecture 1: Poisson Bandits – 46

• Consequences:

– Can construct equilibria with progressivelyincreasing payoff asymmetry, approachingthe most inequitable MPE for λ0 = 0

– There exist no uniformly best or worst MPEwhen λ0 > 0

Page 47: Strategic Experimentation

Rewarding the Last Experimenter

Introduction

The Model

The Single-AgentProblem

The CooperativeProblem

The Strategic Problem

• Best Responses

• General Results

• Symmetric MPE

• Asymmetric MPE

• Simple MPE

• More Asymmetry

Conclusion

S. Rady, Yonsei University 2012 Lecture 1: Poisson Bandits – 47

p∗2

pm

u

s ��

D1

p2

`

p1

`

p

`

ps

`

``

`

`

• Fix (p2, u1) above D1;find p1and pand ps• Fill in u2 above D1and just below• Fill in u2 above p . . . does it join up?• Adjust (p2, u1) if necessary

Page 48: Strategic Experimentation

Conclusion

Introduction

The Model

The Single-AgentProblem

The CooperativeProblem

The Strategic Problem

Conclusion

S. Rady, Yonsei University 2012 Lecture 1: Poisson Bandits – 48

Page 49: Strategic Experimentation

Concluding Remarks

Introduction

The Model

The Single-AgentProblem

The CooperativeProblem

The Strategic Problem

Conclusion

S. Rady, Yonsei University 2012 Lecture 1: Poisson Bandits – 49

• Poisson bandits

– Alternative to Brownian setup

– News comes in ‘lumps’

• Asymmetric equilibria with encouragement

– Construction by elementary methods

– Taking turns is superior to ‘mixing’

– No uniformly ‘best’ or ‘worst’ MPE unless λ0 = 0

Page 50: Strategic Experimentation

Concluding Remarks

Introduction

The Model

The Single-AgentProblem

The CooperativeProblem

The Strategic Problem

Conclusion

S. Rady, Yonsei University 2012 Lecture 1: Poisson Bandits – 50

• What else?

– Ongoing work on ‘bad news’ scenario– Smooth pasting does not hold– Characterization of symmetric MPE– Viscosity solutions of the Bellman equation

• What next?

– Introduce negative correlation of type of risky arm acrossplayers

– Consider non-Markovian equilibria