Normative solutions to the speed/accuracy trade-off in ... · defines trial difficulty for fixed...

Normative solutions to thespeed/accuracy trade-off in perceptual decision-making

Jan DrugowitschUniversity of Geneva

FENS-Hertie Winter School 2015The neuroscience of decision making

Ruben Moreno-BoteAlex Pouget

Mike Shadlen

Perceptual decision-making?

Normative solutions?

Can be parameterised to fit the behaviour

Deviations from normative solutions give insight into specific deficits or biases

Norma&ve = how things ought to be

the information on which decisions should be based

the computations that need to be carried out on that informationIdentify

Resources

Notes, derivations, and figure code (MATLAB) is on

https://github.com/jdrugo/FENS2015

Road map

Normative evidence accumulation

What do we want to maximise?

Speed/accuracy trade-off for known reliability of evidence

Speed/accuracy trade-off for unknown reliability of evidence

Extensions

Road map

Extensions

51.2% coherence

12.8% coherence

Random dot motion task

“right”?“left”?

z ∈ −1,1{ }Hidden state

“left” “right”p(x | z) =N x | z,σε

2( )Evidence

observation / stimulus

Posterior belief

p(z | x)∝ p(x | z)p(z)∝ 1

1+ e−2 xzσε2

likelihood prior

−4 −3 −2 −1 0 1 2 3 40

1p(z = 1 | x)p(x | z = 1)p(x | z = −1)

01evacc/single_obs.m

Evidence accumulation with multiple observations

z ∈ −1,1{ }Hidden state

“left” “right”p(xn | z) =N xn | z,σε

2( )nth evidence

observation / stimulus

Posterior belief after N observations

p(z | x1:N )∝ p(z) p(xn | z)n=1

∏ ∝1

1+ e−2XNz

i.i.d. likelihoodprior

XN = xnn=1

∑with sufficient statistics

1 2 3 4 5 6 7 8 9 10−4−2

1 2 3 4 5 6 7 8 9 10−5

observation n1 2 3 4 5 6 7 8 9 10

01evacc/discrete_obs.m

Evidence accumulation with stream of evidence

Discretise stream into small chunks of δt

Likelihood of momentary evidence δxn

p δxn | z( ) =N δxn | zδt,σε2δt( )

information in δxn about z goes to 0 with δt 0

time t1

01evacc/continuous_obs.m

Posterior upon observing δx1, δx2, … for t seconds(taking δt 0)

X(t) = δx(s)0

∫with sufficient statistics

p z |δx0:t( )∝ p(z) p δxn | z( )n=1

∏ ∝1

1+ e−2X (t )z

i.i.d. likelihoodprior

X(t) ~ N zt,σε2t( )diffusion model

Road map

Extensions

Maximizing 0-1 reward

To get from posterior to choices: need loss/reward functionreward(choice a, hidden state z)

Perform choice that maximises expected reward

0-1 reward

choicea=“right”

a=“left”z=-1 z=110 1

hidden state

p z =1| X(t)( ) ≥ 12

p z =1| X(t)( ) < 12

choose “right”

choose “left”

choose more likely correct option

0 1 2 3 4 5 60

time t

0 1 2 3 4 5 60

how long to accumulate?

02lossfn/loss_01.m

forever?

A cost for accumulating evidence

Cost c for accumulating evidence internal (e.g., effort / attention)external (e.g., lost future opportunities)

0 1 2 3 4 5 60

time t

0 1 2 3 4 5 60

cost 0.0cost 0.1cost 0.2

02lossfn/loss_cost.m

After some time,marginal choice accuracy increase doesnot justify associated cost

ER(PC,RT ) = PC − cRT

Expected reward

p(correct) reaction time

This is the reward functionthat we will use for the rest of the tutorial

Maximising reward rate rather than expected reward

Loss of future reward is explicit in reward rate

RR(PC,RT ) = PC − cRTRT + ti + (1−PC)tp

inter-trial interval penalty time(for wrong choices)

0 1 2 3 4 5 60

time t

0 1 2 3 4 5 60

1ti 0.0ti 2.0ti 4.0

02lossfn/loss_rr.m

Even c=0 leads to early choices,as large RT reduces reward rate in denominator

We will come back to that as possible extension

Road map

Extensions

Dynamic Programming in a nutshell

We find the optimal speed/accuracy trade-off by Dynamic Programming

Ingredients (Markov Decision Process):Set of states,Set of actions,State transition probabilities,Reward function,(Discount factor, )

s ∈ Sa ∈ A

p !s | s,a( )r(s,a)γ =1

r(s1,1)=-1

a=1a=1

a=1 r(s2,{1,2})=-1r(s3,1)=-1

r(s4,1)=-1r(s5,1)=10

a=1r(s6,1)=0

Aim: find policy that maximises total rewardπ : S→ A

V π (s) = r sn,π (sn )( )n=1

∑p s1,s2 ,…|s1=s,π( )

from each state s.

Can be found recursively by solving Bellman’s equation:

V s( ) =maxa

r s,a( )+ V ( !s )p !s |s,a( )

Optimal action (i.e., optimal policy) is action that maximises right-hand side.

V(s6)=0

V(s5)=10V(s4)=9

V(s3)=8V(s2)=9

V(s1)=8

States and actions in perceptual decision-making

States: all observations so far sufficient statistics X ∈ −∞,∞[ ]To have bounded state-space, use belief instead:g∈ 0,1[ ]

g(X) ≡ p z =1| X( ) = 1

1+ e−2 Xσε2

−4 −3 −2 −1 0 1 2 3 40

1p(z = 1 | x)p(x | z = 1)p(x | z = −1)

X ∈ −∞,∞[ ]

g∈ 0,1[ ]

Actions: r s,a( )+ V ( !s )p( !s |s,a)

choose “right” (z=1) g+ 01− g+ 0−cδt + V ( "g )

p(g '|g)

choose “left” (z=-1)accumulate another δt

Bellman’s equation for perceptual decision-making:

V (g) =max g,1− g, V ( "g )p(g '|g)

− cδt#$

belief c

0 0.2 0.4 0.6 0.8 10

max(g, 1−g)<V> − c dt

belief g03knownreliab/plot_valueintersect.m

optimal decision-making by boundaries on belief g

An example decision

time t

0 1 20

03knownreliab/plot_diffusion_example.m

Finding the optimal bounds numerically

Find belief transition probability p(g’|g) for accumulating δt more evidence

Discretise belief and value function into k=1,…K, and solve numerically by value iteration:

V k,n =max gk,1− gk, p g j | gk( )V j,n−1 − cδtj=1

dt = 0.100, sig2 = 1.000

0 0.5 1

dt = 0.010, sig2 = 1.000

0 0.5 1

dt = 0.001, sig2 = 1.000

0 0.5 1

dt = 0.010, sig2 = 0.250

g’0 0.5 1

dt = 0.010, sig2 = 1.000

g’0 0.5 1

dt = 0.010, sig2 = 25.000

g’0 0.5 1

03knownreliab/plot_belieftrans.m

decreasing δt

belief g’ after another δt

increasing σε(task difficulty)

Optimal boundary with cost, task difficulty

evidence accumulation cost c

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10.5

1directDP

evidence accumulation cost c03knownreliab/plot_bound_with_cost.m

task difficulty sqrt(sig2)

0 0.5 1 1.5 2 2.5 3 3.5 40.5

1directDP

task difficulty σε hardeasy

03knownreliab/plot_bound_with_sig2.m

An alternative to DP to find the optimal boundaries:

ER(PC,RT ) = PC − cRT

PC = 1

1+ e−2 θσε2

RT =θ tanh θσε2

Maximising using known expression for diffusion models

Optimal decision-making with diffusion models

Boundaries on belief g boundaries in X (diffusion model)

−4 −3 −2 −1 0 1 2 3 40

1p(z = 1 | x)p(x | z = 1)p(x | z = −1)

X ∈ −∞,∞[ ]

g∈ 0,1[ ]

time t

0 1 2−3

time t

0 1 20

Can perform optimal decision-making without ever computing belief g

Road map

Extensions

Evidence accumulation with reliability changing across trials

−4 −3 −2 −1 0 1 2 3 40

1p(z = 1 | x)p(x | z = 1)p(x | z = −1)

defines trial difficulty for fixed σε=1

Hidden state, z ∈ −1,1{ }Trial difficulty, (small = hard)α ∈ 0,∞[ ]

Per-trial evidence mean µ =αz

p(µ) =N µ | 0,σα2( )

drawn for each trial from

overall task difficulty (small = hard)Hard trials more likely, easy trials less likely

With evidence, evidence accumulation by δxn ~ N µδt,δt( )

g(X, t) ≡ p z =1|δx0:t( ) = p µ ≥ 0 | X(t), t( ) =Φ X(t)σα

−2 + t

such that sufficient statistics are now both X(t) and t

p µ |δx0:t( )∝ p(µ) p δxn |µ( )n=1

∏ ∝N µ | X(t)σα

−2 + t, 1σα

−2 + t$

The resulting optimal policy

V (g, t) =max g,1− g, V "g , t +δt( ) p "g |g,t( )− cδt#

Bellman’s equation over belief g and time t

time t

0 1 20

0 0.5 10

max[g, 1−g]<V>−c dt, t=0.00<V>−c dt, t=0.99<V>−c dt, t=1.98

04knownreliab/plot_valuefn.m

future expected value takes into account time-dependence

For each fixed t boundaries in belief g time-dependent boundaries

Optimal decision-making with diffusion models

Using time-dependent mapping between belief g and diffusing particle X

g(X, t) =Φ Xσα

−2 + t

'(( X(g, t) = σα

−2 + tΦ−1 g( )

Can map bound in belief g to bound in diffusing particle X

time t

tistic

0 1 2−4

time t

f g0 1 2

As before, optimal decision-making without ever directly computing the beliefNaturally leads to slower error than correct choices

Urgency signal

“urgency signal” ascollapsing bound?

Road map

Extensions

Maximising reward rate instead of expected reward

Maximising reward rate = maximising reward in infinite sequence of structurally equal trials

Problem: value of initial choice will be infinite (infinite, possibly rewarding choices follow)

Solution: use average-adjusted value, discounting time δt by -ρδt

reward rate (reward per unit time)

this value is relative to reward that can be achieved on average

For example, Bellman’s equation for known evidence reliability

Policy is invariant to shifts in value function, such that we can choose .V 12( ) = 0

Find both value function and reward rate, using consistency criterion, .V 12( ) = 0

V (g) =maxg− ti + (1− g)tp( )ρ +V 1

2( ),1− g− ti + gtp( )ρ +V 12( ),

V "g( ) p "g |g( )− ρ + c( )δt

expected time from decision to onset of next trial

Generalisations to evidence model

A time-varying accumulation cost

Accumulation cost might rise/drop over time

V (g, t) =max g,1− g, V "g , t +δt( ) p "g |g,t( )− c(t)δt#

supported by human/animal behavior (Drugowitsch et al., 2012)

Evidence reliability that varies within individual trials

In real-world decisions, the evidence reliability is practically never constant.

Requires simultaneous estimation of hidden state and momentary evidence reliabilitysufficient statistics are at least two-dimensional

Leads to reliability-dependent bound on decision-maker’s belief,see Drugowitsch, Moreno-Bote, Pouget (NIPS, 2014).

Paper also introduces faster way to find expected future reward, using PDE solvers.

Summary

Notes, derivations, and figure code (MATLAB) is on

https://github.com/jdrugo/FENS2015

Extensions

Normative solutions to the speed/accuracy trade-off in ... · defines trial difficulty for fixed...

Documents

Footpegs Trial S3 Steel Footpegs Trial S3 CURVE Position 10mm … · Footpegs Trial S3 CURVE Footpegs Trial S3 Hard Rock Footpegs Trial S3 Footpegs Trial S3 ORIGINAL Footpegs Trial

Vibrations and waves: revision · ‣ Q deﬁnes how narrow the resonance is; higher Q means sharper resonance ‣ Q deﬁnes how close the resonant frequency is to the natural frequency

Our Toil Respite Only: Woolf, Diamond, and the Difﬁculty

Statistical Piano Reduction Controlling Performance Difﬁculty · 2 EITA NAKAMURA, et al. similar approach without controls of performance difﬁculty has been studied for guitar

The Role of Self-Focus, Task Difﬁculty, Task Self …...Motivation and Emotion, Vol. 28, No. 2, June 2004 (C 2004) The Role of Self-Focus, Task Difﬁculty, Task Self-Relevance,

Difﬁculty Adjustable and Scalable Constrained Multi

Dynamic Difﬁculty Adjustment for Maximized …papers....Dynamic Difﬁculty Adjustment for Maximized Engagement in Digital Games Su Xue Electronic Arts, Inc. 209 Redwood Shores Pkwy

Analysis of virulence plasmid gene expression deﬁnes three ...€¦ · Analysis of virulence plasmid gene expression deﬁnes three classes of effectors in the type III secretion

Health in the driver's seat · > Do you need a passenger to help you drive? > Do you have difﬁculty yielding the right of way? > Do you have difﬁculty backing up your vehicle?

Microdeletion at chromosome 4q21 deﬁnes a new emerging

Velocity Versus Safety Impact of Goal Conﬂict and Task Difﬁculty On

Information-Theoretic Measures of Dataset Difﬁculty

Carbon quantity deﬁnes productivity while its quality

Understanding the difﬁculty of training deep feedforward ...proceedings.mlr.press/v9/glorot10a/glorot10a.pdf · 249 Understanding the difﬁculty of training deep feedforward neural

Difﬁculty Adjustable and Scalable Constrained Multi ... · Difﬁculty Adjustable and Scalable Constrained Multi-objective Test Problem Toolkit (Michalewicz and Nazhiyath, 1995)

Dimensions of Difﬁculty in Translating Natural …Dimensions of Difﬁculty in Translating Natural Language into First Order Logic Dave Barker-Plummer,1 Richard Cox2 and Robert Dale3

Why Do Alzheimer Patients Have Difﬁculty with Pronouns ...csl.psychol.cam.ac.uk/publications/pdf/99_Almor_BL.pdf · Why Do Alzheimer Patients Have Difﬁculty with Pronouns? Working

The PEN1 Syntaxin Deﬁnes a Novel Cellular …hlyoungs/web-content/pen1.pdfMolecular Biology of the Cell Vol. 15, 5118–5129, November 2004 The PEN1 Syntaxin Deﬁnes a Novel Cellular

KL trial 2014KL trial 2014KL trial 2014

On the subjective difﬁculty of Joystick-based robot arm