View
7
Download
0
Category
Preview:
Citation preview
Normative solutions to thespeed/accuracy trade-off in perceptual decision-making
Jan DrugowitschUniversity of Geneva
FENS-Hertie Winter School 2015The neuroscience of decision making
Ruben Moreno-BoteAlex Pouget
Mike Shadlen
Perceptual decision-making?
Normative solutions?
Can be parameterised to fit the behaviour
Deviations from normative solutions give insight into specific deficits or biases
Norma&ve = how things ought to be
the information on which decisions should be based
the computations that need to be carried out on that informationIdentify
Resources
Notes, derivations, and figure code (MATLAB) is on
https://github.com/jdrugo/FENS2015
Road map
Normative evidence accumulation
What do we want to maximise?
Speed/accuracy trade-off for known reliability of evidence
Speed/accuracy trade-off for unknown reliability of evidence
Extensions
Road map
Normative evidence accumulation
What do we want to maximise?
Speed/accuracy trade-off for known reliability of evidence
Speed/accuracy trade-off for unknown reliability of evidence
Extensions
Normative evidence accumulation
51.2% coherence
12.8% coherence
Random dot motion task
“right”?“left”?
z ∈ −1,1{ }Hidden state
“left” “right”p(x | z) =N x | z,σε
2( )Evidence
observation / stimulus
Posterior belief
p(z | x)∝ p(x | z)p(z)∝ 1
1+ e−2 xzσε2
likelihood prior
x
post
erio
r and
like
lihoo
d
−4 −3 −2 −1 0 1 2 3 40
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1p(z = 1 | x)p(x | z = 1)p(x | z = −1)
01evacc/single_obs.m
Evidence accumulation with multiple observations
z ∈ −1,1{ }Hidden state
“left” “right”p(xn | z) =N xn | z,σε
2( )nth evidence
observation / stimulus
Posterior belief after N observations
p(z | x1:N )∝ p(z) p(xn | z)n=1
N
∏ ∝1
1+ e−2XNz
σε2
i.i.d. likelihoodprior
XN = xnn=1
N
∑with sufficient statistics
x n
1 2 3 4 5 6 7 8 9 10−4−2
024
X n
1 2 3 4 5 6 7 8 9 10−5
0
5
p(z
= 1|
x1:
n)
observation n1 2 3 4 5 6 7 8 9 10
0
0.5
1
01evacc/discrete_obs.m
Evidence accumulation with stream of evidence
Discretise stream into small chunks of δt
Likelihood of momentary evidence δxn
p δxn | z( ) =N δxn | zδt,σε2δt( )
information in δxn about z goes to 0 with δt 0
dxn /
dt
1
−2
0
2
X(t)
1−1
0
1
2
p(z
= 1|
x0:
t)
time t1
0
0.5
1
01evacc/continuous_obs.m
Posterior upon observing δx1, δx2, … for t seconds(taking δt 0)
X(t) = δx(s)0
t
∫with sufficient statistics
p z |δx0:t( )∝ p(z) p δxn | z( )n=1
tδt
∏ ∝1
1+ e−2X (t )z
σε2
i.i.d. likelihoodprior
X(t) ~ N zt,σε2t( )diffusion model
Road map
Normative evidence accumulation
What do we want to maximise?
Speed/accuracy trade-off for known reliability of evidence
Speed/accuracy trade-off for unknown reliability of evidence
Extensions
Maximizing 0-1 reward
To get from posterior to choices: need loss/reward functionreward(choice a, hidden state z)
Perform choice that maximises expected reward
0-1 reward
choicea=“right”
a=“left”z=-1 z=110 1
0
hidden state
p z =1| X(t)( ) ≥ 12
p z =1| X(t)( ) < 12
choose “right”
choose “left”
choose more likely correct option
p(z
= 1
| x0:
t)
0 1 2 3 4 5 60
0.2
0.4
0.6
0.8
1
time t
expe
cted
rew
ard
0 1 2 3 4 5 60
0.2
0.4
0.6
0.8
1
how long to accumulate?
02lossfn/loss_01.m
forever?
A cost for accumulating evidence
Cost c for accumulating evidence internal (e.g., effort / attention)external (e.g., lost future opportunities)
p(z
= 1
| x0:
t)
0 1 2 3 4 5 60
0.2
0.4
0.6
0.8
1
time t
expe
cted
rew
ard
0 1 2 3 4 5 60
0.2
0.4
0.6
0.8
1
cost 0.0cost 0.1cost 0.2
02lossfn/loss_cost.m
After some time,marginal choice accuracy increase doesnot justify associated cost
ER(PC,RT ) = PC − cRT
Expected reward
p(correct) reaction time
This is the reward functionthat we will use for the rest of the tutorial
Maximising reward rate rather than expected reward
Loss of future reward is explicit in reward rate
RR(PC,RT ) = PC − cRTRT + ti + (1−PC)tp
inter-trial interval penalty time(for wrong choices)
p(z
= 1
| x0:
t)
0 1 2 3 4 5 60
0.2
0.4
0.6
0.8
1
time t
expe
cted
rew
ard
rate
0 1 2 3 4 5 60
0.2
0.4
0.6
0.8
1ti 0.0ti 2.0ti 4.0
02lossfn/loss_rr.m
Even c=0 leads to early choices,as large RT reduces reward rate in denominator
We will come back to that as possible extension
Road map
Normative evidence accumulation
What do we want to maximise?
Speed/accuracy trade-off for known reliability of evidence
Speed/accuracy trade-off for unknown reliability of evidence
Extensions
Dynamic Programming in a nutshell
We find the optimal speed/accuracy trade-off by Dynamic Programming
Ingredients (Markov Decision Process):Set of states,Set of actions,State transition probabilities,Reward function,(Discount factor, )
s ∈ Sa ∈ A
p !s | s,a( )r(s,a)γ =1
s1
s2s3
s5s4
s6
r(s1,1)=-1
a=1
a=2
a=1a=1
a=1
a=1 r(s2,{1,2})=-1r(s3,1)=-1
r(s4,1)=-1r(s5,1)=10
a=1r(s6,1)=0
Aim: find policy that maximises total rewardπ : S→ A
V π (s) = r sn,π (sn )( )n=1
∞
∑p s1,s2 ,…|s1=s,π( )
from each state s.
Can be found recursively by solving Bellman’s equation:
V s( ) =maxa
r s,a( )+ V ( !s )p !s |s,a( )
"#
$%
Optimal action (i.e., optimal policy) is action that maximises right-hand side.
V(s6)=0
V(s5)=10V(s4)=9
V(s3)=8V(s2)=9
V(s1)=8
States and actions in perceptual decision-making
States: all observations so far sufficient statistics X ∈ −∞,∞[ ]To have bounded state-space, use belief instead:g∈ 0,1[ ]
g(X) ≡ p z =1| X( ) = 1
1+ e−2 Xσε2
x
post
erio
r and
like
lihoo
d
−4 −3 −2 −1 0 1 2 3 40
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1p(z = 1 | x)p(x | z = 1)p(x | z = −1)
01evacc/single_obs.m
X ∈ −∞,∞[ ]
g∈ 0,1[ ]
Actions: r s,a( )+ V ( !s )p( !s |s,a)
choose “right” (z=1) g+ 01− g+ 0−cδt + V ( "g )
p(g '|g)
choose “left” (z=-1)accumulate another δt
Bellman’s equation for perceptual decision-making:
V (g) =max g,1− g, V ( "g )p(g '|g)
− cδt#$
%&
belief c
valu
e
0 0.2 0.4 0.6 0.8 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
max(g, 1−g)<V> − c dt
value
belief g03knownreliab/plot_valueintersect.m
optimal decision-making by boundaries on belief g
An example decision
time t
belie
f g
0 1 20
0.5
1
03knownreliab/plot_diffusion_example.m
Finding the optimal bounds numerically
Find belief transition probability p(g’|g) for accumulating δt more evidence
Discretise belief and value function into k=1,…K, and solve numerically by value iteration:
V k,n =max gk,1− gk, p g j | gk( )V j,n−1 − cδtj=1
K
∑#
$%%
&
'((
dt = 0.100, sig2 = 1.000
g
0 0.5 1
0
0.5
1
dt = 0.010, sig2 = 1.000
0 0.5 1
0
0.5
1
dt = 0.001, sig2 = 1.000
0 0.5 1
0
0.5
1
dt = 0.010, sig2 = 0.250
g
g’0 0.5 1
0
0.5
1
dt = 0.010, sig2 = 1.000
g’0 0.5 1
0
0.5
1
dt = 0.010, sig2 = 25.000
g’0 0.5 1
0
0.5
1
03knownreliab/plot_belieftrans.m
decreasing δt
belief g’ after another δt
curr
ent b
elief
g
increasing σε(task difficulty)
Optimal boundary with cost, task difficulty
evidence accumulation cost c
boun
d in
bel
ief
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10.5
0.55
0.6
0.65
0.7
0.75
0.8
0.85
0.9
0.95
1directDP
boun
dary
in b
elief
g
evidence accumulation cost c03knownreliab/plot_bound_with_cost.m
task difficulty sqrt(sig2)
boun
d in
bel
ief
0 0.5 1 1.5 2 2.5 3 3.5 40.5
0.55
0.6
0.65
0.7
0.75
0.8
0.85
0.9
0.95
1directDP
task difficulty σε hardeasy
03knownreliab/plot_bound_with_sig2.m
An alternative to DP to find the optimal boundaries:
ER(PC,RT ) = PC − cRT
PC = 1
1+ e−2 θσε2
RT =θ tanh θσε2
Maximising using known expression for diffusion models
Optimal decision-making with diffusion models
Boundaries on belief g boundaries in X (diffusion model)
x
post
erio
r and
like
lihoo
d
−4 −3 −2 −1 0 1 2 3 40
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1p(z = 1 | x)p(x | z = 1)p(x | z = −1)
01evacc/single_obs.m
X ∈ −∞,∞[ ]
g∈ 0,1[ ]
time t
suffi
cien
t sta
tistc
s X
0 1 2−3
−2
−1
0
1
2
3
time t
belie
f g
0 1 20
0.5
1
03knownreliab/plot_diffusion_example.m
Can perform optimal decision-making without ever computing belief g
Road map
Normative evidence accumulation
What do we want to maximise?
Speed/accuracy trade-off for known reliability of evidence
Speed/accuracy trade-off for unknown reliability of evidence
Extensions
Evidence accumulation with reliability changing across trials
x
post
erio
r and
like
lihoo
d
−4 −3 −2 −1 0 1 2 3 40
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1p(z = 1 | x)p(x | z = 1)p(x | z = −1)
01evacc/single_obs.m
defines trial difficulty for fixed σε=1
Hidden state, z ∈ −1,1{ }Trial difficulty, (small = hard)α ∈ 0,∞[ ]
Per-trial evidence mean µ =αz
p(µ) =N µ | 0,σα2( )
drawn for each trial from
overall task difficulty (small = hard)Hard trials more likely, easy trials less likely
With evidence, evidence accumulation by δxn ~ N µδt,δt( )
g(X, t) ≡ p z =1|δx0:t( ) = p µ ≥ 0 | X(t), t( ) =Φ X(t)σα
−2 + t
%
&''
(
)**
such that sufficient statistics are now both X(t) and t
p µ |δx0:t( )∝ p(µ) p δxn |µ( )n=1
tδt
∏ ∝N µ | X(t)σα
−2 + t, 1σα
−2 + t$
%&
'
()
The resulting optimal policy
V (g, t) =max g,1− g, V "g , t +δt( ) p "g |g,t( )− cδt#
$%&'(
Bellman’s equation over belief g and time t
time t
belie
f g
0 1 20
0.5
1
value
belie
f g
0 0.5 10
0.5
1
max[g, 1−g]<V>−c dt, t=0.00<V>−c dt, t=0.99<V>−c dt, t=1.98
04knownreliab/plot_valuefn.m
future expected value takes into account time-dependence
For each fixed t boundaries in belief g time-dependent boundaries
Optimal decision-making with diffusion models
Using time-dependent mapping between belief g and diffusing particle X
g(X, t) =Φ Xσα
−2 + t
#
$%%
&
'(( X(g, t) = σα
−2 + tΦ−1 g( )
Can map bound in belief g to bound in diffusing particle X
time t
suffi
cien
t sta
tistic
X
0 1 2−4
−2
0
2
4
time t
belie
f g0 1 2
0
0.5
1
04knownreliab/plot_diffusion_example.m
As before, optimal decision-making without ever directly computing the beliefNaturally leads to slower error than correct choices
Urgency signal
“urgency signal” ascollapsing bound?
Road map
Normative evidence accumulation
What do we want to maximise?
Speed/accuracy trade-off for known reliability of evidence
Speed/accuracy trade-off for unknown reliability of evidence
Extensions
Maximising reward rate instead of expected reward
Maximising reward rate = maximising reward in infinite sequence of structurally equal trials
Problem: value of initial choice will be infinite (infinite, possibly rewarding choices follow)
Solution: use average-adjusted value, discounting time δt by -ρδt
reward rate (reward per unit time)
this value is relative to reward that can be achieved on average
For example, Bellman’s equation for known evidence reliability
Policy is invariant to shifts in value function, such that we can choose .V 12( ) = 0
Find both value function and reward rate, using consistency criterion, .V 12( ) = 0
V (g) =maxg− ti + (1− g)tp( )ρ +V 1
2( ),1− g− ti + gtp( )ρ +V 12( ),
V "g( ) p "g |g( )− ρ + c( )δt
#
$
%%%
&
'
(((
expected time from decision to onset of next trial
Generalisations to evidence model
A time-varying accumulation cost
Accumulation cost might rise/drop over time
V (g, t) =max g,1− g, V "g , t +δt( ) p "g |g,t( )− c(t)δt#
$%&'(
supported by human/animal behavior (Drugowitsch et al., 2012)
Evidence reliability that varies within individual trials
In real-world decisions, the evidence reliability is practically never constant.
Requires simultaneous estimation of hidden state and momentary evidence reliabilitysufficient statistics are at least two-dimensional
Leads to reliability-dependent bound on decision-maker’s belief,see Drugowitsch, Moreno-Bote, Pouget (NIPS, 2014).
Paper also introduces faster way to find expected future reward, using PDE solvers.
Summary
Notes, derivations, and figure code (MATLAB) is on
https://github.com/jdrugo/FENS2015
Normative evidence accumulation
What do we want to maximise?
Speed/accuracy trade-off for known reliability of evidence
Speed/accuracy trade-off for unknown reliability of evidence
Extensions
Recommended