Upload
derek-robbin
View
216
Download
0
Tags:
Embed Size (px)
Citation preview
Institute for Theoretical Physics and MathematicsTehran
January, 2006
Value based decision making: behavior and theory
Greg Corrado
Leo Sugrue
SENSORY INPUT
DECISION MECHANISMS
ADAPTIVE BEHAVIOR
low level sensory analyzers
motor output structures
SENSORY INPUT
DECISION MECHANISMS
ADAPTIVE BEHAVIOR
low level sensory analyzers
motor output structures
REWARD HISTORY
representationof stimulus/action value
How do we measure value?
Herrnstein RJ, 1961
The Matching Law
Ch
oic
e F
ract
ion
Behavior: What computation does the monkey use to ‘match’?
Theory: Can we build a model that replicatesthe monkeys’ behavior on the matching task?How can we validate the performance of the?
model? Why is a model useful?
Physiology: What are the neural circuits and signal transformations within the brain
that implement the computation?
An eye movement matching task
Bai
ting
Fra
ctio
n1:1
6:1
1:6
6:1
1:2
2:1
1:2
Dynamic Matching Behavior
Rewards
Dynamic Matching Behavior
ResponsesRewards
Dynamic Matching Behavior
Relation Between Reward and Choice is Local
ResponsesRewards
How do they do this?
What local mechanism underlies the monkey’s choices in this game?
To estimate this mechanism we need a modeling framework.
Linear-Nonlinear-Poisson (LNP) Models
of choice behavior
Strategy estimation is straightforward
How do animals weigh past rewards in determining current choice?
Estimating the form of the linear stage
How is differential value mapped onto the animal’s instantaneous probability of
choice?
Estimating the form of the nonlinear stage
DifferentialValue
Differential Value (rewards)
Monkey F Monkey G
Pro
babi
lity
of
Cho
ice
(red
)
Our LNP Model of Choice Behavior
Model Validation• Can the model predict the monkey’s next choice?• Can the model generate behavior on its own?
Can the model predict the monkey’s next choice?
Predicting the next choice: single experiment
Predicting the next choice: all experiments
Can the model generate behavior on its own?
Model generated behavior: single experiment
Distribution of stay durations summarizes behavior across all experiments
Stay Duration (trials)
Model generated behavior: all experiments
Stay Duration (trials)
Model generated behavior: all experiments
Stay Duration (trials)
1. Explore second order behavioral questions
2. Explore neural correlates of valuation
Ok, now that you have a reasonable model what can you do with it?
1. Explore second order behavioral questions
2. Explore neural correlates of valuation
Ok, now that you have a reasonable model what can you do with it?
0000111110011choice history:
Surely ‘not getting a reward’ also has some influence on the monkey’s behavior?
0000010100001reward history:
Choice of Model Input
0000111110011choice history:
0000010100001reward history:
the value of an unrewarded choice
hybrid history: 0000010100001
Choice of Model Input
• Systematically vary the value of • Estimate new L and N stages for the model• Test each new model’s ability to
a) predict choice and b) generate behavior
hybrid history: 0000010100001
Can we build a better model by taking unrewarded choices into account?
Value of Unrewarded Choices () Value of Unrewarded Choices ()
Predictive Performance Generative Performance
Unrewarded choices: The value of nothin’
Value of Unrewarded Choices () Value of Unrewarded Choices ()
Predictive Performance Generative Performance
S tay
Du r
atio
n H
i sto
gram
Ov e
r lap
(%
)
Unrewarded choices: The value of nothin’
Contrary to our intuition inclusion of information about unrewarded choices does not
improve model performance
Choice of Model Input
Optimality of Parameters
Weighting of past rewards
Is there an ‘optimal’ weighting function to maximize the rewards a player can harvest in this game?
• The tuning of the 2 (long) component of the L-stage affects foraging efficiency. Monkeys have found this optimum.
Weighting of past rewards
• The 1 (short) component of the L-stage does not affect foraging efficiency. Why do monkeysoverweight recent rewards?
• The tuning of the , the nonlinear function relating value to p(choice) affects foraging efficiency. The monkeys have found this optimum also.
The differential model is a better predictor of monkey choice
• Monkeys match; best LNP model
• Model predicts and generates choices
• Monkeys find optimal 2 and ; 1 not critical
• Unrewarded choices have no effect
• Differential value predicts choices better than fractional value
?
Best LNP model:
Candidate decision variable, differential value:
g(v1 - v2) = pc
Aside: what would Bayes do?1) maintain beliefs over baiting probabilities
2) be greedy or use dynamic programming
Firing rates in LIP are related to target value on a trial-by-trial basis
LIP
http://brainmap.wustl.edu/vanessen.html
gm020b
intoRF
outof RF
Target Value
The differential model also accounts for more variance in LIP firing rates
• How we control/measure value• An experimental task based on that principle • A simple model of value based choice• How we validate that model• How we use the model to explore behavior• How we use the model to explore value related signals in the brain
What I’ve told you:
• Our Linear-Nonlinear-Poisson model
• Hybrid models, optimality of reward weights• Neural firing in area LIP correlates with ‘differential value’ on a trial-by-trial basis
• A dynamic foraging task• The matching law
• Predictive and generative validation
Foraging Efficiency Varies as a Function of 2
Foraging Efficiency Does Not Vary as a Function of 1
What do animals do?
Matching is a probabilistic policy:
€
pchoose = f pbait , pbait( )
Matching is almost optimal within the set of probabilistic policies.
Animals match.
+ the changeover delay
Greg Corrado
How do we implement the change over delay?
only one ‘live’ target at a time