action evaluation in appetitive and aversive learning...action evaluation in appetitive and aversive...

action evaluation in appetitive and aversive

learningnathaniel daw

princeton university

leuven, 2019

learning for decisions

• how you compute has important consequences for what you choose• eg: which of many possible outcomes you consider• better worked out in appetitive domain but likely extend to avoidance

algorithms for computing expected utility over candidate actions

1. habits vs deliberation: model-based vs. model-free RL

2. psychiatry: disorders involving compulsivity and avoidance

3. stress and opportunity cost

sequential decision tasks

𝑄 𝑠𝑡 , 𝑎𝑡 = 𝑟 𝑠𝑡 +

𝑠𝑡+1

𝑃 𝑠𝑡+1 𝑠𝑡 , 𝑎𝑡 𝑟 𝑠𝑡+1 +

𝑠𝑡+2

$0 $25

C D E F

Markov decision process: consequences of actions are delayed, contingent• connect actions to consequences over space and time• hard to estimate; hard to learn; “temporal credit assignment”• maximizing utility unites both seeking reward and avoiding punishment

“model-based” learning

𝑠𝑡+1

𝑠𝑡+2

$0 $25

C D E F

• learn one-step reward & transition “map”; • iterative, tree-structured computation;• hippocampal “preplay”? (Mattar & Daw 2018)

(Pfeiffer and Foster, 2013)

“model-free” learning

𝑠𝑡+1

𝑠𝑡+2

shortcut: store endpoints of computation (long-run action values)• these can be learned directly, ”model free” (TD learning)• simplifies choice-time computation (just retrieve) – but may not

reflect all available information• standard theory of dopamine, reward prediction errors etc

(Schultz et al 1997)

model-based and model-free learning• these ideas propose to formalize rodent work on

goal-directed vs habitual disticntion in instrumental behavior (Daw et al. 2005)

• these are most often studied in reward domain (e.g. via reward devaluation)

• but to the extent known, largely paralleled in avoidance (LeDoux & Daw 2018)

• lots to do (e.g.: Cain; Cano this meeting)

learned decision making in humans

babili

0 100 200 3000

babili

“bandit” taskse.g. Daw et al 2006

Wittmann et al 2008

Gershman et al 2009

Schonberg et al 2010

Glascher et al 2010

Wimmer et al 2012

Seymour et al 2012

Kovach et al 2012

Madlon-Kay et al 2013

behavioral analysis: characterize the function relating outcomes to future choices (trial by trial learning model)

multinomial logistic regression: outcomes choices

decay form characteristic of error-driven learning

-1 -6 -1 -6 -1 -6-5

lag (trials)

reward shock choice

(Seymour et al. J Neuro 2012)

sequential decision task

with prob: 26% 57% 41% 28%

(all slowly changing)(Daw et al Neuron 2011)

extend experiment to probe map learning:

is choice guided by anticipated states or previous actions?

How does bottom-stage feedback affect top-stage choices?

Example: rare transition at top level, followed by win

• Which top-stage action is now favored?

predictions

model-free

ignores transition structuremodel-based

respects transition structure

model-free model-based

individual subs x 201 trials each

(Daw et al Neuron 2011)

17 subs x 201 trials each

reward: p<1e-8reward x rare: p<5e-5(mixed effects logit)

results reject pure reinforcement models suggest mixture of planning and

reinforcement processes

17 subs x 201 trials each

reward: p<1e-8reward x rare: p<5e-5(mixed effects logit)

MB and MF with shock

in prep w/ Neil Garrett, Marijn Kroes, Liz Phelps

shock: p<5e-7shock x rare: p<.01

(Otto et al Psych Science, 2013)

single task

dual task

dual x model-based: p< .05

log cortisol delta (Z score)

interference

(Otto et al PNAS, 2013)

stress

Also:Individual differences• Development (Decker ea, 2016)• Aging (Eppinger ea 2013)• IQ (Schad ea 2014; Gillan ea 2016)• cognitive control (Otto ea 2015)• Psychopathology (more later…)

Dopamine & PFC• PFC TMS (Smittenaar ea 2013)• COMT (PFC DA) genotype (Doll ea 2016)• Parkinson’s disease & DA meds (Sharp ea

2016; Wunderlich ea 2012)• dopamine PET (Desserno ea 2015)

what are the neural mechanisms underlying MB evaluation?

Is model-based learning really decision by simulation?

decodable stimuli

(Doll, Duncan, Simon, Shohamy & Daw Nature Neuroscience 2015)

RPE (ventral putamen)

behavior MB MF

(Doll, Duncan, Simon, Shohamy & Daw Nature Neuroscience 2015)

behavior MB MF

prospection (category selective ctx)

Signatures of two dissociable neural evaluation mechanisms

1. forward search2. error-driven updating

which have the expected relationships to choice behavior

Is model-based learning related to disorders of compulsion?

Binge eating disorder, n=30

Healthy volunteers, n=106

OCD, n=35Stimulant abusers, n=36

(Voon et al., Biological Psychiatry, 2014)

Methamphetamine/cocaineAbstinent at least 1 wk

3 questions

1) what to make of inflexible goal-pursuit (like anorexia nervosa)?

2) are decision making effects actually acute due to illness?

3) are patients MB for object of compulsion?

what causes MB/MF imbalance in AN?

idea: food restriction behaviors are like avoidance habits• 2-factor theory: avoidance habits

can only be reinforced if safety is reframed as goal

• suggestion: AN are particularly prone to such reframing

• preliminary evidence from Palminteri et al. (2015) reframing task

prelim w/ Karin Foerde, Daphna Shohamy, Joanna Steinglass

train:

probe:

objectively betterworse in training frame

objectively worsebetter in training frame

anxiety a puzzle and a model• anxiety disorders are characterized by persistent

and overgeneralized fear and avoidance• why should this be, given that avoidance is protective?

• in models, approach propagates opportunity and avoidance contains danger: due to assumption you will avoid in future

in prep w/ Sam Zorowitz

suggestion• in general, sequential evaluation requires

assumptions about future events• suggestion: a core dysfunction in anxiety is pessimistic

expectations about future choices

consequences

idea ties together many disparate aspects of anxiety• overgeneralization of avoidance

• control & self-efficacy

• transition to depression

• unbalanced approach-avoidconflict (eg BART)

prediction• attenuated (or reversed) free-choice bias (eg Leotti &

Delgado 2011)

• psychopathology may reflect dysfunction of underlying evaluation choice mechanisms

• compulsion & MB/MF imbalance

• anorexia and anxiety potentially reflecting more unique aspects of avoidance

stress and opportunity costWhy does stress favor habits?

How can we reason formally

about the range of effects of

the stress response?

• so far: transient, action-or stimulus-linked evaluations

• also: more global evaluations • stress, mood, schemas, tonic neuromodulators• average reward, controllability, priors

edlog cortisol delta (Z score)

(Otto et al PNAS, 2013)

opportunity cost of inaction

• deliberation can improve rewards (better choices)

• but takes time (delaying rewards, failing to avoid punishments)

• in appetitive circumstances, the opportunity cost of inaction is proportional to the average reward of the environment (Niv et al., 2007)deliberation should be modulated by long-run average

reward in the environment (Keramati et al., 2011)

also the average opportunity to avoid (Cools et al. 2011; Dayan 2012)

Same basic logic plays out across:

• Decisions• vigor (Niv et al 2007)• foraging (Charnov 1977)• speed-accuracy tradeoffs (Otto & Daw 2018)• time discounting (Kacelnik)

• Meta-decisions / control (Boureau et al. 2015)• deliberation (Keramati et al. 2011)• action chunking (Dezfouli & Balleine 2012)• cognitive effort, ego depletion (Kurzban; Shenhav)• thresholds for signal detection / DDMs (Gold & Shadlen 2003)• explore/exploit

… the average reward as a ubiquitous decision variable

Charnov (1976); Stephens & Krebs (1986)

• serially visit reward patches• choose to harvest or exit• harvesting earns diminishing rewards• exiting leads to a new patch (takes time; no going

principle of lost opportunity• balance between reward and opportunity cost of

harvesting• many problems can be expressed in this stay-

switch form

patch foraging

average reward per harvest(opportunity cost of foraging)ap

exit when

Charnov (1976) ; Stephens & Krebs (1986)

marginal value theorem

𝑛𝑒𝑥𝑡 𝑟𝑒𝑤𝑎𝑟𝑑 < 𝑎𝑣𝑒𝑟𝑎𝑔𝑒 𝑟𝑒𝑤𝑎𝑟𝑑

patch foraging in undergraduates

... ...

decisionwait 9s

decisionwait 3s

decision

Constantino & Daw 2015

predictions: travel time

sample subject

long (13.5s)

short (6s)

time (minutes)

travel time:

predictions: depletion rate

sample subjectex

time (minutes)

steep (.68)

shallow (.89)

depletion rate:

group dataex

steepshallowlong short

travel time depletion rate

n=11 n=11

chronic stress

(Lenow, Constantino, Daw & Phelps J Neurosci 2017)

(p < .01)

acute stress(p < .05)

(Lenow, Constantino, Daw & Phelps J Neurosci 2017)

stress and evaluation

• global decision variables like average reward might provide a more fundamental interpretation for a range of stress effects

• including effects on MB/MF tradeoff, via opportunity cost of time

• not fully worked out for aversive events and avoidance (probably most relevant to stress)

conclusions

how we compute decision variables influences what we choose

two strategies for computing decision variables underlying goal-habit conflict

• distinct neural and behavioral signatures• links to psychopathology eg compulsion

principles and mechanisms likely equally applicable to avoidance

• though much still to explore

asymmetries between approach and avoidance also important

• famous: two-factor theory• novel: different effects in sequential behavior, anxiety

Ida Momennejad (now Columbia)

Ross Otto (now McGill)

Claire Gillan (now TCD)

Brad Doll (now about.com)

Sara Constantino (now Princeton)

Dylan Rich

Marcelo Mattar

Collaborators:

Ken Norman

Liz Phelps

Sam Gershman

Daphna Shohamy

Valerie Voon

Jennifer Lenow

Joanna Steinglass

Funding:

NIMHNIDANINDSNSFMcDonnell FoundationTempleton Foundation

US DODGoogle DeepMind

Lindsay Hunter

Evan Russek

Oliver Vikbladh

Neil Garrett

Kevin Lloyd

Flora Bouchacourt

Sam Zorowitz

Peter Dayan

Yael Niv

Deborah Talmi

Ming Hsu

Mate Lengyel

Karin Foerde

action evaluation in appetitive and aversive learning...action evaluation in appetitive and aversive...

Documents

Aversive Racism and Medical Interactions

Drive Reinforcement Incentivepersonal.tcu.edu/papini/motivation/MOT2b Reinforcement course.pdf · Appetitive Aversive Hedonic value Response contingency Positive Negative Lever →Food

Coallocation of Appetitive and Aversive Memories in the ... · The amygdala plays a key role in representing memories for both fear and reward. However, it is currently not understood

Running head: PAVLOVIAN AVERSIVE LEARNING TO HAPPY … · (e.g., Sander, Grafman, & Zalla, 2003; Sander, Grandjean, & Scherer, 2005, 2018), and proposes that the occurrence of learning

Instructed knowledge shapes feedback- driven aversive ...ndaw/adldp16.pdf · and found that instructions induced dissociations in the neural systems of aversive learning. Responses

Prepared stimuli enhance aversive learning without

Aversive Gegenkonditionierung Ulrich Häßner 3.5.2007

Investigating the Predictive Value of Functional MRI to Appetitive … · 2018-01-16 · RESEARCH ARTICLE Investigating the Predictive Value of Functional MRI to Appetitive and Aversive

Aversive Learning and Discrimination: The Impact of Monetary … · 2020. 4. 30. · Aversive learning, or fear conditioning, pairs stimuli with negative outcomes at a given frequency

Cavell - Aversive Thinking

Cannabidiol attenuates the appetitive effects of 9

Do food cinemagraphs evoke stronger appetitive responses ...€¦ · may evoke stronger appetitive responses than their static counterparts (stills). This would make them a promising

Review The role of the striatum in aversive learning and ... The role of... · Review The role of the striatum in aversive learning and aversive prediction errors Mauricio R. Delgado1,

Aversive Control Negative Reinforcement Avoidance Learning Escape Learning

Emotion and Motivation I: Defensive and Appetitive ... · Emotion and Motivation I: Defensive and Appetitive Reactions in Picture Processing Margaret M. Bradley, Maurizio Codispoti,

Genetic Dissection of Aversive Associative Olfactory Learning and ... · RESEARCH ARTICLE Genetic Dissection of Aversive Associative Olfactory Learning and Memory in Drosophila Larvae

Course Outline - UNSW Psychology · Week Lecture topic/s Tutorial/lab topics Online activities Self-determined activities Week 1 03/06/2019 Neural circuits of appetitive and aversive

Motivation and Emotion - Benjamin N. Cardozo High Schoolcardozohigh.enschool.org/ourpages/auto/2019/10/2/...2019/10/02 · C appetitive and aversive motivation D intrinsic and extrinsic

Aversive Learning and Appetitive Motivation Toggle Feed ... › ... › neuron_2016_perisse.pdfNeuron Article Aversive Learning and Appetitive Motivation Toggle Feed-Forward Inhibition

Acquisition of a Free-Operant-Appetitive Response