44
action evaluation in appetitive and aversive learning nathaniel daw princeton university leuven, 2019

action evaluation in appetitive and aversive learning...action evaluation in appetitive and aversive learning nathaniel daw princeton university leuven, 2019 learning for decisions

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: action evaluation in appetitive and aversive learning...action evaluation in appetitive and aversive learning nathaniel daw princeton university leuven, 2019 learning for decisions

action evaluation in appetitive and aversive

learningnathaniel daw

princeton university

leuven, 2019

Page 2: action evaluation in appetitive and aversive learning...action evaluation in appetitive and aversive learning nathaniel daw princeton university leuven, 2019 learning for decisions

learning for decisions

• how you compute has important consequences for what you choose• eg: which of many possible outcomes you consider• better worked out in appetitive domain but likely extend to avoidance

algorithms for computing expected utility over candidate actions

1. habits vs deliberation: model-based vs. model-free RL

2. psychiatry: disorders involving compulsivity and avoidance

3. stress and opportunity cost

Page 3: action evaluation in appetitive and aversive learning...action evaluation in appetitive and aversive learning nathaniel daw princeton university leuven, 2019 learning for decisions

sequential decision tasks

𝑄 𝑠𝑡 , 𝑎𝑡 = 𝑟 𝑠𝑡 +

𝑠𝑡+1

𝑃 𝑠𝑡+1 𝑠𝑡 , 𝑎𝑡 𝑟 𝑠𝑡+1 +

𝑠𝑡+2

$0 $25

A B

C D E F

$10

Markov decision process: consequences of actions are delayed, contingent• connect actions to consequences over space and time• hard to estimate; hard to learn; “temporal credit assignment”• maximizing utility unites both seeking reward and avoiding punishment

Page 4: action evaluation in appetitive and aversive learning...action evaluation in appetitive and aversive learning nathaniel daw princeton university leuven, 2019 learning for decisions

“model-based” learning

𝑄 𝑠𝑡 , 𝑎𝑡 = 𝑟 𝑠𝑡 +

𝑠𝑡+1

𝑃 𝑠𝑡+1 𝑠𝑡 , 𝑎𝑡 𝑟 𝑠𝑡+1 +

𝑠𝑡+2

$0 $25

A B

C D E F

$10

• learn one-step reward & transition “map”; • iterative, tree-structured computation;• hippocampal “preplay”? (Mattar & Daw 2018)

(Pfeiffer and Foster, 2013)

Page 5: action evaluation in appetitive and aversive learning...action evaluation in appetitive and aversive learning nathaniel daw princeton university leuven, 2019 learning for decisions

“model-free” learning

$25

A B

$10

𝑄 𝑠𝑡 , 𝑎𝑡 = 𝑟 𝑠𝑡 +

𝑠𝑡+1

𝑃 𝑠𝑡+1 𝑠𝑡 , 𝑎𝑡 𝑟 𝑠𝑡+1 +

𝑠𝑡+2

shortcut: store endpoints of computation (long-run action values)• these can be learned directly, ”model free” (TD learning)• simplifies choice-time computation (just retrieve) – but may not

reflect all available information• standard theory of dopamine, reward prediction errors etc

(Schultz et al 1997)

Page 6: action evaluation in appetitive and aversive learning...action evaluation in appetitive and aversive learning nathaniel daw princeton university leuven, 2019 learning for decisions

model-based and model-free learning• these ideas propose to formalize rodent work on

goal-directed vs habitual disticntion in instrumental behavior (Daw et al. 2005)

• these are most often studied in reward domain (e.g. via reward devaluation)

• but to the extent known, largely paralleled in avoidance (LeDoux & Daw 2018)

• lots to do (e.g.: Cain; Cano this meeting)

Page 7: action evaluation in appetitive and aversive learning...action evaluation in appetitive and aversive learning nathaniel daw princeton university leuven, 2019 learning for decisions

learned decision making in humans

+

0

0.25

0.5

pro

babili

ty

0 100 200 3000

0.25

0.5

trial

pro

babili

ty

“bandit” taskse.g. Daw et al 2006

Wittmann et al 2008

Gershman et al 2009

Schonberg et al 2010

Glascher et al 2010

Wimmer et al 2012

Seymour et al 2012

Kovach et al 2012

Madlon-Kay et al 2013

Page 8: action evaluation in appetitive and aversive learning...action evaluation in appetitive and aversive learning nathaniel daw princeton university leuven, 2019 learning for decisions

behavioral analysis: characterize the function relating outcomes to future choices (trial by trial learning model)

multinomial logistic regression: outcomes choices

decay form characteristic of error-driven learning

-1 -6 -1 -6 -1 -6-5

0

5

lag (trials)

<-

avo

id

- ch

oo

se

->

reward shock choice

(Seymour et al. J Neuro 2012)

Page 9: action evaluation in appetitive and aversive learning...action evaluation in appetitive and aversive learning nathaniel daw princeton university leuven, 2019 learning for decisions

sequential decision task

with prob: 26% 57% 41% 28%

(all slowly changing)(Daw et al Neuron 2011)

extend experiment to probe map learning:

is choice guided by anticipated states or previous actions?

Page 10: action evaluation in appetitive and aversive learning...action evaluation in appetitive and aversive learning nathaniel daw princeton university leuven, 2019 learning for decisions

idea

30%

How does bottom-stage feedback affect top-stage choices?

Example: rare transition at top level, followed by win

• Which top-stage action is now favored?

Page 11: action evaluation in appetitive and aversive learning...action evaluation in appetitive and aversive learning nathaniel daw princeton university leuven, 2019 learning for decisions

predictions

model-free

ignores transition structuremodel-based

respects transition structure

Page 12: action evaluation in appetitive and aversive learning...action evaluation in appetitive and aversive learning nathaniel daw princeton university leuven, 2019 learning for decisions

data

model-free model-based

individual subs x 201 trials each

(Daw et al Neuron 2011)

Page 13: action evaluation in appetitive and aversive learning...action evaluation in appetitive and aversive learning nathaniel daw princeton university leuven, 2019 learning for decisions

data

model-free model-based

17 subs x 201 trials each

(Daw et al Neuron 2011)

reward: p<1e-8reward x rare: p<5e-5(mixed effects logit)

results reject pure reinforcement models suggest mixture of planning and

reinforcement processes

Page 14: action evaluation in appetitive and aversive learning...action evaluation in appetitive and aversive learning nathaniel daw princeton university leuven, 2019 learning for decisions

data

model-free model-based

17 subs x 201 trials each

reward: p<1e-8reward x rare: p<5e-5(mixed effects logit)

(Daw et al Neuron 2011)

Page 15: action evaluation in appetitive and aversive learning...action evaluation in appetitive and aversive learning nathaniel daw princeton university leuven, 2019 learning for decisions

MB and MF with shock

L R

or or

in prep w/ Neil Garrett, Marijn Kroes, Liz Phelps

shock: p<5e-7shock x rare: p<.01

Page 16: action evaluation in appetitive and aversive learning...action evaluation in appetitive and aversive learning nathaniel daw princeton university leuven, 2019 learning for decisions

(Otto et al Psych Science, 2013)

single task

dual task

dual x model-based: p< .05

Mo

del

bas

ed

log cortisol delta (Z score)

interference

(Otto et al PNAS, 2013)

stress

Also:Individual differences• Development (Decker ea, 2016)• Aging (Eppinger ea 2013)• IQ (Schad ea 2014; Gillan ea 2016)• cognitive control (Otto ea 2015)• Psychopathology (more later…)

Dopamine & PFC• PFC TMS (Smittenaar ea 2013)• COMT (PFC DA) genotype (Doll ea 2016)• Parkinson’s disease & DA meds (Sharp ea

2016; Wunderlich ea 2012)• dopamine PET (Desserno ea 2015)

Page 17: action evaluation in appetitive and aversive learning...action evaluation in appetitive and aversive learning nathaniel daw princeton university leuven, 2019 learning for decisions

what are the neural mechanisms underlying MB evaluation?

Is model-based learning really decision by simulation?

Page 18: action evaluation in appetitive and aversive learning...action evaluation in appetitive and aversive learning nathaniel daw princeton university leuven, 2019 learning for decisions

decodable stimuli

(Doll, Duncan, Simon, Shohamy & Daw Nature Neuroscience 2015)

Page 19: action evaluation in appetitive and aversive learning...action evaluation in appetitive and aversive learning nathaniel daw princeton university leuven, 2019 learning for decisions

pu

tam

en p

red

icti

on

err

or

RPE (ventral putamen)

behavior MB MF

P<.01

(Doll, Duncan, Simon, Shohamy & Daw Nature Neuroscience 2015)

pro

spec

tive

act

ivat

ion

behavior MB MF

prospection (category selective ctx)

P=.02

Page 20: action evaluation in appetitive and aversive learning...action evaluation in appetitive and aversive learning nathaniel daw princeton university leuven, 2019 learning for decisions

Signatures of two dissociable neural evaluation mechanisms

1. forward search2. error-driven updating

which have the expected relationships to choice behavior

Page 21: action evaluation in appetitive and aversive learning...action evaluation in appetitive and aversive learning nathaniel daw princeton university leuven, 2019 learning for decisions

Is model-based learning related to disorders of compulsion?

Page 22: action evaluation in appetitive and aversive learning...action evaluation in appetitive and aversive learning nathaniel daw princeton university leuven, 2019 learning for decisions

Binge eating disorder, n=30

Healthy volunteers, n=106

OCD, n=35Stimulant abusers, n=36

(Voon et al., Biological Psychiatry, 2014)

Methamphetamine/cocaineAbstinent at least 1 wk

Page 23: action evaluation in appetitive and aversive learning...action evaluation in appetitive and aversive learning nathaniel daw princeton university leuven, 2019 learning for decisions

3 questions

1) what to make of inflexible goal-pursuit (like anorexia nervosa)?

2) are decision making effects actually acute due to illness?

3) are patients MB for object of compulsion?

Page 24: action evaluation in appetitive and aversive learning...action evaluation in appetitive and aversive learning nathaniel daw princeton university leuven, 2019 learning for decisions

what causes MB/MF imbalance in AN?

idea: food restriction behaviors are like avoidance habits• 2-factor theory: avoidance habits

can only be reinforced if safety is reframed as goal

• suggestion: AN are particularly prone to such reframing

• preliminary evidence from Palminteri et al. (2015) reframing task

prelim w/ Karin Foerde, Daphna Shohamy, Joanna Steinglass

train:

probe:

objectively betterworse in training frame

objectively worsebetter in training frame

Page 25: action evaluation in appetitive and aversive learning...action evaluation in appetitive and aversive learning nathaniel daw princeton university leuven, 2019 learning for decisions

anxiety a puzzle and a model• anxiety disorders are characterized by persistent

and overgeneralized fear and avoidance• why should this be, given that avoidance is protective?

• in models, approach propagates opportunity and avoidance contains danger: due to assumption you will avoid in future

in prep w/ Sam Zorowitz

Page 26: action evaluation in appetitive and aversive learning...action evaluation in appetitive and aversive learning nathaniel daw princeton university leuven, 2019 learning for decisions

suggestion• in general, sequential evaluation requires

assumptions about future events• suggestion: a core dysfunction in anxiety is pessimistic

expectations about future choices

in prep w/ Sam Zorowitz

Page 27: action evaluation in appetitive and aversive learning...action evaluation in appetitive and aversive learning nathaniel daw princeton university leuven, 2019 learning for decisions

consequences

idea ties together many disparate aspects of anxiety• overgeneralization of avoidance

• control & self-efficacy

• transition to depression

• unbalanced approach-avoidconflict (eg BART)

prediction• attenuated (or reversed) free-choice bias (eg Leotti &

Delgado 2011)

in prep w/ Sam Zorowitz

Page 28: action evaluation in appetitive and aversive learning...action evaluation in appetitive and aversive learning nathaniel daw princeton university leuven, 2019 learning for decisions

recap

• psychopathology may reflect dysfunction of underlying evaluation choice mechanisms

• compulsion & MB/MF imbalance

• anorexia and anxiety potentially reflecting more unique aspects of avoidance

Page 29: action evaluation in appetitive and aversive learning...action evaluation in appetitive and aversive learning nathaniel daw princeton university leuven, 2019 learning for decisions

stress and opportunity costWhy does stress favor habits?

How can we reason formally

about the range of effects of

the stress response?

• so far: transient, action-or stimulus-linked evaluations

• also: more global evaluations • stress, mood, schemas, tonic neuromodulators• average reward, controllability, priors

Mo

del

bas

edlog cortisol delta (Z score)

(Otto et al PNAS, 2013)

Page 30: action evaluation in appetitive and aversive learning...action evaluation in appetitive and aversive learning nathaniel daw princeton university leuven, 2019 learning for decisions

opportunity cost of inaction

• deliberation can improve rewards (better choices)

• but takes time (delaying rewards, failing to avoid punishments)

• in appetitive circumstances, the opportunity cost of inaction is proportional to the average reward of the environment (Niv et al., 2007)deliberation should be modulated by long-run average

reward in the environment (Keramati et al., 2011)

also the average opportunity to avoid (Cools et al. 2011; Dayan 2012)

Page 31: action evaluation in appetitive and aversive learning...action evaluation in appetitive and aversive learning nathaniel daw princeton university leuven, 2019 learning for decisions

Same basic logic plays out across:

• Decisions• vigor (Niv et al 2007)• foraging (Charnov 1977)• speed-accuracy tradeoffs (Otto & Daw 2018)• time discounting (Kacelnik)

• Meta-decisions / control (Boureau et al. 2015)• deliberation (Keramati et al. 2011)• action chunking (Dezfouli & Balleine 2012)• cognitive effort, ego depletion (Kurzban; Shenhav)• thresholds for signal detection / DDMs (Gold & Shadlen 2003)• explore/exploit

… the average reward as a ubiquitous decision variable

Page 32: action evaluation in appetitive and aversive learning...action evaluation in appetitive and aversive learning nathaniel daw princeton university leuven, 2019 learning for decisions

Charnov (1976); Stephens & Krebs (1986)

• serially visit reward patches• choose to harvest or exit• harvesting earns diminishing rewards• exiting leads to a new patch (takes time; no going

back)

principle of lost opportunity• balance between reward and opportunity cost of

harvesting• many problems can be expressed in this stay-

switch form

patch foraging

Page 33: action evaluation in appetitive and aversive learning...action evaluation in appetitive and aversive learning nathaniel daw princeton university leuven, 2019 learning for decisions

average reward per harvest(opportunity cost of foraging)ap

ple

s p

er h

arve

st

time

exit when

Charnov (1976) ; Stephens & Krebs (1986)

marginal value theorem

𝑛𝑒𝑥𝑡 𝑟𝑒𝑤𝑎𝑟𝑑 < 𝑎𝑣𝑒𝑟𝑎𝑔𝑒 𝑟𝑒𝑤𝑎𝑟𝑑

Page 34: action evaluation in appetitive and aversive learning...action evaluation in appetitive and aversive learning nathaniel daw princeton university leuven, 2019 learning for decisions

patch foraging in undergraduates

... ...

decisionwait 9s

...

decisionwait 3s

decisionwait 3s

decision

...

stay

exit

stay

Constantino & Daw 2015

Page 35: action evaluation in appetitive and aversive learning...action evaluation in appetitive and aversive learning nathaniel daw princeton university leuven, 2019 learning for decisions

app

les

per

har

vest

time

predictions: travel time

app

les

per

har

vest

time

Page 36: action evaluation in appetitive and aversive learning...action evaluation in appetitive and aversive learning nathaniel daw princeton university leuven, 2019 learning for decisions

sample subject

long (13.5s)

short (6s)

exit

th

resh

old

(m

ean

last

rew

ard

)

time (minutes)

travel time:

Constantino & Daw 2015

Page 37: action evaluation in appetitive and aversive learning...action evaluation in appetitive and aversive learning nathaniel daw princeton university leuven, 2019 learning for decisions

app

les

per

har

vest

time

predictions: depletion rate

app

les

per

har

vest

time

Page 38: action evaluation in appetitive and aversive learning...action evaluation in appetitive and aversive learning nathaniel daw princeton university leuven, 2019 learning for decisions

sample subjectex

it t

hre

sho

ld (

mea

n la

st r

ewar

d)

time (minutes)

steep (.68)

shallow (.89)

depletion rate:

Constantino & Daw 2015

Page 39: action evaluation in appetitive and aversive learning...action evaluation in appetitive and aversive learning nathaniel daw princeton university leuven, 2019 learning for decisions

group dataex

it t

hre

sho

ld (

mea

n la

st r

ewar

d)

steepshallowlong short

travel time depletion rate

n=11 n=11

Constantino & Daw 2015

Page 40: action evaluation in appetitive and aversive learning...action evaluation in appetitive and aversive learning nathaniel daw princeton university leuven, 2019 learning for decisions

chronic stress

(Lenow, Constantino, Daw & Phelps J Neurosci 2017)

(p < .01)

ove

rhar

vest

ing

un

der

har

vest

ing

Page 41: action evaluation in appetitive and aversive learning...action evaluation in appetitive and aversive learning nathaniel daw princeton university leuven, 2019 learning for decisions

acute stress(p < .05)

ove

rhar

vest

ing

un

der

har

vest

ing

(Lenow, Constantino, Daw & Phelps J Neurosci 2017)

Page 42: action evaluation in appetitive and aversive learning...action evaluation in appetitive and aversive learning nathaniel daw princeton university leuven, 2019 learning for decisions

stress and evaluation

• global decision variables like average reward might provide a more fundamental interpretation for a range of stress effects

• including effects on MB/MF tradeoff, via opportunity cost of time

• not fully worked out for aversive events and avoidance (probably most relevant to stress)

Page 43: action evaluation in appetitive and aversive learning...action evaluation in appetitive and aversive learning nathaniel daw princeton university leuven, 2019 learning for decisions

conclusions

how we compute decision variables influences what we choose

two strategies for computing decision variables underlying goal-habit conflict

• distinct neural and behavioral signatures• links to psychopathology eg compulsion

principles and mechanisms likely equally applicable to avoidance

• though much still to explore

asymmetries between approach and avoidance also important

• famous: two-factor theory• novel: different effects in sequential behavior, anxiety

Page 44: action evaluation in appetitive and aversive learning...action evaluation in appetitive and aversive learning nathaniel daw princeton university leuven, 2019 learning for decisions

Lab:

Ida Momennejad (now Columbia)

Ross Otto (now McGill)

Claire Gillan (now TCD)

Brad Doll (now about.com)

Sara Constantino (now Princeton)

Dylan Rich

Marcelo Mattar

Collaborators:

Ken Norman

Liz Phelps

Sam Gershman

Daphna Shohamy

Valerie Voon

Jennifer Lenow

Joanna Steinglass

Funding:

NIMHNIDANINDSNSFMcDonnell FoundationTempleton Foundation

US DODGoogle DeepMind

Lindsay Hunter

Evan Russek

Oliver Vikbladh

Neil Garrett

Kevin Lloyd

Flora Bouchacourt

Sam Zorowitz

Peter Dayan

Yael Niv

Deborah Talmi

Ming Hsu

Mate Lengyel

Karin Foerde