Human Reward / Stimulus/ Response Signal Experiment: Data and Analysis

Human Reward / Stimulus/ Response Signal Experiment: Data and Analysis

Draws on:

Alan and Bill’s experimentUsher & McClelland model and experiments

Patrick Simen’s modelSam and Phil’s analysisJuan’s further analysis

Human experiment examining reward bias effect with responsesignal given at different times after target onset

• Target stimuli are rectangles shifted 1,3, or 5 pixels L or R of fixation

• Reward cue occurs 750 msec before stimulus.

– Small arrow head pointing L or R visible for 250 msec. – Only biased reward conditions (2 vs 1 and 1 vs 2) are used.

• Response signal occurs at different times after target onset:

0 75 150 225 300 450 600 900 1200 2000

- Participant receives reward only if response is correct and occurs within 250 msec of response signal.

- Participants were run for 15-25 sessions to provide stable data.

- Data shown are from later sessions in which effects were all stable.

A participant with very little reward bias

• Top panel shows probability of response giving larger reward as a function of actual response time for combinations of:

Stimulus shift (1 3 5) pixels

Reward-stimulus compatibility

• Lower panel shows data transformed to z scores, and corresponds to the theoretical construct:

mean(x1(t)-x2(t))+bias(t)

sd(x1(t)-x2(t))

where x1 represents the state of the

accumulator associated with greater

reward, x2 the same for lesser reward,

and S is thought to choose larger reward if

x1(t)-x2(t)+bias(t) > 0.

Participants Showing Reward Bias

Analysis Assumptions

• Decision variable x varies as a function of t.• Choice is made at some time t = signal lag + rt.• At the time the choice is made:

– For a single difficulty level, two distributions, with means +, -, and equal sd set to 1. Choose high reward if decision variable x > -Xc

– For three difficulty levels, fixed = 1, means i (i=1,2,3),assume same Xc for all difficulty levels.

– Xc can be regarded as a positive increment to the state of the decision variable;high reward is chosen if x > 0 in this case.

-10 -8 -6 -4 -2 0 2 4 6 8 100

0.1

0.2

0.3

0.4

0.5

0.6

- +-xc

C

C

X

X

LHPinvNormZ

HHPinvNormZ

))|((

))|((2

1

2

2

21

21

ZZX

ZZ

c

Only one diff level

iC

iC

Xi

Xi

LHPinvNormZ

HHPinvNormZ

))|((

))|((2

1

3*2

2

21

21

iii

c

iii

ZZX

ZZ

Three diff levels

Subject’s sensitivity, as defined in theory of signal detectability

)(' ii

id When response

signal delay varies)(' tdi

For each subject, fit with function from UM’01

asymi

fiti detd

tt

)1()()0(

Subject Sensitivity

0 0.5 1 1.5 2 2.5-0.5

0

0.5

1

1.5

2

2.5cm

d pr

im

RT+response cue delay

0 0.5 1 1.5 2 2.5-1

0

1

2

3

4ja

d pr

im


0 0.5 1 1.5 2 2.5-0.5

0

0.5

1

1.5

2sl

d pr

im


data, diff=5data, diff=3data, diff=1fit, diff=5fit, diff=3fit, diff=1

1 2 3 4 50.26

0.28

0.3

0.32

0.34

0.36

stimulus (diff) level

RT

0

1 2 3 4 50.2

0.25

0.3

0.35

0.4

0.45

0.5


0 1 2 3 4 50

1

2

3

4


das

ym

cm

jasl

cm

jasl

cm

jasl

Optimal “bias” Xc/based on observedsensitivity data.

Observed “bias”, treatedas positive offsetfavoring response associated with highreward.

3*2

21

i

iic

ZZX

-10 -8 -6 -4 -2 0 2 4 6 8 100

0.5

1

1.5

-Xc/

0 0.5 1 1.5 2 2.50

0.5

1

1.5

2cm


norm

aliz

ed t

hres

hold

xc/

real

optimal

0 0.5 1 1.5 2 2.50

0.5

1

1.5

2ja


norm

aliz

ed t

hres

hold

xc/

real

optimal

0 0.5 1 1.5 2 2.5-0.5

0

0.5

1

1.5

2sl


norm

aliz

ed t

hres

hold

xc/

real

optimal

Some possible models

• OU process ( < 0, 0 = 0) following F&H,with reward bias effect implemented as:

1. An alteration in initial condition, subject to decay 2. Optimal time-varying decision boundary outside of the OU

process3. An input ‘current’ starting at presentation of reward signal

1. Noise from reward onset2. Noise from stimulus onset

4. A constant offset or criterion shift unaffected by time

1. Reward as a change in initial condition, subject to decay

Note:1. Effect of the bias

decays away for lambda<0.

2. There is a dip at

3. At t=0, p=1.

aCaCt 0log1

0 0.5 1 1.5 2 2.50

0.2

0.4

0.6

0.8

1

Time (s)

P o

f ch

oice

tow

ard

larg

er r

ewar

d

RSC 1, diff 5RSC 0, diff 5RSC 1, diff 3RSC 0, diff 3RSC 1, diff 1RSC 0, diff 1

Feng & Holmes notes

)1()();1(),( 220

2 ttaCt etveetC

2. Time-varying optimal bias (Outside of OU process)


persists.2. There is a dip at

3. At t=0, p=1.4. The smaller the

stimulus effect, the larger the bias.

5. The harder the stimulus condition, the later the dip.

2log4

2log4122

22

log

Ca

Cat

)1()( 42log taC etb

)1()();1()(),( 22

2 ttaC etvetbtC

0 0.5 1 1.5 2 2.50

0.2

0.4

0.6

0.8

1

Time (s)

P o

f ch

oice

tow

ard

larg

er r

ewar

d


3.1. Reward acts as input “current”, stays on from reward signal to end of trial, noise starts at reward onset

Reward signal comes seconds before stimulus

Note:1. Effect of the

bias persists2. There is no

dip.3. At t=0, p<1.

Feng & Holmes notes

0 0.5 1 1.5 2 2.50

0.2

0.4

0.6

0.8

1

Time (s)

P o

f ch

oice

tow

ard

larg

er r

ewar

d


2

3.2. Same as 3.1 but variability is introduced only at stimulus onset


persists2. There is dip at

3. At t=0, p=1 since all accumulators have no variance.

baCbeaCt

log1

0 0.5 1 1.5 2 2.50

0.2

0.4

0.6

0.8

1

Time (s)

P o

f ch

oice

tow

ard

larg

er r

ewar

d


2

4. Reward as a constant offset

Note:1. Equivalent to 3.2

for large

2. There is a dip at

3. At t=0, p=1

0log1

aCaCt

0 0.5 1 1.5 2 2.50

0.2

0.4

0.6

0.8

1

Time (s)

P o

f ch

oice

tow

ard

larg

er r

ewar

d


)1()();1(),( 220

2 ttaC etvetC

Some possible models

• OU models ( < 0, 0 = 0) following F&H,with reward bias effect implemented as:

1. An alteration in initial condition, subject to decay 2. Optimal time-varying decision boundary outside of the OU

process3. An input ‘current’ starting at presentation of reward signal

1. Noise from reward onset2. Noise from stimulus onset

4. A constant offset or criterion shift unaffected by time

• While none fit perfectly, starting point variability (0 > 0) would potentially improve 3.2 and 4.

Jay’s favorite mechanistic story(draws from Simen’s model)

• Participant learns to inject waves of activation that prime response accumulators; waves peak just after stimulus onset and have a residual.– Wave is higher for hi rwd response.

• Stimulus activation accumulates as in LCAM. • Response signal initiates added drive to both

accumulators equally.• First accumulator to fixed threshold initiates the

response.

Documents

Human Reward / Stimulus/ Response Signal Experiment: Data and Analysis