Social Reinforcement Learning and its Neural Modulation by Oxytocin …wgas-autismus.org/wp-content/uploads/2015/05/JanaKruppa_Poster8... · Social Reinforcement Learning and its

Social Reinforcement Learning and its Neural Modulation by Oxytocin in Autism Spectrum Disorder

Kruppa, JA1,2,3, Gossen, A1,2,3, Großheinrich, N1,3, Schopf, H2, Kohls, G1, Fink, GR2, Herpertz-Dahlmann, B4,

Konrad, K1,2, Schulte-Rüther, M1,2,3

1 Child Neuropsychology Section, Department of Child and Adolescent Psychiatry, Psychosomatics, and Psychotherapy, University Hospital Aachen, Germany 2 Institute of Neuroscience and Medicine (INM-3), Jülich Research Center, Germany 3 Translational Brain Research in Psychiatry and Neurology, Department of Child and Adolescent Psychiatry, Psychosomatics, and Psychotherapy, University Hospital Aachen, Germany 4 Child and Adolescent Psychiatry, Psychosomatics, and Psychotherapy, University Hospital Aachen, Germany

Introduction BACKGROUND

OBJECTIVES

PARTICIPANTS

METHODS

BEHAVIORAL RESULTS (PRELIMINARY)

CONCLUSION/DISCUSSION

References

Contact: [email protected]

A modi'ied social reinforcement learning task5 was used, in which participants selected whether stimuli belong to category A or B. Upon correct choice, they were given feedback with a probability of 75% to be rewarding and 25% to be neutral. Upon incorrect choice they were given feedback with a probability of 75% to be neutral and 25% to be rewarding. Conditions: (1) NN -‐ neutral cue & neutral feedback, (2) NS -‐ neutral cue & social feedback, (3) SN – social cue & neutral feedback

Reward Learning Model (Q-‐learning)6: On each trial t, subjects make a choice ct (button A or B). An expected outcome value is de'ined for each cue-‐response combination Qt(A) and Qt(B), which are initialized with “0” for all possible cue-‐response combinations. On each trial, the Q-‐values are updated as follows: Qt+1(ct) = Qt (ct)+α∙δt , where 0≤α≤1 is a free learning rate parameter and δt = rt-‐Qt (ct) is the prediction error. rt re'lects the value of the obtained reward for trial t. Assuming that subjects choose probablistically between A and B based on the acquired Q-‐values according to a Softmax-‐distribution, the individual learning parameter can be estimated individually for each subject from the behavioral choice data using maximum likelihood estimation. In addition to alpha, the initial Q-‐values, reward values for positive reinforcement and for neutral feedback, could be estimated from the data, but were kept constant here (Qinit=0, rpos=1, rneut=0).

Group N Mean Age Age Range IQ

TDC 22 22.00 18 -‐ 25 120.64

ASD 11 21.27 18-‐ 25 113.27

Currently, no pharmacological treatment of the core social symptoms of autism spectrum disorder (ASD) is available. Treatments of choice are behavioral interventions, mostly based on operant reinforcement learning1,2. However, training is very time-‐consuming, usually not covered by the health insurance system and treatment effects are only modest. Recently, the neuropeptide oxytocin (OXT) has been shown to enhance motivation and attention to social stimuli, by modulating the brain reward circuitry in social situations3. Likely, these effects have the potential to enhance social reinforcement learning, the core mechanism of behavioral interventions. The addition of OXT to behavioral interventions may prove a fruitful combination for the treatment of the social symptoms of ASD4. No functional imaging studies are available that explicitly investigated the in'luence of OXT on socially reinforced learning.

1Virues-‐Ortega J (2010). Applied behavior analytic intervention for autism in early childhood: meta-‐analysis, meta-‐regression and dose-‐response meta-‐analysis of multiple outcomes. Clinical psychology review, 30(4): 387-‐68. 2Williams White S, Keonig K, & Scahill L (2007). Social skills development in children with autism spectrum disorders: a review of the intervention research. Journal of autism and developmental disorders, 37(10): 1858-‐68. 3Gamer M (2010). Does the amygdala mediate oxytocin effects pn socially reinforced learning? The Journal of Neuroscience: the ofIicial journal of the Society for Neuroscience, 30(28): 9347-‐8. 4Bartz JA, Zaki J, Bolger N, & Ochsner KN (2011). Social effects of oxytocin in humans: context and person matter. Trends in cognitive neurosciences, 15(7): 201-‐309. 5Hurlemann R, Patin A, Onur O, et al. (2010). Oxytocin enhances amygdala-‐dependent, socially reinforced learning and emotional empathy in humans. The Journal of Neuroscience: the of'icial journal of the Society for Neuroscience, 30(14): 4999-‐5007. 6Daw, N (2011). Trial-‐by-‐trial data analysis using computational models. In Delgado MR, Phelps EA, & Robbins TW (Eds.), Decision making, affect, and learning: Attention and Performance XXIII (pp.3-‐38). Oxford, UK: Oxford University Press. 7Haruno M & Kawato M (2006). Heterarchical reinforcement-‐learning model for integration of multiple cortico-‐striatal loops: fMRI examination in stimulus-‐action-‐reward association learning. Neural Netw. 19(8): 1242-‐54.

² What are the behavioral and neural correlates of OXT induced enhancement of socially reinforced learning in TDC and ASD? •  Difference between percentage correct responses at the beginning and end of the task (improvement in performance)

•  OXT-‐induced enhancement of socially reinforced learning re'lected by brain activation in the ventral striatum (VS)

p2

c. Improvement in Performance

a. TDC Group – Placebo

Behavioral level: ²  TDC subjects perform better than ASD subjects across tasks and OXT/PLC condition ²  The ASD group bene'its from OXT in the SN task: participants perform better under OXT than PLC, and achieve a comparable learning level to the TDC group

•  Attention to social stimuli seems to be enhanced in ASD under OXT3 ²  All subjects show an improvement in performance during the course of the task

•  In the SN task, ASD subjects perform worse than TDC subjects during early stages of learning particularly under PLC, but show comparable performance under OXT

Neural level: ²  Preliminary analyses of the healthy control group showed a correlation of brain activity in the ventral striatum with the reward prediction error in both the PLC and OXT condition, but no difference between both conditions

²  Estimation of model parameters may need to be optimized to provide higher sensitivity (e.g., simultaneous estimation of further parameters)

² More extensive analyses including data of the ASD group will follow

190$

200$

210$

220$

230$

240$

250$

260$

NN$ NS$ SN$

AUC$

ASD_PLC$

ASD_OXT$

40#

50#

60#

70#

80#

90#

100#

110#

1# 2# 3# 4# 5# 6# 7# 8#

percen

tage#co

rrect#h

its#(%

)#

Learning#interval#

TDC_PLC_SN#

TDC_OXT_SN#

ASD_OXT_SN#

ASD_PLC_SN#

190$

200$

210$

220$

230$

240$

250$

260$

NN$ NS$ SN$

AUC$

TDC_PLC$

TDC_OXT$

ASD_PLC$

ASD_OXT$

IMAGING RESULTS (PRELIMINARY)

Each participant took part in two measurements on two consecutive days. A double-‐blind within-‐subjects cross-‐over design was used to detect OXT induced differences in comparison to the PLC condition. Each measurement consisted of (1) nasal administration of OXT/PLC, (2) fMRI scan, and (3) neuropsychological assessments. Blood samples were drawn twice at predetermined time points.

Overview of the study protocol. B1-B2: blood draws at predetrmined time points. Dashed line illustrates the accumulation of OXT concentration within the brain.

a.  Main effect of group: F(1) = 5.25, p = 0.03, ηp2 = 0.15

b.  Condition*task interaction: F(2,9) =3.73,

p = .07, ηp2 = 0.45 (marginal signiIicant) c.  Main effect of Interval: F(7,25) = 26.33,

p= .000, ηp2 = 0.88 Main effect of group: F(1) = 3.23, p = 0.08, ηp2 = 0.09 (marginal signiIicant)

b. TDC Group – Oxytocin

a. ASD vs. TDC b. ASD Group

Data analysis: For each participant, the learning parameter was estimated from the behavioral data. The resulting individual learning models were used to calculate trial-‐wise reward-‐prediction error values. Neural data was preprocessed and statistically analyzed with SPM8. For 'irst-‐level analyses, cue and response events were modeled separately using stick functions, convolved with the haemodynamic response function. Response events were parametrically modulated by the trial-‐wise individual reward-‐prediction error values. Beta values representing this modulation were taken to the second level, with all conditions modeled separately in a 'lexible factorial ANOVA model (random-‐effects analysis). A ROI of the ventral striatum was de'ined functionally, using a 10 mm sphere around coordinates adopted from Haruno & Kawato (2006)7.

Signi'icant correlation of brain activation in the ventral striatum with the reward prediction error signal from individual q-‐learning models (p<.001 unc.)

Documents

Social Reinforcement Learning and its Neural Modulation by Oxytocin …wgas-autismus.org/wp-content/uploads/2015/05/JanaKruppa_Poster8... · Social Reinforcement Learning and its