50
Application of Supervised Learning to Neuroimaging Janaina Mourao-Miranda

Supervised Learning applied to Neuroimaging2 › staff › J.Shawe-Taylor › courses › J2.pdf · 2009-12-14 · Supervised Learning Input: X1 X2 X3 Output y1 y2 y3 Learning/Training

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Supervised Learning applied to Neuroimaging2 › staff › J.Shawe-Taylor › courses › J2.pdf · 2009-12-14 · Supervised Learning Input: X1 X2 X3 Output y1 y2 y3 Learning/Training

Application of Supervised

Learning to NeuroimagingJanaina Mourao-Miranda

Page 2: Supervised Learning applied to Neuroimaging2 › staff › J.Shawe-Taylor › courses › J2.pdf · 2009-12-14 · Supervised Learning Input: X1 X2 X3 Output y1 y2 y3 Learning/Training

Supervised Learning

Input:

X1

X2

X3

Output

y1

y2

y3

Learning/Training

Generate a function or hypothesis fsuch that

Training Examples:

(X1, y1), (X2, y2), . . .,(Xn, yn)

Test

Prediction

Test Example

Xi

f(xi) -> yi

f(Xi) = yi

f

Learning

Methodology

Automatic procedures that learn a task from a series of examples

No mathematical

model available

Page 3: Supervised Learning applied to Neuroimaging2 › staff › J.Shawe-Taylor › courses › J2.pdf · 2009-12-14 · Supervised Learning Input: X1 X2 X3 Output y1 y2 y3 Learning/Training

Machine Leaning Methods

• Artificial Neural Networks

• Decision Trees

• Bayesian Networks

• Gaussian Process

• Support Vector Machines

• ..

• SVM is a classifier derived from statistical learning

theory by Vapnik and Chervonenkis

• SVMs introduced by Boser, Guyon, Vapnik in

COLT-92

• Powerful tool for statistical pattern recognition

Page 4: Supervised Learning applied to Neuroimaging2 › staff › J.Shawe-Taylor › courses › J2.pdf · 2009-12-14 · Supervised Learning Input: X1 X2 X3 Output y1 y2 y3 Learning/Training

Advantages of pattern recognition analysis in Neuroimaging

Explore the multivariate nature of neuroimaging data

•MRI/fMRI data are multivariate by nature since each scan contains

information about brain activity at thousands of measured locations (voxels).

•Considering that most of the brain function are distributed process involving a

network of brain regions it seems advantageous to use the spatially distributed

information contained in the data to give a better understanding of brain

functions.

•Can yield greater sensitivity than conventional analysis.

Can be used to make predictions for new examples

•Enable clinical applications: previously acquired data can be used to make

diagnostic or prognostic for a new subjects.

Page 5: Supervised Learning applied to Neuroimaging2 › staff › J.Shawe-Taylor › courses › J2.pdf · 2009-12-14 · Supervised Learning Input: X1 X2 X3 Output y1 y2 y3 Learning/Training

e.g. GLM

Input Output

Map: Activated regionstask 1 vs. task 2

Classical approach: Mass-univariate Analysis

SVM - training

Input Output

Volumes from task 1

Volumes from task 2

… Map: Discriminating regions between task 1 and task 2

Pattern recognition approach: Multivariate Analysis

SVM - test Prediction: task 1 or task 2

Time

Intensity

BOLD signal

1. Voxel time series2. Experimental Design

New example

fMRI Data Analysis

Page 6: Supervised Learning applied to Neuroimaging2 › staff › J.Shawe-Taylor › courses › J2.pdf · 2009-12-14 · Supervised Learning Input: X1 X2 X3 Output y1 y2 y3 Learning/Training

Each fMRI volume is treated as a vector in a extremely high dimensional space

(~200,000 voxels or dimensions after the mask)

fMRI data as input to a classifier

Vector representing the pattern of brain activation

[2 8 4 2 5 4 8 4 8]

Page 7: Supervised Learning applied to Neuroimaging2 › staff › J.Shawe-Taylor › courses › J2.pdf · 2009-12-14 · Supervised Learning Input: X1 X2 X3 Output y1 y2 y3 Learning/Training

Using pattern recognition to distinguish between object categories

vo

xe

ls

Time (trails or scans)

Train Test

input classification decision

?

Page 8: Supervised Learning applied to Neuroimaging2 › staff › J.Shawe-Taylor › courses › J2.pdf · 2009-12-14 · Supervised Learning Input: X1 X2 X3 Output y1 y2 y3 Learning/Training

Classification in Neuroimaging: 2D toy example

voxel 1

voxel 2

w

volume in t1

volume in t2

volume in t4

volume from a

new subject

volume in t32

4

L R

4 2

task 2

volume in t1 volume in t3volume in t2 volume in t4

task 2task 1 task 1task ?

Page 9: Supervised Learning applied to Neuroimaging2 › staff › J.Shawe-Taylor › courses › J2.pdf · 2009-12-14 · Supervised Learning Input: X1 X2 X3 Output y1 y2 y3 Learning/Training

voxel 1

voxel 2

Classification in High Dimensions

w

Data: <xi,yi>, i=1,..,N

Observations: xi ∈ Rd

Labels: yi ∈ {-1,+1}

All hyperplanes in Rd are parameterized by a vector w and a constant b. They can be expressed as:

(x1,+1)

(x2,-1)

In high dimensions there are many possible hyperplanes

0)(, =+>< bxw φ

Our aim is to find such a hyperplane/decision function:

that correctly classify our data: f(x1)=+1 and f(x2)=-1

))(,sgn()f( b+><= xwx φ

Nd ℜ→ℜ:φ

Page 10: Supervised Learning applied to Neuroimaging2 › staff › J.Shawe-Taylor › courses › J2.pdf · 2009-12-14 · Supervised Learning Input: X1 X2 X3 Output y1 y2 y3 Learning/Training

voxel 1

voxel 2

( )11 tX

( )31 tX ( )22 tX

( )42 tX

1m 2mw

thr

w

Projections onto the learning weight vector

FLD

with correction

w

FLD

without correction

w

Simplest Approach: Fisher Linear Discriminant

Page 11: Supervised Learning applied to Neuroimaging2 › staff › J.Shawe-Taylor › courses › J2.pdf · 2009-12-14 · Supervised Learning Input: X1 X2 X3 Output y1 y2 y3 Learning/Training

Fisher Linear Discriminant

voxel 1

voxel 2

2x

1x

+1x

+2x

1µ 2µw

thr

Fisher Discriminant is a classification function:

))(,sgn()( bf +><= xwx φ

Where the weight w is chosen to maximize the quotient

22

2

21

)()(

)()(

−+

−+

+

−=

ww

Jσσ

µµw

−+11 , µµ : mean of the projection of the +/- examples

−+ww σσ , : corresponding standard deviations

Find the direction w that maximizes the separation of the means scaled according to the

variances in that direction.

Regularized version:

ww

λσσ

µµ

++

−=

−+

−+

22

2

21

)()(

)()(

ww

J

Page 12: Supervised Learning applied to Neuroimaging2 › staff › J.Shawe-Taylor › courses › J2.pdf · 2009-12-14 · Supervised Learning Input: X1 X2 X3 Output y1 y2 y3 Learning/Training

If the optimal hyperplane has margin γγγγ>r it will correctly separate the test points.

As r is unknown the best we can do is maximize the margin γγγγ

r

Among all hyperplanes separating the data there is a unique optimal hyperplane, the one which

presents the largest margin (the distance of the closest points to the hyperplane).

Let us consider that all test points are generated by adding bounded noise (r) to the training

examples (test and training data are assumed to have been generate by the same distribution).

Optimal Hyperplane: largest margin classifier

γγγγ

Page 13: Supervised Learning applied to Neuroimaging2 › staff › J.Shawe-Taylor › courses › J2.pdf · 2009-12-14 · Supervised Learning Input: X1 X2 X3 Output y1 y2 y3 Learning/Training

γγγγ

Support Vector Machine: the maximal margin classifier

Data: <xi,yi>, i=1,..,N

Observations: xi ∈ R2

Labels: yi ∈ {-1,+1}

ξ

w

1,...,1

0,))(,(..

min

2

1

,,,

==

≥−≥+><

+− ∑=

w

xw

w

andNi

byts

C

iiii

N

i

ib

ξξγφ

ξγξγ

For details on SVM formulation see Kernel Methods for Patter Analysis, J. Shaw-Taylor & N. Christianini

Optimization Problem (convex quadratic program):

marginslack variables

weight vector

C: controls the trade-off between the margin and the size of the slack variables

In practice C is chosen by cross-validation.

As the parameter C varies, the margin varies smoothly through a corresponding range.

Page 14: Supervised Learning applied to Neuroimaging2 › staff › J.Shawe-Taylor › courses › J2.pdf · 2009-12-14 · Supervised Learning Input: X1 X2 X3 Output y1 y2 y3 Learning/Training

SVM decision function:

SVM weights:

+= ∑

=

N

j

jii bKy1

),(sgn)f( xxx α

)(1

i

N

i

i y xw ∑=

= φα

In the linear case:

>=<

=

jiji

ii

K xxxx

xx

,),(

)(φ

αi≠0 only for inputs that lie on the margin (i.e. support vectors)

The trade-off parameter C between accuracy and

regularization directly controls the size of αi

Page 15: Supervised Learning applied to Neuroimaging2 › staff › J.Shawe-Taylor › courses › J2.pdf · 2009-12-14 · Supervised Learning Input: X1 X2 X3 Output y1 y2 y3 Learning/Training

How to interpret the SVM weight vector?

Weight vector (Discriminating Volume)W = [0.45 0.89]

1 4 2 3 2.5 4.50.5 0.3 1 1.5 2 1

task1 task2 task1 task2 task1 task2

0

0.5

1

1.5

2

2.5

3

0 1 2 3 4 5

voxel 2

vo

xel 1

H: Hyperplane

w

• The value of each voxel in the discriminating volume indicates the importance of such voxel in differentiating between the two classes or brain states.

0.45 0.89

Page 16: Supervised Learning applied to Neuroimaging2 › staff › J.Shawe-Taylor › courses › J2.pdf · 2009-12-14 · Supervised Learning Input: X1 X2 X3 Output y1 y2 y3 Learning/Training

Patter Recognition Method: General Procedure

Split data: training and test

ML training and test

Dimensionality Reduction and/or

Feature Selection

Standard fMRI pre-processing:

•Realignment

•Normalization

•Smooth

Output:

-Accuracy

-Discriminating Maps

(weight vector)

Compute Kernel Matrix

Page 17: Supervised Learning applied to Neuroimaging2 › staff › J.Shawe-Taylor › courses › J2.pdf · 2009-12-14 · Supervised Learning Input: X1 X2 X3 Output y1 y2 y3 Learning/Training

Kernel is a function that, for given two pattern X and X*, returns a real number

characterizing their similarity.

Κ: χ x χ →ℝ

A simple type of similarity measure between two vectors is a dot product.

<X,X*> → Κ(X,X*)

Kernel

Page 18: Supervised Learning applied to Neuroimaging2 › staff › J.Shawe-Taylor › courses › J2.pdf · 2009-12-14 · Supervised Learning Input: X1 X2 X3 Output y1 y2 y3 Learning/Training

5 10 15 20 25 30 35 40 45

5

10

15

20

25

30

35

40

45-3

-2

-1

0

1

2

3

4

5

6

7

x 106

Kernel Matrix

X1

X2

<X1,X2>

>=<

=

jiji

ii

K xxxx

xx

,),(

)(φ

Page 19: Supervised Learning applied to Neuroimaging2 › staff › J.Shawe-Taylor › courses › J2.pdf · 2009-12-14 · Supervised Learning Input: X1 X2 X3 Output y1 y2 y3 Learning/Training

• The original input space can be mapped to some higher-dimensional feature space

where the training set is separable:

Φ: x→→→→ φ(x)

Kernel Approaches and Feature Space

Page 20: Supervised Learning applied to Neuroimaging2 › staff › J.Shawe-Taylor › courses › J2.pdf · 2009-12-14 · Supervised Learning Input: X1 X2 X3 Output y1 y2 y3 Learning/Training

Instead of using two steps:

1. Mapping to high dimensional space xi-> φ(xi)

2. Computing dot product in the high dimensional space <φ(xi), φ(xi)>

One can use the kernel trick and compute these two steps together

A kernel function is defined as a function that corresponds to a dot product of two feature vectors in some expanded feature space

K(xi,xj) := <φ(xi), φ(xi)>

Kernel trick

Page 21: Supervised Learning applied to Neuroimaging2 › staff › J.Shawe-Taylor › courses › J2.pdf · 2009-12-14 · Supervised Learning Input: X1 X2 X3 Output y1 y2 y3 Learning/Training

– Linear kernel:

2

2( , ) exp( )

2

i j

i jK

σ

−= −

x xx x

( , ) T

i j i jK =x x x x

( , ) (1 )T p

i j i jK = +x x x x

0 1( , ) tanh( )T

i j i jK β β= +x x x x

• Examples of commonly-used kernel functions:

– Polynomial kernel:

– Gaussian (Radial-Basis Function (RBF) ) kernel:

– Sigmoid:

• In general, functions that satisfy Mercer’s condition can be kernel functions.

Page 22: Supervised Learning applied to Neuroimaging2 › staff › J.Shawe-Taylor › courses › J2.pdf · 2009-12-14 · Supervised Learning Input: X1 X2 X3 Output y1 y2 y3 Learning/Training

How to give the data as input to the

classifier?

Page 23: Supervised Learning applied to Neuroimaging2 › staff › J.Shawe-Taylor › courses › J2.pdf · 2009-12-14 · Supervised Learning Input: X1 X2 X3 Output y1 y2 y3 Learning/Training

First Approach: Training with the whole brain

- Additional pre-processing: removal of the base line and low frequency components of each

voxel

- Advantages: Predict single events

- Disadvantages: Low signal to noise rate (SNR), stationarity assumptions

Data Matrix =

C1 C1 C1 BL BL BL C2 C2 C2 BL BL BL

vo

xe

ls

Single volumes

Page 24: Supervised Learning applied to Neuroimaging2 › staff › J.Shawe-Taylor › courses › J2.pdf · 2009-12-14 · Supervised Learning Input: X1 X2 X3 Output y1 y2 y3 Learning/Training

Second Approach: Training temporal compressed data

- Additional pre-processing: removal of the base line and low frequency components of each

voxel

- Advantages: High SNR

- Disadvantages: Stationarity assumptions

Data Matrix =

C1 C1 C1 BL BL BL C2 C2 C2 BL BL BL

vo

xe

ls

Mean of volumes or betas

Average volumes(over blocks or over the experiment) or

use parameter estimates (betas) of

the GLM model

Page 25: Supervised Learning applied to Neuroimaging2 › staff › J.Shawe-Taylor › courses › J2.pdf · 2009-12-14 · Supervised Learning Input: X1 X2 X3 Output y1 y2 y3 Learning/Training

Third Approach: Training with regions of interest (ROIs)

- Additional pre-processing: removal of the base line and low frequency components of each

voxel

- Advantages: lower dimensionality

- Disadvantages: stationarity assumptions, need a priori hypothesis to define the ROI, does

not use the whole brain information

Data Matrix =

C1 C1 C1 BL BL BL C2 C2 C2 BL BL BL

vo

xe

ls

Single volumes

Feature selection

Method

Se

lecte

d v

oxe

ls

Single volumes

Page 26: Supervised Learning applied to Neuroimaging2 › staff › J.Shawe-Taylor › courses › J2.pdf · 2009-12-14 · Supervised Learning Input: X1 X2 X3 Output y1 y2 y3 Learning/Training

Fourth Approach: Spatiotemporal information

- Additional pre-processing: removal of the base line and low frequency components of each

voxel

- Advantages: uses temporal and spatial information, no stationarity assumptions

- Disadvantages: Low signal to noise rate (SNR)

Data Matrix =

C1 C1 C1 BL BL BL C2 C2 C2 BL BL BL

vo

xe

ls

Spatiotemporal observations

T1 T2 T3

T1

T2

T3

Page 27: Supervised Learning applied to Neuroimaging2 › staff › J.Shawe-Taylor › courses › J2.pdf · 2009-12-14 · Supervised Learning Input: X1 X2 X3 Output y1 y2 y3 Learning/Training

Examples of Applications

Page 28: Supervised Learning applied to Neuroimaging2 › staff › J.Shawe-Taylor › courses › J2.pdf · 2009-12-14 · Supervised Learning Input: X1 X2 X3 Output y1 y2 y3 Learning/Training

Can we classify brain states

using the whole brain information

from different subjects?

Page 29: Supervised Learning applied to Neuroimaging2 › staff › J.Shawe-Taylor › courses › J2.pdf · 2009-12-14 · Supervised Learning Input: X1 X2 X3 Output y1 y2 y3 Learning/Training

Application I: Classifying cognitive states

We applied SVM classifier to predict from the fMRI scans if a subject was looking

at an unpleasant or pleasant image

Number of subjects: 16

Tasks: Viewing unpleasant and pleasant pictures (6 blocks of 7 scans per block)

Pre-Processing Procedures

• Realignment, normalization to standard space, spatial filter.

• Mask to select voxels inside the brain.

Training Examples

• Mean volume per block

Leave one-out-test

• Training: 15 subjects

• Test: 1 subject

This procedure was repeated 16 times and the results (error rate) were averaged.

Experimental Design:

Page 30: Supervised Learning applied to Neuroimaging2 › staff › J.Shawe-Taylor › courses › J2.pdf · 2009-12-14 · Supervised Learning Input: X1 X2 X3 Output y1 y2 y3 Learning/Training

fMRI scanner

fMRI scanner

?

fMR

Iscanner

Machine Learning Method:

Support Vector Machine

The subject was viewing a pleasant stimuli

Brain looking

at a pleasant stimulus

Brain looking

at an unpleasant stimulus

fMRI scanner

fMRI scanner

Brain looking

at a pleasant stimulus

Brain looking

at an unpleasant stimulus

Training Subjects

Test Subject

Page 31: Supervised Learning applied to Neuroimaging2 › staff › J.Shawe-Taylor › courses › J2.pdf · 2009-12-14 · Supervised Learning Input: X1 X2 X3 Output y1 y2 y3 Learning/Training

1.00

0.66

0.33

0.05

-0.05

-0.33

-0.66

-1.00

un

ple

asa

nt

ple

asa

nt

z=-18 z=-6 z=6 z=18 z=30 z=42

Spatial weight vector

Results

Mourao-Miranda et al. 2006

Page 32: Supervised Learning applied to Neuroimaging2 › staff › J.Shawe-Taylor › courses › J2.pdf · 2009-12-14 · Supervised Learning Input: X1 X2 X3 Output y1 y2 y3 Learning/Training

Can we make use of the

temporal dimension in decoding?

Page 33: Supervised Learning applied to Neuroimaging2 › staff › J.Shawe-Taylor › courses › J2.pdf · 2009-12-14 · Supervised Learning Input: X1 X2 X3 Output y1 y2 y3 Learning/Training

Experiment: Emotional

Images

(Pleasant vs. Unpleasant)

Page 34: Supervised Learning applied to Neuroimaging2 › staff › J.Shawe-Taylor › courses › J2.pdf · 2009-12-14 · Supervised Learning Input: X1 X2 X3 Output y1 y2 y3 Learning/Training

Fixation

Unpleasant or

Pleasant Stimuli

vt2

vt3

vt4

vt5

vt6

vt7

vt9

vt10

vt11

vt12

vt13

vt14

vt8vt1

Duty Cycle

Spatial Temporal Observation

Vi = [v1 v2 v3 v4 v5 v6 v7 v8 v9 v10 v11 v12 v13 v14]

Page 35: Supervised Learning applied to Neuroimaging2 › staff › J.Shawe-Taylor › courses › J2.pdf · 2009-12-14 · Supervised Learning Input: X1 X2 X3 Output y1 y2 y3 Learning/Training

Spatiotemporal SVM: Block Design

Unpleasant

Pleasant

Training example:

Whole duty cycle

T1T2T3T4T5T6T7T8T9T10T11T12T13T14

1.00

0.45

0.22

0.05

-0.05

-0.22

-0.45

-1.00

un

ple

asa

nt

ple

asa

nt

Mourao-Miranda et al. 2007

Page 36: Supervised Learning applied to Neuroimaging2 › staff › J.Shawe-Taylor › courses › J2.pdf · 2009-12-14 · Supervised Learning Input: X1 X2 X3 Output y1 y2 y3 Learning/Training

Spatial-Temporal weight vector:

Dynamic discriminating map

Page 37: Supervised Learning applied to Neuroimaging2 › staff › J.Shawe-Taylor › courses › J2.pdf · 2009-12-14 · Supervised Learning Input: X1 X2 X3 Output y1 y2 y3 Learning/Training

Spatial-Temporal weight vector:

Dynamic discriminating map

Page 38: Supervised Learning applied to Neuroimaging2 › staff › J.Shawe-Taylor › courses › J2.pdf · 2009-12-14 · Supervised Learning Input: X1 X2 X3 Output y1 y2 y3 Learning/Training

T5

1.00

0.45

0.22

0.05

-0.05

-0.22

-0.45

-1.00

un

ple

asa

nt

ple

asa

nt

z=-18

A C

B D

Page 39: Supervised Learning applied to Neuroimaging2 › staff › J.Shawe-Taylor › courses › J2.pdf · 2009-12-14 · Supervised Learning Input: X1 X2 X3 Output y1 y2 y3 Learning/Training

T5

z=-6

1.00

0.45

0.22

0.05

-0.05

-0.22

-0.45

-1.00

un

ple

asa

nt

ple

asa

nt

B

A

Page 40: Supervised Learning applied to Neuroimaging2 › staff › J.Shawe-Taylor › courses › J2.pdf · 2009-12-14 · Supervised Learning Input: X1 X2 X3 Output y1 y2 y3 Learning/Training

T5

z=-6

1.00

0.45

0.22

0.05

-0.05

-0.22

-0.45

-1.00

un

ple

asa

nt

ple

asa

nt

C

D

E

F

Page 41: Supervised Learning applied to Neuroimaging2 › staff › J.Shawe-Taylor › courses › J2.pdf · 2009-12-14 · Supervised Learning Input: X1 X2 X3 Output y1 y2 y3 Learning/Training

Can we classify groups using the

whole brain information from

different subjects?

Page 42: Supervised Learning applied to Neuroimaging2 › staff › J.Shawe-Taylor › courses › J2.pdf · 2009-12-14 · Supervised Learning Input: X1 X2 X3 Output y1 y2 y3 Learning/Training

Application II: Classifying groups of subjects

We applied SVM to classify depressed patients vs. healthy controls based on

their pattern of activation for emotional stimuli (sad faces).

•19 free medication depressed patients vs. 19 healthy controls

•Event-related fMRI paradigm consisted of affective processing of sad facial stimuli with

modulation of the intensity of the emotional expression (low, medium, and high

intensity).

Pre-Processing Procedures

• Realignment, normalization to standard space, spatial filter.

• GLM analysis.

Training Examples

• GLM coefficients, i.e. one example per subject

Leave one pair out cross-validation test

Experimental Design:

Page 43: Supervised Learning applied to Neuroimaging2 › staff › J.Shawe-Taylor › courses › J2.pdf · 2009-12-14 · Supervised Learning Input: X1 X2 X3 Output y1 y2 y3 Learning/Training

Pattern Classification of Brain Activity in Depression

(train and test with GLM coefficients)

Collaboration with Cynthia H.Y. Fu

Fu et al. 2008

Page 44: Supervised Learning applied to Neuroimaging2 › staff › J.Shawe-Taylor › courses › J2.pdf · 2009-12-14 · Supervised Learning Input: X1 X2 X3 Output y1 y2 y3 Learning/Training

SVM weight – Low intensity (Hap 0)

Page 45: Supervised Learning applied to Neuroimaging2 › staff › J.Shawe-Taylor › courses › J2.pdf · 2009-12-14 · Supervised Learning Input: X1 X2 X3 Output y1 y2 y3 Learning/Training

SVM weight – Medium intensity (Sad 2)

Page 46: Supervised Learning applied to Neuroimaging2 › staff › J.Shawe-Taylor › courses › J2.pdf · 2009-12-14 · Supervised Learning Input: X1 X2 X3 Output y1 y2 y3 Learning/Training

SVM – High intensity (Sad 4)

Page 47: Supervised Learning applied to Neuroimaging2 › staff › J.Shawe-Taylor › courses › J2.pdf · 2009-12-14 · Supervised Learning Input: X1 X2 X3 Output y1 y2 y3 Learning/Training

Can we decode subjective pain

from whole brain pattern of fMRI?(Andre Marquand)

Page 48: Supervised Learning applied to Neuroimaging2 › staff › J.Shawe-Taylor › courses › J2.pdf · 2009-12-14 · Supervised Learning Input: X1 X2 X3 Output y1 y2 y3 Learning/Training

Application IV: Decoding Pain Perception

We applied GP methods to predict subjective pain levels in an fMRI experiment

investigating subjective responses to thermal pain

• 15 subjects scanned 6 times over three visits (repeated-measures design)

• Thermal stimulation was delivered via a thermode attached to subjects’ right forearm

• Stimulation was individually calibrated to three different subjective intensity thresholds:1. Sensory detection threshold (SDT; temperature stimulation is detectable)2. Pain detection threshold (PDT; temperature at which it becomes painful)3. Pain tolerance threshold (PTT; maximum tolerable temperature)

• Subjects rated the perceived intensity of the stimulus using a visual-analogue scale

(VAS). 0 = “No sensation”, 100 = “Worst pain imaginable”

• After calibration, the actual temperature applied was invariant throughout the

experiment (within subjects and stimulus classes)

Experimental Design:

1. GPR was used to predict the subjective pain rating (VAS score)

Whole-brain fMRI volumes were used as input to the model

Predictive Model:

Page 49: Supervised Learning applied to Neuroimaging2 › staff › J.Shawe-Taylor › courses › J2.pdf · 2009-12-14 · Supervised Learning Input: X1 X2 X3 Output y1 y2 y3 Learning/Training

Results: GP Regression

For every stimulus class, GPR provided very accurate predictions of subjective

pain intensity (SMSE = 0.51*, p < 1x10-10 by permutation)

SDT ρS = 0.60 PDT ρS = 0.73 PTT ρS = 0.87

Predicted VAS Predicted VAS Predicted VAS

Tru

e V

AS

Tru

e V

AS

Tru

e V

AS

Marquand et al. 2009

Page 50: Supervised Learning applied to Neuroimaging2 › staff › J.Shawe-Taylor › courses › J2.pdf · 2009-12-14 · Supervised Learning Input: X1 X2 X3 Output y1 y2 y3 Learning/Training

Results: GP Regression

Relating brain activity to subjective pain intensity is not a novel finding and

several brain regions have been demonstrated to encode subjective pain intensity

We compared the strength of correlation between GPR predictions and VAS

scores to correlations derived from a number of intensity-coding brain regions

Primary Somatosensory cortex:

• Left: ρS = 0.26

• Right: ρS = 0.12

Secondary Somatosensory cortex:

• Left: ρS = 0.27*

• Right: ρS = 0.32*

Anterior Cingulate Cortex:

• Left: ρS = 0.42*

• Right: ρS = 0.41*

Insula:

• Left: ρS = 0.37*

• Right: ρS = 0.36*

No brain region produced a correlation as strong as GPR predictions derived from

the whole brain. ‘The whole is greater than any of the parts’

Marquand et al. 2009