41
Statistical Data Analysis in Neuroinformatics (Well, really it is: Independent component analysis with applications in brain imaging) Aapo Hyv ¨ arinen Helsinki Institute for Information Technology and Depts of Computer Science and Psychology University of Helsinki

Statistical Data Analysis in Neuroinformatics · Statistical Data Analysis in Neuroinformatics (Well, really it is: Independent component analysis with applications in brain imaging)

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Statistical Data Analysis in Neuroinformatics · Statistical Data Analysis in Neuroinformatics (Well, really it is: Independent component analysis with applications in brain imaging)

Statistical Data Analysis in Neuroinformatics

(Well, really it is:Independent component analysis with applications in brainimaging)

Aapo Hyvarinen

Helsinki Institute for Information Technology and

Depts of Computer Science and Psychology

University of Helsinki

Page 2: Statistical Data Analysis in Neuroinformatics · Statistical Data Analysis in Neuroinformatics (Well, really it is: Independent component analysis with applications in brain imaging)

Types of multivariate data analysis

• Supervised / confirmatory

– Determine if a specified effect is there or not

– Formally: we observe “input”x1,x2, . . . and “output”y1,y2, . . .

– Regression, statistical hypothesis testing, analysis of variance

• Unsupervised / exploratory

– Find something in your data

∗ New things in old data

∗ Anything interesting in new data

– Formally: we only observex1,x2, . . .

– Clustering, factor analysis, independent component analysis

Page 3: Statistical Data Analysis in Neuroinformatics · Statistical Data Analysis in Neuroinformatics (Well, really it is: Independent component analysis with applications in brain imaging)

What this lecture is about

• Exploratory data analysis

– using independent component analysis (ICA) and related methods

• What is ICA, and what is its relationship to factor analysis?

• Application to

– Analysis of evoked brain activity

– Analysis of spontaneous brain activity

– Artefact removal

– Modelling of visual processing

Page 4: Statistical Data Analysis in Neuroinformatics · Statistical Data Analysis in Neuroinformatics (Well, really it is: Independent component analysis with applications in brain imaging)

Example: Independent components of resting-state BOLD

(From Beckmann et al,Phil. Trans. Royal Soc. B, 2005)

Decomposition of resting-state activity into components.No stimulus sequence can be used because none exists!

Page 5: Statistical Data Analysis in Neuroinformatics · Statistical Data Analysis in Neuroinformatics (Well, really it is: Independent component analysis with applications in brain imaging)

First, some theory...

Page 6: Statistical Data Analysis in Neuroinformatics · Statistical Data Analysis in Neuroinformatics (Well, really it is: Independent component analysis with applications in brain imaging)

Problem of blind source separation

There is a number of “source signals”:

Due to some external circumstances, only linear mixtures ofthe source

signals are observed.

Estimate (separate) original signals!

Page 7: Statistical Data Analysis in Neuroinformatics · Statistical Data Analysis in Neuroinformatics (Well, really it is: Independent component analysis with applications in brain imaging)

Principal component analysis does not recover original signals

A solution is possible

Use information onstatistical independenceto recover:

Page 8: Statistical Data Analysis in Neuroinformatics · Statistical Data Analysis in Neuroinformatics (Well, really it is: Independent component analysis with applications in brain imaging)

Independent Component Analysis(Herault and Jutten, 1984-1991)

• Observed random variablesxi are modelled as linear sums of

hidden variables:

xi =m

∑j=1

ai js j, i = 1...n (1)

• Mathematical formulation of blind source separation problem

• A form of factor analysis

• Matrix of ai j is constant (factor loadings), called “mixing matrix”.

• Thesi are hidden random factors called “independent components”, or

“source signals”

• Problem: Estimate bothai j ands j, observing onlyxi.

Page 9: Statistical Data Analysis in Neuroinformatics · Statistical Data Analysis in Neuroinformatics (Well, really it is: Independent component analysis with applications in brain imaging)

When can the ICA model be estimated?

• Must assume:

– Thesi are mutually statistically independent

– Thesi arenongaussian (non-normal)

– (Optional:) Number of independent components is equal to number

of observed variables

• Then: mixing matrix and components can be identified (Comon,1994)

A very surprising result!

Page 10: Statistical Data Analysis in Neuroinformatics · Statistical Data Analysis in Neuroinformatics (Well, really it is: Independent component analysis with applications in brain imaging)

Related method: Principal component analysis

• Basic idea: find directions∑i wixi of maximum variance

• We must constrain the norm ofw: ∑i w2i = 1, otherwise solution is that

wi are infinite.

• For more than one component, find direction of max var orthogonal to

components previously found.

• Principal goal:

explain maximal variance with limited number of components

Page 11: Statistical Data Analysis in Neuroinformatics · Statistical Data Analysis in Neuroinformatics (Well, really it is: Independent component analysis with applications in brain imaging)

Another related method: Classic factor analysis

• Like in ICA, observedxi are linear sums of hidden variables:

xi =m

∑j=1

ai js j +ni, i = 1...n (2)

• But:

– the number of hidden variables is small

– thes j can be (usually are) gaussian

– (And: there is a noise termni)

• Like PCA, explains variance using a small number of components

• But: any rotation of factors explains the same amount of variance:

E.g.,s1 + s2 ands1− s2 explains the same ass1 ands2

⇒ “Factor rotation problem”

Page 12: Statistical Data Analysis in Neuroinformatics · Statistical Data Analysis in Neuroinformatics (Well, really it is: Independent component analysis with applications in brain imaging)

Comparison of ICA, factor analysis and principal componentanalysis

• ICA is nongaussian FA with no separate noise or specific factors.

So many components used that all variance is explained by them.

• No factor rotation left unknown because of identifiability result

• In contrast to FA and PCA,in ICA components really give the original

source signals or underlying hidden variables

• PCA and FA are succesful only in reducing number of variables

• Crucial question is whether your data is nongaussian

– Many “psychological” hidden variables (e.g. “intelligence”) may be

(practically) gaussian because sum of many independent variables

(central limit theorem).

– But signals measured by sensors are usually quite nongaussian

Page 13: Statistical Data Analysis in Neuroinformatics · Statistical Data Analysis in Neuroinformatics (Well, really it is: Independent component analysis with applications in brain imaging)

Some examples of nongaussianity

0 1 2 3 4 5 6 7 8 9 10−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

0 1 2 3 4 5 6 7 8 9 10−5

−4

−3

−2

−1

0

1

2

3

4

5

0 1 2 3 4 5 6 7 8 9 10−8

−6

−4

−2

0

2

4

6

−2 −1.5 −1 −0.5 0 0.5 1 1.5 20

0.1

0.2

0.3

0.4

0.5

0.6

0.7

−5 −4 −3 −2 −1 0 1 2 3 4 50

0.1

0.2

0.3

0.4

0.5

0.6

0.7

−8 −6 −4 −2 0 2 4 60

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Page 14: Statistical Data Analysis in Neuroinformatics · Statistical Data Analysis in Neuroinformatics (Well, really it is: Independent component analysis with applications in brain imaging)

Why classic methods cannot find original components or sources

• In PCA and FA: find componentsyi which are uncorrelated

cov(yi,y j) = E{yiy j}−E{yi}E{y j} = 0 (3)

and maximize explained variance (or variance of components)

• Such methods need only the covariances, cov(xi,x j)

• However, there are many different component sets that are

uncorrelated, because

– The number of covariances is≈ n2/2 due to symmetry

– So, we cannot solve then2 parametersai j, not enough information!

(“More equations than variables”)

• This is why PCA and FA cannot find the underlying components (in

general)

Page 15: Statistical Data Analysis in Neuroinformatics · Statistical Data Analysis in Neuroinformatics (Well, really it is: Independent component analysis with applications in brain imaging)

Nongaussianity, combined with independence, gives more information

• For independent variables we have

E{h1(y1)h2(y2)}−E{h1(y1)}E{h2(y2)} = 0. (4)

• For nongaussian variables, nonlinear covariances give moreinformation than just covariances.

• This is not true for multivariate gaussian distribution

– Distribution is completely determined by covariances (andmeans)

– Uncorrelated gaussian variables are independent, and their

– distribution (standardized) is same in all directions (seebelow)

⇒ ICA model cannot be estimated for gaussian data.

• In practice, simpler to look at properties of linear combinations∑i wixi.PCA maximizes variance of∑i wixi, can we do something better?Yes, see below.

Page 16: Statistical Data Analysis in Neuroinformatics · Statistical Data Analysis in Neuroinformatics (Well, really it is: Independent component analysis with applications in brain imaging)

Illustration

Two components with uniform distributions:

Original components, observed mixtures, PCA, ICA

PCA does not find original coordinates, ICA does!

Page 17: Statistical Data Analysis in Neuroinformatics · Statistical Data Analysis in Neuroinformatics (Well, really it is: Independent component analysis with applications in brain imaging)

Illustration of problem with gaussian distributions

Original components, observed mixtures, PCA

Distribution after PCA is the same as distribution before mixing!

“Factor rotation problem” in classic FA

Page 18: Statistical Data Analysis in Neuroinformatics · Statistical Data Analysis in Neuroinformatics (Well, really it is: Independent component analysis with applications in brain imaging)

Basic intuitive principle of ICA estimation

• Inspired by the Central Limit Theorem:

– Average of many independent random variables will have a

distribution that is close(r) to gaussian

– In the limit of an infinite number of random variables, the

distribution tends to gaussian

• Consider a linear combination∑i wixi = ∑i qisi

• Because of theorem,∑i qisi should be more gaussian thansi.

• Maximizing the nongaussianity of ∑i wixi, we can findsi.

• Also known as projection pursuit.

• Cf. principal component analysis: maximize variance of∑i wixi.

Page 19: Statistical Data Analysis in Neuroinformatics · Statistical Data Analysis in Neuroinformatics (Well, really it is: Independent component analysis with applications in brain imaging)

Illustration of changes in nongaussianity

−4 −3 −2 −1 0 1 2 3 40

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

Histogram and scatterplot, original uniform distributions

−4 −3 −2 −1 0 1 2 3 40

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

Histogram and scatterplot, mixtures given by PCA

Page 20: Statistical Data Analysis in Neuroinformatics · Statistical Data Analysis in Neuroinformatics (Well, really it is: Independent component analysis with applications in brain imaging)

Combining ICA with FA/PCA

• In practice, it is useful to combine ICA with classic PCA or FA

– First, find asmallnumber of factors with PCA or FA

– Then, perform ICA on those factors

• ICA is then a method offactor rotation

• Very different from varimax etc. which do not use statistical structure,

and cannot find original components (in most cases)

• Reduces noise in signals, reduces computation

• (Simplifies algorithms because we can constrain mixing matrix to be

orthogonal.)

Page 21: Statistical Data Analysis in Neuroinformatics · Statistical Data Analysis in Neuroinformatics (Well, really it is: Independent component analysis with applications in brain imaging)

Preprocessing of data

• Prefiltering possible: ICA model still holds with the same matrix A

xi(t) = f (t)∗ xi(t) = ∑τ

f (τ)xi(t − τ) (5)

⇒ (6)

xi(t) = ∑j

ai j s j(t) (7)

One can try to find a frequency band in which the source signalsare as

independent and nongaussian as possible

• (And: Dimension reduction by PCA)

Page 22: Statistical Data Analysis in Neuroinformatics · Statistical Data Analysis in Neuroinformatics (Well, really it is: Independent component analysis with applications in brain imaging)

Reliability analysis

• Algorithmic reliability: Are there local minima? (seea) below)

• Statistical reliability: Is the result just accidental?

Can be analyzed by bootstrap but this changes local minimab)

• We have developed a packageIcasso that uses computationally

intensive methods to visualize and analyze these:

0 2 4 6 8 10−6

−5

−4

−3

−2

−1

0

1a)

0 2 4 6 8 10−6

−5

−4

−3

−2

−1

0

1b)

1

2

3

4

5

6

7

8

9

10

11

12 13

14 15

16

1718

19

20

A B

Page 23: Statistical Data Analysis in Neuroinformatics · Statistical Data Analysis in Neuroinformatics (Well, really it is: Independent component analysis with applications in brain imaging)

Applications

Page 24: Statistical Data Analysis in Neuroinformatics · Statistical Data Analysis in Neuroinformatics (Well, really it is: Independent component analysis with applications in brain imaging)

Application to MEG (Vigario et al,NIPS, 1998)

12

63

54

1

1

2

2

3

3

4

4

5

5

6

6

MEG

5 6500 µV

500 µV

1000 fT/cmMEG

EOG

ECG

saccades blinking biting

Page 25: Statistical Data Analysis in Neuroinformatics · Statistical Data Analysis in Neuroinformatics (Well, really it is: Independent component analysis with applications in brain imaging)

ICA of “spontaneous” MEG (Vigario et al,NIPS, 1998)

IC1

IC2

IC3

IC4

IC5

IC6

IC7

IC8

IC9

10 s

Finds artefacts, useful for removing them

Page 26: Statistical Data Analysis in Neuroinformatics · Statistical Data Analysis in Neuroinformatics (Well, really it is: Independent component analysis with applications in brain imaging)

How to do ICA of imaging data

• Assume we observe several brain images

– at different time points, or

– under different imaging conditions

• ICA expresses observed images as linear sums of “source images”:

= an1

= a21

= a11 +a12 ... +a1n

• Reverses the roles of observations and variables

Page 27: Statistical Data Analysis in Neuroinformatics · Statistical Data Analysis in Neuroinformatics (Well, really it is: Independent component analysis with applications in brain imaging)

ICA of BOLD responses (fMRI)

Subject performing

Stroop colour-naming

task.

(McKeown et al,

PNAS, 1998)

Page 28: Statistical Data Analysis in Neuroinformatics · Statistical Data Analysis in Neuroinformatics (Well, really it is: Independent component analysis with applications in brain imaging)

ICA of resting-state BOLD (1)

(From Beckmann et al,Phil. Trans. Royal Soc. B, 2005)

a) Medial and b) lateral visual areas, c) Auditory system, d) Sensory-motor system,

e) Visuo-spatial system, f) Executive control, g) Dorsal visual stream

Page 29: Statistical Data Analysis in Neuroinformatics · Statistical Data Analysis in Neuroinformatics (Well, really it is: Independent component analysis with applications in brain imaging)

ICA of resting-state BOLD (2)

(From van de Ven et al,NeuroImage, 2005)

Bilateral components corresponding to auditory cortex (AC), visual cortex (PVC),

sensoromotor cortex (SMC). Parts of inferotemporal cortex (IPC), posterior cingulate

cortex (PCC), central sulcus (CS).

Page 30: Statistical Data Analysis in Neuroinformatics · Statistical Data Analysis in Neuroinformatics (Well, really it is: Independent component analysis with applications in brain imaging)

ICA in modelling visual cortex

= s1· + s2· + · · ·+ sk·

• Why are the receptive fields in visual cortex the way they are?

• Statistical-ecological approach

– What is important in a real environment?

– Natural images have statistical regularities, “explain” receptive

fields by statistical properties of natural images

– ICA gives the“best” features natural images

Page 31: Statistical Data Analysis in Neuroinformatics · Statistical Data Analysis in Neuroinformatics (Well, really it is: Independent component analysis with applications in brain imaging)

ICA / sparse coding of natural images(Olshausen and Field, 1996; Bell and Sejnowski, 1997)

Features similar to receptive fields of simple cells in V1

Page 32: Statistical Data Analysis in Neuroinformatics · Statistical Data Analysis in Neuroinformatics (Well, really it is: Independent component analysis with applications in brain imaging)

More theory

Page 33: Statistical Data Analysis in Neuroinformatics · Statistical Data Analysis in Neuroinformatics (Well, really it is: Independent component analysis with applications in brain imaging)

Complication (1): Noisy ICA

• Assume there is (gaussian) sensor noise

xi = ∑j

ai js j +ni (8)

• Very difficult problem in general

• But trivial if noise covariance is the same as signal covariance:

x = A(s+A−1n) = As (9)

the transformed components are independent!

• Or: if noise can be modelled by some components ins.

• In practice maybe the best thing to do:reduce noise by time filtering

and/or PCAand use ordinary (noise-free) ICA algorithms.

Page 34: Statistical Data Analysis in Neuroinformatics · Statistical Data Analysis in Neuroinformatics (Well, really it is: Independent component analysis with applications in brain imaging)

Complication (2): different numbers of components and variables

• In the theoretical analysis, we assume the numbers are equal

• In practice, often we have more variables than components

– simple solution (1): reduce dimension by PCA

– simple solution (2): estimate only thek “first” components

• Another very difficult case: Less variables than independent

components

Page 35: Statistical Data Analysis in Neuroinformatics · Statistical Data Analysis in Neuroinformatics (Well, really it is: Independent component analysis with applications in brain imaging)

Nongaussianity measures: kurtosis

• Problem: how to measure nongaussianity?

• Definition:

kurt(x) = E{x4}−3(E{x2})2 (10)

• if variance constrained to unity, essentially 4th moment.

• Simple algebraic properties because it’s a cumulant:

kurt(s1 + s2) = kurt(s1)+ kurt(s2) (11)

kurt(αs1) = α4 kurt(s1) (12)

• zero for gaussian RV, non-zero for most nongaussian RV’s.

• positive vs. negative kurtosis have typical forms of pdf.

• absolute value a classic measure of nongaussianity

Page 36: Statistical Data Analysis in Neuroinformatics · Statistical Data Analysis in Neuroinformatics (Well, really it is: Independent component analysis with applications in brain imaging)

Left: Laplacian pdf, positive kurt (“supergaussian”).

Right: Uniform pdf, negative kurt (“subgaussian”).

Page 37: Statistical Data Analysis in Neuroinformatics · Statistical Data Analysis in Neuroinformatics (Well, really it is: Independent component analysis with applications in brain imaging)

0 0.5 1 1.5 2 2.5 3 3.5−1.3

−1.2

−1.1

−1

−0.9

−0.8

−0.7

−0.6

−0.5

angle of w

kurt

osis

Kurtosis is minimized, and its absolute value maximized, inthe directions

of the independent components.

Page 38: Statistical Data Analysis in Neuroinformatics · Statistical Data Analysis in Neuroinformatics (Well, really it is: Independent component analysis with applications in brain imaging)

Why kurtosis is not optimal

• Sensitive to outliers:

Consider a sample of 1000 values with unit var, and one value equal to

10.

Kurtosis equals at least 104/1000−3 = 7.

• For supergaussian variables, statistical performance notoptimal even

without outliers.

• Other measures of nongaussianity should be considered.

Page 39: Statistical Data Analysis in Neuroinformatics · Statistical Data Analysis in Neuroinformatics (Well, really it is: Independent component analysis with applications in brain imaging)

Differential entropy as nongaussianity measure

• Generalization of ordinary discrete Shannon entropy:

H(x) = E{− logp(x)} (13)

• for fixed variance, maximized by gaussian distribution.

• often normalized to give negentropy

J(x) = H(xgauss)−H(x) (14)

• Good statistical properties, but computationally difficult.

Page 40: Statistical Data Analysis in Neuroinformatics · Statistical Data Analysis in Neuroinformatics (Well, really it is: Independent component analysis with applications in brain imaging)

Approximation of negentropy

• Approximations of negentropy(Hyvarinen, 1998):

JG(x) = (E{G(x)}−E{G(xgauss)})2 (15)

whereG is a nonquadratic function.

• Generalization of (square of) kurtosis (which isG(x) = x4).

• A good compromise?

– statistical properties not bad (for suitable choice of G)

– computationally simple

• Further possibility: Skewness (for nonsymmetric ICs)

Page 41: Statistical Data Analysis in Neuroinformatics · Statistical Data Analysis in Neuroinformatics (Well, really it is: Independent component analysis with applications in brain imaging)

Conclusions

• ICA is very simple as a model:

linear nongaussian latent variables model.

• Solves factor rotation and blind source separation problems,

if data (components) are nongaussian

• Estimate by maximizing nongaussianity of components.

• Radically different from PCA both in theory and practice.

• Can be applied almost in any field where we have continuous-valued

variables, e.g.

– electro/magnetoencephalograms

– functional magnetic resonance imaging

– modelling of vision