ICA and PCA

ICA and PCA

學生：周節教授：王聖智教授

Outline

• Introduction

• PCA

• ICA

• Reference

Introduction

• Why are these methods ? A: For computational and conceptual simplicity. And it is more convenient to analysis.

• What are these methods ? A: The “representation” is often sought as a linear

transformation of the original data.

• Well-known linear transformation methods. Ex: PCA , ICA , factor analysis, projection pursuit………….

What is PCA?

• Principal Component Analysis

• It is a way of identifying patterns in data, and expressing the data in such a way as to highlight their similarities and differences.

• Reducing the number of dimensions

example

• X Y• 2.5000 2.4000• 0.5000 0.7000• 2.2000 2.9000• 1.9000 2.2000• 3.1000 3.0000• 2.3000 2.7000• 2.0000 1.6000• 1.0000 1.1000• 1.5000 1.6000• 1.1000 0.9000

• Original data

example

• X Y• 0.6900 0.4900• -1.3100 -1.2100• 0.3900 0.9900• 0.0900 0.2900• 1.2900 1.0900• 0.4900 0.7900• 0.1900 -0.3100• -0.8100 -0.8100• -0.3100 -0.3100• -0.7100 -1.0100

(1)Get some data and subtract the mean

example

0.6154 0.7166

eigenvectors = -0.7352 0.6779 0.6779 0.7352 eigenvalues = 0.0491 0 0 1.2840 0.6166 0.6154

(2)Get the covariance matrix Covariance=

(3)Get their eigenvectors & eigenvalues

example

eigenvectors

0.6779 0.7352 -0.7352 0.6779

Example• (4)Choosing components and forming a

feature vector

eigenvectors

eigenvalues

0 1.2840 0.0491 0

0.6779 0.7352 -0.7352 0.6779

A B

B is bigger!

Example• Then we choose two feature vector sets:

(a) A+B

-0.7352 0.6779

0.6779 0.7352 ( feature vector_1)

(b) Only B (Principal Component )

0.6779

0.7352 ( feature vector_2 )

• Modified_data = feature_vector * old_data

example

• X Y

• -0.1751 0.8280• 0.1429 -1.7776• 0.3844 0.9922• 0.1304 0.2742• -0.2095 1.6758• 0.1753 0.9129• -0.3498 -0.0991• 0.0464 -1.1446• 0.0178 -0.4380• -0.1627 -1.2238

(a)feature vector_1

example

example

• x • 0.8280• -1.7776• 0.9922• 0.2742• 1.6758• 0.9129• -0.0991• -1.1446• -0.4380• -1.2238

(b)feature vector_2

Example• (5)Deriving the new data set from feature vector

(a)feature vector_1

(b)feature vector_2

• New_data =

feature_vector_transpose * Modified_data

example

• X Y• 0.6900 0.4900• -1.3100 -1.2100• 0.3900 0.9900• 0.0900 0.2900• 1.2900 1.0900• 0.4900 0.7900• 0.1900 -0.3100• -0.8100 -0.8100• -0.3100 -0.3100• -0.7100 -1.0100

(a)feature vector_1

example

• X Y

• 0.5613 0.6087• -1.2050 -1.3068• 0.6726 0.7294• 0.1859 0.2016• 1.1360 1.2320• 0.6189 0.6712• -0.0672 -0.0729• -0.7759 -0.8415• -0.2969 -0.3220• -0.8296 -0.8997

(b)feature vector_2

example

Sum Up

• 可以降低資料維度• 資料要有相關性比較適合使用• 幾何意義：投影到主向量上

What is ICA?

• Independent Component Analysis

• For separating the blind or unknown sources

• Start with “A cocktail-party problem”

ICA

• The Principle of ICA:

A cocktail-party problem

x1(t)=a11 s1(t) +a12 s2(t) +a13 s3(t)

x2(t)=a21 s1(t) +a22 s2(t) +a12 s3(t)

x3(t)=a31 s1(t) +a32 s2(t) +a33 s3(t)

ICA

X1X2X3

Linear Transformation

S1S2S3

Math model• Given

x1(t),x2(t),x3(t)

• Want to find

s1(t) , s2(t), s3(t)

x1(t)=a11 s1(t) +a12 s2(t) +a13 s3(t)

x2(t)=a21 s1(t) +a22 s2(t) +a12 s3(t)

x3(t)=a31 s1(t) +a32 s2(t) +a33 s3(t)

<=> X=AS

Math model

• Because A,S are Unknown

• We need some assumption(1) S is statistical independent

(2) S is nongaussian distributions

• Goal :

Find a W such that S=WX

X=AS

Theorem

• Using Central limit theorem

The distribution of a sum of independent random variables tends toward a Gaussian distribution

Observed signal = S1 S2 Sna1 + a2 ….+ an

toward Gaussian Non-GaussianNon-GaussianNon-Gaussian

Theorem• Given x = As Let y = wTx z = ATw => y = wTAs = zTs

= S1 S2 Snz1 + z2 ….+ zn

toward Gaussian Non-GaussianNon-GaussianNon-Gaussian

Observed signal = X1 X2 Xnw1 + w2 ….+ wn

Theorem

• Find a w such that

Maximization of NonGaussianity of

y = wTx

• But how to measure NonGaussianity ?

Y = X1 X2 Xnw1 + w2 ….+ wn

Theorem• Measures of nongaussianity

• Kurtosis:

• As y toward to gaussian ,

F(y) is much closer to zero !!!

F(y) = E{ (y)4 } - 3*[ E{ (y)2 } ] 2

Super-Gaussian kurtosis > 0

Gaussian kurtosis = 0

Sub-Gaussian kurtosis < 0

Steps

• (1) centering & whitening process

• (2) FastICA algorithm

Steps

X1X2X3

Linear Transformation

S1S2S3

FastICAS1S2S3

X1X2X3

centering &whitening

Z1Z2Z3

Correlated uncorrelated independent

example

• Original data

example

• (1) centering & whitening process

example

(2) FastICA algorithm

example

(2) FastICA algorithm

Sum up

• 能讓成份間的統計相關性 (statistical dependent) 達到最小的線性轉換方法

• 可以解決未知訊號分解的問題 ( Blind Source Separation )

Reference

• “A tutorial on Principal Components Analysis”,

Lindsay I Smith , February 26, 2002• “Independent Component Analysis :

Algorithms and Applications “,

Aapo Hyvärinen and Erkki Oja , Neural Networks Research Centre Helsinki University of Technology

• http://www.cis.hut.fi/projects/ica/icademo/

centering & Whitening process

• Let is zero mean

• Then is a whitening matrix

xEDVxz T21

sx A

TEDV 21

TTT EE VxxVzz }{}{ 21

21

EDEDEED TT

I

TTE EDExx }{

• Let D and E be the eigenvalues and eigenvector matrix of

covariance matrix of x, i.e.

For the whitened data z, find a vector w such that the linear combination y=wTz has maximum nongaussianity under the constrain

Maximize | kurt(wTz)| under the simpler constraint that ||w||=1

1}{ 2 yE

Then

centering & Whitening process

ww

wzzw

wzzw

T

TT

TT

E

E

yE

}{

}{

}{1 2

FastICA

1. Centering

2. Whitening

3. Choose m, No. of ICs to estimate. Set counter p 1

4. Choose an initial guess of unit norm for wp, eg. randomly.

5. Let

6. Do deflation decorrelation

7. Let wp wp/||wp||

8. If wp has not converged (|<wpk+1 , wp

k>| 1), go to step 5.

9. Set p p+1. If p m, go back to step 4.

1p

1jjj

Tppp )ww(www

xmxx ~~

IzzVxz }{, TE

23 3}][{ pp

Tpp E wwzwzw

Documents

ICA and PCA