25
Principal component analysis MIT Department of Brain and Cognitive Sciences 9.641J, Spring 2005 - Introduction to Neural Networks Instructor: Professor Sebastian Seung

Principal component analysis - MIT OpenCourseWare · to perform principal component analysis. Outline • Variance and covariance • Principal components • Maximizing parallel

Embed Size (px)

Citation preview

Page 1: Principal component analysis - MIT OpenCourseWare · to perform principal component analysis. Outline • Variance and covariance • Principal components • Maximizing parallel

Principal component analysis

MIT Department of Brain and Cognitive Sciences 9.641J, Spring 2005 - Introduction to Neural Networks Instructor: Professor Sebastian Seung

Page 2: Principal component analysis - MIT OpenCourseWare · to perform principal component analysis. Outline • Variance and covariance • Principal components • Maximizing parallel

Hypothesis: Hebbian synaptic plasticity enables perceptrons

to perform principal component analysis

Page 3: Principal component analysis - MIT OpenCourseWare · to perform principal component analysis. Outline • Variance and covariance • Principal components • Maximizing parallel

Outline

• Variance and covariance• Principal components• Maximizing parallel variance• Minimizing perpendicular variance• Neural implementation

– covariance rule

Page 4: Principal component analysis - MIT OpenCourseWare · to perform principal component analysis. Outline • Variance and covariance • Principal components • Maximizing parallel

Principal component

• direction of maximum variance in the input space

• principal eigenvector of the covariance matrix

• goal: relate these two definitions

Page 5: Principal component analysis - MIT OpenCourseWare · to perform principal component analysis. Outline • Variance and covariance • Principal components • Maximizing parallel

Variance

• A random variablefluctuating about its mean value.

• Average of the square of the fluctuations.

δx = x − x

δx( )2 = x2 − x 2

Page 6: Principal component analysis - MIT OpenCourseWare · to perform principal component analysis. Outline • Variance and covariance • Principal components • Maximizing parallel

Covariance

• Pair of random variables, each fluctuating about its mean value.

• Average of product of fluctuations.

δx1 = x1 − x1

δx2 = x2 − x2

δx1δx2 = x1x2 − x1 x2

Page 7: Principal component analysis - MIT OpenCourseWare · to perform principal component analysis. Outline • Variance and covariance • Principal components • Maximizing parallel

Covariance examples

Page 8: Principal component analysis - MIT OpenCourseWare · to perform principal component analysis. Outline • Variance and covariance • Principal components • Maximizing parallel

Covariance matrix

• N random variables• NxN symmetric matrix

• Diagonal elements are variances

Cij = xix j − xi x j

Page 9: Principal component analysis - MIT OpenCourseWare · to perform principal component analysis. Outline • Variance and covariance • Principal components • Maximizing parallel

Principal components

• eigenvectors with klargest eigenvalues

• Now you can calculate them, but what do they mean?

Page 10: Principal component analysis - MIT OpenCourseWare · to perform principal component analysis. Outline • Variance and covariance • Principal components • Maximizing parallel

Handwritten digits

Page 11: Principal component analysis - MIT OpenCourseWare · to perform principal component analysis. Outline • Variance and covariance • Principal components • Maximizing parallel

Principal component in 2d

Page 12: Principal component analysis - MIT OpenCourseWare · to perform principal component analysis. Outline • Variance and covariance • Principal components • Maximizing parallel

One-dimensional projection

Page 13: Principal component analysis - MIT OpenCourseWare · to perform principal component analysis. Outline • Variance and covariance • Principal components • Maximizing parallel

Covariance to variance

• From the covariance, the variance of any projection can be calculated.

• Let w be a unit vector

wT x( )2 − wT x2= wTCw

= wiCijw jij∑

Page 14: Principal component analysis - MIT OpenCourseWare · to perform principal component analysis. Outline • Variance and covariance • Principal components • Maximizing parallel

Maximizing parallel variance

• Principal eigenvector of C– the one with the largest eigenvalue.

w* = argmaxw: w =1

wTCw

λmax C( ) = maxw: w =1

wTCw

= w*TCw*

Page 15: Principal component analysis - MIT OpenCourseWare · to perform principal component analysis. Outline • Variance and covariance • Principal components • Maximizing parallel

Orthogonal decomposition

Page 16: Principal component analysis - MIT OpenCourseWare · to perform principal component analysis. Outline • Variance and covariance • Principal components • Maximizing parallel

Total variance is conserved

• Maximizing parallel variance = Minimizing perpendicular variance

Page 17: Principal component analysis - MIT OpenCourseWare · to perform principal component analysis. Outline • Variance and covariance • Principal components • Maximizing parallel

Rubber band computer

Page 18: Principal component analysis - MIT OpenCourseWare · to perform principal component analysis. Outline • Variance and covariance • Principal components • Maximizing parallel

Correlation/covariance rule

• presynaptic activity x• postsynaptic activity y• Hebbian

∆w = ηyx∆w = η y− y( ) x − x( )

Page 19: Principal component analysis - MIT OpenCourseWare · to perform principal component analysis. Outline • Variance and covariance • Principal components • Maximizing parallel

Stochastic gradient ascent

• Assume data has zero mean• Linear perceptron

y = wT x ∆w = ηxxT w

E =12

wTCw

C = xxT∆w = η∂E

∂w

Page 20: Principal component analysis - MIT OpenCourseWare · to perform principal component analysis. Outline • Variance and covariance • Principal components • Maximizing parallel

Preventing divergence

• Bound constraints

Page 21: Principal component analysis - MIT OpenCourseWare · to perform principal component analysis. Outline • Variance and covariance • Principal components • Maximizing parallel

Oja’s rule

• Converges to principal component• Normalized to unit vector

∆w = η yx− y2w( )

Page 22: Principal component analysis - MIT OpenCourseWare · to perform principal component analysis. Outline • Variance and covariance • Principal components • Maximizing parallel

Multiple principal components

• Project out principal component• Find principal component of remaining

variance

Page 23: Principal component analysis - MIT OpenCourseWare · to perform principal component analysis. Outline • Variance and covariance • Principal components • Maximizing parallel

clustering vs. PCA

• Hebb: output x input• binary output

– first-order statistics• linear output

– second-order statistics

Page 24: Principal component analysis - MIT OpenCourseWare · to perform principal component analysis. Outline • Variance and covariance • Principal components • Maximizing parallel

Data vectors

• xa means ath data vector– ath column of the matrix X.

• Xia means matrix element Xia– ith component of xa

• x is a generic data vector• xi means ith component of x

Page 25: Principal component analysis - MIT OpenCourseWare · to perform principal component analysis. Outline • Variance and covariance • Principal components • Maximizing parallel

Correlation matrix

• Generalization of second moment

xix j =1m

Xiaa=1

m

∑ X ja

xxT =1m

XXT