54
Principal Component Analysis CS5240 Theoretical Foundations in Multimedia Leow Wee Kheng Department of Computer Science School of Computing National University of Singapore Leow Wee Kheng (NUS) Principal Component Analysis 1 / 54

Principal Component Analysis - NUS Computingcs5240/lecture/pca.pdf · 2019-09-05 · Principal Component Analysis CS5240 Theoretical Foundations in Multimedia LeowWeeKheng DepartmentofComputerScience

  • Upload
    others

  • View
    54

  • Download
    3

Embed Size (px)

Citation preview

Page 1: Principal Component Analysis - NUS Computingcs5240/lecture/pca.pdf · 2019-09-05 · Principal Component Analysis CS5240 Theoretical Foundations in Multimedia LeowWeeKheng DepartmentofComputerScience

Principal Component Analysis

CS5240 Theoretical Foundations in Multimedia

Leow Wee Kheng

Department of Computer Science

School of Computing

National University of Singapore

Leow Wee Kheng (NUS) Principal Component Analysis 1 / 54

Page 2: Principal Component Analysis - NUS Computingcs5240/lecture/pca.pdf · 2019-09-05 · Principal Component Analysis CS5240 Theoretical Foundations in Multimedia LeowWeeKheng DepartmentofComputerScience

Motivation

Motivation

How wide is the widest part of NGC 1300?

Leow Wee Kheng (NUS) Principal Component Analysis 2 / 54

Page 3: Principal Component Analysis - NUS Computingcs5240/lecture/pca.pdf · 2019-09-05 · Principal Component Analysis CS5240 Theoretical Foundations in Multimedia LeowWeeKheng DepartmentofComputerScience

Motivation

How thick is the thickest part of NGC 4594?

Use principal component analysis.

Leow Wee Kheng (NUS) Principal Component Analysis 3 / 54

Page 4: Principal Component Analysis - NUS Computingcs5240/lecture/pca.pdf · 2019-09-05 · Principal Component Analysis CS5240 Theoretical Foundations in Multimedia LeowWeeKheng DepartmentofComputerScience

Maximum Variance Estimate

Maximum Variance Estimate

Consider a set of points xi, i = 1, . . . , n, in an m-dimensional spacesuch that their mean µ = 0, i.e., centroid is at the origin.

x i

di

yi

x1

x2

x4

x1

x2

x3

O

q

l

Want to find a line l through the origin that maximizes the projectionsyi of the points xi on l.

Leow Wee Kheng (NUS) Principal Component Analysis 4 / 54

Page 5: Principal Component Analysis - NUS Computingcs5240/lecture/pca.pdf · 2019-09-05 · Principal Component Analysis CS5240 Theoretical Foundations in Multimedia LeowWeeKheng DepartmentofComputerScience

Maximum Variance Estimate

Let q denote the unit vector along line l.Then, the projection yi of xi on l is

yi = x⊤

i q. (1)

The mean squared projection, which is the variance, V over all points is

V =1

n

n∑

i=1

y2i =1

n

n∑

i=1

(x⊤

i q)2 ≥ 0. (2)

Because x⊤

i q = q⊤xi, expanding Eq. 2 gives

V =1

n

n∑

i=1

(q⊤xi)(x⊤

i q) = q⊤

[1

n

n∑

i=1

xix⊤

i

]q. (3)

The middle factor is the covariance matrix C of the data points

C =1

n

n∑

i=1

xix⊤

i . (4)

Leow Wee Kheng (NUS) Principal Component Analysis 5 / 54

Page 6: Principal Component Analysis - NUS Computingcs5240/lecture/pca.pdf · 2019-09-05 · Principal Component Analysis CS5240 Theoretical Foundations in Multimedia LeowWeeKheng DepartmentofComputerScience

Maximum Variance Estimate

We want to find a unit vector q that maximizes the variance V , i.e.,

maximize V = q⊤Cq subject to ‖q‖ = 1. (5)

This is a constrained optimization problem.

Use Lagrange multiplier method:combine V and the constraint using Lagrange multiplier λ

maximize V ′ = q⊤Cq− λ(q⊤q− 1). (6)

Leow Wee Kheng (NUS) Principal Component Analysis 6 / 54

Page 7: Principal Component Analysis - NUS Computingcs5240/lecture/pca.pdf · 2019-09-05 · Principal Component Analysis CS5240 Theoretical Foundations in Multimedia LeowWeeKheng DepartmentofComputerScience

Maximum Variance Estimate Lagrange Multiplier Method

Lagrange Multiplier Method

Lagrange multiplier is a method for solving constrained optimization.

Consider this problem:

maximize f(x) subject to g(x) = c. (7)

Lagrange multiplier method introduces a Lagrange multiplier λ tocombine f(x) and g(x):

L(x, λ) = f(x) + λ (g(x)− c). (8)

The sign of λ can be positive or negative.

Then, solve for the stationary point of L(x, λ):

∂L(x, λ)

∂x= 0. (9)

Leow Wee Kheng (NUS) Principal Component Analysis 7 / 54

Page 8: Principal Component Analysis - NUS Computingcs5240/lecture/pca.pdf · 2019-09-05 · Principal Component Analysis CS5240 Theoretical Foundations in Multimedia LeowWeeKheng DepartmentofComputerScience

Maximum Variance Estimate Lagrange Multiplier Method

◮ If x0 is a solution of the original problem, then there is a λ0 suchthat (x0, λ0) is a stationary point of L.

◮ Not all stationary points yield solutions of the original problem.

◮ The same method applies to minimization problem.

◮ Multiple constraints can be combined by adding multiple terms.

minimize f(x) subject to g1(x) = c1, g2(x) = c2 (10)

is solved with

L(x) = f(x) + λ1(g1(x)− c1) + λ2(g2(x)− c2). (11)

Leow Wee Kheng (NUS) Principal Component Analysis 8 / 54

Page 9: Principal Component Analysis - NUS Computingcs5240/lecture/pca.pdf · 2019-09-05 · Principal Component Analysis CS5240 Theoretical Foundations in Multimedia LeowWeeKheng DepartmentofComputerScience

Maximum Variance Estimate

Now, we differentiate V ′ with respect to q and set to 0:

∂V ′

∂q= 2q⊤C− 2λq⊤ = 0. (12)

Rearranging the terms gives

q⊤C = λq⊤. (13)

Since the covariance matrix C is symmetric (homework), C⊤ = C.So, transposing both sides of Eq. 13 gives

Cq = λq. (14)

◮ Eq. 14 is called an eigenvector equation.

◮ q is the eigenvector.

◮ λ is the eigenvalue.

Thus, the eigenvector q of C gives the line that maximizes variance V .

Leow Wee Kheng (NUS) Principal Component Analysis 9 / 54

Page 10: Principal Component Analysis - NUS Computingcs5240/lecture/pca.pdf · 2019-09-05 · Principal Component Analysis CS5240 Theoretical Foundations in Multimedia LeowWeeKheng DepartmentofComputerScience

Maximum Variance Estimate

The perpendicular distance di of xi from the line l is

di = ‖xi − yiq‖ = ‖xi − (x⊤

i q)q‖. (15)

x i

di

yi

x1

x2

x4

x1

x2

x3

O

q

l

Leow Wee Kheng (NUS) Principal Component Analysis 10 / 54

Page 11: Principal Component Analysis - NUS Computingcs5240/lecture/pca.pdf · 2019-09-05 · Principal Component Analysis CS5240 Theoretical Foundations in Multimedia LeowWeeKheng DepartmentofComputerScience

Maximum Variance Estimate

The squared distance is

d2i = ‖xi − (x⊤

i q)q‖2 = (xi − (x⊤

i q)q)⊤(xi − (x⊤

i q)q)

= x⊤

i xi − x⊤

i (x⊤

i q)q− (x⊤

i q)q⊤xi + (x⊤

i q)q⊤(x⊤

i q)q.

With x⊤

i q being a scalar and x⊤

i q = q⊤xi, we obtain

d2i = x⊤

i xi − (x⊤

i q)2 − (x⊤

i q)2 + (x⊤

i q)2(q⊤q)

= x⊤

i xi − (x⊤

i q)2.

Averaging over all i gives

D =1

n

n∑

i=1

d2i =1

n

n∑

i=1

x⊤

i xi − V. (16)

So, maximizing V means minimizing D.

The eigenvalue λ = V (homework); so λ ≥ 0.

Leow Wee Kheng (NUS) Principal Component Analysis 11 / 54

Page 12: Principal Component Analysis - NUS Computingcs5240/lecture/pca.pdf · 2019-09-05 · Principal Component Analysis CS5240 Theoretical Foundations in Multimedia LeowWeeKheng DepartmentofComputerScience

Maximum Variance Estimate

Name q as the first eigenvector q1.The component of xi orthogonal to q1 is x′

i = xi − x⊤

i q1q1.

Repeat previous method on x′

i in an (m− 1)-D subspace orthogonal toq1 to get q2. Then, repeat to get q3, . . . ,qm.

x1

x2

x3

orthogonalsubspace

q1

O

Leow Wee Kheng (NUS) Principal Component Analysis 12 / 54

Page 13: Principal Component Analysis - NUS Computingcs5240/lecture/pca.pdf · 2019-09-05 · Principal Component Analysis CS5240 Theoretical Foundations in Multimedia LeowWeeKheng DepartmentofComputerScience

Eigendecomposition

Eigendecomposition

In general, a m×m covariance matrix C has m eigenvectors:

Cqj = λj qj , j = 1, . . . ,m. (17)

Transpose both sides of the equations to get

q⊤

j C = λj q⊤

j , j = 1, . . . ,m, (18)

which are row matrices. Stack the row matrices into a column to get

q⊤1...

q⊤m

C =

λ1 0 0

0. . . 0

0 0 λm

q⊤1...

q⊤m

(19)

Denote matrix Q and Λ as

Q = [q1 · · · qm], Λ = diag(λ1, . . . , λm). (20)

Leow Wee Kheng (NUS) Principal Component Analysis 13 / 54

Page 14: Principal Component Analysis - NUS Computingcs5240/lecture/pca.pdf · 2019-09-05 · Principal Component Analysis CS5240 Theoretical Foundations in Multimedia LeowWeeKheng DepartmentofComputerScience

Eigendecomposition

Then, Eq. 19 becomesQ⊤C = ΛQ⊤ (21)

orC = QΛQ⊤. (22)

This is the matrix equation for eigendecomposition.

Properties:

◮ The eigenvectors are orthonormal:

q⊤

j qj = 1,

q⊤

j qk = 0, for k 6= j.(23)

So, the eigenmatrix Q is orthogonal:

Q⊤Q = QQ⊤ = I. (24)

◮ The eigenvalues are arranged to be sorted λj ≥ λj+1.

Leow Wee Kheng (NUS) Principal Component Analysis 14 / 54

Page 15: Principal Component Analysis - NUS Computingcs5240/lecture/pca.pdf · 2019-09-05 · Principal Component Analysis CS5240 Theoretical Foundations in Multimedia LeowWeeKheng DepartmentofComputerScience

General PCA

General PCA

In general, the mean µ of the data points xi is not at the origin.In this case, we subtract µ from each xi,obtaining shifted or zero-mean data points xi − µ.

PCA transforms xi into a new vector yi through Q as follows:

yi = Q⊤(xi − µ) =

q⊤1 (xi − µ)

...q⊤m(xi − µ)

=

m∑

j=1

(xi − µ)⊤qj qj , (25)

Each component of yi is

yij = (xi − µ)⊤qj . (26)

This is the projection of xi − µ on qj .

Leow Wee Kheng (NUS) Principal Component Analysis 15 / 54

Page 16: Principal Component Analysis - NUS Computingcs5240/lecture/pca.pdf · 2019-09-05 · Principal Component Analysis CS5240 Theoretical Foundations in Multimedia LeowWeeKheng DepartmentofComputerScience

General PCA

The original xi can be recovered from yi:

xi = Qyi + µ. (27)

Caution!

◮ xi 6= yi + µ.

◮ xi is in the original input space but yi is in the eigenspace.Let xj denote the unit vectors that form the input space.qj are the eigenvectors that form the eigenspace. Then,

xi = (xi1, xi2, . . . , xim) =m∑

j=1

xijxj ,

yi = (yi1, yi2, . . . , yim) =

m∑

j=1

yijqj .

Leow Wee Kheng (NUS) Principal Component Analysis 16 / 54

Page 17: Principal Component Analysis - NUS Computingcs5240/lecture/pca.pdf · 2019-09-05 · Principal Component Analysis CS5240 Theoretical Foundations in Multimedia LeowWeeKheng DepartmentofComputerScience

General PCA

Leow Wee Kheng (NUS) Principal Component Analysis 17 / 54

Page 18: Principal Component Analysis - NUS Computingcs5240/lecture/pca.pdf · 2019-09-05 · Principal Component Analysis CS5240 Theoretical Foundations in Multimedia LeowWeeKheng DepartmentofComputerScience

General PCA Properties of PCA

Properties of PCA

◮ Mean µy over all yi is 0 (homework).

◮ Variance σ2j along qj is λj (homework).

◮ Since λ1 ≥ · · · ≥ λm ≥ 0, so σ1 ≥ · · · ≥ σm ≥ 0.

◮ q1 gives orientation of the largest variance.

◮ q2 gives orientation of largest variance orthogonal to q1

(2nd largest variance).

◮ qj gives orientation of largest variance orthogonal to q1, . . . ,qj−1

(j-th largest variance).

◮ qm is orthogonal to all other eigenvectors (least variance).

Leow Wee Kheng (NUS) Principal Component Analysis 18 / 54

Page 19: Principal Component Analysis - NUS Computingcs5240/lecture/pca.pdf · 2019-09-05 · Principal Component Analysis CS5240 Theoretical Foundations in Multimedia LeowWeeKheng DepartmentofComputerScience

General PCA Properties of PCA

Data Points

Leow Wee Kheng (NUS) Principal Component Analysis 19 / 54

Page 20: Principal Component Analysis - NUS Computingcs5240/lecture/pca.pdf · 2019-09-05 · Principal Component Analysis CS5240 Theoretical Foundations in Multimedia LeowWeeKheng DepartmentofComputerScience

General PCA Properties of PCA

Centroid at Origin

Leow Wee Kheng (NUS) Principal Component Analysis 20 / 54

Page 21: Principal Component Analysis - NUS Computingcs5240/lecture/pca.pdf · 2019-09-05 · Principal Component Analysis CS5240 Theoretical Foundations in Multimedia LeowWeeKheng DepartmentofComputerScience

General PCA Properties of PCA

Principal Components

Leow Wee Kheng (NUS) Principal Component Analysis 21 / 54

Page 22: Principal Component Analysis - NUS Computingcs5240/lecture/pca.pdf · 2019-09-05 · Principal Component Analysis CS5240 Theoretical Foundations in Multimedia LeowWeeKheng DepartmentofComputerScience

General PCA Properties of PCA

Another Example

Leow Wee Kheng (NUS) Principal Component Analysis 22 / 54

Page 23: Principal Component Analysis - NUS Computingcs5240/lecture/pca.pdf · 2019-09-05 · Principal Component Analysis CS5240 Theoretical Foundations in Multimedia LeowWeeKheng DepartmentofComputerScience

General PCA Properties of PCA

Centroid at Origin

Leow Wee Kheng (NUS) Principal Component Analysis 23 / 54

Page 24: Principal Component Analysis - NUS Computingcs5240/lecture/pca.pdf · 2019-09-05 · Principal Component Analysis CS5240 Theoretical Foundations in Multimedia LeowWeeKheng DepartmentofComputerScience

General PCA Properties of PCA

Principal Components

Leow Wee Kheng (NUS) Principal Component Analysis 24 / 54

Page 25: Principal Component Analysis - NUS Computingcs5240/lecture/pca.pdf · 2019-09-05 · Principal Component Analysis CS5240 Theoretical Foundations in Multimedia LeowWeeKheng DepartmentofComputerScience

General PCA Properties of PCA

Special Cases: Points on a rectangle.

−3 −2 −1 0 1 2 3−2

−1

0

1

2

−3 −2 −1 0 1 2 3−2

−1

0

1

2

−3 −2 −1 0 1 2 3−2

−1

0

1

2

−3 −2 −1 0 1 2 3−2

−1

0

1

2

Blue line: direction of 1st principal component.

To verify the results, run the program in Programming Homework 3 on these data.

Leow Wee Kheng (NUS) Principal Component Analysis 25 / 54

Page 26: Principal Component Analysis - NUS Computingcs5240/lecture/pca.pdf · 2019-09-05 · Principal Component Analysis CS5240 Theoretical Foundations in Multimedia LeowWeeKheng DepartmentofComputerScience

General PCA Properties of PCA

Special Cases: Points on a rectangle.

−3 −2 −1 0 1 2 3−2

−1

0

1

2

−3 −2 −1 0 1 2 3−2

−1

0

1

2

−2 −1 0 1 2−2

−1

0

1

2

−2 −1 0 1 2−2

−1

0

1

2

Blue line: direction of 1st principal component.

Leow Wee Kheng (NUS) Principal Component Analysis 26 / 54

Page 27: Principal Component Analysis - NUS Computingcs5240/lecture/pca.pdf · 2019-09-05 · Principal Component Analysis CS5240 Theoretical Foundations in Multimedia LeowWeeKheng DepartmentofComputerScience

General PCA Properties of PCA

Special Cases: Points on a square / diamond.

−1 0 1

−1

0

1

−1 0 1

−1

0

1

Blue line: direction of 1st principal component.

This is a surprise! But, it is easy to see:The 4 points are equi-distant from the origin.So, they actually lie on a sphere!

A sphere has an infinite number of possible first principal componentsalong any diameter.

Leow Wee Kheng (NUS) Principal Component Analysis 27 / 54

Page 28: Principal Component Analysis - NUS Computingcs5240/lecture/pca.pdf · 2019-09-05 · Principal Component Analysis CS5240 Theoretical Foundations in Multimedia LeowWeeKheng DepartmentofComputerScience

General PCA Properties of PCA

Special Cases: Points on a diamond.

−1 0 1

−1

0

1

What about this case? Try it out yourself!

Leow Wee Kheng (NUS) Principal Component Analysis 28 / 54

Page 29: Principal Component Analysis - NUS Computingcs5240/lecture/pca.pdf · 2019-09-05 · Principal Component Analysis CS5240 Theoretical Foundations in Multimedia LeowWeeKheng DepartmentofComputerScience

PCA Algorithm 1

PCA Algorithm 1

Let xi denote m-dimensional vectors (data points), i = 1, ..., n.

1. Compute the mean vector of the data points

µ =1

n

n∑

i=1

xi. (28)

2. Compute the covariance matrix

C =1

n

n∑

i=1

(xi − µ)(xi − µ)⊤. (29)

3. Perform eigendecomposition of C

C = QΛQ⊤. (30)

Leow Wee Kheng (NUS) Principal Component Analysis 29 / 54

Page 30: Principal Component Analysis - NUS Computingcs5240/lecture/pca.pdf · 2019-09-05 · Principal Component Analysis CS5240 Theoretical Foundations in Multimedia LeowWeeKheng DepartmentofComputerScience

PCA Algorithm 1

◮ Some books and papers use sample covariance

C =1

n− 1

n∑

i=1

(xi − µ)(xi − µ)⊤. (31)

Eq. 31 differs from Eq. 29 by only a constant.

Leow Wee Kheng (NUS) Principal Component Analysis 30 / 54

Page 31: Principal Component Analysis - NUS Computingcs5240/lecture/pca.pdf · 2019-09-05 · Principal Component Analysis CS5240 Theoretical Foundations in Multimedia LeowWeeKheng DepartmentofComputerScience

PCA Algorithm 1

Notes:

◮ Covariance matrix C is a m×m matrix.

◮ Eigendecomposition of very large matrix is inefficient.

◮ Example:◮ A 256×256 colour image has 256×256× 3 values.◮ Number of dimensions m = 196680.◮ C has a size of 196608×196608!!◮ Number of images n is usually ≪ m, e.g., 1000.

Leow Wee Kheng (NUS) Principal Component Analysis 31 / 54

Page 32: Principal Component Analysis - NUS Computingcs5240/lecture/pca.pdf · 2019-09-05 · Principal Component Analysis CS5240 Theoretical Foundations in Multimedia LeowWeeKheng DepartmentofComputerScience

PCA Algorithm 2

PCA Algorithm 2

1. Compute mean µ of data points xi.

µ =1

n

n∑

i=1

xi.

2. Form a m×n matrix A, n≪ m:

A = [(x1 − µ) · · · (xn − µ)]. (32)

3. Compute A⊤A, which is just a n×n matrix.

4. Apply eigendecomposition on A⊤A:

(A⊤A)qj = λjqj . (33)

5. Pre-multiply A to Eq. 33 giving

AA⊤Aqj = Aλjqj . (34)

Leow Wee Kheng (NUS) Principal Component Analysis 32 / 54

Page 33: Principal Component Analysis - NUS Computingcs5240/lecture/pca.pdf · 2019-09-05 · Principal Component Analysis CS5240 Theoretical Foundations in Multimedia LeowWeeKheng DepartmentofComputerScience

PCA Algorithm 2

6. Recover eigenvectors and eigenvalues.

Since AA⊤ = nC (homework), Eq. 34 is

nC (Aqj) = λj(Aqj) (35)

C (Aqj) =λj

n(Aqj). (36)

Therefore, eigenvectors of C are Aqj , andeigenvalues of C are λj/n.

Leow Wee Kheng (NUS) Principal Component Analysis 33 / 54

Page 34: Principal Component Analysis - NUS Computingcs5240/lecture/pca.pdf · 2019-09-05 · Principal Component Analysis CS5240 Theoretical Foundations in Multimedia LeowWeeKheng DepartmentofComputerScience

PCA Algorithm 3

PCA Algorithm 3

1. Compute mean µ of data points xi.

µ =1

n

n∑

i=1

xi.

2. Form a m×n matrix A:

A = [(x1 − µ) · · · (xn − µ)].

3. Apply singular value decomposition (SVD) on A:

A = UΣV⊤.

4. Recover eigenvectors and eigenvalues (page 31).

Leow Wee Kheng (NUS) Principal Component Analysis 34 / 54

Page 35: Principal Component Analysis - NUS Computingcs5240/lecture/pca.pdf · 2019-09-05 · Principal Component Analysis CS5240 Theoretical Foundations in Multimedia LeowWeeKheng DepartmentofComputerScience

PCA Algorithm 3 Singular Value Decomposition

Singular Value Decomposition

Singular value decomposition (SVD) decomposes a matrix A into

A = UΣV⊤. (37)

◮ Column vectors of U are left singular vectors uj , anduj are orthonormal.

◮ Column vectors of V are right singular vectors vj , andvj are orthonormal.

◮ Σ is diagonal and contains singular values sj .

◮ Rank of A = number of non-zero singular values.

Refer to Appendix for more details.

Leow Wee Kheng (NUS) Principal Component Analysis 35 / 54

Page 36: Principal Component Analysis - NUS Computingcs5240/lecture/pca.pdf · 2019-09-05 · Principal Component Analysis CS5240 Theoretical Foundations in Multimedia LeowWeeKheng DepartmentofComputerScience

PCA Algorithm 3 Singular Value Decomposition

Notice that

AA⊤ = UΣV⊤

(UΣV⊤

)⊤

= UΣV⊤VΣU⊤ = UΣ2U⊤

and eigendecomposition of AA⊤ is

AA⊤ = QΛQ⊤.

So, eigenvector of AA⊤ = uj , eigenvalue of AA⊤ = s2j .

On the other hand,

A⊤A =(UΣV⊤

)⊤

UΣV⊤ = VΣU⊤UΣV⊤ = VΣ2V⊤.

Compare with the eigendecomposition of A⊤A:

A⊤A = QΛQ⊤.

So, eigenvector of A⊤A = vj , eigenvalue of A⊤A = s2j .

Leow Wee Kheng (NUS) Principal Component Analysis 36 / 54

Page 37: Principal Component Analysis - NUS Computingcs5240/lecture/pca.pdf · 2019-09-05 · Principal Component Analysis CS5240 Theoretical Foundations in Multimedia LeowWeeKheng DepartmentofComputerScience

PCA Algorithm 3

4. Recover eigenvectors and eigenvalues.

Compare AA⊤ = nC with eigendecomposition of C:

AA⊤ = UΣ2U⊤,

C = QΛQ⊤.

Therefore, eigenvectors qj of C = uj in U, and

eigenvalues λj of C = s2j/n, for sj in Σ.

Leow Wee Kheng (NUS) Principal Component Analysis 37 / 54

Page 38: Principal Component Analysis - NUS Computingcs5240/lecture/pca.pdf · 2019-09-05 · Principal Component Analysis CS5240 Theoretical Foundations in Multimedia LeowWeeKheng DepartmentofComputerScience

Application Examples

Application Examples

Apply PCA for line fitting in 2-D.

Compute PCA of the points. Then,

◮ mean of points = a point on line

◮ 1st eigenvector = unit vector along line

xi1x1

x2

xi2 ip

f

Leow Wee Kheng (NUS) Principal Component Analysis 38 / 54

Page 39: Principal Component Analysis - NUS Computingcs5240/lecture/pca.pdf · 2019-09-05 · Principal Component Analysis CS5240 Theoretical Foundations in Multimedia LeowWeeKheng DepartmentofComputerScience

Application Examples

Apply PCA for plane fitting in 3-D.

Compute PCA of the points. Then,

◮ mean of points = a point on plane

◮ 3rd eigenvector = unit normal vector of plane

Leow Wee Kheng (NUS) Principal Component Analysis 39 / 54

Page 40: Principal Component Analysis - NUS Computingcs5240/lecture/pca.pdf · 2019-09-05 · Principal Component Analysis CS5240 Theoretical Foundations in Multimedia LeowWeeKheng DepartmentofComputerScience

Dimensionality Reduction

Dimensionality Reduction

If a point represents a 256×256 colour image,then the dimensionality is 256×256×3 = 196680!!

But, not all 196680 eigenvalues are non-zero.

Leow Wee Kheng (NUS) Principal Component Analysis 40 / 54

Page 41: Principal Component Analysis - NUS Computingcs5240/lecture/pca.pdf · 2019-09-05 · Principal Component Analysis CS5240 Theoretical Foundations in Multimedia LeowWeeKheng DepartmentofComputerScience

Dimensionality Reduction

Case 1: Number of data vectors n ≤ m number of dimensions.

◮ Data vectors are all independent.Then, number of non-zero eigenvalues (eigenvectors) = n− 1,i.e., rank of covariance matrix = n− 1.Why not = n?

◮ Data vectors are not independent.Then, rank of covariance matrix < n− 1.

Case 2: Number of data vectors n > m.

◮ m or more independent data vectors.Then, rank of covariance matrix = m.

◮ Fewer than m data vectors are independent.In this case, what is the rank of covariance matrix?

In practice, it is often possible to reduce the dimensionality.

Leow Wee Kheng (NUS) Principal Component Analysis 41 / 54

Page 42: Principal Component Analysis - NUS Computingcs5240/lecture/pca.pdf · 2019-09-05 · Principal Component Analysis CS5240 Theoretical Foundations in Multimedia LeowWeeKheng DepartmentofComputerScience

Dimensionality Reduction

Eigenmatrix Q isQ = [q1 · · · qm] . (38)

PCA maps a data point x to a vector y in the eigenspace as

y = Q⊤(x− µ) =m∑

j=1

(x− µ)⊤qjqj , (39)

which has m dimensions.

Pick l eigenvectors with the largest eigenvalues to form truncated Q:

Q = [q1 · · · ql], (40)

which spans a subspace of the eigenspace.

Then, Q maps x to y, an estimate of y:

y = Q⊤(x− µ) =l∑

j=1

(x− µ)⊤qjqj , (41)

which has l < m dimensions.

Leow Wee Kheng (NUS) Principal Component Analysis 42 / 54

Page 43: Principal Component Analysis - NUS Computingcs5240/lecture/pca.pdf · 2019-09-05 · Principal Component Analysis CS5240 Theoretical Foundations in Multimedia LeowWeeKheng DepartmentofComputerScience

Dimensionality Reduction

x y y x

x1 y1 y1 x1

...Q⊤

−→...

...Q⊤

←−...

...Q←− yl yl

Q−→/

...

... yl+1

...

......

...

xm ym xm

dimensionalityreduction

Difference between y and y is

y − y =m∑

j=l+1

(x− µ)⊤qjqj . (42)

Leow Wee Kheng (NUS) Principal Component Analysis 43 / 54

Page 44: Principal Component Analysis - NUS Computingcs5240/lecture/pca.pdf · 2019-09-05 · Principal Component Analysis CS5240 Theoretical Foundations in Multimedia LeowWeeKheng DepartmentofComputerScience

Dimensionality Reduction

Beware:y = Q⊤(x− µ)

and, therefore,x = Qy + µ.

But,y = Q⊤(x− µ),

whereasx = Q y + µ 6= x. (43)

Why?

With n data points xi, sum-squared error E between xi and itsestimate xi is

E =n∑

i=1

‖xi − xi‖2. (44)

Leow Wee Kheng (NUS) Principal Component Analysis 44 / 54

Page 45: Principal Component Analysis - NUS Computingcs5240/lecture/pca.pdf · 2019-09-05 · Principal Component Analysis CS5240 Theoretical Foundations in Multimedia LeowWeeKheng DepartmentofComputerScience

Dimensionality Reduction

When all m dimensions are kept, the total variance is

m∑

j=1

σ2j =

m∑

j=1

λj . (45)

When l dimensions are used, the ratio R′ of unaccounted variance is:

R′(l) =

m∑

j=l+1

σ2j

m∑

j=1

σ2j

=

m∑

j=l+1

λj

m∑

j=1

λj

. (46)

Leow Wee Kheng (NUS) Principal Component Analysis 45 / 54

Page 46: Principal Component Analysis - NUS Computingcs5240/lecture/pca.pdf · 2019-09-05 · Principal Component Analysis CS5240 Theoretical Foundations in Multimedia LeowWeeKheng DepartmentofComputerScience

Dimensionality Reduction

A sample plot of R′ vs. number of dimensions used:

notenough

number ofdimensionsml

enough

1

0

R’

How to choose appropriate l?Choose l such that larger than l doesn’t reduce R′ significantly:

R′(l)−R′(l + 1) < ǫ. (47)

Leow Wee Kheng (NUS) Principal Component Analysis 46 / 54

Page 47: Principal Component Analysis - NUS Computingcs5240/lecture/pca.pdf · 2019-09-05 · Principal Component Analysis CS5240 Theoretical Foundations in Multimedia LeowWeeKheng DepartmentofComputerScience

Dimensionality Reduction

Alternatively, compute ratio R of accounted variance:

R(l) =

l∑

j=1

σ2j

m∑

j=1

σ2j

=

l∑

j=1

λj

m∑

j=1

λj

. (48)

l is large enough ifR(l + 1)−R(l) < ǫ.

number ofdimensions

notenough

ml

1

0

R

enough

Leow Wee Kheng (NUS) Principal Component Analysis 47 / 54

Page 48: Principal Component Analysis - NUS Computingcs5240/lecture/pca.pdf · 2019-09-05 · Principal Component Analysis CS5240 Theoretical Foundations in Multimedia LeowWeeKheng DepartmentofComputerScience

Summary

Summary

◮ Eigendecomposition of covariance matrix = PCA.

◮ In practice, PCA is computed using SVD.

◮ PCA maximizes variances along the eigenvectors.

◮ Eigenvalues = variances along eigenvectors.

◮ Best fitting plane computed by PCA minimizes distance to points.

◮ PCA can be used for dimensionality reduction.

Leow Wee Kheng (NUS) Principal Component Analysis 48 / 54

Page 49: Principal Component Analysis - NUS Computingcs5240/lecture/pca.pdf · 2019-09-05 · Principal Component Analysis CS5240 Theoretical Foundations in Multimedia LeowWeeKheng DepartmentofComputerScience

Probing Questions

Probing Questions

◮ When applying PCA to plane fitting, how to check whether thepoints really lie close to the plane?

◮ If you apply PCA to a set of points on a curve surface, where doyou expect the eigenvectors to point at?

◮ In dimensionality reduction, the m-D vector y is reduced to a l-Dvector y. Since l 6= m, vector subtraction is undefined. But,subtraction of Eq. 39 and 41 gives

y − y =m∑

j=l+1

(x− µ)⊤qjqj .

How is this possible? Is there a contradiction?

◮ Show that when n ≤ m, there are at most n− 1 eigenvectors.

Leow Wee Kheng (NUS) Principal Component Analysis 49 / 54

Page 50: Principal Component Analysis - NUS Computingcs5240/lecture/pca.pdf · 2019-09-05 · Principal Component Analysis CS5240 Theoretical Foundations in Multimedia LeowWeeKheng DepartmentofComputerScience

Homework

Homework I

1. Describe the essence of PCA in one sentence.

2. Show that the covariance matrix C of a set of data points xi,i = 1, . . . , n, is symmetric about its diagonal.

3. Show that the eigenvalue λ of covariance matrix C equals thevariance V as defined in Eq. 2.

4. Show that the mean µy over all yi in the eigenspace is 0.

5. Show that the variance σ2j along eigenvector qj is λj .

6. Show that AA⊤ = nC.

7. Derive the difference Qy − Q y in dimensionality reduction.

Leow Wee Kheng (NUS) Principal Component Analysis 50 / 54

Page 51: Principal Component Analysis - NUS Computingcs5240/lecture/pca.pdf · 2019-09-05 · Principal Component Analysis CS5240 Theoretical Foundations in Multimedia LeowWeeKheng DepartmentofComputerScience

Homework

Homework II

8. Suppose n ≤ m, and the truncated eigenmatrix Q contains all theeigenvectors with non-zero eigenvalues, Q = [q1 · · · qn−1].

Now, map an input vector x to y and y respectively by thecomplete Q (Eq. 39) and the truncated Q (Eq. 41). What is theerror y − y?

What is the result of mapping y back to the input space by Q(Eq. 43)?

9. Q3 of AY2015/16 Final Evaluation.

Leow Wee Kheng (NUS) Principal Component Analysis 51 / 54

Page 52: Principal Component Analysis - NUS Computingcs5240/lecture/pca.pdf · 2019-09-05 · Principal Component Analysis CS5240 Theoretical Foundations in Multimedia LeowWeeKheng DepartmentofComputerScience

Appendix

Appendix

Consider a general m×n matrix A, where m > n. The SVD of A gives

A = UΣV⊤, (49)

◮ U is m×m orthogonal matrix,

◮ Σ is m×n diagonal matrix, and

◮ V is n×n orthogonal matrix.

This is called full PCA.

Since m > n, Σ has the structure

Σ =

[Σn

0m−n

],

◮ Σn is n×n diagonal matrix,

◮ 0m−n is (m− n)×n zero matrix.

Leow Wee Kheng (NUS) Principal Component Analysis 52 / 54

Page 53: Principal Component Analysis - NUS Computingcs5240/lecture/pca.pdf · 2019-09-05 · Principal Component Analysis CS5240 Theoretical Foundations in Multimedia LeowWeeKheng DepartmentofComputerScience

Appendix

So, Eq. 49 can be written in this form

A =[Un Um−n

] [ Σn

0m−n

]V⊤. (50)

That is,A = UnΣnV

⊤. (51)

This is called Thin SVD.

In practice, we usually compute thin SVD instead of full SVD becauseit is useless to compute Um−n.

If the rank r of A is less than n, then can also computecompact SVD as follows:

A = UrΣrV⊤

r , (52)

where Ur is m×r, Σr is r×r diagonal, and Vr is r×r.

Leow Wee Kheng (NUS) Principal Component Analysis 53 / 54

Page 54: Principal Component Analysis - NUS Computingcs5240/lecture/pca.pdf · 2019-09-05 · Principal Component Analysis CS5240 Theoretical Foundations in Multimedia LeowWeeKheng DepartmentofComputerScience

References

References

1. G. Strang, Introduction to Linear Algebra, 4th ed., Wellesley-Cambridge,2009. www-math.mit.edu/~gs

2. C. Shalizi, Advanced Data Analysis from an Elementary Point of View,2015. www.stat.cmu.edu/~cshalizi/ADAfaEPoV

Leow Wee Kheng (NUS) Principal Component Analysis 54 / 54