Dimensionality Reduction. High-dimensional == many features Find concepts/topics/genres: –...

Preview:

Citation preview

Dimensionality Reduction

Slides by Jure Leskovec 2

Dimensionality Reduction

• High-dimensional == many features• Find concepts/topics/genres:

– Documents:• Features: Thousands of words, millions of word pairs

– Surveys – Netflix: 480k users x 177k movies

Slides by Jure Leskovec 3

Dimensionality Reduction

• Compress / reduce dimensionality:– 106 rows; 103 columns; no updates– random access to any cell(s); small error: OK

Slides by Jure Leskovec 4

Dimensionality Reduction

• Assumption: Data lies on or near a low d-dimensional subspace

• Axes of this subspace are effective representation of the data

Slides by Jure Leskovec 5

Why Reduce Dimensions?

Why reduce dimensions?• Discover hidden correlations/topics

– Words that occur commonly together• Remove redundant and noisy features

– Not all words are useful• Interpretation and visualization• Easier storage and processing of the data

Slides by Jure Leskovec 6

SVD - Definition

A[m x n] = U[m x r] [ r x r] (V[n x r])T

• A: Input data matrix– m x n matrix (e.g., m documents, n terms)

• U: Left singular vectors – m x r matrix (m documents, r concepts)

• : Singular values– r x r diagonal matrix (strength of each ‘concept’)

(r : rank of the matrix A)• V: Right singular vectors

– n x r matrix (n terms, r concepts)

Slides by Jure Leskovec 7

SVD

Am

n

m

n

U

VT

T

Slides by Jure Leskovec 8

SVD

Am

n

+

1u1v1 2u2v2

σi … scalarui … vectorvi … vector

T

Slides by Jure Leskovec 9

SVD - Properties

It is always possible to decompose a real matrix A into A = U VT , where

• U, , V: unique• U, V: column orthonormal:

– UT U = I; VT V = I (I: identity matrix)– (Cols. are orthogonal unit vectors)

• : diagonal– Entries (singular values) are positive,

and sorted in decreasing order (σ1 σ2 ... 0)

Slides by Jure Leskovec 10

SVD – Example: Users-to-Movies

• A = U VT - example:

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 3 0 0 0 1 1

0.18 0

0.36 0

0.18 0

0.90 0

0 0.53

0 0.800 0.27

=

SciFi

Romnce

9.64 0

0 5.29x

0.58 0.58 0.58 0 0

0 0 0 0.71 0.71

x

Mat

rix

Alie

n

Sere

nity

Casa

blan

ca

Am

elie

Slides by Jure Leskovec 11

SVD – Example: Users-to-Movies

• A = U VT - example:

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 30 0 0 1 1

0.18 0

0.36 0

0.18 0

0.90 0

0 0.53

0 0.800 0.27

=9.64 0

0 5.29x

0.58 0.58 0.58 0 0

0 0 0 0.71 0.71

x

SciFi-conceptRomance-concept

SciFi

Romnce

Mat

rix

Alie

n

Sere

nity

Casa

blan

ca

Am

elie

Slides by Jure Leskovec 12

SVD - Example

• A = U VT - example:

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 30 0 0 1 1

0.18 0

0.36 0

0.18 0

0.90 0

0 0.53

0 0.800 0.27

=9.64 0

0 5.29x

0.58 0.58 0.58 0 0

0 0 0 0.71 0.71

x

SciFi-conceptRomance-concept

U is “user-to-concept” similarity matrix

SciFi

Romnce

Mat

rix

Alie

n

Sere

nity

Casa

blan

ca

Am

elie

Slides by Jure Leskovec 13

SVD - Example

• A = U VT - example:

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 30 0 0 1 1

0.18 0

0.36 0

0.18 0

0.90 0

0 0.53

0 0.800 0.27

=9.64 0

0 5.29x

0.58 0.58 0.58 0 0

0 0 0 0.71 0.71

x

‘strength’ of SciFi-concept

SciFi

Romnce

Mat

rix

Alie

n

Sere

nity

Casa

blan

ca

Am

elie

Slides by Jure Leskovec 14

SVD - Example

• A = U VT - example:

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 30 0 0 1 1

0.18 0

0.36 0

0.18 0

0.90 0

0 0.53

0 0.800 0.27

=9.64 0

0 5.29x

0.58 0.58 0.58 0 0

0 0 0 0.71 0.71

x

V is “movie-to-concept”similarity matrix

SciFi-concept

SciFi

Romnce

Mat

rix

Alie

n

Sere

nity

Casa

blan

ca

Am

elie

Slides by Jure Leskovec 15

SVD - Example

• A = U VT - example:

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 30 0 0 1 1

0.18 0

0.36 0

0.18 0

0.90 0

0 0.53

0 0.800 0.27

=9.64 0

0 5.29x

0.58 0.58 0.58 0 0

0 0 0 0.71 0.71

x

SciFi-concept

SciFi

Romnce

Mat

rix

Alie

n

Sere

nity

Casa

blan

ca

Am

elie

V is “movie-to-concept”similarity matrix

Slides by Jure Leskovec 16

SVD - Interpretation #1

‘movies’, ‘users’ and ‘concepts’:• U: user-to-concept similarity matrix

• V: movie-to-concept sim. matrix

• : its diagonal elements: ‘strength’ of each concept

Slides by Jure Leskovec 17

SVD - interpretation #2

SVD gives best axis to project on:

• ‘best’ = min sum of squares of projection errors

• minimum reconstruction error

v1

first right singular vector

Movie 1 rating

Mo

vie

2 ra

tin

g

Slides by Jure Leskovec 18

SVD - Interpretation #2

• A = U VT - example:

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 30 0 0 1 1

0.18 0

0.36 0

0.18 0

0.90 0

0 0.53

0 0.800 0.27

9.64 0

0 5.29x

0.58 0.58 0.58 0 0

0 0 0 0.71 0.71

x

v1

=

v1

first right singular vector

Movie 1 rating

Mo

vie

2 ra

tin

g

Slides by Jure Leskovec 19

SVD - Interpretation #2

• A = U VT - example:

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 30 0 0 1 1

0.18 0

0.36 0

0.18 0

0.90 0

0 0.53

0 0.800 0.27

9.64 0

0 5.29x

0.58 0.58 0.58 0 0

0 0 0 0.71 0.71

x

variance (‘spread’) on the v1 axis

=

Slides by Jure Leskovec 20

SVD - Interpretation #2

More details• Q: How exactly is dim. reduction done?

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 30 0 0 1 1

0.18 0

0.36 0

0.18 0

0.90 0

0 0.53

0 0.800 0.27

9.64 0

0 5.29x

0.58 0.58 0.58 0 0

0 0 0 0.71 0.71

x=

Slides by Jure Leskovec 21

SVD - Interpretation #2

More details• Q: How exactly is dim. reduction done?• A: Set the smallest singular values to zero

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 30 0 0 1 1

0.18 0

0.36 0

0.18 0

0.90 0

0 0.53

0 0.800 0.27

=9.64 0

0 5.29x

0.58 0.58 0.58 0 0

0 0 0 0.71 0.71

x

A=

Slides by Jure Leskovec 22

SVD - Interpretation #2

More details• Q: How exactly is dim. reduction done?• A: Set the smallest singular values to zero

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 30 0 0 1 1

0.18 0

0.36 0

0.18 0

0.90 0

0 0.53

0 0.800 0.27

9.64 0

0 0x

0.58 0.58 0.58 0 0

0 0 0 0.71 0.71

x

A=

~

Slides by Jure Leskovec 23

SVD - Interpretation #2

More details• Q: How exactly is dim. reduction done?• A: Set the smallest singular values to zero:

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 30 0 0 1 1

0.18 0

0.36 0

0.18 0

0.90 0

0 0.53

0 0.800 0.27

9.64 0

0 0x

0.58 0.58 0.58 0 0

0 0 0 0.71 0.71

x

A=

~

Slides by Jure Leskovec 24

SVD - Interpretation #2

More details• Q: How exactly is dim. reduction done?• A: Set the smallest singular values to zero:

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 30 0 0 1 1

0.18

0.36

0.18

0.90

0

00

9.64x

0.58 0.58 0.58 0 0

x

A=

~

Slides by Jure Leskovec 25

SVD - Interpretation #2

More details• Q: How exactly is dim. reduction done?• A: Set the smallest singular values to zero

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 30 0 0 1 1

~

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 0 0

0 0 0 0 00 0 0 0 0

A=

B=

Frobenius norm:

ǁMǁF = Σij Mij2

ǁA-BǁF = Σij (Aij-Bij)2

is “small”

Slides by Jure Leskovec 26

A U

Sigma

VT=

B U

Sigma

VT

=

B is approx A

27

SVD – Best Low Rank Approx.• Theorem: Let A = U VT (σ1σ2…, rank(A)=r)

then B = U S VT – S = diagonal nxn matrix where si=σi (i=1…k) else si=0

is a best rank-k approximation to A:– B is solution to minB ǁA-BǁF where rank(B)=k

• We will need 2 facts:– where M = P Q R is SVD of M– U VT - U S VT = U ( - S) VT

Slides by Jure Leskovec

Σ

𝜎 11

𝜎 𝑟𝑟

Slides by Jure Leskovec 28

SVD – Best Low Rank Approx.• We will need 2 facts:

– where M = P Q R is SVD of M

– U VT - U S VT = U ( - S) VT

We apply:-- P column orthonormal-- R row orthonormal-- Q is diagonal

Slides by Jure Leskovec 29

SVD – Best Low Rank Approx.• A = U VT , B = U S VT (σ1σ2… 0, rank(A)=r)

– S = diagonal nxn matrix where si=σi (i=1…k) else si=0

then B is solution to minB ǁA-BǁF , rank(B)=kWhy?

• We want to choose si to minimize we set si=σi (i=1…k) else si=0

r

kii

r

kii

k

iiis s

i1

2

1

2

1

2)(min

r

iiisFFkBrankBsSBA

i1

2

)(,)(minminmin

U VT - U S VT = U ( - S) VT

Slides by Jure Leskovec 30

SVD - Interpretation #2

Equivalent:‘spectral decomposition’ of the matrix:

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 30 0 0 1 1

= x xu1 u2

σ1

σ2

v1

v2

Slides by Jure Leskovec 31

SVD - Interpretation #2Equivalent:‘spectral decomposition’ of the matrix

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 30 0 0 1 1

= u1σ1 vT1 u2σ2 vT

2+ +...

n

m

n x 1 1 x m

k terms

Assume: σ1 σ2 σ3 ... 0

Why is setting small σs the thing to do?Vectors ui and vi are unit length, so σi scales them.So, zeroing small σs introduces less error.

Slides by Jure Leskovec 32

SVD - Interpretation #2

Q: How many σs to keep?A: Rule-of-a thumb:

keep 80-90% of ‘energy’ (=σi2)

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 30 0 0 1 1

= u1σ1 vT1 u2σ2 vT

2+ +...n

m

assume: σ1 σ2 σ3 ...

Slides by Jure Leskovec 33

SVD - Complexity

• To compute SVD:– O(nm2) or O(n2m) (whichever is less)

• But:– Less work, if we just want singular values– or if we want first k singular vectors– or if the matrix is sparse

• Implemented in linear algebra packages like– LINPACK, Matlab, SPlus, Mathematica ...

Slides by Jure Leskovec 34

SVD - Conclusions so far

• SVD: A= U VT: unique– U: user-to-concept similarities– V: movie-to-concept similarities– : strength of each concept

• Dimensionality reduction: – keep the few largest singular values

(80-90% of ‘energy’)– SVD: picks up linear correlations

Slides by Jure Leskovec 35

Case study: How to query?

Q: Find users that like ‘Matrix’ and ‘Alien’

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 3 0 0 0 1 1

0.18 0

0.36 0

0.18 0

0.90 0

0 0.53

0 0.800 0.27

=

SciFi

Romnce

9.64 0

0 5.29x

0.58 0.58 0.58 0 0

0 0 0 0.71 0.71

x

Mat

rix

Alie

n

Sere

nity

Casa

blan

ca

Am

elie

Slides by Jure Leskovec 36

Case study: How to query?

Q: Find users that like ‘Matrix’A: Map query into a ‘concept space’ – how?

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 3 0 0 0 1 1

0.18 0

0.36 0

0.18 0

0.90 0

0 0.53

0 0.800 0.27

=

SciFi

Romnce

9.64 0

0 5.29x

0.58 0.58 0.58 0 0

0 0 0 0.71 0.71

x

Mat

rix

Alie

n

Sere

nity

Casa

blan

ca

Am

elie

Slides by Jure Leskovec 37

Case study: How to query?

Q: Find users that like ‘Matrix’A: map query vectors into ‘concept space’ – how?

5 0 0 0 0

q=

MatrixAl

ien

v1

q

v2

Mat

rix

Alie

n

Sere

nity

Casa

blan

ca

Am

elie

Project into concept space:Inner product with each ‘concept’ vector vi

Slides by Jure Leskovec 38

Case study: How to query?

Q: Find users that like ‘Matrix’A: map the vector into ‘concept space’ – how?

v1

q

q*v1

5 0 0 0 0

q=

Mat

rix

Alie

n

Sere

nity

Casa

blan

ca

Am

elie

v2

MatrixAl

ien

Project into concept space:Inner product with each ‘concept’ vector vi

Slides by Jure Leskovec 39

Case study: How to query?

Compactly, we have:qconcept = q V

E.g.: 0.58 0

0.58 0

0.58 0

0 0.71

0 0.71

movie-to-concept similarities

= 2.9 0

SciFi-concept

5 0 0 0 0

q=

Mat

rix

Alie

n

Sere

nity

Casa

blan

ca

Am

elie

Slides by Jure Leskovec 40

Case study: How to query?How would the user d that rated

(‘Alien’, ‘Serenity’) be handled?dconcept = d V

E.g.:0.58 0

0.58 0

0.58 0

0 0.71

0 0.71

movie-to-concept similarities

= 5.22 0

SciFi-concept

0 4 5 0 0

d=

Mat

rix

Alie

n

Sere

nity

Casa

blan

ca

Am

elie

Slides by Jure Leskovec 41

Case study: How to query?

Observation: User d that rated (‘Alien’, ‘Serenity’) will be similar to query “user” q that rated (‘Matrix’), although d did not rate ‘Matrix’!

0 4 5 0 0

d= 2.9 0

SciFi-concept

5 0 0 0 0

5.22 0

q=

Mat

rix

Alie

n

Sere

nity

Casa

blan

ca

Am

elie

Similarity = 0 Similarity ≠ 0

Slides by Jure Leskovec 42

SVD: Drawbacks

+ Optimal low-rank approximation:• in Frobenius norm

- Interpretability problem:– A singular vector specifies a linear

combination of all input columns or rows- Lack of sparsity:

– Singular vectors are dense!

=

U

VT

Recommended