Dimensionality Reduction. High-dimensional == many features Find concepts/topics/genres: –...

Dimensionality Reduction

Slides by Jure Leskovec 2

• High-dimensional == many features• Find concepts/topics/genres:

– Documents:• Features: Thousands of words, millions of word pairs

– Surveys – Netflix: 480k users x 177k movies

• Compress / reduce dimensionality:– 106 rows; 103 columns; no updates– random access to any cell(s); small error: OK

• Assumption: Data lies on or near a low d-dimensional subspace

• Axes of this subspace are effective representation of the data

Why Reduce Dimensions?

Why reduce dimensions?• Discover hidden correlations/topics

– Words that occur commonly together• Remove redundant and noisy features

– Not all words are useful• Interpretation and visualization• Easier storage and processing of the data

SVD - Definition

A[m x n] = U[m x r] [ r x r] (V[n x r])T

• A: Input data matrix– m x n matrix (e.g., m documents, n terms)

• U: Left singular vectors – m x r matrix (m documents, r concepts)

• : Singular values– r x r diagonal matrix (strength of each ‘concept’)

(r : rank of the matrix A)• V: Right singular vectors

– n x r matrix (n terms, r concepts)

1u1v1 2u2v2

σi … scalarui … vectorvi … vector

SVD - Properties

It is always possible to decompose a real matrix A into A = U VT , where

• U, , V: unique• U, V: column orthonormal:

– UT U = I; VT V = I (I: identity matrix)– (Cols. are orthogonal unit vectors)

• : diagonal– Entries (singular values) are positive,

and sorted in decreasing order (σ1 σ2 ... 0)

SVD – Example: Users-to-Movies

• A = U VT - example:

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 3 0 0 0 1 1

0.18 0

0.36 0

0.18 0

0.90 0

0 0.53

0 0.800 0.27

Romnce

9.64 0

0 5.29x

0.58 0.58 0.58 0 0

0 0 0 0.71 0.71

SVD – Example: Users-to-Movies

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 30 0 0 1 1

0.18 0

0.36 0

0.18 0

0.90 0

0 0.53

0 0.800 0.27

=9.64 0

0 5.29x

0.58 0.58 0.58 0 0

0 0 0 0.71 0.71

SciFi-conceptRomance-concept

Romnce

SVD - Example

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 30 0 0 1 1

0.18 0

0.36 0

0.18 0

0.90 0

0 0.53

0 0.800 0.27

=9.64 0

0 5.29x

0.58 0.58 0.58 0 0

0 0 0 0.71 0.71

SciFi-conceptRomance-concept

U is “user-to-concept” similarity matrix

Romnce

SVD - Example

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 30 0 0 1 1

0.18 0

0.36 0

0.18 0

0.90 0

0 0.53

0 0.800 0.27

=9.64 0

0 5.29x

0.58 0.58 0.58 0 0

0 0 0 0.71 0.71

‘strength’ of SciFi-concept

Romnce

SVD - Example

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 30 0 0 1 1

0.18 0

0.36 0

0.18 0

0.90 0

0 0.53

0 0.800 0.27

=9.64 0

0 5.29x

0.58 0.58 0.58 0 0

0 0 0 0.71 0.71

V is “movie-to-concept”similarity matrix

SciFi-concept

Romnce

SVD - Example

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 30 0 0 1 1

0.18 0

0.36 0

0.18 0

0.90 0

0 0.53

0 0.800 0.27

=9.64 0

0 5.29x

0.58 0.58 0.58 0 0

0 0 0 0.71 0.71

SciFi-concept

Romnce

V is “movie-to-concept”similarity matrix

SVD - Interpretation #1

‘movies’, ‘users’ and ‘concepts’:• U: user-to-concept similarity matrix

• V: movie-to-concept sim. matrix

• : its diagonal elements: ‘strength’ of each concept

SVD - interpretation #2

SVD gives best axis to project on:

• ‘best’ = min sum of squares of projection errors

• minimum reconstruction error

first right singular vector

Movie 1 rating

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 30 0 0 1 1

0.18 0

0.36 0

0.18 0

0.90 0

0 0.53

0 0.800 0.27

9.64 0

0 5.29x

0.58 0.58 0.58 0 0

0 0 0 0.71 0.71

first right singular vector

Movie 1 rating

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 30 0 0 1 1

0.18 0

0.36 0

0.18 0

0.90 0

0 0.53

0 0.800 0.27

9.64 0

0 5.29x

0.58 0.58 0.58 0 0

0 0 0 0.71 0.71

variance (‘spread’) on the v1 axis

More details• Q: How exactly is dim. reduction done?

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 30 0 0 1 1

0.18 0

0.36 0

0.18 0

0.90 0

0 0.53

0 0.800 0.27

9.64 0

0 5.29x

0.58 0.58 0.58 0 0

0 0 0 0.71 0.71

More details• Q: How exactly is dim. reduction done?• A: Set the smallest singular values to zero

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 30 0 0 1 1

0.18 0

0.36 0

0.18 0

0.90 0

0 0.53

0 0.800 0.27

=9.64 0

0 5.29x

0.58 0.58 0.58 0 0

0 0 0 0.71 0.71

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 30 0 0 1 1

0.18 0

0.36 0

0.18 0

0.90 0

0 0.53

0 0.800 0.27

9.64 0

0.58 0.58 0.58 0 0

0 0 0 0.71 0.71

More details• Q: How exactly is dim. reduction done?• A: Set the smallest singular values to zero:

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 30 0 0 1 1

0.18 0

0.36 0

0.18 0

0.90 0

0 0.53

0 0.800 0.27

9.64 0

0.58 0.58 0.58 0 0

0 0 0 0.71 0.71

More details• Q: How exactly is dim. reduction done?• A: Set the smallest singular values to zero:

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 30 0 0 1 1

0.58 0.58 0.58 0 0

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 30 0 0 1 1

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 0 0

0 0 0 0 00 0 0 0 0

Frobenius norm:

ǁMǁF = Σij Mij2

ǁA-BǁF = Σij (Aij-Bij)2

is “small”

B is approx A

SVD – Best Low Rank Approx.• Theorem: Let A = U VT (σ1σ2…, rank(A)=r)

then B = U S VT – S = diagonal nxn matrix where si=σi (i=1…k) else si=0

is a best rank-k approximation to A:– B is solution to minB ǁA-BǁF where rank(B)=k

• We will need 2 facts:– where M = P Q R is SVD of M– U VT - U S VT = U ( - S) VT

Slides by Jure Leskovec

𝜎 11

𝜎 𝑟𝑟

SVD – Best Low Rank Approx.• We will need 2 facts:

– where M = P Q R is SVD of M

– U VT - U S VT = U ( - S) VT

We apply:-- P column orthonormal-- R row orthonormal-- Q is diagonal

SVD – Best Low Rank Approx.• A = U VT , B = U S VT (σ1σ2… 0, rank(A)=r)

– S = diagonal nxn matrix where si=σi (i=1…k) else si=0

then B is solution to minB ǁA-BǁF , rank(B)=kWhy?

• We want to choose si to minimize we set si=σi (i=1…k) else si=0

iiis s

2)(min

iiisFFkBrankBsSBA

)(,)(minminmin

U VT - U S VT = U ( - S) VT

Equivalent:‘spectral decomposition’ of the matrix:

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 30 0 0 1 1

= x xu1 u2

SVD - Interpretation #2Equivalent:‘spectral decomposition’ of the matrix

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 30 0 0 1 1

= u1σ1 vT1 u2σ2 vT

2+ +...

n x 1 1 x m

k terms

Assume: σ1 σ2 σ3 ... 0

Why is setting small σs the thing to do?Vectors ui and vi are unit length, so σi scales them.So, zeroing small σs introduces less error.

Q: How many σs to keep?A: Rule-of-a thumb:

keep 80-90% of ‘energy’ (=σi2)

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 30 0 0 1 1

= u1σ1 vT1 u2σ2 vT

2+ +...n

assume: σ1 σ2 σ3 ...

SVD - Complexity

• To compute SVD:– O(nm2) or O(n2m) (whichever is less)

• But:– Less work, if we just want singular values– or if we want first k singular vectors– or if the matrix is sparse

• Implemented in linear algebra packages like– LINPACK, Matlab, SPlus, Mathematica ...

SVD - Conclusions so far

• SVD: A= U VT: unique– U: user-to-concept similarities– V: movie-to-concept similarities– : strength of each concept

• Dimensionality reduction: – keep the few largest singular values

(80-90% of ‘energy’)– SVD: picks up linear correlations

Case study: How to query?

Q: Find users that like ‘Matrix’ and ‘Alien’

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 3 0 0 0 1 1

0.18 0

0.36 0

0.18 0

0.90 0

0 0.53

0 0.800 0.27

Romnce

9.64 0

0 5.29x

0.58 0.58 0.58 0 0

0 0 0 0.71 0.71

Q: Find users that like ‘Matrix’A: Map query into a ‘concept space’ – how?

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 3 0 0 0 1 1

0.18 0

0.36 0

0.18 0

0.90 0

0 0.53

0 0.800 0.27

Romnce

9.64 0

0 5.29x

0.58 0.58 0.58 0 0

0 0 0 0.71 0.71

Q: Find users that like ‘Matrix’A: map query vectors into ‘concept space’ – how?

5 0 0 0 0

MatrixAl

Project into concept space:Inner product with each ‘concept’ vector vi

Q: Find users that like ‘Matrix’A: map the vector into ‘concept space’ – how?

5 0 0 0 0

MatrixAl

Project into concept space:Inner product with each ‘concept’ vector vi

Compactly, we have:qconcept = q V

E.g.: 0.58 0

0.58 0

0 0.71

movie-to-concept similarities

= 2.9 0

SciFi-concept

5 0 0 0 0

Case study: How to query?How would the user d that rated

(‘Alien’, ‘Serenity’) be handled?dconcept = d V

E.g.:0.58 0

0.58 0

0 0.71

movie-to-concept similarities

= 5.22 0

SciFi-concept

0 4 5 0 0

Observation: User d that rated (‘Alien’, ‘Serenity’) will be similar to query “user” q that rated (‘Matrix’), although d did not rate ‘Matrix’!

0 4 5 0 0

d= 2.9 0

SciFi-concept

5 0 0 0 0

5.22 0

Similarity = 0 Similarity ≠ 0

SVD: Drawbacks

+ Optimal low-rank approximation:• in Frobenius norm

- Interpretability problem:– A singular vector specifies a linear

combination of all input columns or rows- Lack of sparsity:

– Singular vectors are dense!

Dimensionality Reduction. High-dimensional == many features Find concepts/topics/genres: –...

Documents

Researching linguistic features of text genres in a DGS corpus: The case of finger loci

Dimensionality Reduction for Stationary Time Series via ...papers.nips.cc/paper/7609-dimensionality-reduction-for-stationary-ti… · Dimensionality Reduction for Stationary Time

Dimensionality reduction Usman Roshan CS 675. Dimensionality reduction What is dimensionality reduction? –Compress high dimensional data into lower dimensions

Grammar features and discourse style in digital genres

Dimensionality Reduction

Jan Kamenický. Many features ⇒ many dimensions Dimensionality reduction ◦ Feature extraction (useful representation) ◦ Classification ◦ Visualization

Genre Scholarship workshop June 2012. What is Genre? Genres are constituted of features of content (such as story and character and theme) and features

Evaluation of Parameter Influence and Dimensionality ...madm.dfki.de/_media/sebastian_thesis.pdf · Dimensionality Reduction for CBIR Systems using Local Features, TF-IDF and Inverted

Dimensionality Reduction - tschwarz.mscs.mu.edu · Introduction • Feature selection • Data sets contain often large numbers of features • Some of the features depend on each

Computationalapproachtomultifractalmusic · Computationalapproachtomultifractalmusic ... Also the viariably of multifractal features is best visible for popular music genres.

Dimensionality and dimensionality … and dimensionality reductiondimensionality reduction Nuno Vasconcelos ECE Depp,artment, UCSD. Note ... The curse of dimensionality

Writing Genres: Key Features and Grammar Progression

Learning Genres, Learning About Genres and Learning ...€¦ · Web viewLearning Genres, Learning About Genres and Learning Through Genres: Educating Indigenous Learners. Ian G

Cayman Islands Writing Exemplar Resources...4 Text Form Purpose Organisation Language Features Traits of Writing Text Features Forms / Genres Recount (Personal Narrative/ Retell) •

Three Genres of Indonesian Popular Music: Genre, Hybridity ... · ticular, while the last two are, potentially, features of all mediated music, pop - ular or not. While genres may

UNIVERSITY OF DELHI - du.ac.in · salient features of respective phases, features of respective genres and contribution of individual authors. Further, these courses will help in

Curse of Dimensionality, Dimensionality Reduction with …rita/uml_course/add_mat/PCA.pdf · Curse of Dimensionality: Overfitting If the number of features d is large, the number

Assignment 1 genres sub genres

QlikView Dimensionality

17 Dimensionality Reduction