24
Non-linear Dimensionality Reduction CMPUT 466/551 Nilanjan Ray Prepared on materials from the book Non-linear dimensionality reduction By Lee and Verleysen, Springer, 2007

Non-linear Dimensionality Reduction CMPUT 466/551 Nilanjan Ray Prepared on materials from the book Non-linear dimensionality reduction By Lee and Verleysen,

Embed Size (px)

Citation preview

Page 1: Non-linear Dimensionality Reduction CMPUT 466/551 Nilanjan Ray Prepared on materials from the book Non-linear dimensionality reduction By Lee and Verleysen,

Non-linear Dimensionality Reduction

CMPUT 466/551Nilanjan Ray

Prepared on materials from the bookNon-linear dimensionality reduction By Lee and Verleysen, Springer,

2007

Page 2: Non-linear Dimensionality Reduction CMPUT 466/551 Nilanjan Ray Prepared on materials from the book Non-linear dimensionality reduction By Lee and Verleysen,

Agenda

• What is dimensionality reduction?• Linear methods– Principal components analysis– Metric multidimensional scaling (MDS)

• Non-linear methods– Distance preserving– Topology preserving– Auto-encoders (Deep neural networks)

Page 3: Non-linear Dimensionality Reduction CMPUT 466/551 Nilanjan Ray Prepared on materials from the book Non-linear dimensionality reduction By Lee and Verleysen,

Dimensionality Reduction

• Mapping d-dimensional data points y to p-dimensional vectors x; p < d.

• Purposes– Visualization– Classification/regression

• Most of the times we are only interested in the forward mapping y to x.

• The backward mapping is difficult in general.• If the forward and the backward mappings are linear they

method is called linear, else it is called non-linear dimensionality reduction technique.

Page 4: Non-linear Dimensionality Reduction CMPUT 466/551 Nilanjan Ray Prepared on materials from the book Non-linear dimensionality reduction By Lee and Verleysen,

Two Benchmark Manifolds

Page 5: Non-linear Dimensionality Reduction CMPUT 466/551 Nilanjan Ray Prepared on materials from the book Non-linear dimensionality reduction By Lee and Verleysen,

Distance Preserving Methods

Let’s say the points yi are mapped to xi, i=1,2,…,N.

Distance preserving methods try to preserve pair wise distances, i.e.,

d(yi, yj) = d(xi, xj), or the pair wise dot products, <yi, yj> = <xi, xj>.

What is a distance?

Nondegeneracy: d(a, b) = 0 if and only if a = b

Triangular inequality: for any three points a, b, and c, d(a, b) d(c, a) + d(c, b)

Other two properties, nonnegativity and symmetry follows from these two

Page 6: Non-linear Dimensionality Reduction CMPUT 466/551 Nilanjan Ray Prepared on materials from the book Non-linear dimensionality reduction By Lee and Verleysen,

Metric MDSA multidimensional scaling (MDS) method is a linear generative model like PCA:

xy W

y’s are d-dimensional observed variable and x’s are p-dimensional latent variable W is a matrix with the property:

PT IWW

],[ jiY yyji

yijs yy ,

XXXIXWXWXYYsS Tp

TTTTNji

yij 1,][

So, dot product is preserved. How about Euclidean distances?

],[ jiX xxLet

Then

jjjiiiijd yyyyyy ,,2,2

So, Euclidean distances are preserved too!

Page 7: Non-linear Dimensionality Reduction CMPUT 466/551 Nilanjan Ray Prepared on materials from the book Non-linear dimensionality reduction By Lee and Verleysen,

Metric MDS Algorithm

Center data matrix Y; and compute dot product matrix S = YTY

If data matrix is not available, only distance matrix D is available, dodouble centering to form scalar matrix:

Compute eigenvalue decomposition S = UUT

Construct p-dimensional representation as:

TNp UIX 2/1,

ˆ

Metric MDS is actually PCA and is a linear method

Page 8: Non-linear Dimensionality Reduction CMPUT 466/551 Nilanjan Ray Prepared on materials from the book Non-linear dimensionality reduction By Lee and Verleysen,

Metric MDS Result

Page 9: Non-linear Dimensionality Reduction CMPUT 466/551 Nilanjan Ray Prepared on materials from the book Non-linear dimensionality reduction By Lee and Verleysen,

Sammon’s Nonlinear Mapping (NLM)

NLM minimizes the energy function:

N

jii

yij

N

jii

yij

xij

yij

NLM dcd

dd

cE

11

2

where,)(1

Start with initial x’s

Update x’s by

2,

2

,,,

ik

NLM

ik

NLM

ikik

x

E

xE

xx

xk,i is the kth component of vector xi

(quasi-Newton update)

Page 10: Non-linear Dimensionality Reduction CMPUT 466/551 Nilanjan Ray Prepared on materials from the book Non-linear dimensionality reduction By Lee and Verleysen,

Sammon’s NLM

Page 11: Non-linear Dimensionality Reduction CMPUT 466/551 Nilanjan Ray Prepared on materials from the book Non-linear dimensionality reduction By Lee and Verleysen,

A Basic Issue with Metric Distance Preserving Methods

Geodesic distancesseem to be bettersuited

Page 12: Non-linear Dimensionality Reduction CMPUT 466/551 Nilanjan Ray Prepared on materials from the book Non-linear dimensionality reduction By Lee and Verleysen,

Graph Distance: Approximation to Geodesic Distance

Page 13: Non-linear Dimensionality Reduction CMPUT 466/551 Nilanjan Ray Prepared on materials from the book Non-linear dimensionality reduction By Lee and Verleysen,

ISOMAP

ISOMAP = MDS with graph distance

Needs to decide how the graph is constructed: who is the neighbor of whomK closest rule or -distance rule can build a graph

Page 14: Non-linear Dimensionality Reduction CMPUT 466/551 Nilanjan Ray Prepared on materials from the book Non-linear dimensionality reduction By Lee and Verleysen,

KPCA

Closely related to MDS algorithm

KPCA using Gaussian kernel

Page 15: Non-linear Dimensionality Reduction CMPUT 466/551 Nilanjan Ray Prepared on materials from the book Non-linear dimensionality reduction By Lee and Verleysen,

Topology Preserving Techniques

• Topology Neighborhood relationship• Topology preservation means two neighboring

points in d-dimensions should map to two neighboring points in p-dimension

• Distance preservation is too often too rigid; topology preservation techniques can sometimes stretch or shrink point clouds

• More flexible; algorithmically more complex

Page 16: Non-linear Dimensionality Reduction CMPUT 466/551 Nilanjan Ray Prepared on materials from the book Non-linear dimensionality reduction By Lee and Verleysen,

TP Techniques

• Can be categorized broadly into– Methods with predefined topology• SOM (Kohonen’s self-organizing map)

– Data driven lattice• LLE (locally linear embedding)• Isotop…

Page 17: Non-linear Dimensionality Reduction CMPUT 466/551 Nilanjan Ray Prepared on materials from the book Non-linear dimensionality reduction By Lee and Verleysen,

Kohonen’s Self-Organizing Maps (SOM)

Step 1: Define a 2D lattice indexed by (l, k): l, k =1,…K.

Step 2: For a set of data vectors yi, i=1,2,…,N, find a set of prototypes m(l, k). Note that by this indexing (l, k), the prototypes are mapped to the 2D lattice.

Step 3: Iterate for each data yi:1. Find the closest prototype m (using Euclidean distance in

the d-dimensional space):

2. Update prototypes:

(prepared from [HTF] book)

2

,

** ),(minarg),( klkl ikl

my

)),())(,(),,((),(),( ** klklklhklkl i mymm

Page 18: Non-linear Dimensionality Reduction CMPUT 466/551 Nilanjan Ray Prepared on materials from the book Non-linear dimensionality reduction By Lee and Verleysen,

Neighborhood Function for SOM

otherwise,0

||||if,1)),(),,((

**** kkllklklh

A hard threshold function:

Or, a soft threshold function:

)2

)()(exp()),(),,((

2

2*2***

kkll

klklh

Page 19: Non-linear Dimensionality Reduction CMPUT 466/551 Nilanjan Ray Prepared on materials from the book Non-linear dimensionality reduction By Lee and Verleysen,

Example: Simulated data

Page 20: Non-linear Dimensionality Reduction CMPUT 466/551 Nilanjan Ray Prepared on materials from the book Non-linear dimensionality reduction By Lee and Verleysen,

SOM for “Swiss Roll” and “Open Box”

Page 21: Non-linear Dimensionality Reduction CMPUT 466/551 Nilanjan Ray Prepared on materials from the book Non-linear dimensionality reduction By Lee and Verleysen,

Remarks

• SOM is actually a constrained k-means– Constrains K-means clusters on a smooth manifold– If only one neighbor (itself) is allowed => K-means

• Learning rate () and distance threshold () usually decrease with training iterations

• Mostly useful for a visualization tool: typically it cannot map to more than 3 dimensions

• Convergence is hard to assess

Page 22: Non-linear Dimensionality Reduction CMPUT 466/551 Nilanjan Ray Prepared on materials from the book Non-linear dimensionality reduction By Lee and Verleysen,

Locally Linear Embedding

• Data driven lattice, unlike SOM on predefined lattice

• Topology preserving: it is based on conformal mapping, which is a transformation that preserves angles; LLE is invariant to rotation, translation and scaling

• To some extent similar to preserving dot-product• A data point yi is assumed to be a linear

combination of its neighbors

Page 23: Non-linear Dimensionality Reduction CMPUT 466/551 Nilanjan Ray Prepared on materials from the book Non-linear dimensionality reduction By Lee and Verleysen,

LLE Principle

Each data point y is a local linear combination:

N

i jjiji

i

wWE1

)( yy

Neighborhood of yi: determined by a graph

Constraints on wij: iwijij

for,1

LLE first computes the matrix W by minimizing E. Then it assumes that in the low dimensions the same local linear combination holds:

N

i jjijiN

i

wF1

1 ),,( xxxx

So, it minimizes F with respect to x’s: obtains low dimensional mapping!

Page 24: Non-linear Dimensionality Reduction CMPUT 466/551 Nilanjan Ray Prepared on materials from the book Non-linear dimensionality reduction By Lee and Verleysen,

LLE Results

Let’s visit: http://www.cs.toronto.edu/~roweis/lle/