31
1 Manifold learning P. Agius – L13, Spring 2008 P. Agius – L13, Spring 2008 Relevant Reading: Algorithms for Manifold Learning Lawrence Cayton http://vis.lbl.gov/~romano/mlgroup/papers/manifold-learning.pdf Handling the curse of dimensionality via dimensionality reduction High dimensional data is often simpler than it’s high dimension suggests. Want: simplified non-overlapping representation of the data with features identifiable with the underlying patterns of the data in its original format … want to discover a manifold structure in the data. Most popular: PCA finds directions of max variance and finds basis for a linear subspace. But only appropriate if data lies in linear subspace. Manifold algorithms are non-linear analogs to PCA

Manifold learning - ut · Manifold learning …properly defined Some algorithms: Isomap Locally Linear Embedding LaplacianEigenmaps SemidefiniteEmbedding Parameter for k-nearest neighbors

  • Upload
    others

  • View
    13

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Manifold learning - ut · Manifold learning …properly defined Some algorithms: Isomap Locally Linear Embedding LaplacianEigenmaps SemidefiniteEmbedding Parameter for k-nearest neighbors

1

Manifold learning

P. Agius – L13, Spring 2008

P. Agius – L13, Spring 2008

Relevant Reading: Algorithms for Manifold Learning

Lawrence Cayton

http://vis.lbl.gov/~romano/mlgroup/papers/manifold-learning.pdf

Handling the curse of dimensionality via dimensionality reduction

High dimensional data is often simpler than it’s high dimension suggests.

Want: simplified non-overlapping representation of the data with features identifiable with the underlying patterns of the data in its original format … want to discover a manifold structure in the data.

Most popular: PCA finds directions of max variance and finds basis for a linear subspace. But only appropriate if data lies in linear subspace.

Manifold algorithms are non-linear analogs to PCA

Page 2: Manifold learning - ut · Manifold learning …properly defined Some algorithms: Isomap Locally Linear Embedding LaplacianEigenmaps SemidefiniteEmbedding Parameter for k-nearest neighbors

2

P. Agius – L13, Spring 2008

High dimensional inputs: X =

Low dimensional outputs: Y=

n=number of points, D=number of input dimensions, d=number of manifold dimensions

P. Agius – L13, Spring 2008

From 3 dimensions to 1 dimension …

this 3-dim curve can be represented as a line in 1 dimension

In topology … definition of a homeomorphism is: a continuous function whose inverse is also continuous.

If we can find such a function from high-dim space to low-dim space, then we can say they are homeomorphic.

Page 3: Manifold learning - ut · Manifold learning …properly defined Some algorithms: Isomap Locally Linear Embedding LaplacianEigenmaps SemidefiniteEmbedding Parameter for k-nearest neighbors

3

P. Agius – L13, Spring 2008

Manifold learning … properly defined

Some algorithms:

IsomapLocally Linear EmbeddingLaplacian EigenmapsSemidefinite Embedding

Parameter for

k-nearest neighbors

P. Agius – L13, Spring 2008

Isomap – isometric feature mapping

Reminiscent of MDS … indeed it is an extension of MDS

Two main steps:

- Estimate the geodesic distances (i.e. distances in manifold)

- Use MDS to find points in low-dim Euclidean space

Page 4: Manifold learning - ut · Manifold learning …properly defined Some algorithms: Isomap Locally Linear Embedding LaplacianEigenmaps SemidefiniteEmbedding Parameter for k-nearest neighbors

4

P. Agius – L13, Spring 2008

Estimating geodesic distances

Two assumptions for Isomap: -there is a mapping that preserves distances.- manifold is smooth enough so that distances are approximately linear

For points that are far apart in the manifold, linear approx is no good. For such points:

- build a k-nearest neighbor graph weighted by Euclidean distances between points

- then find distance between the far points using a shortest path algorithm (Eg. Dijkstra)

P. Agius – L13, Spring 2008

Isomap uses MDS to find those points whose geodesic distances match the Euclidean distances.

Page 5: Manifold learning - ut · Manifold learning …properly defined Some algorithms: Isomap Locally Linear Embedding LaplacianEigenmaps SemidefiniteEmbedding Parameter for k-nearest neighbors

5

P. Agius – L13, Spring 2008

Locally Linear Embedding (LLE)

Intuition: visualize manifold as collection of overlapping coordinate patches. If neighborhoods are sufficiently small and manifold sufficiently smooth, then patches are approx linear.

Goal: identify these linear patches, characterize their geometry and find a mapping accordingly

P. Agius – L13, Spring 2008

Page 6: Manifold learning - ut · Manifold learning …properly defined Some algorithms: Isomap Locally Linear Embedding LaplacianEigenmaps SemidefiniteEmbedding Parameter for k-nearest neighbors

6

P. Agius – L13, Spring 2008

Laplacian Eigenmaps

Based on spectral graph theory …

Given a graph with weights W, define the graph Laplacian as

L=D-W

where D=diagonal matrix with

Eigenvalues and eigenvectors of L reveal lots of info about graph

P. Agius – L13, Spring 2008

k-nearest neighbors of xj

Page 7: Manifold learning - ut · Manifold learning …properly defined Some algorithms: Isomap Locally Linear Embedding LaplacianEigenmaps SemidefiniteEmbedding Parameter for k-nearest neighbors

7

P. Agius – L13, Spring 2008

Semidefinite Embedding

Intuition: Imagine k-nearest neighbors of each point to be connected by a rigid rod. Now take this structure and pull apartas far as possible … can we properly unravel it???

P. Agius – L13, Spring 2008

Constraint that ensures

distances between neighbor

points are preserved

B=psd matrix

Page 8: Manifold learning - ut · Manifold learning …properly defined Some algorithms: Isomap Locally Linear Embedding LaplacianEigenmaps SemidefiniteEmbedding Parameter for k-nearest neighbors

8

P. Agius – L13, Spring 2008

Other algorithms

Hessian LLE (hLLE) – replace graph Laplacian with a Hessian estimator

Local Tangent Space alignment (LTSA) – similar to hLLE, estimates the tangent space at each point by performing PCA

Paper discusses a total of 6 algorithms … and there are variations … so many! Why so many?

Evaluating these algorithms is difficult. Choosing the best is also difficult.

P. Agius – L13, Spring 2008

Comparisons

Isometric embedding (assumption of distance preservation)– Isomap, hLLE, Semidefinite Embedding

versusConformal embedding (preservation of angles)

- c-Isomap

Local versus Global- Isomap is global, all point pairs considered during embedding- LLE is local, cost function only considers k-nearest neighbors

Time complexity – spectral decomposition is costly!

Page 9: Manifold learning - ut · Manifold learning …properly defined Some algorithms: Isomap Locally Linear Embedding LaplacianEigenmaps SemidefiniteEmbedding Parameter for k-nearest neighbors

9

P. Agius – L13, Spring 2008

Open questions

Other issue not much addressed: choice of k in k-nearest neighbors approaches

Manifold LearningManifold Learning

http://www-ee.stanford.edu/~gray/birs/slides/hero.pdf

Page 10: Manifold learning - ut · Manifold learning …properly defined Some algorithms: Isomap Locally Linear Embedding LaplacianEigenmaps SemidefiniteEmbedding Parameter for k-nearest neighbors

10

Manifold LearningManifold Learning

Locally Linear Embedding (LLE)Locally Linear Embedding (LLE)

Page 11: Manifold learning - ut · Manifold learning …properly defined Some algorithms: Isomap Locally Linear Embedding LaplacianEigenmaps SemidefiniteEmbedding Parameter for k-nearest neighbors

11

Page 12: Manifold learning - ut · Manifold learning …properly defined Some algorithms: Isomap Locally Linear Embedding LaplacianEigenmaps SemidefiniteEmbedding Parameter for k-nearest neighbors

12

Page 13: Manifold learning - ut · Manifold learning …properly defined Some algorithms: Isomap Locally Linear Embedding LaplacianEigenmaps SemidefiniteEmbedding Parameter for k-nearest neighbors

13

Arranging words: Each word was initially represented by a high-dimensional vector that

counted the number of times it appeared in different encyclopedia articles. Words with similar

contexts are collocated

Page 14: Manifold learning - ut · Manifold learning …properly defined Some algorithms: Isomap Locally Linear Embedding LaplacianEigenmaps SemidefiniteEmbedding Parameter for k-nearest neighbors

14

Page 15: Manifold learning - ut · Manifold learning …properly defined Some algorithms: Isomap Locally Linear Embedding LaplacianEigenmaps SemidefiniteEmbedding Parameter for k-nearest neighbors

15

Page 16: Manifold learning - ut · Manifold learning …properly defined Some algorithms: Isomap Locally Linear Embedding LaplacianEigenmaps SemidefiniteEmbedding Parameter for k-nearest neighbors

16

Manifold learning: MDS

and Isomap

Jieping Ye

Department of Computer Science

and Engineering

Arizona State University

http://www.public.asu.edu/~jye02

Review

• Clustering

• Classification

• Semi-supervised learning

• Feature reduction

• Kernel methods

Page 17: Manifold learning - ut · Manifold learning …properly defined Some algorithms: Isomap Locally Linear Embedding LaplacianEigenmaps SemidefiniteEmbedding Parameter for k-nearest neighbors

17

Manifold learning

Outline of lecture

• Intuition

• Linear method- PCA

• Linear method- MDS

• Nonlinear method- Isomap

• Summary

Page 18: Manifold learning - ut · Manifold learning …properly defined Some algorithms: Isomap Locally Linear Embedding LaplacianEigenmaps SemidefiniteEmbedding Parameter for k-nearest neighbors

18

Why Dimensionality Reduction

• The curse of dimensionality

• Number of potential features can be huge

– Image data: each pixel of an image• A 64X64 image ℑ 4096 features

– Genomic data: expression levels of the genes

• Several thousand features

– Text categorization: frequencies of phrases in

a document or in a web page

• More than ten thousand features

Why Dimensionality Reduction

• Two approaches to reduce number of

features

– Feature selection: select the salient features

by some criteria

– Feature extraction: obtain a reduced set of

features by a transformation of all features

• Data visualization and exploratory data

analysis also need to reduce dimension

– Usually reduce to 2D or 3D

Page 19: Manifold learning - ut · Manifold learning …properly defined Some algorithms: Isomap Locally Linear Embedding LaplacianEigenmaps SemidefiniteEmbedding Parameter for k-nearest neighbors

19

Deficiencies of Linear Methods

• Data may not be best summarized by

linear combination of features

– Example: PCA cannot discover 1D structure

of a helix

-1

-0.5

0

0.5

1

-1

-0.5

0

0.5

10

5

10

15

20

Intuition: how does your brain

store these pictures?

Page 20: Manifold learning - ut · Manifold learning …properly defined Some algorithms: Isomap Locally Linear Embedding LaplacianEigenmaps SemidefiniteEmbedding Parameter for k-nearest neighbors

20

Brain Representation

Brain Representation

• Every pixel?

• Or perceptually

meaningful structure?

– Up-down pose

– Left-right pose

– Lighting direction

So, your brain

successfully reduced

the high-dimensional

inputs to an

intrinsically 3-

dimensional manifold!

Page 21: Manifold learning - ut · Manifold learning …properly defined Some algorithms: Isomap Locally Linear Embedding LaplacianEigenmaps SemidefiniteEmbedding Parameter for k-nearest neighbors

21

Manifold Learning

• A manifold is a topological space which is

locally Euclidean

• An example of nonlinear manifold:

Manifold Learning

• Discover low dimensional

representations (smooth

manifold) for data in high

dimension.

• Linear approaches(PCA, MDS)

• Non-linear approaches

(ISOMAP, LLE, others)

d

iRy ∈ Y

X�

iRx ∈

latent

observed

Page 22: Manifold learning - ut · Manifold learning …properly defined Some algorithms: Isomap Locally Linear Embedding LaplacianEigenmaps SemidefiniteEmbedding Parameter for k-nearest neighbors

22

Linear Approach- PCA

• PCA Finds subspace linear projections of input

data.

Linear approach- PCA

• Main steps for computing PCs

– Form the covariance matrix S.

– Compute its eigenvectors:

– The first d eigenvectors form the

d PCs.

– The transformation G consists of the p

PCs.

],,,[ 21 daaaG L←

{ }piia 1=

{ }diia 1=

Page 23: Manifold learning - ut · Manifold learning …properly defined Some algorithms: Isomap Locally Linear Embedding LaplacianEigenmaps SemidefiniteEmbedding Parameter for k-nearest neighbors

23

Linear Approach- classical MDS

• MDS: Multidimensional scaling

• Borg and Groenen, 1997

• MDS takes a matrix of pairwise distances and gives a mapping

to Rd. It finds an embedding that preserves the interpoint

distances, equivalent to PCA when those distance are

Euclidean.

• Low dimensional data for visualization

Linear Approach- classical MDS

Te een

IP1

:matrix Centering −=

columneach frommean column thesubstract :X

roweach frommean row thesubstract :

e

e

P

XP

( )

=⇒

=⇒

=

111

101

210

021mean row

112

120

231

X

XPe

Example:

Page 24: Manifold learning - ut · Manifold learning …properly defined Some algorithms: Isomap Locally Linear Embedding LaplacianEigenmaps SemidefiniteEmbedding Parameter for k-nearest neighbors

24

Linear Approach- classical MDS

( )( )

ijji

ee

ijji

xxDPP

xxD

)()(2

matrix distance:2

µµ −•−−=⇒

−=

Linear Approach- classical

MDS

Page 25: Manifold learning - ut · Manifold learning …properly defined Some algorithms: Isomap Locally Linear Embedding LaplacianEigenmaps SemidefiniteEmbedding Parameter for k-nearest neighbors

25

Linear Approach- classical MDS

( )( )

( )( )[ ]

.,,1for,

),,(,,,,

2

? find tohow D,Given :Problem

)()(2

matrix distance:

2121

5.05.0

2

diux

diaguuuU

UUUUDDPP

x

xxDPP

xxD

iii

dddd

T

dddd

T

ddd

ee

i

ijji

ee

ijji

L

LL

==⇒

=Σ=

ΣΣ=Σ==−

−•−−=⇒

−=

λ

λλλ

µµ

Linear Approach- classical MDS

• So far, we focus on classical MDS, assuming

D is the squared distance matrix.

– Metric scaling

• How to deal with more general dissimilarity

measures

– Non-metric scaling

( )definite-semi positibe benot may :scaling Nonmetric

)()(2 :scaling Metric

ee

ijji

ee

DPP

xxDPP

−•−=− µµ

Solutions: (1) Add a large constant to its diagonal.

(2) Find its nearest positive semi-definite matrix

by setting all negative eigenvalues to zero.

Page 26: Manifold learning - ut · Manifold learning …properly defined Some algorithms: Isomap Locally Linear Embedding LaplacianEigenmaps SemidefiniteEmbedding Parameter for k-nearest neighbors

26

Nonlinear Dimensionality

Reduction

• Many data sets contain essential nonlinear

structures that invisible to PCA and MDS

• Resorts to some nonlinear dimensionality

reduction approaches.

– Kernel methods

• Depend on the kernels

• Most kernels are not data dependent

Nonlinear Approaches- Isomap

• Constructing neighbourhood graph G

• For each pair of points in G, Computing shortest

path distances ---- geodesic distances.

• Use Classical MDS with geodesic distances.

Euclidean distance� Geodesic distance

Josh. Tenenbaum, Vin de Silva, John langford 2000

Page 27: Manifold learning - ut · Manifold learning …properly defined Some algorithms: Isomap Locally Linear Embedding LaplacianEigenmaps SemidefiniteEmbedding Parameter for k-nearest neighbors

27

Sample points with Swiss Roll

• Altogether there are

20,000 points in the

“Swiss roll” data set.

We sample 1000 out

of 20,000.

Construct neighborhood graph G

K- nearest neighborhood (K=7)

DG is 1000 by 1000 (Euclidean) distance matrix of two neighbors (figure A)

Page 28: Manifold learning - ut · Manifold learning …properly defined Some algorithms: Isomap Locally Linear Embedding LaplacianEigenmaps SemidefiniteEmbedding Parameter for k-nearest neighbors

28

Compute all-points shortest path in G

Now DG is 1000 by 1000 geodesic distance

matrix of two arbitrary points along the

manifold (figure B)

Find a d-dimensional Euclidean space Y (Figure c)

to preserve the pariwise diatances.

Use MDS to embed graph in Rd

Page 29: Manifold learning - ut · Manifold learning …properly defined Some algorithms: Isomap Locally Linear Embedding LaplacianEigenmaps SemidefiniteEmbedding Parameter for k-nearest neighbors

29

The Isomap algorithm

PCA, MD vs ISOMAP

Page 30: Manifold learning - ut · Manifold learning …properly defined Some algorithms: Isomap Locally Linear Embedding LaplacianEigenmaps SemidefiniteEmbedding Parameter for k-nearest neighbors

30

• Nonlinear

• Globally optimal• Still produces globally optimal low-dimensional Euclidean

representation even though input space is highly folded,

twisted, or curved.

• Guarantee asymptotically to recover the true

dimensionality.

Isomap: Advantages

• May not be stable, dependent on topology of data

• Guaranteed asymptotically to recover geometric

structure of nonlinear manifolds

– As N increases, pairwise distances provide better

approximations to geodesics, but cost more computation

– If N is small, geodesic distances will be very inaccurate.

Isomap: Disadvantages

Page 31: Manifold learning - ut · Manifold learning …properly defined Some algorithms: Isomap Locally Linear Embedding LaplacianEigenmaps SemidefiniteEmbedding Parameter for k-nearest neighbors

31

Applications

• Isomap and Nonparametric Models of

Image Deformation

• LLE and Isomap Analysis of Spectra and

Colour Images

• Image Spaces and Video Trajectories:

Using Isomap to Explore Video

Sequences

• Mining the structural knowledge of high-

dimensional medical data using isomapIsomap Webpage: http://isomap.stanford.edu/

Next class

• Topics

– Local Linear Embedding (LLE)

• Readings

– Nonlinear Dimensionality Reduction by

Locally Linear Embedding

• Roweis and Saul