Upload
others
View
13
Download
0
Embed Size (px)
Citation preview
1
Manifold learning
P. Agius – L13, Spring 2008
P. Agius – L13, Spring 2008
Relevant Reading: Algorithms for Manifold Learning
Lawrence Cayton
http://vis.lbl.gov/~romano/mlgroup/papers/manifold-learning.pdf
Handling the curse of dimensionality via dimensionality reduction
High dimensional data is often simpler than it’s high dimension suggests.
Want: simplified non-overlapping representation of the data with features identifiable with the underlying patterns of the data in its original format … want to discover a manifold structure in the data.
Most popular: PCA finds directions of max variance and finds basis for a linear subspace. But only appropriate if data lies in linear subspace.
Manifold algorithms are non-linear analogs to PCA
2
P. Agius – L13, Spring 2008
High dimensional inputs: X =
Low dimensional outputs: Y=
n=number of points, D=number of input dimensions, d=number of manifold dimensions
P. Agius – L13, Spring 2008
From 3 dimensions to 1 dimension …
this 3-dim curve can be represented as a line in 1 dimension
In topology … definition of a homeomorphism is: a continuous function whose inverse is also continuous.
If we can find such a function from high-dim space to low-dim space, then we can say they are homeomorphic.
3
P. Agius – L13, Spring 2008
Manifold learning … properly defined
Some algorithms:
IsomapLocally Linear EmbeddingLaplacian EigenmapsSemidefinite Embedding
Parameter for
k-nearest neighbors
P. Agius – L13, Spring 2008
Isomap – isometric feature mapping
Reminiscent of MDS … indeed it is an extension of MDS
Two main steps:
- Estimate the geodesic distances (i.e. distances in manifold)
- Use MDS to find points in low-dim Euclidean space
4
P. Agius – L13, Spring 2008
Estimating geodesic distances
Two assumptions for Isomap: -there is a mapping that preserves distances.- manifold is smooth enough so that distances are approximately linear
For points that are far apart in the manifold, linear approx is no good. For such points:
- build a k-nearest neighbor graph weighted by Euclidean distances between points
- then find distance between the far points using a shortest path algorithm (Eg. Dijkstra)
P. Agius – L13, Spring 2008
Isomap uses MDS to find those points whose geodesic distances match the Euclidean distances.
5
P. Agius – L13, Spring 2008
Locally Linear Embedding (LLE)
Intuition: visualize manifold as collection of overlapping coordinate patches. If neighborhoods are sufficiently small and manifold sufficiently smooth, then patches are approx linear.
Goal: identify these linear patches, characterize their geometry and find a mapping accordingly
P. Agius – L13, Spring 2008
6
P. Agius – L13, Spring 2008
Laplacian Eigenmaps
Based on spectral graph theory …
Given a graph with weights W, define the graph Laplacian as
L=D-W
where D=diagonal matrix with
Eigenvalues and eigenvectors of L reveal lots of info about graph
P. Agius – L13, Spring 2008
k-nearest neighbors of xj
7
P. Agius – L13, Spring 2008
Semidefinite Embedding
Intuition: Imagine k-nearest neighbors of each point to be connected by a rigid rod. Now take this structure and pull apartas far as possible … can we properly unravel it???
P. Agius – L13, Spring 2008
Constraint that ensures
distances between neighbor
points are preserved
B=psd matrix
8
P. Agius – L13, Spring 2008
Other algorithms
Hessian LLE (hLLE) – replace graph Laplacian with a Hessian estimator
Local Tangent Space alignment (LTSA) – similar to hLLE, estimates the tangent space at each point by performing PCA
Paper discusses a total of 6 algorithms … and there are variations … so many! Why so many?
Evaluating these algorithms is difficult. Choosing the best is also difficult.
P. Agius – L13, Spring 2008
Comparisons
Isometric embedding (assumption of distance preservation)– Isomap, hLLE, Semidefinite Embedding
versusConformal embedding (preservation of angles)
- c-Isomap
Local versus Global- Isomap is global, all point pairs considered during embedding- LLE is local, cost function only considers k-nearest neighbors
Time complexity – spectral decomposition is costly!
9
P. Agius – L13, Spring 2008
Open questions
Other issue not much addressed: choice of k in k-nearest neighbors approaches
Manifold LearningManifold Learning
http://www-ee.stanford.edu/~gray/birs/slides/hero.pdf
10
Manifold LearningManifold Learning
Locally Linear Embedding (LLE)Locally Linear Embedding (LLE)
11
12
13
Arranging words: Each word was initially represented by a high-dimensional vector that
counted the number of times it appeared in different encyclopedia articles. Words with similar
contexts are collocated
14
15
16
Manifold learning: MDS
and Isomap
Jieping Ye
Department of Computer Science
and Engineering
Arizona State University
http://www.public.asu.edu/~jye02
Review
• Clustering
• Classification
• Semi-supervised learning
• Feature reduction
• Kernel methods
17
Manifold learning
Outline of lecture
• Intuition
• Linear method- PCA
• Linear method- MDS
• Nonlinear method- Isomap
• Summary
18
Why Dimensionality Reduction
• The curse of dimensionality
• Number of potential features can be huge
– Image data: each pixel of an image• A 64X64 image ℑ 4096 features
– Genomic data: expression levels of the genes
• Several thousand features
– Text categorization: frequencies of phrases in
a document or in a web page
• More than ten thousand features
Why Dimensionality Reduction
• Two approaches to reduce number of
features
– Feature selection: select the salient features
by some criteria
– Feature extraction: obtain a reduced set of
features by a transformation of all features
• Data visualization and exploratory data
analysis also need to reduce dimension
– Usually reduce to 2D or 3D
19
Deficiencies of Linear Methods
• Data may not be best summarized by
linear combination of features
– Example: PCA cannot discover 1D structure
of a helix
-1
-0.5
0
0.5
1
-1
-0.5
0
0.5
10
5
10
15
20
Intuition: how does your brain
store these pictures?
20
Brain Representation
Brain Representation
• Every pixel?
• Or perceptually
meaningful structure?
– Up-down pose
– Left-right pose
– Lighting direction
So, your brain
successfully reduced
the high-dimensional
inputs to an
intrinsically 3-
dimensional manifold!
21
Manifold Learning
• A manifold is a topological space which is
locally Euclidean
• An example of nonlinear manifold:
Manifold Learning
• Discover low dimensional
representations (smooth
manifold) for data in high
dimension.
• Linear approaches(PCA, MDS)
• Non-linear approaches
(ISOMAP, LLE, others)
d
iRy ∈ Y
X�
iRx ∈
latent
observed
22
Linear Approach- PCA
• PCA Finds subspace linear projections of input
data.
Linear approach- PCA
• Main steps for computing PCs
– Form the covariance matrix S.
– Compute its eigenvectors:
– The first d eigenvectors form the
d PCs.
– The transformation G consists of the p
PCs.
],,,[ 21 daaaG L←
{ }piia 1=
{ }diia 1=
23
Linear Approach- classical MDS
• MDS: Multidimensional scaling
• Borg and Groenen, 1997
• MDS takes a matrix of pairwise distances and gives a mapping
to Rd. It finds an embedding that preserves the interpoint
distances, equivalent to PCA when those distance are
Euclidean.
• Low dimensional data for visualization
Linear Approach- classical MDS
Te een
IP1
:matrix Centering −=
columneach frommean column thesubstract :X
roweach frommean row thesubstract :
e
e
P
XP
( )
−
−
−
=⇒
=⇒
−
=
111
101
210
021mean row
112
120
231
X
XPe
Example:
24
Linear Approach- classical MDS
( )( )
ijji
ee
ijji
xxDPP
xxD
)()(2
matrix distance:2
µµ −•−−=⇒
−=
Linear Approach- classical
MDS
25
Linear Approach- classical MDS
( )( )
( )( )[ ]
.,,1for,
),,(,,,,
2
? find tohow D,Given :Problem
)()(2
matrix distance:
2121
5.05.0
2
diux
diaguuuU
UUUUDDPP
x
xxDPP
xxD
iii
dddd
T
dddd
T
ddd
ee
i
ijji
ee
ijji
L
LL
==⇒
=Σ=
ΣΣ=Σ==−
−•−−=⇒
−=
λ
λλλ
µµ
Linear Approach- classical MDS
• So far, we focus on classical MDS, assuming
D is the squared distance matrix.
– Metric scaling
• How to deal with more general dissimilarity
measures
– Non-metric scaling
( )definite-semi positibe benot may :scaling Nonmetric
)()(2 :scaling Metric
ee
ijji
ee
DPP
xxDPP
−
−•−=− µµ
Solutions: (1) Add a large constant to its diagonal.
(2) Find its nearest positive semi-definite matrix
by setting all negative eigenvalues to zero.
26
Nonlinear Dimensionality
Reduction
• Many data sets contain essential nonlinear
structures that invisible to PCA and MDS
• Resorts to some nonlinear dimensionality
reduction approaches.
– Kernel methods
• Depend on the kernels
• Most kernels are not data dependent
Nonlinear Approaches- Isomap
• Constructing neighbourhood graph G
• For each pair of points in G, Computing shortest
path distances ---- geodesic distances.
• Use Classical MDS with geodesic distances.
Euclidean distance� Geodesic distance
Josh. Tenenbaum, Vin de Silva, John langford 2000
27
Sample points with Swiss Roll
• Altogether there are
20,000 points in the
“Swiss roll” data set.
We sample 1000 out
of 20,000.
Construct neighborhood graph G
K- nearest neighborhood (K=7)
DG is 1000 by 1000 (Euclidean) distance matrix of two neighbors (figure A)
28
Compute all-points shortest path in G
Now DG is 1000 by 1000 geodesic distance
matrix of two arbitrary points along the
manifold (figure B)
Find a d-dimensional Euclidean space Y (Figure c)
to preserve the pariwise diatances.
Use MDS to embed graph in Rd
29
The Isomap algorithm
PCA, MD vs ISOMAP
30
• Nonlinear
• Globally optimal• Still produces globally optimal low-dimensional Euclidean
representation even though input space is highly folded,
twisted, or curved.
• Guarantee asymptotically to recover the true
dimensionality.
Isomap: Advantages
• May not be stable, dependent on topology of data
• Guaranteed asymptotically to recover geometric
structure of nonlinear manifolds
– As N increases, pairwise distances provide better
approximations to geodesics, but cost more computation
– If N is small, geodesic distances will be very inaccurate.
Isomap: Disadvantages
31
Applications
• Isomap and Nonparametric Models of
Image Deformation
• LLE and Isomap Analysis of Spectra and
Colour Images
• Image Spaces and Video Trajectories:
Using Isomap to Explore Video
Sequences
• Mining the structural knowledge of high-
dimensional medical data using isomapIsomap Webpage: http://isomap.stanford.edu/
Next class
• Topics
– Local Linear Embedding (LLE)
• Readings
– Nonlinear Dimensionality Reduction by
Locally Linear Embedding
• Roweis and Saul