Upload
tangia
View
84
Download
0
Embed Size (px)
DESCRIPTION
EE 290A Generalized Principal Component Analysis. René Vidal Center for Imaging Science Institute for Computational Medicine Johns Hopkins University. Principal Component Analysis (PCA). Given a set of points x 1 , x 2 , …, x N Geometric PCA: find a subspace S passing through them - PowerPoint PPT Presentation
Citation preview
EE 290AGeneralized Principal Component Analysis
René VidalCenter for Imaging Science
Institute for Computational MedicineJohns Hopkins University
Principal Component Analysis (PCA)• Given a set of points x1, x2, …, xN
– Geometric PCA: find a subspace S passing through them– Statistical PCA: find projection directions that maximize the variance
• Solution (Beltrami’1873, Jordan’1874, Hotelling’33, Eckart-Householder-Young’36)
• Applications: data compression, regression, computer vision (eigenfaces), pattern recognition, genomics
Basis for S
Extensions of PCA• Higher order SVD (Tucker’66, Davis’02)
• Independent Component Analysis (Common ‘94)
• Probabilistic PCA (Tipping-Bishop ’99)– Identify subspace from noisy data– Gaussian noise: standard PCA– Noise in exponential family (Collins et al.’01)
• Nonlinear dimensionality reduction– Multidimensional scaling (Torgerson’58)– Locally linear embedding (Roweis-Saul ’00)– Isomap (Tenenbaum ’00)
• Nonlinear PCA (Scholkopf-Smola-Muller ’98)– Identify nonlinear manifold by applying PCA to
data embedded in high-dimensional space
• Principal Curves and Principal Geodesic Analysis (Hastie-Stuetzle’89, Tishbirany ‘92, Fletcher ‘04)
Generalized Principal Component Analysis• Given a set of points lying in multiple subspaces, identify
– The number of subspaces and their dimensions– A basis for each subspace– The segmentation of the data points
• “Chicken-and-egg” problem– Given segmentation, estimate subspaces– Given subspaces, segment the data
Prior work on subspace clustering• Iterative algorithms:
– K-subspace (Ho et al. ’03), – RANSAC, subspace selection and growing (Leonardis et al. ’02)
• Probabilistic approaches: learn the parameters of a mixture model using e.g. EM – Mixtures of PPCA: (Tipping-Bishop ‘99): – Multi-Stage Learning (Kanatani’04)
• Initialization– Geometric approaches: 2 planes in R3 (Shizawa-Maze ’91)– Factorization approaches: independent subspaces of equal
dimension (Boult-Brown ‘91, Costeira-Kanade ‘98, Kanatani ’01)– Spectral clustering based approaches: (Yan-Pollefeys’06)
Basic ideas behind GPCA• Towards an analytic solution to subspace clustering
– Can we estimate ALL models simultaneously using ALL data?– When can we do so analytically? In closed form?– Is there a formula for the number of models?
• Will consider the most general case– Subspaces of unknown and possibly different dimensions– Subspaces may intersect arbitrarily (not only at the origin)
• GPCA is an algebraic geometric approach to data segmentation– Number of subspaces = degree of a polynomial– Subspace basis = derivatives of a polynomial– Subspace clustering is algebraically equivalent to
• Polynomial fitting • Polynomial differentiation
Applications of GPCA in computer vision • Geometry
– Vanishing points
• Image compression
• Segmentation– Intensity (black-white)– Texture – Motion (2-D, 3-D)– Video (host-guest)
• Recognition– Faces (Eigenfaces)
• Man - Woman– Human Gaits– Dynamic Textures
• Water-bird
• Biomedical imaging• Hybrid systems identification
Introductory example: algebraic clustering in 1D
• Number of groups?
Introductory example: algebraic clustering in 1D
• How to compute n, c, b’s?– Number of clusters
– Cluster centers
– Solution is unique if
– Solution is closed form if
Introductory example: algebraic clustering in 2D
• What about dimension 2?
• What about higher dimensions?– Complex numbers in higher dimensions?– How to find roots of a polynomial of quaternions?
• Instead– Project data onto one or two dimensional space– Apply same algorithm to projected data
Representing one subspace• One plane
• One line
• One subspace can be represented with– Set of linear equations– Set of polynomials of degree 1
De Morgan’s rule
Representing n subspaces• Two planes
• One plane and one line– Plane:– Line:
• A union of n subspaces can be represented with a set of homogeneous polynomials of degree n
Veronese map
Fitting polynomials to data points• Polynomials can be written linearly in terms of the vector of coefficients by
using polynomial embedding
• Coefficients of the polynomials can be computed from nullspace of embedded data– Solve using least squares– N = #data points
Finding a basis for each subspace• Case of hyperplanes:
– Only one polynomial
– Number of subspaces
– Basis are normal vectors
• Problems– Computing roots may be sensitive to noise– The estimated polynomial may not perfectly factor with noisy– Cannot be applied to subspaces of different dimensions
• Polynomials are estimated up to change of basis, hence they may not factor, even with perfect data
Polynomial Factorization (GPCA-PFA) [CVPR 2003]• Find roots of polynomial of degree in one variable• Solve linear systems in variables• Solution obtained in closed form for
Finding a basis for each subspace
• To learn a mixture of subspaces we just need one positive example per class
Polynomial Differentiation (GPCA-PDA) [CVPR’04]
Choosing one point per subspace• With noise and outliers
– Polynomials may not be a perfect union of subspaces
– Normals can estimated correctly by choosing points optimally
• Distance to closest subspace without knowing segmentation?
GPCA for hyperplane segmentation• Coefficients of the polynomial can be computed from null
space of embedded data matrix– Solve using least squares– N = #data points
• Number of subspaces can be computed from the rank of embedded data matrix
• Normal to the subspaces can be computed from the derivatives of the polynomial
GPCA for subspaces of different dimensions• There are multiple polynomials
fitting the data
• The derivative of each polynomial gives a different normal vector
• Can obtain a basis for the subspace by applying PCA to normal vectors
GPCA for subspaces of different dimensions• Apply polynomial embedding to projected data
• Obtain multiple subspace model by polynomial fitting
– Solve to obtain– Need to know number of subspaces
• Obtain bases & dimensions by polynomial differentiation
• Optimally choose one point per subspace using distance
An example• Given data lying in the union
of the two subspaces
• We can write the union as
• Therefore, the union can be represented with the two polynomials
An example• Can compute polynomials from
• Can compute normals from
Dealing with high-dimensional data• Minimum number of points
– K = dimension of ambient space– n = number of subspaces
• In practice the dimension of each subspace ki is much smaller than K
– Number and dimension of the subspaces is preserved by a linear projection onto a subspace of dimension
– Can remove outliers by robustly fitting the subspace
• Open problem: how to choose projection?– PCA?
Subspace 1
Subspace 2
GPCA with spectral clustering• Spectral clustering
– Build a similarity matrix between pairs of points– Use eigenvectors to cluster data
• How to define a similarity for subspaces?– Want points in the same subspace to be close– Want points in different subspace to be far
• Use GPCA to get basis
• Distance: subspace angles
Comparison of PFA, PDA, K-sub, EM
Summary• GPCA: algorithm for clustering subspaces
– Deals with unknown and possibly different dimensions– Deals with arbitrary intersections among the subspaces
• Our approach is based on– Projecting data onto a low-dimensional subspace– Fitting polynomials to projected subspaces– Differentiating polynomials to obtain a basis
• Applications in image processing and computer vision– Image segmentation: intensity and texture– Image compression– Face recognition under varying illumination
For more information,
Vision, Dynamics and Learning Lab @
Johns Hopkins University
Thank You!Thank You!