Robust Multiple Manifolds Structure Learning · 2018. 9. 10. · manifold clustering is investigated in (Goh & Vidal, 2007), which mainly focuses on separated manifolds. As an extension

Robust Multiple Manifolds Structure Learning

Dian Gong, Xuemei Zhao, Gérard Medioni DIANGONG,XUEMEIZ ,[email protected]

University of Southern California, Institute for Roboticsand Intelligent Systems, Los Angeles, CA 90089, USA

Abstract

We present a robust multiple manifolds structurelearning (RMMSL) scheme to robustly estimatedata structures under the multiple low intrinsicdimensional manifolds assumption. In the lo-cal learning stage, RMMSL efficiently estimateslocal tangent space by weighted low-rank ma-trix factorization. In the global learning stage,we propose a robust manifold clustering methodbased on local structure learning results. Theproposed clustering method is designed to getthe flattest manifolds clusters by introducing anovel curved-level similarity function. Our ap-proach is evaluated and compared to state-of-the-art methods on synthetic data, handwritten digitimages, human motion capture data and motor-bike videos. We demonstrate the effectiveness ofthe proposed approach, which yields higher clus-tering accuracy, and produces promising resultsfor challenging tasks of human motion segmen-tation and motion flow learning from videos.

1. Introduction

The concept of manifold has been extensively used inalmost all aspects of machine learning such as non-linear dimension reduction (Saul & Roweis, 2003), visu-alization (Lawrence, 2005; van der Maaten, 2009), semi-supervised learning (Singh et al., 2009), multi-task learn-ing (Agarwal et al., 2010) and regression (Steinke & Hein,2009). Related methods have been applied to many real-world problems in computer vision, computer graphics,web data mining and more. Despite the success of mani-fold learning (in this paper, manifold learning refers to anylearning technique that explicitly assumes data have mani-folds structures), people find there are several fundamentalchallenges in real applications,

(1) Multiple manifolds with possible intersections: in many

Appearing inProceedings of the 29th International Conference onMachine Learning, Edinburgh, Scotland, UK, 2012. Copyright2012 by the author(s)/owner(s).

circumstances there is no unique (global) manifold but anumber of manifolds with possible intersections. For in-stance, in handwritten digit images, each digit forms itsown manifold in the observed feature space. For humanmotion, joint-position (angle) of key points in body skele-ton form low dimensional manifolds for each specific ac-tion (Urtasun et al., 2008). In these situations, modelingdata as a union of (linear or non-linear) manifolds ratherthan a single one can give us a better foundation for manytasks such as semi-supervised learning (Goldberg et al.,2009) and denoising (Hein & Maier, 2007).

(2) Noise and outliers: one critical issue of manifold learn-ing is whether the method is robust to noise and outliers.This has been pointed out in the pointer work on nonlin-ear dimension reduction (Saul & Roweis, 2003). For non-linear manifolds, since it is not possible to leverage all datato estimate local data structure, more samples from mani-folds are required as the noise level increases.

(3) High curvature and local linearity assumption: typi-cal manifold learning algorithms approximate manifolds bythe union of locally linear patches (possibly overlapped).These local patches are estimated by linear methods suchas principal component analysis (PCA) and factor analysis(FA) (Teh & Roweis, 2003). However, for high curvaturemanifolds, many smaller patches are needed, but this oftenconflicts with the limited number of data samples.

In this paper, we investigate the problem of robustly es-timating data structure under the multiple manifolds as-sumption. In particular, data are assumed to be sam-pled from multiple smooth submanifolds of (possibly dif-ferent) low intrinsic dimensionalities with noise and out-liers. The proposed scheme named Robust Multiple Man-ifolds Structure Learning (RMMSL) is composed of twostages. In the local stage, we estimate local manifoldsstructure taking into account of noise and curvature. Theglobal stage, i.e., manifold clustering and outlier detec-tion, is performed by constructing a multiple kernel sim-ilarity graph based on local structure learning results by in-troducing a novel curved-level similarity function. Thus,issue (1) is explicitly addressed, and (2) and (3) are par-tially investigated. A demonstration of the proposed ap-proach is given in Fig.1. It is worth noting that other


Figure 1.A demonstration of RMMSL. From left to right, noisydata points sampled from two intersecting circles with outliers,outlier detection results, and manifolds clustering results afteroutlier detection.

types of global learning tasks such as dimension reduc-tion, denoising and semi-supervised learning can be han-dled based on each individual manifold cluster by ex-isting methods (Lawrence, 2005; van der Maaten, 2009;Hein & Maier, 2007; Sinha & Belkin, 2010).

Our problem statement is rigorously expressed as follows.Data{xi}

n1+n2i=1 ∈R

D×(n1+n2) consist of inliers{xi} (1≤ i ≤n1) and outliers{x j} (n1+1≤ j ≤ n1+n2). Inlier points{xi}

n1i=1 ∈ R

D×n1 are assumed to be sampled from multiplesubmanifoldsMc (1≤ c≤ nc) as follows,

xi = fC(xi)(τi)+ni , i = 1,2, ..,n1 (1)

whereC(xi) ∈ {1,2, ...,nc} is the manifold label functionfor xi . ni is inlier noise inRD. fc(·) is the smooth mappingfunction that maps latent variableτi from latent spaceRdc

to ambient spaceRD for manifoldMc. dc(< D) is the in-trinsic dimensionality ofMc and it can vary for differentmanifolds. The task of local structure learning is to esti-mate the tangent spaceTxiM and the local intrinsic dimen-sionality dC(xi) at eachxi . This is equivalent to estimate

J( fci (·);τi) ∈ RD×dci , i.e., the Jacobian matrix offci (·) at

τi . Details are given in section 3. The tasks of global struc-ture learning addressed in this paper are, to detect outliers{x j} (n1+1≤ j ≤ n1+n2) and to assign manifolds clusterlabelci ∈ {1,2, ...,nc} for inliers{xi} (1≤ i ≤ n1) (Fig. 2).This is given in section 4. Experimental results are shownin section 5 and followed by conclusion in section 6.

2. Related Work

Recent works on manifold learning and latent vari-able modeling focus on low-dimensional embedding andlabel propagation for high dimensional data, whichare usually assumed to be sampled from a singlemanifold (Saul & Roweis, 2003; Sinha & Belkin, 2010;Lawrence, 2005; van der Maaten, 2009). To address themultiple manifolds problem, (Goldberg et al., 2009) givestheoretical analysis and a practical algorithm for semi-supervised learning with the multi-manifold assumption.As a complementary approach to previous works, RMMSLmainly focuses on unsupervised learning with the multi-manifold assumption and it can be combined with existing

approaches1.

In the global structure learning stage, RMMSL focuses onclustering and outlier detection. Clustering is a long stand-ing problem and we only review manifold related cluster-ing methods. If data lie on a low-dimensional submanifold,distance on the manifold is used to replace the Euclideandistance in the clustering process. This leads to spectralclustering (Ng et al., 2002; Zelnik-Manor & Perona, 2005;Maier et al., 2009), one of the most popular modern cluster-ing algorithms. We refer (von Luxburg, 2007) as a surveyfor spectral clustering. However, most of these works donot explicitly consider multiple intersecting manifolds.

State-of-the-art multiple subspace learning methodssuch as (Elhamifar & Vidal, 2009; Vidal et al., 2010;Yan & Pollefeys, 2006) acquire good performance onmultiple linear (intersecting) subspace segmentation withthe assumption that the intrinsic manifolds have linearstructures. However, real data often have nonlinear intrin-sic structures, which brings difficulty for them. Nonlinearmanifold clustering is investigated in (Goh & Vidal, 2007),which mainly focuses on separated manifolds. As anextension of ISOMAP, (Souvenir & Pless, 2005) proposesan EM algorithm to perform multiple manifolds clustering,but results are sensitive to initializations and the E-stepis heuristic. Recently, (Wang et al., 2011) proposes anelegant solution of manifolds clustering by constructingthe affinity matrix based on estimated local tangent space.

3. Local Manifold Structure Estimation

Correctly and efficiently estimating local data structure isa crucial step for data analysis. As explained before, prob-lems like noise and high curvature make local structure es-timation challenging. This section addresses the issue ofhow to model and represent local structure information onmanifolds. We start from local Taylor expansion, and lo-cal manifold tangent space is represented by the Jacobianmatrix under the local isometry assumption.

Local Taylor Expansion. Without additional assump-tions, the model in eq.1 (section 1) is not well defined.For instance, iff (·) and point set{τi} satisfy eq.1, thenf (g−1(·)) and point set{g(τi)} satisfy it too, whereg(·) isany invertible and differential mapping function fromRd

to Rd. Thus, thelocal isometryassumption is enforced at

point τi ∈ Rd,

|| f (τ)− f (τi)||2 = ||τ− τi ||2+o(||τ− τi ||2) (2)

whereτ is in theε−neighborhoodof τi . The above con-dition implies thatJ( f (·);τi) ∈ R

D×d is an orthonormalmatrix (Smith et al., 2009), i.e.,JT( f (·);τi)J( f (·);τi) = Id.

1It can also be used to supervised learning with minor modifi-cations.


From eq.1, by using Taylor expansion to incorporate bothTaylor approximation error and inlier noise we get,

xi j −xi = J( f (·);τi)(τi j − τi)+ei j +n,i j

ei j ∼ o(||τi j − τi ||2)(3)

wherexi j (1≤ j ≤mi) aremi elements in theε ball of xi andnoise vectorn,i j

denotesni j −ni . By using matrix notation,we have,

Xi −xi1Tmi

= Ji(T i − τi1Tmi)+Ei +Ni (4)

Assume there aremi points {τi j ,xi j}mij=1 in the ε −

neighborhoodof {τi ,xi}, the local data matrixXi is definedas [xi1,xi2, ...,ximi

] ∈ RD×mi . The corresponding latent co-

ordinate matrixT i for Xi is denoted as[τi1,τi2, ...,τimi] ∈

Rd×mi , the corresponding local Taylor approximation error

matrix Ei is denoted as[ei1,ei2, ...,eimi] ∈ R

D×mi andNi isthe local inlier noise matrix[n,i1,n

,i2, ...,n,imi

] ∈ RD×mi . Ji is

the short notation ofJ( f (·);τi).

It is natural to assume noisen,i jto be i.i.d with homo-

geneous Gaussian distributionN(0,σ2nID) (1 ≤ j ≤ mi).

Furthermore, we treat errors as independent Gaussian ran-dom vectors with different covariance matrices, i.e.,ei j ∼

N(0,σ2IDαi j ). By adding noise and error together, aninte-grated errorvectorεi j is defined as,

εi j = n,i j+ei j ∼ N(0,σ2

nID +σ2αi j ID)

j = 1,2, ...,mi

(5)

whereσ2n indicates the scale of the noise’s covariance and

σ2 indicates the scale of the error.αi j reflects the inhomo-geneous property of different error of Taylor expansion onpointsxi j (based onxi). Thus, noise and error are jointlyconsidered. Instead of treating errorei j as homogeneousacross different points, we argue the error is proportionaltothe relative distance||τi j − τi ||2, based on the natural prop-erty of Taylor expansion.

Inference. To reflect the inhomogeneous property of er-ror εi j (combine both Taylor approximation error and inliernoise), we propose the following objective function to esti-mateJi ,

L (Ji) = ||EiS||2F

= ||{(Xi −xi1Tmi)−Ji(T i − τi1

Tmi)}S||2F

(6)

whereS∈ Rmi×mi is the diagonal weight matrix withsj j =

s(τi j ,τi) indicating the importance to minimize errorεi j onxi j . Intuitively, we emphasize the error minimization forxi j

more thanxik if sj j is larger thanskk.

Then we discuss how to chooseαi j and determineSaccord-ingly. First, we chooseαi j = α(||τi j −τi ||2) whereα(·) is a

monotonically non-decreasing function in the non-negativedomain. Supported by the fact of local-isometry (eq.2), wehaveαi j ∼ α(||xi j −xi ||2). So,

εi j ∼ N(0,(σ2n+σ2α(||xi j −xi ||2))ID)

j = 1,2, ...,mi(7)

Eq. 7 immediately suggests a weight function,sj j = s(‖τi j − τi‖2)= 1/(σ2

n+σ2α(||xi j −xi ||2)). Given the functionα(·), s(·) can be determined and we get the weight matrixS. Then, eq.6 can be effectively solved by the followingoptimization framework

argmaxJi

||(JTi X̃iSST X̃

Ti Ji)||∗,s.t.,J

Ti Ji = Id (8)

Essentially, the solution of eq.8 is just the largestd eigen-

vectors of the matrix̃XiSST X̃Ti (X̃ = (Xi −xi1T

mi)), which is

calledlocal structure matrix Ti ∈ RD×D. The local intrin-

sic dimensionalityd can be estimated by finding the largestgap between eigenvalues ofT i (Mordohai & Medioni,2010).

Analysis. By modeling inhomogeneous errorei j , curvatureeffect (Hessian) is implicitly considered without high orderterms. Furthermore, for a large range ofα(·) (such as highorder polynomial), weightsj j is quite small when||xi j −xi ||2 is large. Thus, outliers will not affect the estimationresults much.

It is also interesting to compare the local learning meth-ods with different kernel functionsα(·). For standard low-rank matrix approximation by Singular Value Decomposi-tion (SVD), α(·) is a constant function. This is becauseSVD assumes data lie on a linear subspace without consid-ering curvature or outliers. SVD can be improved to RobustSVD, by introducing the robust influence function to han-dle outliers (De la Torre & Black, 2003). However, RobustSVD is still constrained by the linear model and the com-putational cost is high because of the iterative computation.On the other hand, Tensor Voting (TV) uses Gaussian ker-nel (Mordohai & Medioni, 2010), which can be viewed asa special case ofs(·) whenσn = 0. This is because standardTensor Voting does not consider inlier noise.

4. Global Manifold Structure Learning

In this global stage of RMMSL, we focus on multiplesmooth manifolds clustering based on local structure learn-ing results.

In contrast to previous works, the clustering stage inRMMSL can handle multiple non-linear (possibly inter-secting) manifolds with different dimensionalities and itexplicitly considers outlier filtering, which is addressedasone step in the clustering process. Compared to the stan-dard assumptions in clustering, i.e., the intra-class distance


Figure 2.An example of multiple smooth manifolds clustering.The first one is the input data samples and the other three arepossible clustering results. Only the rightmost is the result wewant because the underlying manifolds are smooth.

should be small while the inter-class distance should belarge, we further argue that each cluster should be a smoothmanifold component. As shown in Fig.2, when two man-ifolds intersect, there exist multiple possible clustering so-lutions, while only the rightmost is the result we want. Theunderlying assumption we make is local manifold has rela-tively low curvature, i.e. it changes smoothly and slowly asspatial distance increases. In order to get theflattestman-ifold clustering result, it is natural to incorporate aflatnessmeasure into the clustering objective function.

Curved-Level Measure.As an approximation of (continu-ous) Laplacian-Bertrami operator (Hein et al., 2005), (dis-crete) graph Laplacian can measure the smoothness of theunderlying manifold. However, computing graph Lapla-cian is a global process and most of the theoretical resultscan not be easily adapted to the multi-manifold setting. Onthe other side, curvature is a local measurement to indi-cate the curved degree of a geometric object. As a gener-alization of the curvature for 1D curve and 2D surface, theRicci curvature tensor is proposed to represent the amountby which the volume element of a geodesic ball in a curved(Riemannian) manifold deviates from that of the standardball in Euclidean space.

Inspired by the idea of curvature, the followingcurved-levelmeasurementR(x) is considered,

R(x) = ∑xi∈N(x)

||θ(Ji ,J)||d(xi ,x)

(9)

θ(Ji ,J) measures the principal angle between the tangentspaceJi ∈ ℜD×di (TxiM) andJ ∈ ℜD×d (TxM). d(xi ,x)is the geodesic distance betweenxi and x. N(x) is thespatial neighborhood points set forx. Intuitively, R(x) isanalogous (up to a constant variation) to the integration ofthe unsigned principal curvatures along different directionsat x. The theoretical analysis on the connection betweenR(x) and curvature (such as mean curvature) as well as theasymptotic behavior when the number of data samples goesto infinity is left for future investigation. Based on eq.9, wecan measure the (approximate)total curvedlevel on onemanifold clusterMk by summing upR(·) on all data sam-

plesxi belonging to this cluster,

R(Mk) = ∑xi∈Mk

R(xi) = ∑xi∈Mk,(i, j)∈G

||θ(Ji ,J j)||

d(xi ,x j)(10)

where G is the undirected neighborhood graph built ondata samples (ε-neighborhood orK-nearest neighborhoodgraph).

Objective Function. Eq. 10 provides an empirical way tomeasure thetotal curved degreeon one particular manifoldclusterMk. This can be viewed as one type of the intra-cluster dissimilarity in the standard clustering framework.In order to get the balanced clustering results, we also con-sider the inter-cluster dissimilarity function as follows,

R(Mk,Ml ) = ∑xi∈Mk,x j∈Ml ,(i, j)∈G

||θ(Ji ,J j)||

d(xi ,x j)(11)

In practice, the value of||θ(Ji ,J)||/d(xi ,x) in eq.9 is un-bounded and numerically unstable. Thus, it is straightfor-ward to compound eq.9, 11 and the standard similaritykernel function (such as Gaussian) to get thenormalizedcurved measurement. In particular, we use the standardminimization framework by putting||θ(Ji ,J)||/d(xi ,x) intothe similarity function with an additional distance similar-ity function,

JRMMSL({Mk}nck=1) =

nc

∑k=1

W(Mk,Mk)

W(Mk)(12)

whereW(·) is thecontraryversion of the curved measurefunction R(·), i.e., theflatter the manifold is thelargervalue ofW(·) is. Mk is the complementary set ofMk.The formulation ofW(·) is,

∑xi∈Mk,(i, j)∈G

w1(||θ(Ji ,J j)||

d(xi ,x j))w2(d(xi ,x j)) (13)

wherew1(·) andw2(·) are similarity kernel functions thatcan be chosen as Gaussian or other standard formulations(similar for W(Mk,Ml )). Due to the shrinkage effect ofkernel,d(·) is further approximated by Euclidean distance.Intuitively, W(Mk) is one type of the intra-class similarityfunction on one manifold cluster andW(Mk,Ml ) is theinter-class similarity function between two clusters. Theoptimalnc-classes clustering results are obtained by mini-mizing eq.12.

Algorithm. Indeed, eq.12 can be viewed as a multi-class normalized-cut with novel similarity measurementprovided by local curved similarity and distance similarityfunctions (von Luxburg, 2007). Directly minimizing eq.12is an NP-hard problem, but it can be relaxed by graph spec-tral clustering in the following procedure.


Step1. Before (global) manifold clustering, local structurelearning of RMMSL in sec.3 is performed to estimate thelocal tangent spaceJi ∈ ℜD×di at each pointxi . di can belocally estimated from the local learning stage of RMMSLor chosen as a fixed value in advance. Also, the neigh-borhood graphG is built on all input data samples{xi}

ni=1

(n= n1+n2).

Step 2. Constructing the similarity matrixW =[w(xi ,x j)]i, j ∈ ℜn×n by the following two kernels. Thefirst kernel is the pairwise distance kernel, which iswidely used in graph spectral clustering and defined asw1(xi ,x j) = exp{−||xi − x j ||

2/(σiσ j)}. In particular, weuse the idea from self-tuning spectral clustering to selectthe local bandwidthσi and σ j (Zelnik-Manor & Perona,2005). The second one is curved level kernelw2(xi ,x j) =exp{−(θ(Ji ,J j))

2/(||xi − x j ||2(σ2

c/σiσ j))}, where σc isused to control the effect of this curved similarity. Then,w(xi ,x j) is set as

wi j = w1(xi ,x j)w2(xi ,x j) =

exp{−(||xi −x j ||

2

σiσ j+

θ(Ji ,J j)2

||xi −x j ||2σ2c/σiσ j

)}.(14)

Step̃3 (optional). Based onW, outlier detection (filtering)can be done as described later.

Step4. Once we have the similarity matrixW, the stan-dard spectral clustering technique can be applied. Specif-ically, we compute the (unnormalized) Laplacian matrixL = D−W, whereD is a diagonal matrix whose elementsequal to the sum ofW’s corresponding rows. We select thefirst nc (number of clusters) eigenvectors of the generalizedeigenproblemLe= λDe. Finally, K-means algorithm is ap-plied on the rows of these eigenvectors. After identifyingmanifold cluster labels, many tasks such as embedding anddenoising can be performed on each manifold cluster.

Outlier Detection. We formulate the outlier detectionproblem as a manifold saliency ranking problem by therandom walk model proposed in (Zhou et al., 2004). Wedefine a random walk graph on{xi}

Ni=1 from the follow-

ing transition probability matrixP = I − Lrw = D−1W. Itcan be shown that, ifW is symmetric, then the stationaryprobabilityπ of this random walk can be calculated directlywithout complex eigen-decomposition (Zhou et al., 2004),

π = 1Tn D/||W||1 (15)

where 1n is ann×1 column vector with all elements as 1and|| · ||1 is the entry-wise 1-norm of a matrix. It is straight-forward that, ranking data according toπ is the same asthe un-normalized distribution 1T

n D. After ranking, bottompoints can be filtered out as outliers. The ratio of the out-liers can be given as a prior or be estimated by performingK-means on 1Tn D.

−100−50

050

100

−100

−50

0

50

100

−100

−50

0

50

100

−100−50

050

100

−100

−50

0

50

100

−100

−50

0

50

100

−100−50

050

100

−100

−50

0

50

100

−100

−50

0

50

100

−100−50

050

100

−100

−50

0

50

100

−100

−50

0

50

100

−100−50

050

100150

200250

−100

−50

0

50

100

−100

−50

0

50

100

−100−50

050

100150

200250

−100

−50

0

50

100

−100

−50

0

50

100

−100−50

050

100150

200250

−100

−50

0

50

100

−100

−50

0

50

100

−100−50

050

100150

200250

−100

−50

0

50

100

−100

−50

0

50

100

020

4060

80100

0

20

40

60

80

0

20

40

60

80

100

0 20 40 60 80 100

0

20

40

60

80

0

20

40

60

80

100

020

4060

80100

0

20

40

60

80

0

20

40

60

80

100

020

4060

80100

0

20

40

60

80

0

20

40

60

80

100

Figure 3.Visualization of part of the clustering results in Ta-ble 1. The first row : one noisy sphere inside anothernoisy sphere inℜ3. The second row: two intersecting noisyspheres inℜ3. The third row: two intersecting noisy planesin ℜ3. For each part from left to right: K-means, self-tuningspectral clustering (Zelnik-Manor & Perona, 2005), GeneralizedPCA (Vidal et al., 2010) and RMMSL.

Analysis. The key idea of this algorithm is to present anovel way to construct the similarity matrixW in a multi-ple kernel setting (based on local structure estimations) toencourageflatter clustering results. It is worth noting thatif two points have different tangent spaces, the similarity(eq. 14) becomes smaller when two points get closer (ina range). This can be viewed as an intuitive explanationthat why RMMSL can handle multiple intersecting mani-folds. From the high level point of view, the pairwise simi-larity incorporates the information on two local points setsrather than two single points only. Similar ideas were pro-posed in (Elhamifar & Vidal, 2009; Goldberg et al., 2009;von Luxburg et al., 2011).

5. Experiments

We evaluate the performance of RMMSL on synthetic data,USPS digits, CMU Motion Capture data (MoCap) and Mo-torbike videos. These data are chosen to demonstrate thegeneral capability of RMMSL as well as the advantageson nonlinear manifolds. We investigate the performance ofmanifold clustering and further applications such as humanaction segmentation and motion flow modeling. We alsoperform quantitative comparisons of the local tangent spaceestimations. In particular, the weighted low-rank matrix de-composition (local structure learning in RMMSL) is com-pared with the local SVD (or local PCA (Teh & Roweis,2003)) and ND-TV (Mordohai & Medioni, 2010). Resultsshow that RMMSL is more robust to curvature and outliersthan other methods. Also, if the manifold has relatively lowcurvature and is outlier free, then RMMSL and local SVDhave similar results. Details are omitted due to space limit.


Data/Methods K-means NJW Clustering Self-tuning Clustering GPCA SSC RMMSL

Big-small spheres 0.50 1.00 1.00 0.51 0.56 1.00Two-intersecting spheres 0.76 0.78 0.84 0.50 0.53 0.95Two-intersecting planes 0.51 0.60 0.62 0.85 0.93 0.95USPS-2200 0.80 0.89 0.89 - - 0.90USPS-5500 0.78 0.88 0.88 - - 0.88CMU MoCap 0.69 0.81 0.81 - - 0.89Motorbike Video 0.72 0.84 0.85 0.85 0.87 0.96

Table 1.Rand index scores of clustering on synthetic data, USPS digits, CMU MoCapsequences and Motorbike videos.

−20

−15

−10

−5

0

5

10

15

20

25

−20

−10

0

10

20

30

0

50

−20

−15

−10

−5

0

5

10

15

20

25

−20

−10

0

10

20

30

0

50

−15

−10

−5

0

5

10

15

−20

−15

−10

−5

0

5

10

15

20

25

0

50

−20

−15

−10

−5

0

5

10

15

20

25

−20

−10

0

10

20

30

0

50

Figure 4.Clustering results of RMMSL on two manifolds withoutliers. From left to right: ground truth, outlier detection, clus-tering after outlier filtering and clustering without outlier filtering.

5.1. Multiple Manifolds Clustering

In this task, quantitative comparisons of manifolds clus-tering are provided on three data sets. We compareRMMSL (global structure learning) to K-means, spec-tral clustering (NJW algorithm (Ng et al., 2002)), Self-tuning spectral clustering (Zelnik-Manor & Perona, 2005),Generalized PCA (GPCA) (Vidal et al., 2010) and SparseSubspace Clustering (SSC) (Elhamifar & Vidal, 2009).These methods (except K-means) are chosen becausethey are related to manifold learning or subspace learn-ing. We also perform comparisons with other spec-tral clustering and subspace clustering methods suchas (Yan & Pollefeys, 2006) and (Goh & Vidal, 2007), andget the same conclusion, but results are omitted due tospace limit. Rand Index score is used as the evalua-tion metric. The kernel bandwidthσ in spectral clus-tering is tuned in{1,5,10,20,50,100,200} for syntheticdata, MoCap and videos, and{100,500,1000,2000,5000}for USPS. The value of theKth neighborhood in self-tuning spectral clustering and RMMSL is chosen from{5,10,15,20,30,50,100}. σc in the global stage ofRMMSL is chosen from{0.2,0.5,1,1.5,2}. In the lo-cal stage, quadraticα(·) is used andσn is set as 1.The sparse regularization parameter of SSC is tuned in{0.001,0.002,0.005,0.01,0.1}. For synthetic data withrandom noise, the parameters for all methods are selectedon 5 trials and then the average performance on another 50trials is reported. For real data, parameters are selected bypicking the best Rand Index. For all methods containingK-means, 100 replicates are performed.

Synthetic Data. Since most clustering algorithms do notconsider outliers explicitly, we first perform a comparison

on 3 outlier free synthetic data, while each contains 2000noisy samples from two manifolds inℜ3 (d= 2 andD= 3).Results are shown in Fig.3. Rand index scores are given inthe first three rows of Table1. For all methods, the num-ber of clustersnc is fixed as 2. Results (Table1) show thatRMMSL achieves comparable and often superior perfor-mance than other candidate methods. In particular, whentwo manifolds are nonlinear and have intersections, suchas two intersecting spheres (second row), the advantage ofRMMSL is clearest.

To verify the robustness, we further evaluate RMMSL onsynthetic data with outliers. We add 100 outliers and a2D plane (1000 samples) on the Swiss roll (2000 samples)to generate two intersecting manifolds inℜ3. The resultsof RMMSL are shown in Fig.4. RMMSL effectively fil-ters out outliers and achieves 0.96 Rand score (nc = 2) and0.99 F-measure for outlier detection when the ratio is given.Also, the Rand score is reduced to 0.78 if we do clusteringwithout outlier filtering (nc = 3). This fact suggests that theoutlier detection step is helpful if outliers exist. It is worthnoting that spectral clustering methods cover broader casesthan RMMSL, which mainly has advantage on multiplelow-dimensional manifolds embedded in high dimensionalspace. For instance, in the case of two 2D Gaussian dis-tributed clusters inℜ2, RMMSL is reduced to self-tuningspectral clustering (all local tangent spaces are ideally iden-tical). Compared with multiple subspace learning methodssuch as GPCA and SSC, which are the state-of-the-art forlinear manifold clustering, our approach is better when un-derlying manifolds arenonlinear.

USPS Digits.We choose two subsets of USPS hand writ-ten digits images. The first contains 1100 samples for dig-its 1 and 2 each (USPS-2200) and the second contains 1100samples for digits 1 to 5 each (USPS-5500).D is reducedfrom 256 (size 16×16 images) to 50 by PCA andd is fixedas 5. Due to the highly nonlinear image structure, resultsof subspace clustering methods are not reported. For USPSdata, the possible high intrinsic dimensionality v.s. the lim-ited number of samples bring difficulties for data structurelearning, especially the local learning stage of RMMSL.Nevertheless, RMMSL achieves comparable results.


Figure 5.An example of human action segmentation results on CMU MoCap. Top left, the 3D visualization of the sequence. Top right,labeled ground truth and clustering results comparison. Bottom, 9 uniformly sampled human poses.

MoCap Data. The automatic clustering of human motionsequences into differentaction unitsis a necessary step formany tasks such as action recognition and video annota-tion. Usually this is referred astemporal segmentation. Inorder to make a fair comparison among different cluster-ing methods, we focus on the non-temporal setting, i.e.,the temporal index is removed and sequences are treatedas collections of static human poses. We choose 5 mixedaction sequences from subject 86 in the CMU MoCap. Weuse the joint-position (45-dimensional representation for 15human body markers inℜ3) features which are centralizedto remove the global motion. The average Rand scores arereported in Table1. It shows that RMMSL achieves higherclustering accuracy than other candidate methods.

One motion sequence (500 frames) and the correspondingresults are visualized in Fig.5. The subject walks, thenslightly turns around and sits down. By combining the locallearning results from RMMSL and (Teh & Roweis, 2003),joint-position features (ℜ45) from this sequence are visual-ized inℜ3. This figure supports the assumption that thereare low-dimensional manifolds in the joint-position space.In fact, this MoCap sequence can be viewed as threecon-nectednonlinear motion manifolds, corresponding to walk-ing, turn-around and sit-down respectively.

Motion Flow Modeling. Unsupervised motion flow mod-eling is performed on videos (Lin et al., 2011). The goalis to analyze coordinated movements formed by multipleobjects, extractsemanticlevel information, and understandwhat’s happening in the scene. Given motorbike videos asshown in Fig.6 (from YouTube), global motion pattern islearned from low level motion features. This is an impor-tant task for video analysis and can be served as a foun-dation for many applications such as object tracking andabnormal event detection (Lin et al., 2010; 2011).

Differ from the probabilistic approaches in (Lin et al.,2011), we formulate motion flow learning as a manifold

clustering problem. In the experiments, optical flowson salient feature points are estimated by Lucas-Kanadealgorithm. Every feature point has 4D information of(x,y,vx,vy). Then motion directionθ is calculated, and ev-ery point is embedded to(x,y,θ) space. We observe thatcoordinated group movements form into manifold struc-tures in(x,y,θ) space. Therefore, the points are used asthe input. The first video containsn= 9266 points and thesecond one containsn = 8684 points. We use RMMSLto learn the global motion manifolds by doing manifoldclustering (nc = 2) and get the best Rand scores which arereported in Table1. Since optical flow results are noisy,outlier filtering is performed before clustering. The mo-tion manifold learning results of two motorbike videos areshown in Fig.6, where motion manifolds are visualizedon images after kernel density interpolation. From the re-sults we can see that the clustered manifolds have clearsemantic meanings, since each manifold corresponds to acoordinated movement formed by a group of motorbikes.Therefore, RMMSL correctly learns global motion to helpunderstand the video scenes.

6. Conclusion

Robust Multiple Manifolds Structure Learning is proposedto effectively learn the data structure by considering noise,curved-level and multiple manifolds assumption. In partic-ular, the estimated local structure is used to assist the globalstructure learning tasks of clustering and outlier detection.The algorithm is evaluated and compared to other state-of-the-art clustering methods. The results are encouraging: onboth synthetic and real data, the proposed algorithm yieldssmaller clustering errors in challenging cases, especiallywhen multiple nonlinear manifolds intersect. Furthermore,the results on action segmentation and motion flow model-ing demonstrate RMMSL’s capability for broad challeng-ing applications in real world.


Figure 6.Two examples of motion flow modeling results on motorbike videos. From left to right: two images samples, optical flowresults, learned motion manifolds with highlighted motion directions.

Acknowledgements

This work was supported in part by NIH Grant EY016093and DE-FG52-08NA28775 from the U.S. Department ofEnergy. We thank Fei Sha for helpful discussions.

ReferencesAgarwal, A., III, H. Daume, and Gerber, S. Learning multiple

tasks using manifold regularization. InNIPS, pp. 46–54. 2010.

De la Torre, F. and Black, M. J. A framework for robust subspacelearning.IJCV, 54:117–142, 2003.

Elhamifar, E. and Vidal, R. Sparse subspace clustering. InCVPR,2009.

Goh, A. and Vidal, R. Segmenting motions of different types byunsupervised manifold clustering. InCVPR, 2007.

Goldberg, A., Zhu, X., Singh, A., Xu, Z., and Nowak, R. Multi-manifold semi-supervised learning. InAISTATS, 2009.

Hein, M. and Maier, M. Manifold denoising. InNIPS, pp. 561–568. 2007.

Hein, M., Audibert, J., and von Luxburg, U. From graphs to man-ifolds - weak and strong pointwise consistency of graph lapla-cians. InCOLT, pp. 470–485, 2005.

Lawrence, N. Probabilistic non-linear principal component anal-ysis with gaussian process latent variable models.JMLR, 6:1783–1816, November 2005.

Lin, D., Grimson, E., and Fisher, J. Modeling and estimatingpersistent motion with geometric flows. InCVPR, 2010.

Lin, D., Grimson, E., and Fisher, J. Construction of dependentdirichlet processes based on poisson processes. InNIPS. 2011.

Maier, M., von Luxburg, U., and Hein, M. Influence of graphconstruction on graph-based clustering measures. InNIPS, pp.351–358. 2009.

Mordohai, P. and Medioni, G. Dimensionality estimation, mani-fold learning and function approximation using tensor voting.JMLR, 11:411–450, 2010.

Ng, Andrew Y., Jordan, Michael I., and Weiss, Yair. On spectralclustering: Analysis and an algorithm. InNIPS, 2002.

Saul, L. K. and Roweis, S. T. Think globally, fit locally: un-supervised learning of low dimensional manifolds.JMLR, 4:119–155, 2003.

Singh, A., Nowak, R., and Zhu, X. Unlabeled data: Now it helps,now it doesn’t. InNIPS, pp. 1513–1520. 2009.

Sinha, K. and Belkin, M. Semi-supervised learning using sparseeigenfunction bases. InNIPS. 2010.

Smith, A., Huo, X., and Zha, H. Convergence and rate of conver-gence of a manifold-based dimension reduction algorithm. InNIPS. 2009.

Souvenir, R. and Pless, R. Manifold clustering. InICCV, 2005.

Steinke, F. and Hein, M. Non-parametric regression betweenmanifolds. InNIPS, pp. 1561–1568. 2009.

Teh, Y. W. and Roweis, S. Automatic alignment of local repre-sentations. InNIPS, pp. 841–848. 2003.

Urtasun, R., Fleet, D. J., Geiger, A., Popovic, J., Darrell, T.,and Lawrence., N. D. Topologically-constrained latent vari-able models. InICML, pp. 1080–1087, 2008.

van der Maaten, L.J.P. Learning a parametric embedding by pre-serving local structure. InAISTATS, pp. 384–391, 2009.

Vidal, R., Ma, Y., and Sastry, S.Generalized Principal Compo-nent Analysis (GPCA).Springer Verlag, 2010.

von Luxburg, U. A tutorial on spectral clustering.Statistics andComputing, 17:395–416, 2007.

von Luxburg, U., Radl, A., and Hein, M. Getting lost in space:Large sample analysis of the commute distance,. InNIPS, pp.2622–2630. 2011.

Wang, Y., Jiang, Y., Wu, Y., and Zhou, Z.-H. Spectral cluster-ing on multiple manifolds.IEEE Transactions on Neural Net-works, 22(7):1149–1161, 2011.

Yan, J. and Pollefeys, M. A general framework for motion seg-mentation: independent, articulated, rigid, non-rigid, degener-ate and non-degenerate. InECCV, pp. 94–106, 2006.

Zelnik-Manor, L. and Perona, P. Self-tuning spectral clustering.In NIPS, pp. 1601–1608. 2005.

Zhou, D., Weston, J., Gretton, A., Bousquet, O., and Schölkopf,B. Ranking on data manifolds. InNIPS. 2004.

Documents

Robust Multiple Manifolds Structure Learning · 2018. 9. 10. · manifold clustering is investigated in (Goh & Vidal, 2007), which mainly focuses on separated manifolds. As an extension