12
IEEE TRANSACTIONS ON CYBERNETICS 1 Incomplete Multiview Spectral Clustering With Adaptive Graph Learning Jie Wen , Yong Xu , Senior Member, IEEE, and Hong Liu, Member, IEEE Abstract—In this paper, we propose a general framework for incomplete multiview clustering. The proposed method is the first work that exploits the graph learning and spectral clustering techniques to learn the common representation for incomplete multiview clustering. First, owing to the good performance of low-rank representation in discovering the intrinsic subspace structure of data, we adopt it to adaptively construct the graph of each view. Second, a spectral constraint is used to achieve the low-dimensional representation of each view based on the spectral clustering. Third, we further introduce a co-regularization term to learn the common representation of samples for all views, and then use the k-means to partition the data into their respective groups. An efficient iterative algorithm is provided to optimize the model. Experimental results conducted on seven incomplete multiview datasets show that the proposed method achieves the best performance in comparison with some state-of-the-art meth- ods, which proves the effectiveness of the proposed method in incomplete multiview clustering. Index Terms—Co-regularization, graph learning, incomplete multiview clustering, low-rank representation. I. I NTRODUCTION I N THE real world, an image can be represented by various descriptors, such as SIFT, local binary pattern (LBP), HOG, etc. [1]. A webpage can be described by the images, links, texts, etc. [2]. The values of blood test and magnetic resonance images can be viewed as two references for disease diagnos- ing [3]. The above phenomenons indicate that almost all things can be represented from different perspectives (views). These features acquired from different views are regarded as the multiview features [4], [5]. Generally, multiview features can Manuscript received July 9, 2018; revised October 8, 2018; accepted November 23, 2018. This work was supported in part by the Guangdong Province High-Level Personnel of Special Support Program under Grant 2016TX03X164, and in part by the National Natural Science Foundation of Guangdong Province under Grant 2017A030313384. This paper was recom- mended by Associate Editor S. Ventura. (Corresponding author: Yong Xu.) J. Wen and Y. Xu are with the Bio-Computing Research Center, Harbin Institute of Technology, Shenzhen, Shenzhen 518055, China, and also with the Shenzhen Medical Biometrics Perception and Analysis Engineering Laboratory, Harbin Institute of Technology, Shenzhen, Shenzhen 518055, China (e-mail: [email protected]; [email protected]). H. Liu is with the Engineering Laboratory on Intelligent Perception for Internet of Things, Shenzhen Graduate School, Peking University, Shenzhen 518055, China (e-mail: [email protected]). This paper has supplementary downloadable material available at http://ieeexplore.ieee.org, provided by the author. Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TCYB.2018.2884715 represent data more comprehensive because features of dif- ferent views provide many complementary information. Thus, exploiting the multiview features has the potential to improve the performance of different machine learning tasks [6]–[12]. Multiview clustering is one of the hottest research direc- tions in this filed, which focuses on adaptively partitioning data points into their respective groups without any label information [13]–[16]. In the past few years, many multiview clustering methods have been proposed, such as the multi- view k-means clustering [17], canonical correlation analysis (CCA)-based method [18], co-regularized multiview spectral clustering [19], low-rank tensor-based method [20], the deep matrix factorization-based method [4], etc. For multiview clus- tering, ensuring the clustering agreement among all views is the key to achieving good performance [21]. To this end, Wang et al. [21] introduced a views-agreement constraint to guarantee that the data-cluster graphs learned from all views are consistent to each other. In [14], a structured low-rank matrix factorization-based method is proposed to learn more consistent low-dimensional data-cluster representations for all views by exploiting the manifold structures and introducing the divergence constraint term jointly. Besides, in [22], based on the multiview spectral clustering, a novel angular-based regularizer is imposed on the sparse representation vectors of all views to learn the consensus similarity graph shared by all views for clustering. In [8], a more robust multiview spec- tral clustering model is proposed, which can learn the optimal similarity graphs and data-cluster representations with views agreement. For these multiview clustering methods, they com- monly require that all views of data are complete. However, the requirement is often impossible to satisfy because it is often the case that some views of samples are missing in the real- world applications, especially in the applications of disease diagnosing [3] and webpage clustering [23]. This incomplete problem of views leads to the failure of the conventional multiview methods [24]. To address this issue, many efforts have been made in recent years, which can be generally categorized into two groups in terms of the exploited techniques, i.e., matrix factorization-based incomplete multiview clustering (MFIMC) and graph-based incomplete multiview clustering (GIMC). MFIMC focuses on learning a consensus representation with low dimensionality for all views directly via the matrix fac- torization technique. For example, partial multiview clustering (PMVC) seeks to learn a common latent subspace for all views, in which the instances of different views are enforced to have the same representation [3]. Different from PMVC, 2168-2267 c 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

IEEE TRANSACTIONS ON CYBERNETICS 1 Incomplete Multiview ...static.tongtianta.site/paper_pdf/d700ba0e-90fc-11e9-9b23-00163e08… · clustering methods have been proposed, such as the

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: IEEE TRANSACTIONS ON CYBERNETICS 1 Incomplete Multiview ...static.tongtianta.site/paper_pdf/d700ba0e-90fc-11e9-9b23-00163e08… · clustering methods have been proposed, such as the

IEEE TRANSACTIONS ON CYBERNETICS 1

Incomplete Multiview Spectral Clustering WithAdaptive Graph Learning

Jie Wen , Yong Xu , Senior Member, IEEE, and Hong Liu, Member, IEEE

Abstract—In this paper, we propose a general framework forincomplete multiview clustering. The proposed method is the firstwork that exploits the graph learning and spectral clusteringtechniques to learn the common representation for incompletemultiview clustering. First, owing to the good performance oflow-rank representation in discovering the intrinsic subspacestructure of data, we adopt it to adaptively construct the graphof each view. Second, a spectral constraint is used to achieve thelow-dimensional representation of each view based on the spectralclustering. Third, we further introduce a co-regularization termto learn the common representation of samples for all views, andthen use the k-means to partition the data into their respectivegroups. An efficient iterative algorithm is provided to optimizethe model. Experimental results conducted on seven incompletemultiview datasets show that the proposed method achieves thebest performance in comparison with some state-of-the-art meth-ods, which proves the effectiveness of the proposed method inincomplete multiview clustering.

Index Terms—Co-regularization, graph learning, incompletemultiview clustering, low-rank representation.

I. INTRODUCTION

IN THE real world, an image can be represented by variousdescriptors, such as SIFT, local binary pattern (LBP), HOG,

etc. [1]. A webpage can be described by the images, links,texts, etc. [2]. The values of blood test and magnetic resonanceimages can be viewed as two references for disease diagnos-ing [3]. The above phenomenons indicate that almost all thingscan be represented from different perspectives (views). Thesefeatures acquired from different views are regarded as themultiview features [4], [5]. Generally, multiview features can

Manuscript received July 9, 2018; revised October 8, 2018; acceptedNovember 23, 2018. This work was supported in part by the GuangdongProvince High-Level Personnel of Special Support Program under Grant2016TX03X164, and in part by the National Natural Science Foundation ofGuangdong Province under Grant 2017A030313384. This paper was recom-mended by Associate Editor S. Ventura. (Corresponding author: Yong Xu.)

J. Wen and Y. Xu are with the Bio-Computing Research Center, HarbinInstitute of Technology, Shenzhen, Shenzhen 518055, China, and also withthe Shenzhen Medical Biometrics Perception and Analysis EngineeringLaboratory, Harbin Institute of Technology, Shenzhen, Shenzhen 518055,China (e-mail: [email protected]; [email protected]).

H. Liu is with the Engineering Laboratory on Intelligent Perception forInternet of Things, Shenzhen Graduate School, Peking University, Shenzhen518055, China (e-mail: [email protected]).

This paper has supplementary downloadable material available athttp://ieeexplore.ieee.org, provided by the author.

Color versions of one or more of the figures in this paper are availableonline at http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TCYB.2018.2884715

represent data more comprehensive because features of dif-ferent views provide many complementary information. Thus,exploiting the multiview features has the potential to improvethe performance of different machine learning tasks [6]–[12].

Multiview clustering is one of the hottest research direc-tions in this filed, which focuses on adaptively partitioningdata points into their respective groups without any labelinformation [13]–[16]. In the past few years, many multiviewclustering methods have been proposed, such as the multi-view k-means clustering [17], canonical correlation analysis(CCA)-based method [18], co-regularized multiview spectralclustering [19], low-rank tensor-based method [20], the deepmatrix factorization-based method [4], etc. For multiview clus-tering, ensuring the clustering agreement among all views isthe key to achieving good performance [21]. To this end,Wang et al. [21] introduced a views-agreement constraint toguarantee that the data-cluster graphs learned from all viewsare consistent to each other. In [14], a structured low-rankmatrix factorization-based method is proposed to learn moreconsistent low-dimensional data-cluster representations for allviews by exploiting the manifold structures and introducingthe divergence constraint term jointly. Besides, in [22], basedon the multiview spectral clustering, a novel angular-basedregularizer is imposed on the sparse representation vectors ofall views to learn the consensus similarity graph shared byall views for clustering. In [8], a more robust multiview spec-tral clustering model is proposed, which can learn the optimalsimilarity graphs and data-cluster representations with viewsagreement. For these multiview clustering methods, they com-monly require that all views of data are complete. However, therequirement is often impossible to satisfy because it is oftenthe case that some views of samples are missing in the real-world applications, especially in the applications of diseasediagnosing [3] and webpage clustering [23]. This incompleteproblem of views leads to the failure of the conventionalmultiview methods [24].

To address this issue, many efforts have been made inrecent years, which can be generally categorized into twogroups in terms of the exploited techniques, i.e., matrixfactorization-based incomplete multiview clustering (MFIMC)and graph-based incomplete multiview clustering (GIMC).MFIMC focuses on learning a consensus representation withlow dimensionality for all views directly via the matrix fac-torization technique. For example, partial multiview clustering(PMVC) seeks to learn a common latent subspace for allviews, in which the instances of different views are enforcedto have the same representation [3]. Different from PMVC,

2168-2267 c© 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

Page 2: IEEE TRANSACTIONS ON CYBERNETICS 1 Incomplete Multiview ...static.tongtianta.site/paper_pdf/d700ba0e-90fc-11e9-9b23-00163e08… · clustering methods have been proposed, such as the

2 IEEE TRANSACTIONS ON CYBERNETICS

multi-incomplete-view clustering first fills in the missing viewswith the average of all instances in the corresponding views,and then uses the weighted non-negative matrix factorizationtechnique to jointly learn the latent representations of differ-ent views and the consensus representation for all views [25].For these matrix-based methods, the common problem is thatthey only focus on learning the consensus representation whileignoring the intrinsic structure of data, which cannot guar-antee the compactness and discriminability of the learnedrepresentation.

GIMC focuses on learning the low-dimensional representa-tion from different graphs that reveal the relationships of allsamples. Compared with the matrix factorization-based meth-ods, GIMC can effectively exploit the geometric structures ofdata. For GIMC, the graph construction is very crucial becauseit directly influences the clustering performance. However,since some views are missing, it is impossible to constructsuch graphs to connect all samples completely. To addressthe issue, Trivedi et al. [23] proposed to complete the incom-plete graph of the view with missing instances referring tothe Laplacian matrix of the complete view, and then learnedthe low-dimensional representations for different views via thekernel CCA. The biggest shortcoming of this method is that itrequires at least one complete view. Gao et al. [26] proposedto fill in the missing view with the average of instances in thecorresponding view for graph construction and subspace learn-ing. However, when the multiview data has large number of themissing instances in all views, this approach will fail becausethe filled missing views will dominate the subspace learn-ing [25]. Thus, completing the graph by filling in the missingviews is not a good choice for incomplete multiview clustering.Zhao et al. [27] proposed a more novel graph learning methodfor incomplete multiview clustering, in which a robust con-sensus graph is adaptively learned from the low-dimensionalconsensus representation. Different from the above graph-based methods, the method proposed by Zhao et al. does notneed to complete the graph or fill in the missing instances ofany view.

Although many methods have been proposed to address theincomplete problem of multiview clustering, they still havemany problems to address. For example, a limitation of thesemethods is that they require few samples to have features ofall views. In other words, for data with more than three views,these methods fail to deal with the case that no sample con-tains all of the three views. The second limitation is that thesegraph-based methods cannot learn the global optimal consen-sus representation for clustering because the subspace learningand graph construction are treated independently in two sep-arated steps. To solve the above issues, we, in this paper,propose a more general framework for multiview scenarios,which is suitable to all kinds of the multiview data, includ-ing arbitrary incomplete cases and the complete case. Theproposed method aims to jointly learn the low-dimensionalconsensus representation and similarity graphs for all views,which enables our method to obtain the global optimal consen-sus representation and thus has the potential to perform better.To this end, we exploit the low-rank representation techniqueto adaptively learn the similarity graph of each view owing to

its good performance in discovering the intrinsic relationshipsof data [28], [29]. Meanwhile, considering that graphs con-structed from different incomplete views generally have greatdifferences in magnitude, structures, and dimensions owingto the diversity of missing views, it is impossible to directlylearn a unified graph or representation for all views [30].To tackle this problem, we provide an ingenious approach,which indirectly learns the consensus representation from thelow-dimensional representations of all views rather than thegraphs by introducing a co-regularization term. Meanwhile,the proposed method is robust to noise by introducing thesparse error term to compensate the noise. Overall, this paperhas the following contributions.

1) In this paper, we provide a more general multiview clus-tering framework, which can deal with both completeand incomplete multiview cases.

2) The proposed method is a pioneering work that inte-grates the graph construction and consensus represen-tation learning into a joint optimization framework forincomplete multiview clustering. Compared with theother methods, the proposed method is able to learn theoptimal similarity graph of each view and the consen-sus cluster representation for all views and, thus, has thepotential to obtain a better performance.

3) The proposed method is robust to noise to some extent.Especially, by introducing the sparse error term to com-pensate the noise, the proposed method can greatlyreduce the negative influence of noise, which is ben-eficial to discover the intrinsic structure of the noisydata.

The rest of this paper is organized as follows. Section IIbriefly introduces some related works to the proposed method.In Section III, we present the proposed method and itsoptimization algorithm in detail. Section IV gives a deepanalysis to the proposed method. In Section V, several experi-ments are conducted to prove the effectiveness of the proposedmethod. Section VI offers the conclusions of this paper.

II. RELATED WORKS

A. Single-View Spectral Clustering

For a dataset X = [x1, x2, . . . , xn] ∈ Rm×n with n sam-ples, spectral clustering tries to solve the following problemto learn the low-dimensional representation F ∈ Rn×c forclustering [31], [32]:

minFT F=I

Tr(FTLF

)(1)

where Tr(·) denotes the trace operation. L ∈ Rn×n is theLaplacian graph, which is generally calculated as L = D − Win the ratio cut method [33] and L = I − D−1/2WD−1/2

in the normalized cut method [31], where D ∈ Rn×n is adiagonal matrix and its ith diagonal element is calculated asDi,i = ∑

j [(Wi,j + Wj,i)/2], W(W ≥ 0) is the symmetric sim-ilarity graph with non-negative elements constructed from thedata, I is the identity matrix.

Equation (1) is the typical eigenvalue decompositionproblem and its solution is the eigenvector set correspondingto the first c minimum eigenvalues of L. For F, its each row

Page 3: IEEE TRANSACTIONS ON CYBERNETICS 1 Incomplete Multiview ...static.tongtianta.site/paper_pdf/d700ba0e-90fc-11e9-9b23-00163e08… · clustering methods have been proposed, such as the

WEN et al.: IMSC_AGL 3

(a) (b)

Fig. 1. Two types of the incomplete multiview data. (a) Some samples contain the features of all views and the remaining samples contain only one view.(b) Views are arbitrarily missing, including the case that none of samples contain the features of all views.

can be viewed as the new representation of the correspondingoriginal sample. Then spectral clustering utilizes the k-meansalgorithm to partition the new representation F into severalclusters.

B. Multiview Subspace Clustering

Generally, due to the different feature distributions of differ-ent views, graphs constructed from different views will havelarge differences. In this case, it is difficult to find the intrinsicconsensus graphs of multiple views for clustering. Fortunately,it is possible to find the optimal and unique representation forall views. Inspired by this motivation, Gao et al. [30] proposedthe multiview subspace clustering (MVSC), which focuses onlearning a consensus cluster indicator matrix for clustering.For a dataset with k views whose vth view is representedas X(v) = [x(v)1 , x(v)2 , . . . , x(v)n ] ∈ Rm×n, the learning model ofMVSC is designed as follows:

minZ(v),E(v),F

v

∥∥∥X(v) − X(v)Z(v) − E(v)

∥∥∥

2

F

+∑

v

(λ1Tr

(FTL(v)F

)+ λ2

∥∥∥E(v)∥∥∥

1

)

s.t. Z(v)T1 = 1,Z(v)i.i = 0,FTF = I (2)

where 1 is a column vector with all ones, Z(v)i.i = 0 meansthat all diagonal elements of matrix Z(v) are 0, λ1 and λ2are penalty parameters. L(v) is the Laplacian graph of the vthview and is calculated as L(v) = D(v) − W(v), where W(v) =([|Z(v)| + |Z(v)|T ]/2) and D(v)i,i = ∑

j W(v)i,j . E(v) is the error

matrix used to model different noises. In (2), ||·||F and ||·||1 arethe “Frobenius” norm and “l1” norm, respectively [34], [35].Tr(·) denotes the trace operator.

MVSC integrates the graph construction and low-dimensional representation learning into a joint learningframework, which enables it to find the global optimalindicator matrix for clustering. By alternatively solving allvariables, MVSC can find the local optimum of the con-sensus representation F. Then MVSC performs k-means onthe consensus representation to obtain the final clusteringresults.

III. PROPOSED METHOD

Fig. 1 shows two cases of the incomplete multiview data. Inthe first case, only few samples contain features of all viewsand the other samples contain only one view. In the secondcase, all views are arbitrarily missing. It includes the case thatno samples contain the features of all views. For the first case,many methods, such as PMVC [3] and incomplete multimodelgrouping (IMG) [27], have been proposed based on the matrixfactorization. However, these methods fail to deal with the sec-ond incomplete case. To address this issue, in this section, wewill propose a more general incomplete multiview clusteringframework that can handle all kinds of incomplete cases.

A. Incomplete Multiview Spectral Clustering With AdaptiveGraph Learning

From the previous presentation, we can find that all con-ventional graph-based multiview clustering methods, includingMVSC, learn the subspace based on the similarity graph ofeach view. These methods require that graphs constructed fromall views are complete, which not only can reveal the simi-larity relationships of all samples but also have the same sizeof n × n for the multiview data with n samples. However, asshown in Fig. 1, since all views are incomplete, graphs con-structed from these incomplete views will have different sizesand also cannot reveal the relationships of all samples, whichleads to the failure of the existing GIMC methods.

Some researchers propose to fill in the missing views withthe average of instances of the corresponding view, and thenconstruct the similarity graph of each view independently [26].Although this approach can make the graphs of different viewshave the same size, it cannot produce the correct representationfor each view and thus is harmful to the multiview clustering.This is mainly because that these missing instances will beregarded as the same class and be connected with the sameweight (weight of 1 in the binary nearest neighbor graph),which in turn pulls these missing instances together in thelow-dimensional subspace whether they are from the samecluster or not. Therefore, this graph completion approach isunreasonable for incomplete multiview clustering, especiallyfor the case with large number of missing views. A more rea-sonable way to avoid this issue is to set the connected weights

Page 4: IEEE TRANSACTIONS ON CYBERNETICS 1 Incomplete Multiview ...static.tongtianta.site/paper_pdf/d700ba0e-90fc-11e9-9b23-00163e08… · clustering methods have been proposed, such as the

4 IEEE TRANSACTIONS ON CYBERNETICS

Fig. 2. Framework of the proposed incomplete multiview clustering method. The proposed method focuses on learning a consensus representation U frommultiple views for clustering.

corresponding to these missing instances as 0 in the similar-ity graph of the corresponding view. In this way, the uncertainsimilarity information corresponding to the missing views willnot play a negative role in learning the data-cluster represen-tation. In contrast, only the authentic similarity information ofthe available instances are exploited to guide the representa-tion learning, which is beneficial to achieve a more reliabledata-cluster representation and reduces the negative influenceof the missing views. Based on the above analysis, we rewriteMVSC as follows for incomplete multiview learning:

minZ(v),E(v),F

v

∥∥∥Y(v) − Y(v)Z(v) − E(v)∥∥∥

2

F

+∑

v

(λ1Tr

(FTL̄(v)F

)+ λ2

∥∥∥E(v)∥∥∥

1

)

s.t. Z(v)T1 = 1,Z(v)i.i = 0,FT F = I (3)

where Y(v) = [y(v)1 , y(v)2 , . . . , y(v)nv ] ∈ Rmv×nv denotes theset of the un-missing instances in the vth view, mv and nv

are the number of the features and un-missing instances ofthe vth view, respectively. The original instance sets includ-ing the missing and un-missing instances in the vth vieware still represented as X(v) = [x(v)1 , x(v)2 , . . . , x(v)n ] ∈ Rmv×n

(nv < n). L̄(v) = D̄(v) − W̄(v), W̄(v) = [(|Z̄(v)| + |Z̄(v)|T)/2],D̄(v)i.i = ∑

j W̄(v)i,j . Z̄(v) ∈ Rn×n is the complete graph that

connects all instances including the missing and un-missinginstances of the vth view. In our method, we can exploit thefollowing formula to obtain the completed graph Z̄(v) based onthe graph Z(v) ∈ Rnv×nv learned from the un-missing instances:

Z̄(v) = G(v)TZ(v)G(v) (4)

where G(v) ∈ Rnv×n is an index matrix used to complete thegraph, whose elements related to the missing instances areenforced to zero. Especially, matrix G(v) is defined as follows:

G(v)i.j ={

1, if y(v)i is the original instance x(v)j0, otherwise.

(5)

It is simple to prove that the following equation is satisfied:

L̄(v)=G(v)TL(v)G(v) (6)

where L(v) is the Laplacian matrix of graph Z(v).Considering that the data is generally drawn from several

low-rank subspaces, thus the learned graph should better dis-cover the low-rank structure of data [28]. Moreover, inspiredby the motivation that learning a non-negative graph is ben-eficial to improve the clustering performance and make thelearned graph more interpretable [36], we rewrite model (3)as follows to learn multiple non-negative graphs for multiviewsubspace learning:

minZ(v),E(v),F

v

(∥∥∥Z(v)

∥∥∥∗ + λ2

∥∥∥E(v)

∥∥∥

1

)

+ λ1

v

Tr(

FTG(v)TL(v)G(v)F)

s.t. Y(v) = Y(v)Z(v) + E(v),Z(v)1 = 1

0 ≤ Z(v) ≤ 1,Z(v)i.i = 0,FTF = I (7)

where ||Z(v)||∗ is the nuclear norm of matrix Z(v), which iscalculated as the summation of all singular values of matrixZ(v) [37]–[39]. Constraint Z(v)1 = 1 is used to avoid that anysample has no contribution in the joint representation, 1 is acolumn vector with all elements as 1 [40]. In model (7), F isan n × c matrix whose each row denotes the representation ofthe corresponding sample.

In model (7), the third term is equivalent to(1/2)

∑nj=1

∑ni=1 (‖Fi,: − Fj,:‖2

2

∑v W̄(v)

i,j ). This demon-strates that the regularization weights to the target clusterindicator matrix F are the summation of all similarityweights of multiple graphs. For complete data, this approachmay be effective to learn the optimal cluster indicator foreach sample. However, for incomplete multiview data, thisapproach will fail because the regularization weights to themissing instances and un-missing instances are too unfair.It may occur the case that through the summation of theweights of multiple graphs, the regularization weights of

Page 5: IEEE TRANSACTIONS ON CYBERNETICS 1 Incomplete Multiview ...static.tongtianta.site/paper_pdf/d700ba0e-90fc-11e9-9b23-00163e08… · clustering methods have been proposed, such as the

WEN et al.: IMSC_AGL 5

samples from different classes may larger than those of thesamples from the same class, which will lead to the incorrectcluster indicators of samples. To solve this problem, wepropose to learn the consensus representation from thosecluster indicator matrices of all views as follows:

minZ(v),E(v),F

v

(∥∥∥Z(v)

∥∥∥∗ + λ1Tr

(F(v)TG(v)TL(v)G(v)F(v)

))

+∑

v

(λ2

∥∥∥E(v)∥∥∥

1+ λ3

2�(

F(v),U))

s.t. Y(v) = Y(v)Z(v) + E(v),Z(v)1 = 1

0 ≤ Z(v) ≤ 1,Z(v)i.i = 0,F(v)TF(v) = I,UTU = I (8)

where λ3 is also a penalty parameter, matrix U ∈ Rn×c is thetarget cluster indicator matrix (or consensus representation) tolearn. Function �(F(v),U) measures the disagreement of theconsensus representation U and representation F(v). �(F(v),U)is defined as follows [19]:

�(

F(v),U)=∥∥∥∥∥

KU

‖KU‖2F

− KF(v)∥∥KF(v)∥∥2

F

∥∥∥∥∥

2

F

(9)

where KU and KF(v) are the similarity matrices of U and F(v),respectively. For simplicity, we also choose the linear kernel,i.e., KU=UUT as the similarity measure metric [19]. Basedon the facts that ‖KU‖2

F = c and ‖KF(v)‖2F = c, (9) can be

rewritten as follows:

�(

F(v),U)=2

(c − Tr

(F(v)F(v)TUUT

))

c2. (10)

As a result, our final model is expressed as follows:

minZ(v),E(v),F(v),U

v

(∥∥∥Z(v)∥∥∥∗ + λ3

(c − Tr

(F(v)F(v)TUUT

)))

+∑

v

λ1Tr(

F(v)TG(v)TL(v)G(v)F(v))

+∑

v

λ2

∥∥∥E(v)

∥∥∥

1

s.t. Y(v) = Y(v)Z(v) + E(v),Z(v)1 = 1

0 ≤ Z(v) ≤ 1,Z(v)i.i = 0,F(v)TF(v) = I,UTU = I. (11)

Since the proposed IMC framework is based on the adap-tive graph learning and spectral clustering, we refer to theproposed method as incomplete multiview spectral clusteringwith adaptive graph learning (IMSC_AGL). The framework ofour method is briefly outlined in Fig. 2.

B. Solution to IMSC_AGL

For the optimization problem (11), we choose the alternatingdirection method of multipliers (ADMMs) to calculate its localoptimal solution [41]. At the beginning, we introduce severalvariables to make problem (11) separable as follows:

minZ(v),E(v),F(v),U,S(v),W(v)

v

(∥∥∥S(v)∥∥∥∗ − λ3Tr

(F(v)F(v)TUUT

))

+∑

v

λ1Tr(

F(v)TG(v)TL(v)w G(v)F(v))

+∑

v

λ2

∥∥∥E(v)∥∥∥

1

s.t. Y(v) = Y(v)Z(v) + E(v),Z(v) = S(v),Z(v) = W(v)

W(v)1 = 1, 0 ≤ W(v) ≤ 1,W(v)i.i = 0,F(v)TF(v) = I,UTU = I

(12)

where L(v)w denotes the Laplacian graph of matrix W(v). Note:Since c is a constant, we can ignore it in (11). The augmentedLagrangian function of (12) is formulated as follows:

L =∑

v

(∥∥∥S(v)∥∥∥∗ + λ1Tr

(F(v)TG(v)TL(v)w G(v)F(v)

))

−∑

v

λ3Tr(

F(v)F(v)TUUT)

+∑

v

λ2

∥∥∥E(v)

∥∥∥

1

+ μ

2

v

∥∥∥∥∥Z(v) − S(v) + C(v)2

μ

∥∥∥∥∥

2

F

+∥∥∥∥∥

Z(v) − W(v) + C(v)3

μ

∥∥∥∥∥

2

F

+ μ

2

v

∥∥∥∥∥Y(v) − Y(v)Z(v) − E(v) + C(v)1

μ

∥∥∥∥∥

2

F

−∑

v

ψ(v)T(

W(v)1 − 1)

(13)

where matrices C(v)1 , C(v)2 , and C(v)3 , and vector ψ(v) areLagrange multipliers, μ is a penalty parameter.

Then we can iteratively solve all unknown variables one-by-one as follows.

Step 1 (Update Variable Z(v)): Fixing all of the othervariables, the problem to solve variable Z(v) is degraded tominimize the following problem:

L(

Z(v))

=∥∥∥∥∥

Z(v) − S(v) + C(v)2

μ

∥∥∥∥∥

2

F

+∥∥∥∥∥

Z(v) − W(v) + C(v)3

μ

∥∥∥∥∥

2

F

+∥∥∥∥∥

Y(v) − Y(v)Z(v) − E(v) + C(v)1

μ

∥∥∥∥∥

2

F

. (14)

Then we can obtain variable Z(v) by setting the derivativeof L(Z(v)) with respect to Z(v) to zero as follows:

∂L(Z(v)

)

∂Z(v)= 2Z(v) − 2M(v)

1 + 2Z(v) − 2M(v)2

+ 2Y(v)T(

Y(v)Z(v) − M(v)3

)= 0

⇔ Z(v)=(

Y(v)T Y(v)+2I)−1(

Y(v)T M(v)3 + M(v)

1 + M(v)2

)

(15)

where M(v)1 = S(v) − (C(v)2 /μ), M(v)

2 = W(v) − (C(v)3 /μ), andM(v)

3 = Y(v) − E(v) + (C(v)1 /μ).Step 2 (Update Variable S(v)): Fixing the other variables,

the subproblem to calculate variable S(v) is degraded to thefollowing formula:

minS(v)

∥∥∥S(v)

∥∥∥∗ + μ

2

∥∥∥∥∥

Z(v) − S(v) + C(v)2

μ

∥∥∥∥∥

2

F

. (16)

Equation (16) can be computed by the singular valuethresholding (SVT) shrinkage operator as follows [42]–[44]:

S(v) = �1/μ

(

Z(v) + C(v)2

μ

)

(17)

where � denotes the SVT shrinkage operator.

Page 6: IEEE TRANSACTIONS ON CYBERNETICS 1 Incomplete Multiview ...static.tongtianta.site/paper_pdf/d700ba0e-90fc-11e9-9b23-00163e08… · clustering methods have been proposed, such as the

6 IEEE TRANSACTIONS ON CYBERNETICS

Step 3 (Update Variables W(v) and ψ (v)): Fixing the othervariables, the optimization problem to calculate variable W(v)

is formulated as follows:

minW(v)≥0,W(v)

i.i =0λ1Tr

(F(v)TG(v)TL(v)w G(v)F(v)

)

+ μ

2

∥∥∥∥∥Z(v) − W(v) + C(v)3

μ

∥∥∥∥∥

2

F

− ψ(v)T(

W(v)1 − 1). (18)

Define P(v) = G(v)F(v), R(v) = Z(v) + (C(v)3 /μ), (18) isequivalent to the following optimization problem:

minW(v)≥0,W(v)

i.i =0

λ1

2

nv∑

i,j

∥∥∥P(v)i,: − P(v)j,:

∥∥∥

2

2W(v)

i,j

+ μ

2

∥∥∥W(v) − R(v)∥∥∥

2

F− ψ(v)T

(W(v)1 − 1

)(19)

where P(v)i,: and P(v)j,: represent the ith and jth row vectors ofmatrix P(v), respectively. It is obvious to see that problem (19)is independent to each row. Define H(v)

i,j = ‖P(v)i,: − P(v)j,: ‖22, then

we can simplify problem (19) into the following problem:

minW(v)

i,: ≥0,W(v)i.i =0

λ1

2W(v)

i,: H(v)Ti,: + μ

2

∥∥∥W(v)

i,: − R(v)i,:

∥∥∥

2

2

− ψ(v)i

(W(v)

i,: 1 − 1)

⇔ minW(v)

i,: ≥0,W(v)i.i =0

μ

2

∥∥∥∥W(v)

i,: −(

R(v)i,: − λ1

2μH(v)

i,:

)∥∥∥∥

2

2

− ψ(v)i

(W(v)

i,: 1 − 1)

(20)

where ψ(v)i is the ith element of vector ψ(v). The optimalsolution to problem (20) is as follows [45]:

W(v)i,j =

{0, j = i

R(v)i,j − λ12μH(v)

i,j + ψ(v)iμ, otherwise.

(21)

Then we further enforce all elements of matrix W(v) to benot less than 0 by W(v)= max(W(v), 0), in which all elementsless than 0 in W(v) are enforced to 0, and the remaining ele-ments are preserved. According to constraint W(v)

i,: 1 = 1, we

can obtain that the Lagrange multiplier ψ(v)i is updated asfollows:

ψ(v)i = μ

⎝1 −nv∑

j=1,j =i

(R(v)i,j − λ1

2μH(v)

i,j

)⎞

/

(nv − 1). (22)

Step 4 (Update Variable E(v)): Fixing the other variables,the subproblem to solve variable E(v) is as follows:

minE(v)

λ2

∥∥∥E(v)∥∥∥

1+ μ

2

∥∥∥∥∥Y(v) − Y(v)Z(v) − E(v) + C(v)1

μ

∥∥∥∥∥

2

F

. (23)

Problem (23) is a typical sparsity constraint optimizationproblem and has the following closed-form solution [46]–[48]:

E(v) = ϑλ2/μ

(

Y(v) − Y(v)Z(v) + C(v)1

μ

)

(24)

where ϑ denotes the shrinkage operator.

Step 5 (Update Variable F(v)): Fixing the other variables,the cluster indicator matrix F(v) of each view can be calculatedby minimizing the following formula:

minF(v)T F(v)=I

λ1Tr(

F(v)TG(v)TL(v)w G(v)F(v))

− λ3Tr(

F(v)F(v)TUUT)

⇔ maxF(v)T F(v)=I

Tr(

F(v)T(λ3UUT − λ1G(v)TL(v)w G(v)

)F(v)

).

(25)

Problem (25) can be solved by the eigenvalue decomposi-tion, where the first c eigenvectors corresponding to the first clargest eigenvalues of matrix (λ3UUT − λ1G(v)TL(v)w G(v)) arechosen as the optimal solution to variable F(v).

Step 6 (Update Variable U): Fixing the other variables, theproblem to obtain the consensus cluster representation U isdegraded to the following problem:

minUT U=I

−∑

v

λ3Tr(

F(v)F(v)TUUT)

⇔ maxUT U=I

Tr

(

UT

(∑

v

F(v)F(v)T)

U

)

. (26)

Problem (26) can also be simply computed by the eigen-value decomposition. The optimal solution to variable U is theeigenvector set corresponding to the first c largest eigenvaluesof matrix (

∑v F(v)F(v)T).

Step 7 (Update Variables C(v)1 , C(v)

2 , C(v)3 , and μ): These

four variables are updated as follows, respectively,

C(v)1 = C(v)1 + μ(

Y(v) − Y(v)Z(v) − E(v))

(27)

C(v)2 = C(v)2 + μ(

Z(v) − S(v))

(28)

C(v)3 = C(v)3 + μ(

Z(v) − W(v))

(29)

μ = min(ρμ,μ0) (30)

where ρ and μ0 are constants.Algorithm 1 summarizes the computation procedures

presented above. After obtaining the consensus representationU, we perform the k-means algorithm on it to obtain the finalclustering results.

IV. ANALYSIS OF THE PROPOSED METHOD

A. Computational Complexity

For simplicity, we do not take into account the computa-tional costs of matrix multiplication, elements-based matrixdivision, matrix addition and subtraction, etc., since these oper-ations are very simple in comparison with the other matrixoperations. For the proposed algorithm listed in Algorithm 1,the major computational costs are the operations like matrixinverse, singular value decomposition, and eigenvalue decom-position. Generally, the computational complexities of theabove three operations are O(n3) for an n × n matrix, O(mn2)

for an m × n matrix, and O(n3) for an n × n matrix, respec-tively. Therefore, the computational complexities of steps 2,5, and 6 are about O(n3

v), O(n3), and O(n3), respectively. Forstep 1, although it needs to calculate the inverse operation,

Page 7: IEEE TRANSACTIONS ON CYBERNETICS 1 Incomplete Multiview ...static.tongtianta.site/paper_pdf/d700ba0e-90fc-11e9-9b23-00163e08… · clustering methods have been proposed, such as the

WEN et al.: IMSC_AGL 7

Algorithm 1 IMSC_AGL [Solving (12)]

Input: Incomplete multi-view data Y(v), index matrix G(v),v ∈ [1, k], parameters λ1,λ2, and λ3.Initialization: Initialize Z(v) with the k-nearest neighborgraph of each view; initialize F(v) via the eigenvalue decom-position on the Laplacian graph of each view; using (26)to initialize the cluster matrix U, μ0 = 108, ρ = 1.1,μ = 0.01.while not converged do

for v from 1 to k1. Update variable Z(v) via (15);2. Update variable S(v) via (17);3. Update variables ψ(v) and W(v) via (22) and (21),

respectively, and then implement W(v)= max(W(v), 0

);

4. Update variable E(v) via (24);5. Update variable F(v) by solving (25);6. Update variables C(v)1 , C(v)2 , and C(v)3 via (27),

(28), and (29), respectively.end7. Update U by solving (26);8. Update μ via (30).

end whileOutput: U

we can still ignore its computational complexity because theinverse operation about (Y(v)TY(v)+2I)−1 can be precomputedbefore the iteration. For the remaining steps, their compu-tational complexities can also be ignored since they onlycontain the basic matrix operations. Therefore, the wholecomputational complexity of the proposed method is aboutO(τ (kn3 + n3 +∑

v n3v)), where τ denotes the iteration num-

ber, k denotes the number of views, and nv denotes the numberof un-missing instances in the vth view.

B. Convergence Analysis

For the ADMM-style optimization approach, it is difficultto prove its strong convergence property with more than twounknown variables. Fortunately, we can prove a weak conver-gence property of the proposed method based on the followingtheorem [42], [49].

Theorem 1: Let the solution of the optimization problem(12) at the t-th iteration step be ϒt = (Z(v)t , S(v)t , W(v)

t ,

E(v)t , F(v)t , Ut, (C(v)1 )t, (C

(v)2 )t, (C

(v)3 )t, ψ

(v)t ), v = [1, k].

If the sequence solutions {ϒt}∞t=1 of problem (12) arebounded and satisfy condition limt→∞(ϒt+1 − ϒt) = 0, thenwe can conclude that the accumulated point of sequence{ϒt}∞t=1 is a Karush–Kuhn–Tucker (KKT) point of problem(12). Whenever {ϒt}∞t=1 converges, it converges to the KKTpoint.

Proof: The detailed proof of Theorem 1 is moved to thesupplementary material.

Theorem 1 provides some assurances to the convergenceproperty of the proposed algorithm. In the subsequent section,we will conduct several experiments to further prove it.

V. EXPERIMENTS AND ANALYSES

In this section, we conduct experiments to prove the effec-tiveness of the proposed method. The MATLAB code of ourIMSC_AGL is released at: https://sites.google.com/view/jerry-wen-hit/publications.

A. Experimental Settings

1) Baseline Algorithms and Evaluation Metrics: The fol-lowing six methods are selected to compare with the proposedmethod.

1) Best Single View (BSV) [27]: For BSV, the missinginstances in every view are first filled in the average ofinstances in the corresponding view. Then it performsk-means on all views independently and reports theirbest clustering results.

2) Concat [27]: Concat adopts the same approach as BSVto fill in the missing instances. Their difference is thatit concatenates all views into a single view with longdimensions, and then performs k-means to obtain theclustering results.

3) PMVC [3]: Based on the non-negative matrix factor-ization, PMVC learns a common latent representationfor all views, then performs k-means on the learnedrepresentation to obtain the clustering results.

4) IMG [27]: IMG also uses the matrix factorizationtechnique to learn a common latent representation forall views. Different with PMVC, IMG simultaneouslylearns a graph from the common representation andfinally exploits the spectral clustering algorithm to obtainthe clustering results.

5) Double Constrained Non-Negative Matrix Factorization(DCNMF) [50]: DCNMF uses local geometric structureof each view to guide the representation learning.

6) Graph Regularized PMVC (GPMVC) [51]: GPMVC canbe viewed as a variant of DCNMF, which learns thecommon representation from the normalized individualrepresentations of all views.

IMG, DCNMF, and GPMVC have many tunable parameters(more than two parameters). In our experiments, we imple-ment these methods with a wide candidate parameter set,such as {10−4, 10−3, 10−2, 10−1, 100, 101, 102, 103, 104} andreport their best clustering results for fair comparison.

There are many criterions to evaluate the clustering per-formances of different methods [52]. In our experiments, wechoose three most well-known evaluation criterions, i.e., clus-tering accuracy (ACC), normalized mutual information (NMI),and purity to compare the above methods [17]. Generally,we expect the values of these evaluation criterions as big aspossible.

2) Databases: Seven real-world databases listed in Table Iare adopted for evaluation. Their detailed information are asfollows.

1) Handwritten Digit Database [53]: Following the exper-imental settings in [53], we conduct the experiments on themultiview handwritten database1 with 2000 samples in total.

1https://www.dropbox.com/s/b00cdxc2s96o1to/handwritten.mat

Page 8: IEEE TRANSACTIONS ON CYBERNETICS 1 Incomplete Multiview ...static.tongtianta.site/paper_pdf/d700ba0e-90fc-11e9-9b23-00163e08… · clustering methods have been proposed, such as the

8 IEEE TRANSACTIONS ON CYBERNETICS

TABLE IDESCRIPTION OF THE USED BENCHMARK DATASETS

There are ten digits, i.e., 0–9, in the used handwritten digitdatabase. In our experiments, we only use two views, i.e.,pixel average features and Fourier coefficient features, to con-duct experiments, in which the pixel average view has 240features per sample and the Fourier coefficient view contains76 features for one sample.

2) BUAA-VisNir Face Database (BUAA) [54]: Following theexperimental settings in [27], we implement the experimentson the subset of the BUAA face database, which contains90 visual images and 90 near infrared images of the first10 classes. The used BUAA face database is available athttps://github.com/hdzhao/IMG/tree/master/data. For the usedBUAA face database, two types of images, i.e., visual imagesand near infrared images are regarded as the two views ofpersons. All images were resized into a 10 × 10 matrix andvectorized in advance.

3) Cornell Database [2], [55]: Cornell database is one ofthe popular WebKB databases,2 which is composed of 195documents over the five labels, i.e., student, project, course,staff, and faculty. Each document is described by two views,i.e., 1703 content features and 195 citation features.

4) Caltech101 Database [56]: The Caltech101 database isone of the popular object databases which contains 101 objectsin total. Each object provides 40–800 images. In our exper-iments, a multiview subset which contains seven classes and1474 images is adopted [53]. For convenience, we refer to thesubset as Caltech7.3 The original multiview Caltech7 databasecontains five types of features. In our experiments, we onlyselect two views to implement the experiments, in which theone is GIST features [57] with 512 dimensions per sample andthe other one is the LBPs features [57] with 928 dimensionsper sample.

5) ORL Database:4 The ORL face database is composed of400 faces taken from 40 individuals. In the experiments, wefirst preresized all images into the size of 32 × 32, and thenextract the features of LBP, GIST, and pyramid of histogram oforiented gradients [58]. Combining the original pixel featuresof each image, we form the multiview ORL dataset with fourviews, in which their feature dimensions are 1024, 512, 1024,and 1024, respectively.

6) 3 Sources Database:5 This database contains 948 newsarticles collected from three online news sources: 1) BBC;

2http://lig-membres.imag.fr/grimal/data.html3https://www.dropbox.com/s/ulvatoo8gepcdfk/Caltech101-7.mat4The original ORL face database is available at:

http://www.cl.cam.ac.uk/research/dtg/attarchive/facedatabase.html.5http://erdos.ucd.ie/datasets/3sources.html

2) Reuters; and 3) The Guardian. In our experiments, weselect a subset which contains 169 stories reported in allof the three sources to compare different methods. The 169stories were categorized into six topical labels: 1) busi-ness; 2) entertainment; 3) health; 4) politics; 5) sport; and6) technology.

7) BBCSport Database [59]: The original BBCSportdatabase contains 737 documents about the sport news arti-cles collected from the BBC Sport website. These documentsare described by 2–4 views and categorized into five classes.In our experiments, we choose a subset6 with 116 samplesdescribed by all of the four views to validate the effectivenessof our method.

3) Incomplete Multiview Data Construction: In our exper-iments, we construct two types of incomplete multiviewdata.

1) Incomplete Case That Few Samples Have CompleteViews: Following the experimental settings in [27], forhandwritten, BUAA, Cornell, and Caltech7 databases,we randomly select 10%, 30%, 50%, 70%, and 90%samples from the database as the paired samples. Forthe half of corresponding remaining samples, we removetheir first view, and for the other half of samples,we remove their second view to form the incompletemultiview scenarios. Similarly, for the ORL multiviewdatabase, we also randomly select 10%, 30%, and 50%samples as the paired samples, and then we followthe previous strategy to randomly select 25%, 25%,25%, and 25% samples of the corresponding remainingsamples as the single view samples.

2) Incomplete Case That All Samples Have Missing Views:In our experiments, we exploit the ORL, 3 Sources,and BBCSport databases to construct the incompletemultiview data with no paired samples, where about53%, 55%, and 55% instances are randomly removedfrom each view of the three databases, respectively. Forfairness, we repeatedly perform all compared methodsfive times on these databases and report their averageclustering results.

B. Experimental Results and Analysis

Experimental results of different methods on the above twotypes of incomplete multiview databases are enumerated inTables II–VII and shown in Fig. 3, respectively. From theexperimental results, we can obtain the following points.

1) In most cases, Concat and BSV perform the first worstand second worst in comparison with the other meth-ods. This demonstrates that concatenating all views intoone long view is not a good approach in dealing withthe multiview clustering tasks. This is mainly becausethat Concat not only ignores the differences of differentviews in the feature scales and distributions but also can-not exploit the complementary information across differ-ent views. While for BSV, the missing instances in eachview will obviously be clustered into the same group

6https://github.com/GPMVCDummy/GPMVC/tree/master/partialMV/PVC/recreateResults/data

Page 9: IEEE TRANSACTIONS ON CYBERNETICS 1 Incomplete Multiview ...static.tongtianta.site/paper_pdf/d700ba0e-90fc-11e9-9b23-00163e08… · clustering methods have been proposed, such as the

WEN et al.: IMSC_AGL 9

TABLE IIMEAN NMIS (%) AND ACCS (%) OF DIFFERENT METHODS ON THE HANDWRITTEN DIGIT DATABASE. BOLD NUMBERS DENOTE THE BEST RESULT

TABLE IIIMEAN NMIS (%) AND ACCS (%) OF DIFFERENT METHODS ON THE BUAA DATABASE. BOLD NUMBERS DENOTE THE BEST RESULT

TABLE IVMEAN NMIS (%) AND ACCS (%) OF DIFFERENT METHODS ON THE CORNELL DATABASE. BOLD NUMBERS DENOTE THE BEST RESULT

TABLE VMEAN NMIS (%) AND ACCS (%) OF DIFFERENT METHODS ON THE CALTECH7 DATABASE. BOLD NUMBERS DENOTE THE BEST RESULT

since they are filled in the same average instance, whichmakes BSV achieve very bad performance especially forthe case with large number of missing instances in everyview. The experimental results of Concat and BSV inthe tables and figures also prove that filling in the miss-ing instances with the corresponding average instancesis not a good choice to address the incomplete problemof multiview clustering.

2) From Tables II–V and Fig. 3, we can find that PMVC,IMG, DCNMF, GPMVC, and the proposed method canachieve much better performance than BSV and Concatin most cases, which proves that exploiting the com-plementary information of multiview features to learna common representation is an effective approach indealing with the incomplete problem.

3) From Tables II–V and Fig. 3, we can find that DCNMFand GPMVC perform better than PMVC on the hand-written digit database and BUAA database, while per-form worse than PMVC in some cases especially on theCaltech7 and ORL databases. Compared with PMVC,DCNMF, and GPMVC all try to exploit the geometricstructure of each view to guide the common representa-tion learning. Thus, these experimental results indicatethat exploiting the intrinsic geometric structure of datahas the potential to learn a more discriminative and com-pact common representation for clustering. However, ifthe constructed geometric structure is not the intrinsicstructure, it will lead to the opposite effect. Thus, cap-ture the intrinsic structure of each view is crucial forthese methods.

Page 10: IEEE TRANSACTIONS ON CYBERNETICS 1 Incomplete Multiview ...static.tongtianta.site/paper_pdf/d700ba0e-90fc-11e9-9b23-00163e08… · clustering methods have been proposed, such as the

10 IEEE TRANSACTIONS ON CYBERNETICS

(a) (b) (c) (d)

Fig. 3. Purities (%) of different methods on the (a) handwritten digit database, (b) BUAA database, (c) Cornell database, and (d) Caltech7 database withdifferent rates of paired samples.

TABLE VIMEAN NMIS (%), ACCS (%), AND PURITIES (%) OF DIFFERENT

METHODS ON THE FIRST INCOMPLETE CASE OF THE ORL DATABASE.BOLD NUMBERS DENOTE THE BEST RESULT

TABLE VIIMEAN NMIS (%), ACCS (%), AND PURITIES (%) OF DIFFERENT

METHODS ON THE SECOND INCOMPLETE MULTIVIEW DATABASES.BOLD NUMBERS DENOTE THE BEST RESULT

4) The proposed method significantly outperforms the othermethods on the above multiview databases with all kindsof incomplete cases. For instance, on the handwrittendigit database (Table II), the proposed method achieves3% and 6% improvement of ACC and NMI in com-parison with the second best method, i.e., DCNMF,respectively. In Table VII, the NMI, ACC, and purityof the proposed method are about 33%, 26%, and31% higher than those of BSV and Concat on theBBCSport database. These good experimental resultsstrongly prove the effectiveness of the proposed methodin dealing with all kinds of the incomplete multiviewclustering tasks.

C. Analysis of the Parameter Sensitivity

In this section, we mainly focus on analyzing thesensitivity of the three tunable parameters, i.e., λ1, λ2,and λ3, in model (11). We first define a candidate set,i.e., {10−5, 10−4, 10−3, 10−1, 1, 10, 102, 103, 104, 105}, forthe three parameters, and then perform the proposed methodwith different combinations of the three parameters [60]. InFigs. 4 and 5, we show the relationships of NMI (%) and thethree parameters on the ORL and BUAA face datasets with50% and 70% paired samples, respectively. From these figures,it is obvious that the proposed method can obtain the stableand satisfied NMIs when the three parameters are located in

(a) (b)

Fig. 4. NMI (%) versus (a) parameters λ1 and λ2 by fixing parameter λ3 and(b) parameter λ3 by fixing parameters λ1 and λ2, on the ORL face datasetwith 50% paired samples.

(a) (b)

Fig. 5. NMI (%) versus (a) parameters λ1 and λ2 by fixing parameter λ3and (b) parameter λ3 by fixing parameters λ1 and λ2, on the BUAA facedataset with 70% paired samples.

some feasible areas. For example, on the ORL dataset, whenparameters λ1, λ2, and λ3 locate in the range of [10−5, 10−2],[101, 105], and [10−1, 102], respectively, a satisfactory clus-tering performance can be obtained. This demonstrates thatthe proposed method is insensitive to the three parameters tosome extent.

For different databases, it is still an open problem to adap-tively select the optimal values for these parameters to ourbest knowledge. In this section, we provide a simple strategyto find the optimal combination of the three parameters forexperiments. From Figs. 4 and 5, we can find that the proposedmethod is relatively insensitive to the selection of parameterλ1 in the small parameter range [10−5, 10−2] to some extent,thus we can set λ1 with a fixed value like 10−3 at first, and thenfocus on finding the best combination of parameters λ2 and λ3.According to the experimental results shown in Figs. 4 and 5,we define two candidate sets {101, 102, 103, 104, 105} and{10−2, 10−1, 100, 101, 102} for parameters λ2 and λ3, respec-tively. Then perform the proposed method with different values

Page 11: IEEE TRANSACTIONS ON CYBERNETICS 1 Incomplete Multiview ...static.tongtianta.site/paper_pdf/d700ba0e-90fc-11e9-9b23-00163e08… · clustering methods have been proposed, such as the

WEN et al.: IMSC_AGL 11

(a) (b)

Fig. 6. Objective function value versus the iteration step of the proposedmethod on the (a) BUAA face and (b) ORL face databases, in which 70%and 50% samples are randomly selected as the paired samples.

of the two parameters. In this way, we can find the best com-bination of parameters λ2 and λ3 in the 2-D space formedby their candidate parameters. To obtain the optimal value ofparameter λ1, we can fix λ2 and λ3 with the obtained optimalcombination and then perform the proposed method with dif-ferent values of λ1. As a result, the optimal combination ofthese three values can be achieved. Then we use the selectedparameters to conduct experiments and report the results forcomparison.

D. Convergence Analysis Based on Experiments

In this section, two experiments are conduct to prove theconvergence property of the proposed optimization approachlisted in Algorithm 1. In Fig. 6, we show the objective functionvalue versus the iteration steps, in which the objective functionvalue is calculated as obj = ∑

v (γ(v)1 + γ

(v)2 )/

∑v ‖X(v)‖2

Faccording to the original model (11), whereγ(v)1 = ‖Z(v)‖∗ + λ1Tr(F(v)TG(v)TL(v)w G(v)F(v)) +λ2‖E(v)‖1 + λ3(c − Tr(F(v)F(v)TUUT)) and γ

(v)2 =

‖Y(v) − Y(v)Z(v) − E(v)‖2F + ‖Z(v) − S(v)‖2

F + ‖Z(v) − W(v)‖2F .

From Fig. 6, it is obvious that the objective function curveis monotonically decreasing till to the stable level, whichdemonstrates that our provided optimization approach mono-tonically decreases the objective problem. This also provesthat the proposed method will converge to the local optimalpoint after a few iterations. Moreover, from the objectivefunction value, we can find that the proposed method hasvery promising convergence efficiency.

VI. CONCLUSION

In this paper, a novel incomplete multiview clusteringframework is proposed. Different from the conventional meth-ods that exploit the matrix factorization technique for incom-plete multiview clustering, the proposed method is the firstone that integrates the spectral clustering and adaptive graphlearning technique to tackle this problem. Compared with theconventional methods, the proposed method is more flexiblesince it is able to handle all kinds of incomplete cases. We haveconducted several experiments on the two types of incom-plete multiview data. Experimental results commonly showthat the proposed method can achieve a better performancethan some state-of-the-art methods, which proves its effec-tiveness in dealing with the incomplete multiview clusteringtasks.

REFERENCES

[1] X. Yuan, J. Yu, Z. Qin, and T. Wan, “A SIFT-LBP image retrieval modelbased on bag of features,” in Proc. IEEE Int. Conf. Image Process.,2011, p. 1.

[2] A. Blum and T. Mitchell, “Combining labeled and unlabeled datawith co-training,” in Proc. Annu. Conf. Comput. Learn. Theory, 1998,pp. 92–100.

[3] S.-Y. Zhi, Y. Jiang, and Z.-H. Zhou, “Partial multi-view clustering,” inProc. AAAI Conf. Artif. Intell., 2014, pp. 1968–1974.

[4] H. Zhao, Z. Ding, and Y. Fu, “Multi-view clustering via deep matrixfactorization,” in Proc. AAAI Conf. Artif. Intell., 2017, pp. 2921–2927.

[5] Z. Zhang et al., “Highly-economized multi-view binary compression forscalable image clustering,” in Proc. Eur. Conf. Comput. Vis. (ECCV),2018, pp. 731–748.

[6] X. Li, M. Chen, F. Nie, and Q. Wang, “A multiview-based parameterfree framework for group detection,” in Proc. AAAI Conf. Artif. Intell.,2017, pp. 4147–4153.

[7] J. Li et al., “A probabilistic hierarchical model for multi-view andmulti-feature classification,” in Proc. AAAI Conf. Artif. Intell., 2018,pp. 3498–3505.

[8] Y. Wang and L. Wu, “Beyond low-rank representations: Orthogonal clus-tering basis reconstruction with optimized graph structure for multi-viewspectral clustering,” Neural Netw., vol. 103, pp. 1–8, Jul. 2018.

[9] K. Zhan, C. Zhang, J. Guan, and J. Wang, “Graph learning for multi-view clustering,” IEEE Trans. Cybern., vol. 48, no. 10, pp. 2887–2895,Oct. 2018.

[10] F. Nie, G. Cai, J. Li, and X. Li, “Auto-weighted multi-view learning forimage clustering and semi-supervised classification,” IEEE Trans. ImageProcess., vol. 27, no. 3, pp. 1501–1511, Mar. 2018.

[11] J. Li, B. Zhang, G. Lu, and D. Zhang, “Generative multi-view and multi-feature learning for classification,” Inf. Fusion, vol. 45, pp. 215–226,Jan. 2019.

[12] G. Luo et al., “Multi-views fusion CNN for left ventricular volumesestimation on cardiac MR images,” IEEE Trans. Biomed. Eng., vol. 65,no. 9, pp. 1924–1934, Sep. 2018.

[13] F. Nie, G. Cai, and X. Li, “Multi-view clustering and semi-supervisedclassification with adaptive neighbours,” in Proc. AAAI Conf. Artif.Intell., 2017, pp. 2408–2414.

[14] Y. Wang, L. Wu, X. Lin, and J. Gao, “Multiview spectral clusteringvia structured low-rank matrix factorization,” IEEE Trans. Neural Netw.Learn. Syst., vol. 29, no. 10, pp. 4833–4843, Oct. 2018.

[15] L. Zhao, Z. Chen, Y. Yang, Z. J. Wang, and V. C. M. Leung, “Incompletemulti-view clustering via deep semantic mapping,” Neurocomputing,vol. 275, pp. 1053–1062, Jan. 2018.

[16] Z. Zhang, L. Liu, F. Shen, H. T. Shen, and L. Shao, “Binary multi-viewclustering,” IEEE Trans. Pattern Anal. Mach. Intell., to be published,doi: 10.1109/TPAMI.2018.2847335.

[17] X. Cai, F. Nie, and H. Huang, “Multi-view K-means clustering on bigdata,” in Proc. Int. Joint Conf. Artif. Intell., 2013, pp. 2598–2604.

[18] K. Chaudhuri, S. M. Kakade, K. Livescu, and K. Sridharan, “Multi-viewclustering via canonical correlation analysis,” in Proc. Annu. Int. Conf.Mach. Learn., 2009, pp. 129–136.

[19] A. Kumar, P. Rai, and H. Daumé, “Co-regularized multi-view spec-tral clustering,” in Proc. Adv. Neural Inf. Process. Syst., 2011,pp. 1413–1421.

[20] C. Zhang, H. Fu, S. Liu, G. Liu, and X. Cao, “Low-rank tensorconstrained multiview subspace clustering,” in Proc. IEEE Int. Conf.Comput. Vis., 2015, pp. 1582–1590.

[21] Y. Wang et al., “Iterative views agreement: An iterative low-rank basedstructured optimization method to multi-view spectral clustering,” inProc. Int. Joint Conf. Artif. Intell., 2016, pp. 2153–2159.

[22] Y. Wang et al., “Robust subspace clustering for multi-view data byexploiting correlation consensus,” IEEE Trans. Image Process., vol. 24,no. 11, pp. 3939–3949, Nov. 2015.

[23] A. Trivedi, P. Rai, H. Daumé, III, and S. L. Duvall, “Multiview clus-tering with incomplete views,” in Proc. Adv. Neural Inf. Process. Syst.Workshop, 2010, pp. 1–7.

[24] J. Wen, Z. Zhang, Y. Xu, and Z. Zhong, “Incomplete multi-view clus-tering via graph regularized matrix factorization,” in Proc. Eur. Conf.Comput. Vis. Workshop, 2018, p. 1.

[25] W. Shao, L. He, and P. S. Yu, “Multiple incomplete views clusteringvia weighted nonnegative matrix factorization with L2,1 regularization,”in Proc. Joint Eur. Conf. Mach. Learn. Knowl. Disc. Databases, 2015,pp. 318–334.

[26] H. Gao, Y. Peng, and S. Jian, “Incomplete multi-view clustering,” inProc. Int. Conf. Intell. Inf. Process., 2016, pp. 245–255.

Page 12: IEEE TRANSACTIONS ON CYBERNETICS 1 Incomplete Multiview ...static.tongtianta.site/paper_pdf/d700ba0e-90fc-11e9-9b23-00163e08… · clustering methods have been proposed, such as the

12 IEEE TRANSACTIONS ON CYBERNETICS

[27] H. Zhao, H. Liu, and Y. Fu, “Incomplete multi-modal visual datagrouping,” in Proc. Int. Joint Conf. Artif. Intell., 2016, pp. 2392–2398.

[28] G. Liu et al., “Robust recovery of subspace structures by low-rank rep-resentation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 35, no. 1,pp. 171–184, Jan. 2013.

[29] J. Chen and J. Yang, “Robust subspace segmentation via low-rankrepresentation,” IEEE Trans. Cybern., vol. 44, no. 8, pp. 1432–1445,Aug. 2014.

[30] H. Gao, F. Nie, X. Li, and H. Huang, “Multi-view subspace clustering,”in Proc. IEEE Int. Conf. Comput. Vis., 2015, pp. 4238–4246.

[31] A. Y. Ng, M. I. Jordan, and Y. Weiss, “On spectral clustering: Analysisand an algorithm,” in Proc. Adv. Neural Inf. Process. Syst., 2002,pp. 849–856.

[32] Y. Yang, F. Shen, Z. Huang, H. T. Shen, and X. Li, “Discrete nonnega-tive spectral clustering,” IEEE Trans. Knowl. Data Eng., vol. 29, no. 9,pp. 1834–1845, Sep. 2017.

[33] L. Hagen and A. B. Kahng, “New spectral methods for ratio cut par-titioning and clustering,” IEEE Trans. Comput.-Aided Design Integr.Circuits Syst., vol. 11, no. 9, pp. 1074–1085, Sep. 1992.

[34] C. Tian et al., “Enhanced CNN for image denoising,” arXiv preprintarXiv:1810.11834, 2018, pp. 1–8.

[35] S. Zeng, J. Gou, and X. Yang, “Improving sparsity of coefficientsfor robust sparse and collaborative representation-based image clas-sification,” Neural Comput. Appl., vol. 30, no. 10, pp. 2965–2978,2018.

[36] L. Zhuang et al., “Non-negative low rank and sparse graph forsemi-supervised learning,” in Proc. IEEE Conf. Comput. Vis. PatternRecognit., 2012, pp. 2328–2335.

[37] X. Li, G. Cui, and Y. Dong, “Graph regularized non-negative low-rankmatrix factorization for image clustering,” IEEE Trans. Cybern., vol. 47,no. 11, pp. 3840–3853, Nov. 2017.

[38] Y. Lu, C. Yuan, W. Zhu, and X. Li, “Structurally incoherent low-ranknonnegative matrix factorization for image classification,” IEEE Trans.Image Process., vol. 27, no. 11, pp. 5248–5260, Nov. 2018.

[39] L. Fei, Y. Xu, X. Fang, and J. Yang, “Low rank representation withadaptive distance penalty for semi-supervised subspace classification,”Pattern Recognit., vol. 67, pp. 252–262, Jul. 2017.

[40] J. Wen, B. Zhang, Y. Xu, J. Yang, and N. Han, “Adaptive weightednonnegative low-rank representation,” Pattern Recognit., vol. 81,pp. 326–340, Sep. 2018.

[41] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, “Distributedoptimization and statistical learning via the alternating direction methodof multipliers,” Found. Trends R© Mach. Learn., vol. 3, no. 1, pp. 1–122,2011.

[42] J. Wen et al., “Low-rank preserving projection via graph reg-ularized reconstruction,” IEEE Trans. Cybern., to be published,doi: 10.1109/TCYB.2018.2799862.

[43] Y. Lu et al., “Low-rank 2-D neighborhood preserving projection forenhanced robust image representation,” IEEE Trans. Cybern., to bepublished, doi: 10.1109/TCYB.2018.2815559.

[44] Z. Zhang, Y. Xu, L. Shao, and J. Yang, “Discriminative block-diagonalrepresentation learning for image recognition,” IEEE Trans. NeuralNetw. Learn. Syst., vol. 29, no. 7, pp. 3111–3125, Jul. 2018.

[45] F. Nie, X. Wang, M. I. Jordan, and H. Huang, “The constrainedLaplacian rank algorithm for graph-based clustering,” in Proc. AAAIConf. Artif. Intell., 2016, pp. 1969–1976.

[46] J. Wen et al., “Robust sparse linear discriminant analysis,”IEEE Trans. Circuits Syst. Video Technol., to be published,doi: 10.1109/TCSVT.2018.2799214.

[47] E. J. Candès, X. Li, Y. Ma, and J. Wright, “Robust principal componentanalysis?” J. ACM, vol. 58, no. 3, p. 11, 2011.

[48] Y. Lu et al., “Structurally incoherent low-rank 2DLPP for image clas-sification,” IEEE Trans. Circuits Syst. Video Technol., to be published,doi: 10.1109/TCSVT.2018.2849757.

[49] Y. Zhang, “An alternating direction algorithm for nonnegative matrixfactorization,” Dept. Comput. Appl. Math., Rice Univ., Houston, TX,USA, Rep. TR10-03, 2010.

[50] B. Qian, X. Shen, Y. Gu, Z. Tang, and Y. Ding, “Double constrainedNMF for partial multi-view clustering,” in Proc. IEEE Int. Conf. Digit.Image Comput. Techn. Appl., 2016, pp. 1–7.

[51] N. Rai, S. Negi, S. Chaudhury, and O. Deshmukh, “Partial multi-viewclustering using graph regularized NMF,” in Proc. IEEE Int. Conf.Pattern Recognit., 2016, pp. 2192–2197.

[52] E. Amigó, J. Gonzalo, J. Artiles, and F. Verdejo, “A comparison ofextrinsic clustering evaluation metrics based on formal constraints,” Inf.Retrieval, vol. 12, no. 4, pp. 461–486, 2009.

[53] Y. Li, F. Nie, H. Huang, and J. Huang, “Large-scale multi-view spectralclustering via bipartite graph,” in Proc. AAAI Conf. Artif. Intell., 2015,pp. 2750–2756.

[54] D. Huang, J. Sun, and Y. Wang, “The BUAA-VisNir face databaseinstructions,” School Comput. Sci. Eng., Beihang Univ., Beijing, China,Tech. Rep. IRIP-TR-12-FR-001, 2012.

[55] Y. Guo, “Convex subspace representation learning from multi-viewdata,” in Proc. AAAI Conf. Artif. Intell., vol. 1, 2013, pp. 387–393.

[56] L. Fei-Fei, R. Fergus, and P. Perona, “Learning generative visual modelsfrom few training examples: An incremental Bayesian approach testedon 101 object categories,” Comput. Vis. Image Understand., vol. 106,no. 1, pp. 59–70, 2007.

[57] A. Oliva and A. Torralba, “Modeling the shape of the scene: A holisticrepresentation of the spatial envelope,” Int. J. Comput. Vis., vol. 42,no. 3, pp. 145–175, 2001.

[58] P. Felzenszwalb, D. Mcallester, and D. Ramanan, “A discriminativelytrained, multiscale, deformable part model,” in Proc. IEEE Conf.Comput. Vis. Pattern Recognit., 2008, pp. 1–8.

[59] D. Greene and P. Cunningham, “Practical solutions to the problem ofdiagonal dominance in kernel document clustering,” in Proc. Int. Conf.Mach. Learn., 2006, pp. 377–384.

[60] J. Wen, Y. Xu, Z. Li, Z. Ma, and Y. Xu, “Inter-class sparsity based dis-criminative least square regression,” Neural Netw., vol. 102, pp. 36–47,Jun. 2018.

Jie Wen received the M.S. degree in detectiontechnology & automatic equipment from HarbinEngineering University, Harbin, China, in 2015. Heis currently pursuing the Ph.D. degree in computerscience and technology from the Harbin Institute ofTechnology, Shenzhen, China.

His current research interests include image andvideo processing, pattern recognition, data cluster-ing, and machine learning.

Yong Xu (M’06–SM’15) received the Ph.D. degreein pattern recognition and intelligence system fromthe Nanjing University of Science and Technology,Nanjing, China, in 2005.

He is currently with the Harbin Instituteof Technology, Shenzhen, Shenzhen, China.His current research interests include patternrecognition, biometrics, machine learning, andvideo analysis. More information please refer tohttp://www.yongxu.org/lunwen.html.

Hong Liu (M’05) received the Ph.D. degree in elec-trical control and automation engineering from theHarbin Institute of Technology, Harbin, China, in1996.

He was a Postdoctoral Fellow in computer scienceand technology from the Peking University, Beijing,China, in 1996. He is currently the Supervisor ofdoctoral students, and the Director of the ResearchDepartment of Shenzhen Graduate School and theDirector of the Intelligent Robot Laboratory, PekingUniversity, Shenzhen, China. He has done exchange

visits in many famous universities and research institutions in several coun-tries and regions, including the U.S., Canada, France, The Netherlands, Japan,South Korea, Singapore, and Hong Kong. He has published over 100 papersin the important scholarly journals and international conferences. His currentresearch interests include image processing and pattern recognition, intelli-gent robots and computer vision, and intelligent micro-systems hardware andsoftware co-design.

Dr. Liu was a recipient of the Korean Academy of an Jung-geun Awards,the Department of Space Science and Technology Progress Award, the PekingUniversity Teaching Excellence Award, and the Aetna Award and candidatesof Peking University Top Ten Teachers. He is an Executive Director and theVice Secretary of Intelligent Automation Committee of Chinese AutomationAssociation.