Splitting and linearizing augmented Lagrangian algorithm for subspace recovery from corrupted observations

Adv Comput Math (2013) 38:837–858DOI 10.1007/s10444-011-9261-9

Splitting and linearizing augmented Lagrangianalgorithm for subspace recoveryfrom corrupted observations

Yunhai Xiao · Soon-Yi Wu · Dong-Hui Li

Received: 27 September 2011 / Accepted: 13 November 2011 /Published online: 3 December 2011© Springer Science+Business Media, LLC 2011

Abstract Given a set of corrupted data drawn from a union of multiplesubspace, the subspace recovery problem is to segment the data into theirrespective subspace and to correct the possible noise simultaneously. Recently,it is discovered that the task can be characterized, both theoretically andnumerically, by solving a matrix nuclear-norm and a �2,1-mixed norm involvedconvex minimization problems. The minimization model actually has separa-ble structure in both the objective function and constraint; it thus falls into theframework of the augmented Lagrangian alternating direction approach. Inthis paper, we propose and investigate an augmented Lagrangian algorithm.We split the augmented Lagrangian function and minimize the subproblemsalternatively with one variable by fixing the other one. Moreover, we linearizethe subproblem and add a proximal point term to easily derive the closed-formsolutions. Global convergence of the proposed algorithm is established undersome technical conditions. Extensive experiments on the simulated and thereal data verify that the proposed method is very effective and faster than thesate-of-the-art algorithm LRR.

Communicated by Lixin Shen.

Y. Xiao (B)Institute of Applied Mathematics, College of Mathematics and Information Science,Henan University, Kaifeng 475000, Chinae-mail: [email protected]

S.-Y. WuNational Center for Theoretical Sciences (South), National Cheng Kung University,Tainan 700, Taiwane-mail: [email protected]

D.-H. LiSchool of Mathematical Sciences, South China Normal University, Guangzhou 510631, Chinae-mail: [email protected]

838 Y. Xiao et al.

Keywords Principal component analysis · Low-rank representation ·Subspace recovery · Augmented Lagrangian function · Convex optimization ·Nuclear norm minimization

Mathematics Subject Classifications (2010) 65L09 · 65K05 · 90C30 ·90C25 · 94A08

1 Introduction

Many applications in various areas can be captured by finding and exploitinglow-dimensional structure in high-dimensional data; for example, the principalcomponent analysis in computer vision and image/vedio analysis. Althoughprincipal component analysis offers an almost exact estimate of the subspacewhen the data are corrupted by small Gaussian noise, it breaks down whenonly a fraction of data is grossly corrupted. Given a data matrix X ∈ R

m×n

generated from a low-rank Z ∈ Rm×n with additive gross but sparse noise

E ∈ Rm×n (i.e., the impulsive noise), so that, X = Z + E. Under some mild

conditions, it is shown that [33] the task of recovering low-rank componentsand the sparse components can be accomplished via solving a nuclear normand a �1-norm involved convex minimization problem

minZ ,E

‖Z‖∗ + λ‖E‖1 (1.1)

s.t. X = Z + E,

where ‖ · ‖1 is defined as the sum of absolute values of all entries; λ > 0 is apositive weighting parameter; ‖ · ‖∗ is a well-known nuclear norm (also knownas trace norm or Ky Fan norm) defined by the sum of all singular values. Notethat the nuclear norm is the best convex approximation of the rank functionover the unit ball of matrices under the spectral norm, and it is widely used asa surrogate of the rank function to include low-rank solutions in various areassuch as machine learning, statistics, engineering, etc. The model (1.1) has alsobeen highlighted in the context of the so-called robust principal componentanalysis (RPCA) [4].

The formulation of RPCA implicitly assumes that the underlying data is ap-proximately drawn from a single low-rank subspace. However, when the datais drawn from a union of multiple subspace, denoted as S1,S2, . . . ,St, RPCAactually treats the data as being sampled from a single subspace defined byS = ∑t

i=1 Si. The specifics of the individual subspace are not well considered,so the recovery may be inaccurate. The subspace segmentation [8] methodcan only explore the corrected subspace structures by assuming that thesubspaces are independent and the data is noiseless. Hence, its performanceand applicability in real scenarios are limited by a lack of robustness [23].Given a set of data samples approximately drawn from a union of multipleindependent subspaces, and a fraction of the data vectors grossly corrupted byE0, i.e., X = X0 + E0; the low-rank representation [20] method is to recover

Splitting and linearizing augmented Lagrangian algorithm 839

Z 0 ∈ Rn×n and to correct possible noise E0 simultaneously via solving the

following convex minimization problem

minZ ,E

‖Z‖∗ + λ‖E‖2,1 (1.2)

s.t. X = AZ + E,

where A ∈ Rm×n is a dictionary that linearly spans the data space; ‖ · ‖2,1 is

the so-called �2,1-mixed norm defined by the sum of the �2-norm of eachcolumn of matrix. The �2,1-mixed norm encourages that the column of E tobe zero, which indicates that some data vectors are corrupted while keepingthe others clean. After we get the minimizer (Z ∗, E∗), the original data X0

can be reconstructed by setting X − E∗ (or AZ ∗). The minimizer Z ∗ is alsonamed the lowest-rank representation of data X with respect to a dictionaryA. Note that the formulation (1.2) reduces to (1.1) by setting A := I (theidentity matrix) and ‖ · ‖1 := ‖ · ‖2,1; hence, LRR is a generalization of RPCAthat essentially uses the standard basis as the dictionary. Therefore, LRRis a substantial improvement over RPCA, which exactly recovers subspacestructures and correct possible noise as well.

Problem (1.2) is convex, which can be easily recast as an equivalent semi-definite programming problem (SDP), and is solved subsequently by usingthe SDP solvers such as the popular interior-point method packages SDPT3[30] and SeDuMi [26]. However, the interior-point approach is prohibitivelyinefficient for large cases of these problems. Fortunately, this hardness can begreatly alleviated by taking the favorable structure in both objective functionand the constraint. In the context of RPCA where A = I, the iterative thresh-olding approach in [33] solves a relaxed version of (1.1). This algorithm is ac-tually a straightforward combination of the iterative threholding algorithm fornuclear norm minimization [2] and the well-known fixed-point continuationalgorithm for �1-norm minimization [13]. The accelerated proximal gradient(APG) method [19] solves a Lagrangian version of (1.1). This method belongsto the applicable scope of the classical proximal gradient method [29] andits performance is dramatically improved by incorporating with Nesterov’swork [22] and a widely used continuation technique. The inexact augmentedLagrangian method [18] is considerably faster than APG by taking full ex-ploitation to the high-level separable structure, which actually accords withthe alternating direction method (ADM) developing almost simultaneouslyand independently in [39]. Just like the matrix completion problems (e.g.,[2, 3, 21]), recently, Tao and Yuan [28] consider model (1.1) under morepractical circumstances, where only a part of entries of data X can be observedand a fraction of the observed data is grossly corrupted by large noise while theothers by small noise.

Although much progress has been achieved in solving model (1.1), thesereviewed solvers need further modifications or improvements to deal with(1.2). The very recently proposed low-rank representation algorithm (namedLRR) reformulates model (1.2) into an equivalent problem by adding a newvariable and a constraint, and then uses the ADM for its solution. In fact,

840 Y. Xiao et al.

although the convergence of this method for a convex programming whoseobjective function and constraint are separable into two partes is well studied,the convergence of the method for the more general case with three or moreseparable parts is still open [14, 28]. Hence, it indicates that the convergenceproperty of LRR is ambiguous without more conditions.

Because the resulting subproblems are always sufficiently simple to haveclosed-form solutions, ADM recently has been exhibited as a powerful al-gorithmic tool to solve convex programming problems arising from variousapplications such as in image processing [9, 12, 31, 35, 36], compressive sensing[38], matrix completion [5, 34, 37], SDP [17, 27, 32], and multi-task featherlearning [6]. In this paper, we focus on the application of ADM to solve thenuclear norm and �2,1-mixed norm involved minimization model (1.2), and todemonstrate its remarkable effectiveness in recovering subspace structure andcorrecting noise as well for a given corrupted data. More precisely, we splitthe task of minimizing the corresponding augmented Lagrangian function intotwo subproblems, which solve the variables E and Z in a consecutive orderby fixing the other variables, and then updating the multiplier subsequently.Both resulting subproblems have closed-form solutions which can be easilydetermined by taking the problem’s favorable structures. When the closed-from solutions of the X-subproblem are not clear, we linearize the subproblemand add a proximal point term to guarantee that its closed-form solutionscan be easily derived. The proposed algorithm is easily performed and pro-vide that each subproblem admits closed-form solutions. The algorithm ismainly dominated by a linear-time shrinkage, a partial matrix singular valuedecomposition, and two matrix-matrix multiplications involved A and A� atper-iteration. Extensive experiments indicate that the proposed algorithm ispromising and performs better than the state-of-the-art solver LRR.

We summarize the notation used in this paper. Matrices are written asuppercase letters. Vectors are written as lowercase letters. For matrix X, itsi-th row and j-th column are denoted by [X]i,: and [X]:, j respectively. The�1-norm, Frobenius norm and �2,1-mixed norm of the matrix X ∈ R

m×n arerespectively defined as

‖X‖1 =∑

|Xij|, ‖X‖F =√√√√

m∑

i=1

n∑

j=1

x2i, j and ‖X‖2,1 =

n∑

j=1

√√√√

m∑

i=1

x2i, j

=n∑

j=1

‖[X]:, j‖2,

where xi, j is the (i, j)-th component of X and ‖ · ‖2 is the Euclidean norm ofvector. For any two matrices X, Y ∈ R

n×t, we define 〈X, Y〉 = tr(X�Y) (thestandard trace inner product in R

t), so that ‖X‖F = √〈X, X〉. For vector x,we denote diag(x) as a diagonal matrix which possesses the components of xon its diagonal. For any symmetric M ∈ R

m×m, λmax(M) denotes the largest


eigenvalues of M. Symbol “�” is defined as the transpose of a vector or amatrix. For a convex f (x), ∂ f (x) is defined as the subdifferential of f at x.

The rest of this paper is organized as follows. In Section 2, we briefly recallsome preliminary result which used in the subsequent sections, and then deriveour algorithm based on primal problem (1.2). In Section 3, we establish theconvergence of the proposed algorithm under some technical conditions. InSection 4, we report numerical results by using a simulated data and a real dataset and do some performance comparisons. Finally, we conclude the paper inSection 5.

2 Algorithm

2.1 Preliminary results

This subsection is devoted to briefly reviewing some well-known results whichwill be used to construct our algorithm. We firstly recall the classic alternat-ing direction method contributed originally by Glowinski and Marocco [11],and Gabay and Mercier [10] for solving the structured convex optimizationproblem

minx,y

f1(x) + f2(y) (2.1)

s.t. Mx + Ny = c,

where M ∈ Rl×s, N ∈ R

l×t, and c ∈ Rl; functions f1 : R

s → R, f2 : Rt → R are

convex. The augmented Lagrangian function of (2.1) is

L1(x, y, λ) = f1(x) + f2(y) − λ�(Mx + Ny − c) + β

2‖Mx + Ny − c‖2

2, (2.2)

where λ ∈ Rl is the Lagrangian multiplier associated with the linear constraint

and β > 0 is a penalty parameter for the violation of the linear constraint. It iswell-known that, starting from λ0, the classic augmented Lagrangian solves

minx,y

L1(x, y, λk), (2.3)

at the current iteration and updates the multiplier λ subsequently. Solving(2.3) for x and y simultaneously can be difficult, since it ignores the favorableseparable structure emerging in the objective function and the constraints. TheADM splits the augmented Lagrangian function (2.2) in an alternative mannersuch that each variable x and y can be derived separately at a time while fixingthe other one at its latest values. In a word, we follow the framework:

⎧⎨

⎩

xk+1 := arg minx L1(x, yk, λk),

yk+1 := arg miny L1(xk+1, y, λk),

λk+1 := λk − β[Mxk+1 + Nyk+1 − c].(2.4)

In fact, the ADM is closely related to Douglas-Rachford splitting methods[25] and split-Bregman methods [12] in image processing [9]. The theoretical

842 Y. Xiao et al.

analysis of ADM can be refereed to Proposition 4.2 in [1, Chapter 3], and itsconvergence theorem can be stated as follows:

Theorem 2.1 For f ixed β > 0, the sequence {(xk, yk, λk)} generated by iteration(2.4) from any starting point is bounded, and every limit of {(xk, yk)} is anoptimal solution of (2.1).

Now, we list two important theorems, which will be used in our proposedalgorithm to solve each subproblem.

Theorem 2.2 [2, 21] Given Y ∈ Rm×n of rank r, let

Y = U�V�, � = diag({σi}1≤i≤r),

be the singular value decomposition (SVD) of Y. For each μ > 0, we let

Dμ(Y) = U�μV� and �τ = diag({σi − μ}+),

where {·}+ = max(0, ·). It is shown that Dμ(Y) obeys

Dμ(Y) = arg minX

μ‖X‖∗ + 12‖X − Y‖2

F . (2.5)

Theorem 2.3 [7] Let Y ∈ Rm×n be a given matrix, and Sμ(Y) be the optimal

solution of

minX

μ‖X‖2,1 + 12‖X − Y‖2

F;then the i-th column of Sμ(Y) is

[Sμ(Y)]:,i ={‖[Y]:,i‖2 − μ

}

+[Y]:,i

‖[Y]:,i‖2.

2.2 Review of LRR Algorithm

In this subsection, we briefly review the very recent algorithm LRR, and stateour motivations immediately. The LRR method adds a new variable J ∈ R

n×n

to model (1.2) and converts it to the equivalent form:

minZ ,E

‖J‖∗ + λ‖E‖2,1 (2.6)

s.t. X = AZ + E,

J = Z .

The augmented Lagrangian function of (2.6) is

L2(E, Z , J, Y, L) =‖J‖∗ + λ‖E‖2,1 − 〈Y, AZ + E − X〉 + μ

2‖AZ + E − X‖2

F

− 〈L, Z − J〉 + μ

2‖Z − J‖2

F,


where Y ∈ Rm×n and L ∈ R

n×n are the Lagrangian multipliers and μ > 0 isa penalty parameter. The LRR algorithm minimizes L2(E, Z , J, Y, L) firstlywith respect to J, later with Z , and then with E by fixing the other variableswith their latest values. More precisely, with the given (Ek, Zk, Jk, Yk, Lk), theLRR generates the new iterate (Ek+1, Zk+1, Jk+1, Yk+1, Lk+1) via the scheme:

⎧⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎩

Jk+1 ∈ arg minJ L2(Ek, Zk, J, Yk, Lk),

Zk+1 ∈ arg minE L2(Ek, Z , Jk+1, Yk, Lk),

Ek+1 ∈ arg minZ L2(E, Zk+1, Jk+1, Yk, Lk),

Yk+1 := Yk − μ(AZk+1 + Ek+1 − Xk+1),

Lk+1 := Lk − μ(Zk+1 − Jk+1).

By Theorems 2.2 and 2.3, it can be easily derived the following more compactform:

⎧⎪⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎪⎩

Jk+1 := D1/μ(Zk − Yk/μ),

Zk+1 :=(

A� A + I)−1(

A�(X − Ek) + Jk+1 + (A�Yk + Lk)/μ),

[Ek+1]:,i := S1/μ(X − AZk+1 + Yk/μ), (i = 1, 2, . . . , n),

Yk+1 := Yk − μ(AZk+1 + Ek+1 − Xk+1),

Lk+1 := Lk − μ(Zk+1 − Jk+1),

(2.7)

where I ∈ Rn×n is an identity matrix.

We clearly see that, LRR divides the minimization task into three separablesubproblems based on each variable. In particular, we observe that the J- andE-subproblems can be easily implemented and enjoy closed-form solutions.In the special case of the Z -subproblem where A has orthonormal rows,i.e., AA� = I in compressive sensing, it is easy to verify that (A� A + I)−1 =I − 1

2 A� A. In this case, there is no need to compute the matrix inverse forattaining the next Z . However, when A is a generic linear operator, thematrix inverse does not has explicit forms any more, and the efficiency of theLRR depends heavily on how to solve the harder subproblem. Despite thesurprisingly efficient, it is still not clear whether LRR converges globally with-out additional technical conditions. Fortunately, recent developments in thisliterature illustrate that if the generated iterates (Jk+1, Zk+1, Ek+1, Yk+1, Lk+1)

are further corrected by a suitable descent direction along with an appropriatestepsize, then a more closer iterate to the solution set of (1.2) can be easilyderived [15]. However, the correction steps may cause critical difficulties; seefor example, it usually destroys the low-rank feature of the temporary iterategenerated by (2.7). An alternative correction-free approach is to update theLagrangian multipliers immediately once Jk+1 is computed, and then solve thelatest multipliers involved subproblems in a consecutive order [16, 28].

2.3 Splitting and linearizing augmented Lagrangian algorithm

Although the global convergence properties of LRR can be easily establishedby using further modifications like [15, 16, 28], it is still highly desirable toseek a two-variable involved algorithm whose convergence is ensured without

844 Y. Xiao et al.

additional computational burdens and restrictive assumptions. Hence, thismotivates us to design an efficient algorithm to solve directly the originalmodel (1.2) without any equivalent transformation; that is, both the objectivefunction and constraints are separated into two parts. More precisely, thecorresponding augmented Lagrangian function of (1.2) is given by

LA(E, Z , ) = ‖Z‖∗ + λ‖E‖2,1 − 〈, AZ + E − X〉 + μ

2‖AZ + E − X‖2

F ,

(2.8)

where ∈ Rm×n is the Lagrangian multiplier and μ > 0 is a penalty parameter.

Let (Zk, Ek, k) be given. The next (Zk+1, Ek+1, k+1) can be generated viathe following scheme:

⎧⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎩

Ek+1 := arg minE∈Rm×n

LA(E, Zk, k), (2.9a)

Zk+1 := arg minZ∈Rm×n

LA(Ek+1, Z , k), (2.9b)

k+1 := k − μ(AZk+1 + Ek+1 − X). (2.9c)

Clearly, to solve (1.2) by the alternating direction algorithm scheme, the maincomputation burden at per-iteration is to solve the two resulting subproblems(2.9a) and (2.9b). Firstly, we deduce that

Ek+1 := arg minE∈Rm×n

LA(E, Zk, k)

= arg minE∈Rm×n

λ‖E‖2,1 − 〈k, AZk + E − X〉 + μ

2‖AZk + E − X‖2

F

= arg minE∈Rm×n

λ‖E‖2,1 + μ

2‖E − (X − AZk + k/μ)‖2

F

= Sλ/μ(X − AZk + k/μ).

Secondly, for the given k and the latest Ek+1, the subproblem (2.9b) withrespect to Z is equivalent to

Zk+1 := arg minZ∈Rn×n

LA(Ek+1, Z , k)

= arg minZ∈Rn×n

‖Z‖∗ − 〈k, AZ + Ek+1 − X〉 + μ

2‖AZ + Ek+1 − X‖2

F

= arg minZ∈Rn×n

‖Z‖∗ + μ

2‖AZ + Ek+1 − X − k/μ‖2

F . (2.10)

However, since the generic linear operator A, (2.10) does not share the samestructures as (2.5); hence, it does not enjoy closed-form solutions. Therefore,the exact solution of (2.9b) is expensive, which causes the main difficulty toapply the standard alternating direction method directly. Roughly speaking,it is not necessary to solve (2.9b) exactly to achieve high precision solutionsin order to guarantee the convergence of the iterative scheme (2.9a–2.9c).


Alternatively, we would solve an approximate model at each iteration as longas the overall convergence can be guaranteed. More specifically, let

Gk = A�(AZk + Ek+1 − X − k/μ) (2.11)

be the gradient of 12‖AZ + Ek+1 − X − k/μ‖2

F at current Zk, and we have

12‖AZ + Ek+1 − X − k/μ‖2

F ≈ 12‖AZk + Ek+1 − X − k/μ‖2

F

+ 〈Gk, Z − Zk〉 + 12τ

‖Z − Zk‖2F,

(2.12)

where τ > 0 is a positive scalar, and the last term is the so-called proximalpoints term in optimization literature [24]. Hence, instead of (2.10), the nextiterate can be generated by

Zk+1 := arg minZ∈Rn×n

‖Z‖∗ + μ

2‖AZ + Ek+1 − X − k/μ)‖2

F

≈ arg minZ∈Rn×n

‖Z‖∗ + μ

2‖AZk + Ek+1 − X − k/μ‖2

F + μ〈Gk, Z − Zk〉

+ μ

2τ‖Z − Zk‖2

F

= arg minZ∈Rn×n

‖Z‖∗ + μ〈Gk, Z − Zk〉 + μ

2τ‖Z − Zk‖2

F

= arg minZ∈Rn×n

‖Z‖∗ + μ

2τ‖Z − (Zk − τGk)‖2

F

= Dτ/μ(Zk − τGk). (2.13)

To sum up, our algorithm named Primal version Splitting and LinearizingAugmented Lagrangian algorithm or PSLAL, is described below.

Algorithm 1 (PSLAL). Input positive constants λ, τ and μ. Initialize Z0, 0 andk = 0.While “not converged”, Do

1) Compute Ek+1 via Sλ/μ(X − AZk + k/μ);2) Compute Zk+1 via Dτ/μ(Zk − τGk);3) Update k+1 via k − μ(AZk+1 + Ek+1 − X);4) Let k = k + 1.

End Do

Remark 1 We clearly see that the algorithm is easily performed in sense thatboth subproblems enjoy closed-form solutions. Meanwhile, there are only twomatrix-matrix multiplications involved A and A� which are required at per-iteration. Furthermore, each iteration of the algorithm is also dominated byone partial SVD to compute the singular values bigger than the threshold λ/μ

for implementing the shrinkage as stated in Theorem 2.2.

846 Y. Xiao et al.

Now, we dedicate to show the stopping criterion of PSLAL. Firstly, it is easyto see that the optimal condition of (1.2) is characterized by finding the solution(E∗, Z ∗) ∈ R

m×n × Rn×n and the corresponding Lagrangian multiplier ∗ such

that⎧⎪⎪⎪⎨

⎪⎪⎪⎩

∗ ∈ ∂λ‖E∗‖2,1, (2.14a)

A�∗ ∈ ∂‖Z ∗‖∗, (2.14b)

0 = AZ ∗ + E∗ − X. (2.14c)

Secondly, the iterate (Ek+1, Zk+1, k+1) generated by SLAL is character-ized by

⎧⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎩

− μ(Ek+1 − X + AZk − k/μ) ∈ ∂λ‖Ek+1‖2,1, (2.15a)

− μ

τ(Zk+1 − Zk + τGk) ∈ ∂‖Zk+1‖∗, (2.15b)

k+1 = k − μ(AZk+1 + Ek+1 − X), (2.15c)

which is equivalent to

⎧⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎩

k+1 + μA(Zk+1 − Zk) ∈ ∂λ‖Ek+1‖2,1, (2.16a)

A�k+1 − μ

τ

(I − τ A� A

)(Zk+1 − Zk) ∈ ∂‖Zk+1‖∗, (2.16b)

k+1 = k − μ(AZk+1 + Ek+1 − X). (2.16c)

Note the optimality condition (2.14a–2.14c). The characterization (2.16a–2.16c) immediately indicates that the proposed algorithm should terminate if‖Zk+1 − Zk‖F and ‖k+1 − k‖F are sufficiently small; that is, for a give smallpositive scalar ε > 0, the stopping criteria should be

min{μ

τ‖Zk+1 − Zk‖F,

1μ

‖k+1 − k‖F

}≤ ε.

3 Convergence analysis

This section is devoted to establishing the global convergence of AlgorithmPSLAL. Since PSLAL applies a linearization technique on formulation (2.10)to derive closed-form solutions, by considering the definition of Gk in (2.11),we can write (2.15b) as

−μA�(AZk+1+Ek+1 − X − k/μ) − μ

(1τ

I − A� A)

(Zk+1 − Zk)

︸︷︷︸

∈ ∂‖Zk+1‖∗.

(3.1)


On the other hand, it follows from (2.9b) that

−μA�(AZk+1 + Ek+1 − X − k/μ) ∈ ∂‖Zk+1‖∗. (3.2)

By comparing (3.1) with (3.2), we clearly see that our linearizaton techniqueactually adds a proximal points term μ( 1

τI − A� A)(Zk+1 − Zk) on the exact

model (2.9b) at per-iteration.

Lemma 3.1 Let (Ek+1, Zk+1, k+1) denote the triplet generated by SLAL froma given (Ek, Zk, k). Then it holds that

μ

τ〈Zk+1 − Z ∗, Zk+1 − Zk〉 + 1

μ〈k+1 − k, k+1 − ∗〉

+ 〈k − k+1, A(Zk − Zk+1)〉 ≤ 0. (3.3)

Proof The optimality condition (2.14a) indicates ∗ ∈ ∂‖E∗‖2,1, and (2.16a)shows

k+1 + μA(Zk+1 − Zk) ∈ ∂λ‖Ek+1‖2,1.

By considering the convexity of ‖ · ‖2,1, we have⟨Ek+1 − E∗, (k+1 − ∗) + μA(Zk+1 − Zk)

⟩≥ 0. (3.4)

Moreover, it follows AZ ∗ + E∗ = X and AZk+1 + Ek+1 − X = (k −k+1)/μ that

Ek+1 − E∗ = k − k+1

μ− (AZk+1 − AZ ∗).

Therefore, (3.4) can be rewritten as⟨(k − k+1)/μ − (AZk+1 − AZ ∗), (k+1 − ∗) + μA(Zk+1 − Zk)

⟩≥ 0.

(3.5)

On the other hand, (2.14b) shows A�∗ ∈ ∂‖Z ∗‖∗. Furthermore, by theconvexity of ‖ · ‖∗ and (2.16b), it holds that

⟨Zk+1 − Z ∗, A�(k+1 − ∗) − μ

(1τ

I − A� A)

(Zk+1 − Zk)⟩≥ 0,

or, equivalently,μ

τ

⟨Zk+1−Z ∗, Zk−Zk+1

⟩ + ⟨A(Zk+1−Z ∗), (k+1−∗) + μA(Zk+1 − Zk)

⟩ ≥ 0.

(3.6)

Adding both sides of (3.5) and (3.6), it yields the desirable result (3.3). �

The following theorem indicates that the sequence (Zk, k) generated bythe proposed PSLAL is Fejèr monotone, which measures how much thecurrent iteration is close to the solution.

848 Y. Xiao et al.

Theorem 3.2 Let the sequence {(Ek, Zk, k)} be generated by algorithm SLAL.If 0 < τ < 1/λmax(A� A), where λmax(A� A) denotes the largest eigenvalue ofA� A. Then, there exists a positive scalar ν such that

μ

τ‖Zk+1 − Z ∗‖2

F + 1μ

‖k+1 − ∗‖2F ≤ μ

τ‖Zk − Z ∗‖2

F + 1μ

‖k − ∗‖2F

− ν(μ

τ‖Zk+1 − Z k‖2

F + 1μ

‖k+1 − k‖2F

). (3.7)

Proof Using the identity

‖a + b‖22 = ‖a‖2

2 − ‖b‖22 + 2(a + b)�b ,

and the facts Zk+1 − Z ∗ = (Zk − Z ∗) + (Zk+1 − Zk) and k+1 − ∗ = (k −∗) + (k+1 − k), it follows that

μ

τ‖Zk+1 − Z ∗‖2

F = μ

τ‖Zk − Z ∗‖2

F − μ

τ‖Zk+1

− Zk‖2F + 2μ

τ〈Zk+1 − Z ∗, Zk+1 − Zk〉, (3.8)

and

1μ

‖k+1 − ∗‖2F = 1

μ‖k − ∗‖2

F − 1μ

‖k+1

− k‖2F + 2

μ〈k+1 − ∗, k+1 − k〉. (3.9)

Plugging both sides of (3.8) and (3.9), we get

μ

τ‖Zk+1 − Z ∗‖2

F + 1μ

‖k+1 − ∗‖2F

= μ

τ‖Zk − Z ∗‖2

F + 1μ

‖k − ∗‖2F − μ

τ‖Zk+1 − Zk‖2

F − 1μ

‖k+1 − k‖2F

+2[μ

τ〈Zk+1 − Z ∗, Zk+1 − Zk〉 + 1

μ〈k+1 − ∗, k+1 − k〉

]

≤ μ

τ‖Zk−Z ∗‖2

F + 1μ

‖k−∗‖2F −

(μ

τ‖Zk+1−Zk‖2

F + 1μ

‖k+1−k‖2F

)

−2〈k − k+1, A(Zk − Zk+1)〉, (3.10)


where the inequality is drawn from Lemma 3.1. Let δ := 1 − τλmax(A� A) > 0,and define η := 1/μ(1 + δ). Then, it follows the Cauchy-Schwartz inequality2a�b ≥ −η‖a‖2 − ‖b‖2/η that

μ

τ‖Zk+1 − Zk‖2

F + 1μ

‖k+1 − k‖2F + 2〈k − k+1, A(Zk − Zk+1)〉

≥ μ

τ‖Zk+1 − Zk‖2

F +(

1μ

− η

)

‖k − k+1‖2F − 1

η‖A(Zk − Zk+1)‖2

F

≥(

μ

τ− λmax(A� A)

η

)

‖Zk+1 − Zk‖2F +

(1μ

− η

)

‖k+1 − k‖2F

= μδ2

τ‖Zk+1 − Zk‖2

F + δ

μ(1 + δ)‖k+1 − k‖2

F

≥ ν

(μ

τ‖Zk+1 − Zk‖2

F + 1μ

‖k+1 − k‖2F

)

, (3.11)

where ν = min{δ2, δ1+δ

}. Substituting (3.11) into (3.10), we get the assertion ofthis theorem. �

The above theorem immediately indicates that {μ

τ‖Zk − Z ∗‖2

F + 1μ‖k −

∗‖2F} is a monotonically non-increasing sequence and thus {Zk, k} con-

verges. Furthermore, the theorem also verifies the following corollary.

Corollary 3.3 Let the sequence {(Ek, Zk, k)} be generated by PSLAL. Then,we have

1) {Zk} and {k} are bounded;2) limk→∞ ‖Zk+1 − Zk‖F = 0 and limk→∞ ‖k+1 − k‖F = 0.

Proof Let (Z ∗, E∗) be the solution of (1.2) and ∗ be the correspondingLagrangian multiplier. First, from (3.7), we have

μ

τ‖Zk − Z ∗‖2

F + 1μ

‖k − ∗‖2F ≤ μ

τ‖Z0 − Z ∗‖2

F + 1μ

‖0 − ∗‖2F,

which shows that {Zk} and {k} are bounded. Also from (3.7), we obtain

ν

∞∑

k=0

(μ

τ‖Zk+1−Z k‖2

F + 1μ

‖k+1−k‖2F

)

≤ μ

τ‖Z0−Z ∗‖2

F + 1μ

‖0−∗‖2F <∞,

it follows that

limk→∞

(μ

τ‖Zk+1 − Z k‖2

F + 1μ

‖k+1 − k‖2F

)

= 0,

or the second statement of this corollary. �

We are ready to prove the convergence of the proposed algorithm.

850 Y. Xiao et al.

Theorem 3.4 For f ixed μ > 0 and 0 < τ < 1/λmax(A� A), the sequence{(Zk, Xk, k)} generated by PSLAL from any starting point converges to(E, Z , ), where (E, Z ) is an optimal solution of (1.2).

Proof It follows from Corollary 1 that there exists an index set k j such thatZk j → Z and k j → . Additionally, since

Ek = Sλ/μ

(X − AZk + k/μ + A(Zk − Zk−1) + (k+1 − k)/μ

),

and Zk − Zk−1 → 0 and k − k−1 → 0, it implies that

Ek j → E := Sλ/μ(X − AZ + /μ).

Now, we verify that (E, Z , ) satisfies the optimality conditions (2.14a–2.14c).It follows from (2.9c) that

k+1 = k − μ(

AZk+1 + Sλ/μ(X − AZk + k/μ) − X),

or, equivalently,

k − k+1

μ+ A(Zk − Zk+1) =

(AZk + Sλ/μ(X − AZk + k/μ) − X

).

By taking limit on both sides, it yields

AZ + E − X = 0. (3.12)

By using Zk+1 − Zk → 0, k j+1 = k j + (k j+1 − k j) → , Ek j+1 = Ek j +(Ek j+1 − Ek j) → E, the identity (3.12), and taking limit of (2.16b) on k j, wecan derive that

∈ ∂λ‖E‖2,1. (3.13)

In a similar way, by taking limit of (2.15b) on k j, we can also obtain

A� ∈ ∂‖Z‖∗. (3.14)

Therefore, follows from (3.12), (3.13) and (3.14), it is obvious that the clusterpoint (E, Z , ) satisfies the optimality condition (2.14a–2.14c).

4 Numerical experiments

The purpose of this section is to demonstrate the feasibility and efficiency ofthe proposed algorithm. For this purpose, we perform two types of experi-ments concentrating on a simulated data and a real dataset. In the first test,we also compare our algorithm to the state-of-the-art algorithm LRR. To runthe algorithm, we use Matlab codes provided by the authors to do comparison.All experiments are performed under Window XP Premium and Matlab v7.8(2009a) running on a Lenovo laptop with an Intel Atom CPU at 1.6 GHz and1 GB of memory.


4.1 Test on simulated data

In the first test, we verify the efficiency and stability of the proposed algorithmin a simulated data. Following the setting in [20], five independent subspaces{Si}5

i ∈ R100 are constructed, and whose bases {Ui}5

i are generated by Ui+1 =TUi, 1 ≤ i ≤ 4, where T denotes a random rotation and Ui a random orthogonalmatrix of dimension 100 × 4. Hence, each subspace has a rank 4 and the datahas an ambient dimension of 100. From each subspace, 40 data vectors aresampled by using X0

i = UiQi (1 ≤ i ≤ 5), where Qi being a 4 × 40 independentand identically distributed N (0, 1) matrix. In summary, each sample dataXi ∈ R

100×40, and the whole data matrix is formulated as X0 = [X01 , . . . , X0

5 ] ∈R

100×200 with rank r = 20. In this experiment, we consider the case when afraction (say Fr) of the data vectors is grossly corrupted by large noise whileothers are kept as noiseless. If the i-th column vector [X]:,i is chosen to becorrupted, its components are generated by adding Gaussian noise with zeromean and standard deviation 0.2 ∗ ‖[X0]:,i‖2. Hence, we have X = X0 + E0

with E0j,i = 0.2 ∗ ‖[X0]:,i‖2 if i-th column is chosen to be corrupted.

With the given noise contained observations X and a special designeddictionary A, our goal is to compute a block-diagonal affinity matrix Z ∗ andrecover the low-rank data matrix X∗ = AZ ∗ by utilizing algorithms PSLALand LRR. In this test, we take the data matrix itself as the dictionary matrix,i.e., A = X. We set the weighting parameter as λ = 0.1 to derive fruitfulsolutions, and choose τ = 1/(2λmax(A� A)) to ensure convergence. The valuesof multiplier μ is taken as a nondecreasing sequence 1e − 6 ≤ μi ≤ 1e + 10with relationship μi+1 = 1.1μi. In all the test results given below, we use thezero matrix as the starting point for both algorithms. As described in Section 2,we simply terminate the iterative process when the changes of two consecutiveiterations are sufficiently small, i.e.,

min{‖Zk+1 − Zk‖1, ‖k+1 − k‖1} ≤ ε, (4.1)

where ε > 0 is a tolerance and chosen as 1e − 6 in all the following test.Specifically, we also force the iterative process to stop if the number ofiterations exceeds 500 without achieving (4.1). The quality of restoration X∗is measured by means of the recovery error RecErr = ‖X∗ − X0‖1, where X0

is the original noiseless data matrix. Besides, the quality of the affinity matrixZ ∗ is also measured by the so-called segmentation accuracy, which is obtainedby computing the percentage of correctly classified data vectors (see [20] formore detail).

Firstly, we visually illustrate the behavior of the proposed algorithm. In thissimple test, we choose f r = 0.3, which means that 30% of the data vectorsare grossly corrupted. In running LRR, we use its Matlab package, which isavailable at http://sites.google.com/site/guangcanliu/, and set all the parameteras default except for λ = 0.1, which has been proved by extensive preparingexperiments as the best choice. The original noiseless data, the observationaldata, and the recovered data and the removed noise matrix by PSLAL are

http://sites.google.com/site/guangcanliu/

852 Y. Xiao et al.

presented in Fig. 1. Moreover, the affinity matrices produced by PSLAL andLRR are given in Fig. 2.

Comparing the top left plot to the bottom left one in Fig. 1, we observethat the recovered data X∗ has five separately parts like the original dataX0, which roughly verifies that the proposed algorithm could handle wellthe data drawn from a mixture of multiple subspace. When we turn to ourattention to compare the affinity matrices produced by LRR and PSLAL,which are respectively presented in Fig. 2, it is obvious that both affinitymatrices are block-diagonal, which further shows that both algorithms canreveal the segmentation of the data. However, this simple test also tells usthat the proposed algorithm requires a smaller number of iterations and fewercomputing time to derive comparable quality results. All together, this simpleexperiment shows that algorithm PSLAL performs quite well, and is superiorto the well-known algorithm LRR in recovering subspace structures from anobservations and removing noise.

Although the above limited test illustrates that our algorithm is very promis-ing, it is still not enough to conclude that PSLAL is the winner; for the reason,the settings of the problem may influence the algorithm’s performance. Tofurther verify the algorithm’s benefit, now, we change our attention to comparetheir performances in order to solve (1.2) at different noise levels. We constructthe testing data and then set of all the parameters’ values in the same way asthe previous test. To specifically illustrate the performance of each algorithm,we present four comparison results in terms of segmentation accuracy, recovererror, number of iterations and running time as fr varies from 0 to 1 in Fig. 3.

X X

X* E*

0

Fig. 1 Top left the original noiseless data matrix with 100 × 200 and 5 subspace; Top right thenoisy observations with 30% vectors are grossly corrupted; Bottom left recovery data matrix byPSLAL; Bottom right removed noise by PSLAL. In this figure, the equality X = X∗ + E∗ holds


Z* (LRR: iter=286, time=31.8) Z* (PSLAL: iter=210, time=18.8)

Fig. 2 Comparing the affinity matrices. Left the affinity matrix by LRR with 286 iterations and31.8 CPU time in seconds; Right the affinity matrix by PSLAL with 210 iterations and 18.8 CPUtime in seconds. In this test, the segmentation accuracy is 100%

As shown by the top row in Fig. 3, both algorithms exhibit good stabilityin sense that they attain the desired results with roughly the same accuracyover different noise levels. Moreover, as the number of the corrupted vectorsincrease, the recovery error becomes larger; meanwhile, the segmentationaccuracy becomes lower. It is also easy to observe that, although LRR canstably derive comparable resolutions, it consumes substantially more iterationsas the noise fraction increases. Moreover, from the bottom right plot, we cansee that the speed advantage of the PSLAL algorithm over LRR is significant.The PSLAL algorithm not only consume fewer iterations to obtain the sameaccuracy, but are also less computationally expensive at each iteration. There-fore, the PSLAL algorithm is much faster in terms of CPU time, especially inlarge scale case.

4.2 Test on real data sets

In this subsection, we illustrate PSLAL’s practical ability to handle largenoise contained real data set by using the widely-used Extended YaleFace Database B. The data set is available at http://vision.ucsd.edu/∼leekc/ExtYaleDatabase/ExtYaleB.html, which consists the frontal face im-ages of 28 human subjects under 9 poses and 64 illumination conditions. Thedata set partitions these images into 38 classes and each one contains 64 faceimages with size 192 × 168. In this test, we only consider the first 5 of them; inother words, there are 320 images used for experiments. For the inner memorylimitation, we resize the test images into 48 × 42 and pixels are re-scaledinto [0, 1]. Altogether, the noisy observations are drown from 5 independentsubspace and each one with dimensions 2016 and 64 of data vectors. For easy

http://vision.ucsd.edu/~leekc/ExtYaleDatabase/ExtYaleB.html

http://vision.ucsd.edu/~leekc/ExtYaleDatabase/ExtYaleB.html

854 Y. Xiao et al.

0 0.2 0.4 0.6 0.8 1

10–0.009

10–0.007

10–0.005

10–0.003

10–0.001

fr value

Seg

men

tatio

n ac

cura

cyPSLALLRR

0 0.2 0.4 0.6 0.8 110

2

103

104

fr value

Rec

Err

PSLALLRR

0 0.2 0.4 0.6 0.8 1

102.3

102.4

fr value

Itera

tions

PSLALLRR

0 0.2 0.4 0.6 0.8 110

0

101

102

fr value

CP

U T

ime

(sec

)

PSLALLRR

Fig. 3 Comparing the performance of LRR and PSLAR in sense of the final segmentationaccuracy (top left), and the recovery errors (top right), the number of iterations (bottom left), theCPU time (bottom right). At each plot, the x-axes represents the fraction of large noise

comprehension about the data set, which used in this test, we display the 64images in the second class in Fig. 4. As shown in this figure, almost a half ofthe data vectors are corrupted by shadows.

The test aims to segment noisy data matrix into their individual subspaceand correct the noise (shadows) as well. In PSLAL, we take the data matrixitself as the dictionary A. Unlike the experiments setting in prequel, in this test,the weighting parameter is chosen as λ = 1e + 3 and the penalty multipliers μ

Fig. 4 Frontal face images in the second class of extended yale database B


= + = +

Fig. 5 Some examples of using PSLAL to correct teh corruptions in face: The original data (firstand the forth column), the corrected data (the second and the fifth column), and the errors (thethird and the last column). In this test, the number of iterations is 200, the computing time is 240.6seconds, and the final segmentation accuracy is 85%

are taken as a nondecreasing sequence 1e − 2 ≤ μi ≤ 1e + 2 and μi+1 = 1.1μi.Moreover, the iterative process starts at zero points and terminates when (4.1)with ε = 1e − 5 is met. When the number iterations exceeds 200, the algorithmis also stopped. Some examples of the original corrupted data, the correcteddata by PSLAL, and the removed errors are presented in Fig. 5.

Although the face images in Yale Extended Yale Database B are arbi-trary corrupted by large noise, the experiments illustrate that our proposedalgorithm performed successfully in revealing the subspace structures andcorrecting the possible noise, which means that it derives relatively highsegmentation accuracy with 85%. In conclusion, the second class of the testfurther illustrates that our proposed method has the ability to recover both thesubspace structures and to remove the possible noise simultaneously.

5 Conclusions

Assuming that a given high-dimensional corrupted data lie near a much lowerdimensional subspace, it was recently shown that RPCA can efficiently andaccurately estimate this low-dimensional subspace and synchronously correctthe contaminated errors. However, when the data is drawn from a union ofmultiple subspaces, RPCA fails; for the reason it actually teats the data asa single subspace. The alterative subspace segmentation approach [8] workssuccessfully to reveal the respective structures, but it only handles the case

856 Y. Xiao et al.

when the data is noiseless. The very innovatory technique LRR can recoverthe subspace individually and correct possible noise as well by minimizing acombination of the nuclear norm and the �2,1-mixed norm. LRR minimizesan equivalent model by adding a new variable and a new constraint, and thusdivides the resulting minimization task into three separable subproblems oneach variable. However, at each iteration, LRR involves a matrix inverse,which makes it not scale well enough to handle very large problems. Despitethe surprisingly efficient, it is still not clear whether LRR converges globallywith additional conditions. Hence, an efficient algorithm which directly solvesthe original problem without any equivalent transformation is highly needed.

In this paper, we proposed, analyzed, and tested an augmented Lagrangianalternative direction algorithm to solve problem (1.2). Since we linearizedthe subproblem in order to easily derive the closed-form solutions, wenamed the proposed algorithm as Primal Splitting and Linearizing AugmentedLagrangian (abbreviated as PSLAL). At each iteration, the resulting subprob-lems of PSLAL with respect to each variable can be determined explicitlyby using shrinkage operators defined in Theorems 2.2 and 2.3. Moreover,PSLAL is easily performed provided that only two matrix-multiplicities withrespect to A and A� are involved at each step. Global convergence of thealgorithm is established under mild assumptions. Extensive numerical resultson a simulated data and the Extended Yale Database B showed that PSLALperformed promisingly and even nearly one-time faster than the well-knownsolver LRR.

The augmented Lagrangian alternating direction algorithm has been wellstudied and verified recently as a powerful algorithmic tool for solving struc-tured convex minimization problems arising in many applications. The papermainly emphasized the applicability of this type of algorithm for recoveringsubspace structures and correcting possible noise from the corrupted data;thus, it further broadened the university. Although the motivation is simple,the numerical experiments illustrate that our approach performs better thanLRR. Surely, this is the main numerical contribution of our paper. Further-more, the global convergence of the proposed algorithm is easily guaranteedunder some technical assumptions, and thus theoretically reinforces the pio-neering work LRR on this topic. Of course, this is another contribution of thepaper.

Acknowledgements The first author would like to thank Prof. Bingsheng He, Dr. Junfeng Yangand Min Tao of Nanjing University for their helpful suggestions. The work of Yunhai Xiao issupported in part by the Natural Science Foundation of China grant NSFC-11001075. The work ofDong-Hui Li is supported in part by NSFC-11071087.

References

1. Bertsekas, D.P., Tsitsiklis, J.N.: Parallel and distributed computation: numerical methods.PrenticeHall, Englewood Cliffs, NJ (1989)


2. Cai, J.F., Candès, E.J., Shen, Z.: A singular value thresholding algorithm for matrix comple-tion. SIAM J. Optim. 20, 1956–1982 (2010)

3. Candès, E.J., Recht, B.: Exact matrix completion via convex optimization. Found. Comput.Math. 9, 717–772 (2009)

4. Candès, E.J., Li, X., Ma, Y., Wright, J.: Robust principal component analysis? J. ACM 58, 1–37(2011)

5. Chen, C., He, B.S., Yuan, X.: Matrix completion via alternating direction method. IMA J.Numer. Anal. doi:10.1093/imanum/drq039

6. Deng, W., Yin, W., Zhang, Y.: Group sparse optimization by alternating direction method.Avaiable at http://www.optimization-online.org/DB_HTML/2011/04/3006.html

7. Duchi, J., Singer, Y.: Efficient online and batch learning using forward backward splitting. J.Mach. Learn. Res. 10, 2899–2934 (2009)

8. Elhamifar, E., Vidal, R.: Sparse subspace clustering. In: IEEE Conference on ComputerVision and Pattern Recongnition, vol. 2, pp. 2970–2997 (2009)

9. Esser, E.: Applications of Lagrangian-based alternating direction methods and connec-tions to split Bregman. TR. 09-31, CAM, UCLA. Available at ftp://ftp.math.ucla.edu/pub/camreport/cam09-31.pdf (2009)

10. Gabay, D., Mercier, B.: A dual algorithm for the solution of nonlinear variational problemsvia finite-element approximations. Comput. Math. Appl. 2, 17–40 (1976)

11. Glowinski, R., Marrocco, A.: Sur l’approximation, par éléments finis d’ordre un, et la réso-lution, par pénalisation-dualité d’une classe de problèmes de Dirichlet nonlinéaires. RevueFrancaise d’automatique, informatique, recherche opéretionnelle. Analyse numérique 2, 41–76 (1975)

12. Goldstein, T., Osher, S.: The split Bregman method for �1-regularized problems. SIAM J.Imag. Sci. 2, 323–343 (2009)

13. Hale, E.T., Yin, W., Zhang., Y.: Fixed-point continuation for �1-minimization: methodologyand convergence. SIAM J. Optim. 19, 1107–1130 (2008)

14. He, B.S., Tao, M., Yuan, X.M.: A splitting method for separate convex program-ming with linking linear constraints. Available at http://www.optimization-online.org/DB_HTML/2010/06/2665.html

15. He, B.S., Tao, M., Xu, M.H., Yuan, X.M.: Alternating directions based contraction methodfor generally separable linearly constrained convex programming problems. Available athttp://www.optimization-online.org/DB_HTML/2009/11/2465.html

16. He, B.S., Tao, M., Yuan, X.M.: A splitting method for separate convex program-ming with linking linear constraints. Available at http://www.optimization-online.org/DB_HTML/2010/06/2665.html

17. He, B.S., Xu, M.H., Yuan, X.M.: Solving large-scale least squares semidefinite programmingby alternating direction methods. SIAM. J. Matrix Anal. Appl. 32, 136–152 (2011)

18. Lin, Z., Chen, M., Wu, L., Ma, Y.: The augmented Lagrange multiplier method for exact recov-ery of corrupted low-rank matrices. Math. Program. Available at http://arxiv.org/abs/1009.5055

19. Lin, Z., Ganesh, A., Wright, J., Wu, L., Chen, M., Ma, Y.: Fast convex opti-mization algorithm for exact recovery of a corrupted low-rank matrix. Available athttp://yima.csl.illinois.edu/psfile/rpca_algorithms.pdf

20. G. Liu, Z. Lin, S. Yan, J. Sun, Y. Yu, Ma, Y.: Robust recovery of subspace structures by low-rank representation. Available at http://arxiv.org/abs/1010.2955

21. Ma, S., Goldfarb, D., Chen, L.: Fixed point and Bregman iterative methods for matrix rankminimization. Math. Program. doi:10.1007/s10107-009-0306-5

22. Nesterov, Y.: A method of solving a convex programming proble with convergence rateO(1/k2). Sov. Math., Dokl. 27, 372–376 (1983)

23. Rao, S., Tron, R., Vida, R., Ma, Y.: Motion segmentation in the presence of outlying, incom-plete, or corrupted trajectories. IEEE Trans. Pattern Anal. 32, 1832–1845 (2010)

24. Rockafellar, R.T.: Monotone operators and the proximal point algorithm. SIAM J. ControlOptim. 14, 877–898 (1998)

25. Steidl, G., Teuber, T.: Removing multiplicative noise by Douglas-Rachford splitting methods.J. Math. Imag. Vis. 36, 168–184 (2010)

26. Sturm, J.F.: Using SeDuMi 1.02, a Matlab toolbox for optimization over symmetric cones.Optim. Method. Softw. 11–12, 625–653 (1999)

http://dx.doi.org/10.1093/imanum/drq039

http://www.optimization-online.org/DB_HTML/2011/04/3006.html

ftp://ftp.math.ucla.edu/pub/camreport/cam09-31.pdf

ftp://ftp.math.ucla.edu/pub/camreport/cam09-31.pdf






http://arxiv.org/abs/1009.5055

http://yima.csl.illinois.edu/psf/ile/rpca_algorithms.pdf

http://arxiv.org/abs/1010.2955

http://dx.doi.org/10.1007/s10107-009-0306-5

858 Y. Xiao et al.

27. Sun, J., Zhang, S.: A modified alternating direction method for convex quadratically con-strained quadratic semidefinite programs. Eur. J. Oper. Res. 207, 1210–1220 (2010)

28. Tao, M., Yuan, X.M.: Recovering low-rank and sparse components of matrices from incom-plete and noisy observations. SIAM J. Optim. 21, 57–81 (2011)

29. Tseng, P.: On accelerated proximal gradient methods for convex-concave optimization. SIAMJ. Optim. (2008, submitted)

30. Tütüncü, R.H., Toh, K.C., Todd., M.J.: Solving semidefinite-quadratic-linear programs usingSDPT3. Math. Program. 95, 189–217 (2003)

31. Wang, Y., Yang, J., Yin, W., Zhang, Y.: A new alternating minimization algorithm for totalvariation image reconstruction. SIAM J. Imag. Sci. 1, 248–272 (2008)

32. Wen, Z., Goldfard, D. W. Yin: Alternating direction augmented Lagrangian methods forsemidefinite programming. Math. Prog. Comp. 2, 203–230 (2010)

33. Wright, J., Ma, Y., Ganesh, A., Rao, S.: Robust principal component analysis: exact recoveryof corrupted low-rank matrices via convex optimization. In: Proceedings of the Conference onNeural Information Processing Systems (NIPS), 2009.

34. Xiao, Y., Jin, Z.F.: An alternative direction method for linear constrained matrix nuclear normminimization. Numer. Linear Algebra Appl. doi:10.1002/nla.783

35. Xiao, Y., Song, H.N.: An inexact alternating directions algorithm for constrainedtotal variation regularized compressive sensing problems. J. Math. Imag. Vis.doi:10.1007/s10851-011-0314-y

36. Xiao, Y., Yang, J., Yuan, X.M.: Fast algorithms for total variation image reconstruction fromrandom projections. Available at http://arxiv.org/abs/1001.1774v1

37. Yang, J., Yuan, X.M.: Linearized augmented Lagrangian and alternating direction methodsfor nuclear norm minimization (submitted)

38. Yang, J., Zhang, Y.: Alternating direction algorithms for �1-problems in compressive sensing.SIAM J. Sci. Comput. 33, 250–278 (2011)

39. Yuan, X.M., Yang, J.: Sparse and low-rank matrix decomposition via alternating directionmethods. Pac. J. Optim. (2011, to appear)

http://dx.doi.org/10.1002/nla.783

http://dx.doi.org/10.1007/s10851-011-0314-y

http://arxiv.org/abs/1001.1774v1

Documents

Splitting and linearizing augmented Lagrangian algorithm for subspace recovery from corrupted observations