14
arXiv:1310.4261v3 [cs.IT] 19 Jun 2014 1 An Online Algorithm for Separating Sparse and Low-dimensional Signal Sequences from their Sum Han Guo, Chenlu Qiu, Namrata Vaswani, Dept of Electrical and Computer Engineering, Iowa State University, Ames, IA {hanguo,chenlu,namrata}@iastate.edu Abstract—This paper designs and extensively evaluates an online algorithm, called practical recursive projected compressive sensing (Prac-ReProCS), for recovering a time sequence of sparse vectors St and a time sequence of dense vectors Lt from their sum, Mt := St + Lt , when the Lt ’s lie in a slowly changing low- dimensional subspace of the full space. A key application where this problem occurs is in real-time video layering where the goal is to separate a video sequence into a slowly changing background sequence and a sparse foreground sequence that consists of one or more moving regions/objects on-the-fly. Prac-ReProCS is a practical modification of its theoretical counterpart which was analyzed in our recent work. Experimental comparisons demonstrating the advantage of the approach for both simulated and real videos, over existing batch and recursive methods, are shown. Extension to the undersampled case is also developed. I. I NTRODUCTION This paper designs and evaluates a practical algorithm for recovering a time sequence of sparse vectors S t and a time sequence of dense vectors L t from their sum, M t := S t + L t , when the L t ’s lie in a slowly changing low-dimensional subspace of R n . The magnitude of the entries of L t could be larger, roughly equal or smaller than that of the nonzero entries of S t . The extension to the undersampled case, M t := AS t + BL t , is also developed. The above problem can be interpreted as one of online/recursive sparse recovery from potentially large but structured noise. In this case, S t is the quantity of interest and L t is the potentially large but structured low-dimensional noise. Alternatively it can be posed as a recursive / online robust principal components analysis (PCA) problem. In this case L t , or in fact, the subspace in which it lies, is the quantity of interest while S t is the outlier. A key application where the above problem occurs is in video layering where the goal is to separate a slowly changing background from moving foreground objects/regions [4], [5]. The foreground layer, e.g. moving people/objects, is of interest in applications such as automatic video surveillance, tracking moving objects, or video conferencing. The background se- quence is of interest in applications such as background editing (video editing applications). In most static camera videos, the background images do not change much over time and hence the background image sequence is well modeled as lying in a fixed or slowly-changing low-dimensional subspace of R n [6], [5]. Moreover the changes are typically global, e.g. due A part of this work was presented at Allerton 2010, Allerton 2011 and ICASSP 2014 [1], [2], [3]. This research was partially supported by NSF grants CCF-0917015, CCF-1117125 and IIS-1117509. to lighting variations, and hence modeling it as a dense image sequence is valid too [5]. The foreground layer usually consists of one or more moving objects/persons/regions that move in a correlated fashion, i.e. it is a sparse image sequence that often changes in a correlated fashion over time. Other applications where the above problem occurs include solving the video layering problem from compressive video measurements, e.g. those acquired using a single-pixel camera; online detection of brain activation patterns from full or undersampled functional MRI (fMRI) sequences (the “active” part of the brain forms the sparse image, while the rest of the brain which does not change much over time forms the low-dimensional part); or sensor networks based detection and tracking of abnormal events such as forest fires or oil spills. The single pixel imaging and undersampled fMRI applications are examples of the compressive case, M t = AS t + BL t with B = A. Related Work. Most high dimensional data often approxi- mately lie in a lower dimensional subspace. Principal compo- nents’ analysis (PCA) is a widely used dimension reduction technique that finds a small number of orthogonal basis vectors (principal components), along which most of the variability of the dataset lies. For a given dimension, r, PCA finds the r- dimensional subspace that minimizes the mean squared error between data vectors and their projections into this subspace [7]. It is well known that PCA is very sensitive to outliers. Computing the PCs in the presence of outliers is called robust PCA. Solving the robust PCA problem recursively as more data comes in is referred to as online or recursive robust PCA. “Outlier” is a loosely defined term that usually refers to any corruption that is not small compared to the true signal (or data vector) and that occurs only occasionally. As suggested in [8], an outlier can be nicely modeled as a sparse vector. In the last few decades, there has been a large amount of work on robust PCA, e.g. [4], [9], [10], [11], [12], and recursive robust PCA e.g. [13], [14], [15]. In most of these works, either the locations of the missing/corruped data points are assumed known [13] (not a practical assumption); or they first detect the corrupted data points and then replace their values using nearby values [14]; or weight each data point in proportion to its reliability (thus soft-detecting and down- weighting the likely outliers) [4], [15]; or just remove the entire outlier vector [11], [12]. Detecting or soft-detecting outliers (S t ) as in [14], [4], [15] is easy when the outlier magnitude is large, but not when it is of the same order or smaller than that of the L t ’s. In a series of recent works [5], [16], a new and elegant

An Online Algorithm for Separating Sparse and Low ...arXiv:1310.4261v3 [cs.IT] 19 Jun 2014 1 An Online Algorithm for Separating Sparse and Low-dimensional Signal Sequences from their

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: An Online Algorithm for Separating Sparse and Low ...arXiv:1310.4261v3 [cs.IT] 19 Jun 2014 1 An Online Algorithm for Separating Sparse and Low-dimensional Signal Sequences from their

arX

iv:1

310.

4261

v3 [

cs.IT

] 19

Jun

201

41

An Online Algorithm for Separating Sparse andLow-dimensional Signal Sequences from their Sum

Han Guo, Chenlu Qiu, Namrata Vaswani,Dept of Electrical and Computer Engineering, Iowa State University, Ames, IA

{hanguo,chenlu,namrata}@iastate.edu

Abstract—This paper designs and extensively evaluates anonline algorithm, called practical recursive projected compressivesensing (Prac-ReProCS), for recovering a time sequence of sparsevectors St and a time sequence of dense vectorsLt from theirsum, Mt := St +Lt, when theLt ’s lie in a slowly changing low-dimensional subspace of the full space. A key application wherethis problem occurs is in real-time video layering where thegoalis to separate a video sequence into a slowly changing backgroundsequence and a sparse foreground sequence that consists ofone or more moving regions/objects on-the-fly. Prac-ReProCSis a practical modification of its theoretical counterpart whichwas analyzed in our recent work. Experimental comparisonsdemonstrating the advantage of the approach for both simulatedand real videos, over existing batch and recursive methods,areshown. Extension to the undersampled case is also developed.

I. I NTRODUCTION

This paper designs and evaluates a practical algorithm forrecovering a time sequence of sparse vectorsSt and a timesequence of dense vectorsLt from their sum,Mt := St +Lt,when the Lt’s lie in a slowly changing low-dimensionalsubspace ofRn. The magnitude of the entries ofLt couldbe larger, roughly equal or smaller than that of the nonzeroentries ofSt. The extension to the undersampled case,Mt :=ASt + BLt, is also developed. The above problem can beinterpreted as one of online/recursive sparse recovery frompotentially large but structured noise. In this case,St isthe quantity of interest andLt is the potentially large butstructured low-dimensional noise. Alternatively it can beposedas a recursive / online robust principal components analysis(PCA) problem. In this caseLt, or in fact, the subspace inwhich it lies, is the quantity of interest whileSt is the outlier.

A key application where the above problem occurs is invideo layering where the goal is to separate a slowly changingbackground from moving foreground objects/regions [4], [5].The foreground layer, e.g. moving people/objects, is of interestin applications such as automatic video surveillance, trackingmoving objects, or video conferencing. The background se-quence is of interest in applications such as background editing(video editing applications). In most static camera videos, thebackground images do not change much over time and hencethe background image sequence is well modeled as lying ina fixed or slowly-changing low-dimensional subspace ofR

n

[6], [5]. Moreover the changes are typically global, e.g. due

A part of this work was presented at Allerton 2010, Allerton 2011 andICASSP 2014 [1], [2], [3]. This research was partially supported by NSFgrants CCF-0917015, CCF-1117125 and IIS-1117509.

to lighting variations, and hence modeling it as a dense imagesequence is valid too [5]. The foreground layer usually consistsof one or more moving objects/persons/regions that move in acorrelated fashion, i.e. it is a sparse image sequence that oftenchanges in a correlated fashion over time. Other applicationswhere the above problem occurs include solving the videolayering problem from compressive video measurements, e.g.those acquired using a single-pixel camera; online detection ofbrain activation patterns from full or undersampled functionalMRI (fMRI) sequences (the “active” part of the brain formsthe sparse image, while the rest of the brain which does notchange much over time forms the low-dimensional part); orsensor networks based detection and tracking of abnormalevents such as forest fires or oil spills. The single pixelimaging and undersampled fMRI applications are examplesof the compressive case,Mt = ASt +BLt with B = A.

Related Work. Most high dimensional data often approxi-mately lie in a lower dimensional subspace. Principal compo-nents’ analysis (PCA) is a widely used dimension reductiontechnique that finds a small number of orthogonal basis vectors(principal components), along which most of the variability ofthe dataset lies. For a given dimension,r, PCA finds ther-dimensional subspace that minimizes the mean squared errorbetween data vectors and their projections into this subspace[7]. It is well known that PCA is very sensitive to outliers.Computing the PCs in the presence of outliers is called robustPCA. Solving the robust PCA problem recursively as moredata comes in is referred to as online or recursive robust PCA.“Outlier” is a loosely defined term that usually refers to anycorruption that is not small compared to the true signal (ordata vector) and that occurs only occasionally. As suggestedin [8], an outlier can be nicely modeled as a sparse vector.

In the last few decades, there has been a large amountof work on robust PCA, e.g. [4], [9], [10], [11], [12], andrecursive robust PCA e.g. [13], [14], [15]. In most of theseworks, either the locations of the missing/corruped data pointsare assumed known [13] (not a practical assumption); or theyfirst detect the corrupted data points and then replace theirvalues using nearby values [14]; or weight each data pointin proportion to its reliability (thus soft-detecting and down-weighting the likely outliers) [4], [15]; or just remove theentire outlier vector [11], [12]. Detecting or soft-detectingoutliers (St) as in [14], [4], [15] is easy when the outliermagnitude is large, but not when it is of the same order orsmaller than that of theLt’s.

In a series of recent works [5], [16], a new and elegant

Page 2: An Online Algorithm for Separating Sparse and Low ...arXiv:1310.4261v3 [cs.IT] 19 Jun 2014 1 An Online Algorithm for Separating Sparse and Low-dimensional Signal Sequences from their

2

solution to robust PCA called Principal Components’ Pursuit(PCP) has been proposed, that does not require a two stepoutlier location detection/correction process and also does notthrow out the entire vector. It redefines batch robust PCA as aproblem of separating a low rank matrix,Lt := [L1, . . . , Lt],from a sparse matrix,St := [S1, . . . , St], using the mea-surement matrix,Mt := [M1, . . . ,Mt] = Lt + St. Otherrecent works that also study batch algorithms for recoveringa sparseSt and a low-rankLt fromMt := Lt + St or fromundersampled measurements include [17], [18], [19], [20],[21], [22], [23], [24], [25], [26]. It was shown in [5] that bysolving PCP:

minL,S‖L‖∗ + λ‖S‖1 subject to L+ S =Mt (1)

one can recoverLt and St exactly, provided that (a)Lt is“dense”; (b) any element of the matrixSt is nonzero w.p. ,and zero w.p.1 − , independent of all others (in particular,this means that the support sets of the differentSt’s areindependent over time); and (c) the rank ofLt and the supportsize ofSt are small enough. Here‖A‖∗ is the nuclear normof a matrixA (sum of singular values ofA) while ‖A‖1 isthe ℓ1 norm ofA seen as a long vector.

Notice that most applications described above require anonline solution. A batch solution would need a long delay;and would also be much slower and more memory-intensivethan a recursive solution. Moreover, the assumption that theforeground support is independent over time is not usuallyvalid. To address these issues, in the conference versions ofthis work [1], [2], we introduced a novel recursive solutioncalled Recursive Projected Compressive Sensing (ReProCS).In recent work [27], [28], [29], we have obtained performanceguarantees for ReProCS. Under mild assumptions (denseness,slow enough subspace change ofLt and “some” supportchange at least everyh frames ofSt), we showed that, withhigh probability (w.h.p.), ReProCS can exactly recover thesupport set ofSt at all times; and the reconstruction errorsof bothSt andLt are upper bounded by a time invariant andsmall value. The work of [27], [28] contains a partial resultwhile [29] is a complete correctness result.

Contributions. The contributions of this work are as fol-lows. (1) We design a practically usable modification of theReProCS algorithm studied in [27], [28], [29]. By “practicallyusable”, we mean that (a) it requires fewer parameters andwe develop simple heuristics to set these parameters withoutany model knowledge; (b) it exploits practically motivatedassumptions and we demonstrate that these assumptions arevalid for real video data. While denseness and gradual supportchange are also used in earlier works - [5], [16] and [30]respectively -slow subspace changeis the key new (andvalid) assumption introduced in ReProCS. (2) We show viaextensive simulation and real video experiments that practical-ReProCS is more robust to correlated support change ofSt

than PCP and other existing work. Also, it is also able torecover small magnitude sparse vectors significantly betterthan other existing recursive as well as batch algorithms. (3)We also develop a compressive practical-ReProCS algorithmthat can recoverSt from Mt := ASt + BLt. In this caseAandB can be fat, square or tall.

More Related Work. Other very recent work on recursive/ online robust PCA includes [31], [32], [33], [34], [35].

Some other related work includes work that uses structuredsparsity models, e.g. [36]. For our problem, if it is known thatthe sparse vector consists of one or a few connected regions,these ideas could be incorporated into our algorithm as well.On the other hand, the advantage of the current approachthat only uses sparsity is that it works both for the case ofa few connected regions as well as for the case of multiplesmall sized moving objects, e.g. see the airport video results athttp://www.ece.iastate.edu/∼chenlu/ReProCS/VideoReProCS.htm.

Paper Organization.We give the precise problem definitionand assumptions in Sec II. The practical ReProCS algorithmis developed in Sec III. The algorithm for the compressivemeasurements’ case is developed in Sec IV. In Sec V, wedemonstrate using real videos that the key assumptions usedbyour algorithm are true in practice. Experimental comparisonson simulated and real data are shown in Sec VI. Conclusionsand future work are discussed in Sec VII.

A. Notation

For a setT ⊆ {1, 2, · · ·n}, we use |T | to denote itscardinality; and we useT c to denote its complement, i.e.T c := {i ∈ {1, 2, . . . n} : i /∈ T }. The symbols∪,∩, \ denoteset union set intersection and set difference respectively(recallT1 \ T2 := T1 ∩ T c

2 ). For a vectorv, vi denotes theith entryof v and vT denotes a vector consisting of the entries ofvindexed byT . We use‖v‖p to denote theℓp norm of v.The support ofv, supp(v), is the set of indices at whichvis nonzero, supp(v) := {i : vi 6= 0}. We say thatv is s-sparseif |supp(v)| ≤ s.

For a matrixB, B′ denotes its transpose, andB† denotesits pseudo-inverse. For a matrix with linearly independentcolumns,B† = (B′B)−1B′. The notation [.] denotes anempty matrix. We useI to denote an identity matrix. Foran m × n matrix B and an index setT ⊆ {1, 2, . . . n},BT is the sub-matrix ofB containing columns with indicesin the setT . Notice thatBT = BIT . We useB \ BT todenoteBT c . Given another matrixB2 of sizem×n2, [B B2]constructs a new matrix by concatenating matricesB andB2

in horizontal direction. Thus,[(B \BT ) B2] = [BT c B2]. We

use the notationBSV D= UΣV ′ to denote the singular value

decomposition (SVD) ofB with the diagonal entries ofΣbeing arranged in non-decreasing order.

The interval notation[t1, t2] := {t1, t1 + 1, · · · , t2} andsimilarly the matrix[Lt1 , . . . Lt2 ] := [Lt1 , Lt1+1, · · · , Lt2 ]

Definition 1.1: The s-restricted isometry constant (RIC)[37], δs, for an n ×m matrix Ψ is the smallest real numbersatisfying(1−δs)‖x‖22 ≤ ‖ΨTx‖22 ≤ (1+δs)‖x‖22 for all setsT with |T | ≤ s and all real vectorsx of length |T |.

Definition 1.2: For a matrixM ,• range(M) denotes the subspace spanned by the columns

of M .• M is a basis matrixif M ′M = I.• The notationQ = basis(range(M)), or Q = basis(M)

for short, means thatQ is a basis matrixfor range(M)i.e. Q satisfiesQ′Q = I and range(Q) = range(M).

Page 3: An Online Algorithm for Separating Sparse and Low ...arXiv:1310.4261v3 [cs.IT] 19 Jun 2014 1 An Online Algorithm for Separating Sparse and Low-dimensional Signal Sequences from their

3

Definition 1.3:

• The b% left singular values’ setof a matrix M is thesmallest set of indices of its singular values that containsat leastb% of the total singular values’ energy. In otherwords, if M

SVD= UΣV ′, it is the smallest setT such

that∑

i∈T (Σ)2i,i ≥ b

100

∑n

i=1(Σ)2i,i.

• The corresponding matrix of left singular vectors,UT , isreferred to as theb% left singular vectors’ matrix.

• The notation[Q,Σ] = approx-basis(M, b%) means thatQ is the b% left singular vectors’ matrix forM andΣis the diagonal matrix with diagonal entries equal to theb% left singular values’ set.

• The notationQ = approx-basis(M, r) means thatQcontains the left singular vectors ofM corresponding toits r largest singular values. This also sometimes referredto as:Q contains ther top singular vectors ofM .

II. PROBLEM DEFINITION AND ASSUMPTIONS

The measurement vector at timet, Mt, is ann dimensionalvector which can be decomposed as

Mt := St + Lt. (2)

Let Tt denote the support set ofSt, i.e.,

Tt := supp(St) = {i : (St)i 6= 0}.

We assume thatSt andLt satisfy the assumptions given belowin the next three subsections. Suppose that an initial trainingsequence which does not contain the sparse components isavailable, i.e. we are givenMtrain = [Mt; 1 ≤ t ≤ ttrain]with Mt = Lt. This is used to get an initial estimate of thesubspace in which theLt’s lie 1. At eacht > ttrain, the goal isto recursively estimateSt andLt and the subspace in whichLt lies. By “recursively” we mean: useSt−1, Lt−1 and theprevious subspace estimate to estimateSt andLt.

The magnitude of the entries ofLt may be small, of thesame order, or large compared to that of the nonzero entriesof St. In applications whereSt is the signal of interest, thecase when‖Lt‖2 is of the same order or larger than‖St‖2 isthe difficult case.

A key application where the above problem occurs is inseparating a video sequence into background and foregroundlayers. Let Imt denote the image at timet, Ft denote theforeground image att andBt the background image att, allarranged as 1-D vectors. Then, the image sequence satisfies

(Imt)i =

{

(Ft)i if i ∈ supp(Ft)(Bt)i if i /∈ supp(Ft)

(3)

In fMRI, Ft is the sparse active region image whileBt is thebackground brain image. In both cases, it is fair to assume thatan initial background-only training sequence is available. Forvideo this means there are no moving objects/regions in theforeground. For fMRI, this means some frames are capturedwithout providing any stimulus to the subject.

1If an initial sequence withoutSt’s is not available, one can use a batchrobust PCA algorithm to get the initial subspace estimate aslong as the initialsequence satisfies its required assumptions.

Let µ denote the empirical mean of the training backgroundimages. If we letLt := Bt−µ, Mt := Imt−µ, Tt := supp(Ft),and

(St)Tt:= (Ft −Bt)Tt

, (St)T ct:= 0,

then, clearly,Mt = St + Lt. Once we get the estimatesLt,St, we can also recover the foreground and background as

Bt = Lt+µ, Tt = supp(St), (Ft)Tt= (Imt)Tt

, (Ft)T ct= 0.

A. Slowly changing low-dimensional subspace change

We assume that forτ large enough, anyτ length subse-quence of theLt’s lies in a subspace ofRn of dimension lessthanmin(τ, n), and usually much less thanmin(τ, n). In otherwords, for τ large enough,maxt rank([Lt−τ+1, . . . Lt]) ≪min(τ, n). Also, this subspace is either fixed or changes slowlyover time.

One way to model this is as follows [27]. LetLt = PtatwherePt is ann× rt basis matrixwith rt ≪ n that is piece-wise constant with time, i.e.Pt = P(j) for all t ∈ [tj , tj+1)andP(j) changes as

P(j) = [(P(j−1)Rj \ P(j),old), P(j),new]

whereP(j),new andP(j),old are basis matrices of sizen×cj,new

andn×cj,old respectively withP ′(j),newP(j−1) = 0 andRj is a

rotation matrix. Moreover, (a)0 ≤∑ji=1(ci,new−ci,old) ≤ cdif ;

(b) 0 ≤ cj,new ≤ cmax < r0; (c) (tj+1 − tj) ≫ r0 + cdif; and(d) there are a total ofJ change times withJ ≪ (n − r0 −cdif)/cmax.

Clearly, (a) implies thatrt ≤ r0 + cdif := rmax and (d)implies thatrmax + Jcmax ≪ n. This, along with (b) and(c), helps to ensure that for anyτ > rmax + cmax, rt,τ :=maxt rank([Lt−τ+1, . . . Lt]) < min(τ, n), and forτ ≫ rmax+cmax, rt,τ ≪ min(τ, n) 2.

By slow subspace change, we mean that: fort ∈ [tj , tj+1),‖(I−P(j−1)P

′(j−1))Lt‖2 is initially small and increases grad-

ually. In particular, we assume that, fort ∈ [tj , tj + α),

‖(I − P(j−1)P′(j−1))Lt‖2 ≤ γnew≪ min(‖Lt‖2, ‖St‖2)

and increases gradually aftertj+α. One model for “increasesgradually” is as given in [27, Sec III-B]. Nothing in this paperrequires the specific model and hence we do not repeat it here.

The above piecewise constant subspace change model is asimplified model for what typically happens in practice. In

2 To address a reviewer comment, we explain this in detail here. Noticefirst that (c) implies that(tj+1 − tj) ≫ rmax. Also, (b) implies thatrank([Ltj , . . . Ltj+k−1]) ≤ rmax + (k − 1)cmax. First consider the casewhen botht − τ + 1 and t lie in [tj , tj+1 − 1]. In this case,rt,τ ≤ rmax

for any τ . Thus for anytj+1 − tj > τ ≫ rmax, rt,τ ≪ min(τ, n). Nextconsider the case whent−τ+1 ∈ [tj , tj+1−1] andt ∈ [tj+1, tj+2−1]. Inthis case,rt,τ ≤ rmax+cmax. Thus, for anytj+2−tj > τ ≫ rmax+cmax,rt,τ ≪ min(τ, n). Finally consider the case whent−τ +1 ∈ [tj , tj+1−1]and t ∈ [tj+k+1, tj+k+2 − 1] for a 0 < k < J − 1. In thiscase, τ can be rewritten asτ = (tj+k+1 − tj+1) + τ1 + τ2 withτ1 := tj+1 − (t − τ + 1) and τ2 := t − (tj+k+1 − 1). Clearly,rt,τ ≤ (rmax+(k−1)cmax)+min(τ1, cmax)+min(τ2, cmax) < krmax+min(τ1, cmax) + min(τ2, cmax) ≪ (tj+k+2 − tj+1) + min(τ1, cmax) +min(τ2, cmax) ≤ (tj+k+2 − tj+1) + τ1 + τ2 = τ . Moreover, rt,τ ≤rmax + (k + 1)cmax ≤ rmax + Jcmax ≪ n. Thus, in this case again forany τ , rt,τ ≪ min(τ, n).

Page 4: An Online Algorithm for Separating Sparse and Low ...arXiv:1310.4261v3 [cs.IT] 19 Jun 2014 1 An Online Algorithm for Separating Sparse and Low-dimensional Signal Sequences from their

4

most cases,Pt changes a little at eacht in such a way thatthe low-dimensional assumption approximately holds. If wetry to model this, it would result in a nonstationary modelthat is difficult to precisely define or to verify (it would requiremultiple video sequences of the same type to verify)3.

Since background images typically change only a little overtime (except in case of a camera viewpoint change or a scenechange), it is valid to model the mean-subtracted backgroundimage sequence as lying in a slowly changing low-dimensionalsubspace. We verify this assumption in Sec V.

B. Denseness

To state the denseness assumption, we first need to definethe denseness coefficient. This is a simplification of the oneintroduced in our earlier work [27].

Definition 2.1 (denseness coefficient):For a matrix or avectorB, define

κs(B) = κs(range(B)) := max|T |≤s

‖IT ′basis(B)‖2 (4)

where‖.‖2 is the vector or matrix2-norm. Recall that basis(B)is short for basis(range(B)). Similarly κs(B) is short forκs(range(B)). Notice thatκs(B) is a property of the subspacerange(B). Note also thatκs(B) is a non-decreasing functionof s and of rank(B).

We assume that the subspace spanned by theLt’s is dense,i.e.

κ2s(P(j)) = κ2s([Ltj , . . . Ltj+1−1]) ≤ κ∗

for a κ∗ significantly smaller than one. Moreover, a sim-ilar assumption holds forP(j),new with a tighter bound:κ2s(P(j),new) ≤ κnew < κ∗. This assumption is similar toone of the denseness assumptions used in [38], [5]. In [5],a bound is assumed onκ1(U) and κ1(V ) where U andV are the matrices containing the left and right singularvectors of the entire matrix,[L1, L2 . . . Lt]; and a tighterbound is assumed onmaxi,j |(UV ′)i,j |. In our notation,U = [P(0), P(1),new, . . . P(J),new].

The following lemma, proved in [27], relates the RIC ofI−PP ′, whenP is abasis matrix, to the denseness coefficientfor range(P ). Notice thatI−PP ′ is ann×n matrix that hasrank (n− rank(P )) and so it cannot be inverted.

Lemma 2.2:For abasis matrix, P ,

δs(I − PP ′) = κs(P )2

Thus, the denseness assumption implies that the RIC ofthe matrix (I − P(j)P

′(j)) is small. Using any of the RIC

based sparse recovery results, e.g. [39], this ensures thatfor t ∈ [tj , tj+1), s-sparse vectorsSt are recoverable from(I − P(j)P

′(j))Mt = (I − P(j)P

′(j))St by ℓ1 minimization.

Very often, the background images primarily change dueto lighting changes (in case of indoor sequences) or due tomoving waters or moving leaves (in case of many outdoorsequences) [5], [27]. All of these result in global changes andhence it is valid to assume that the subspace spanned by thebackground image sequences is dense.

3With letting at be a zero mean random variable with a covariance matrixthat is constant for sub-intervals within[tj , tj+1), the above model is apiecewise wide sense stationary approximation to the nonstationary model.

C. Small support size, some support change, small supportchange assumption onSt

Let the sets of support additions and removals be

∆t := Tt \ Tt−1, ∆e,t := Tt−1 \ Tt.

(1) We assume that

|Tt|+min(|Tt|, |∆t|+ |∆e,t|) ≤ s+ s∆ wheres∆ ≪ s

In particular, this implies that we either need|Tt| ≤ s and|∆t|+ |∆e,t| ≤ s∆ (St is sparse with support size at mosts,and its support changes slowly) or, in cases when the change|∆t| + |∆e,t| is large, we need|Tt| ≤ 0.5(s + s∆) (need atighter bound on the support size).

(2) We also assume that there issomesupport change everyfew frames, i.e. at least once everyh frames,|∆t| > s∆,min.Practically, this is needed to ensure that at least some ofthe background behind the foreground is visible so that thechanges to the background subspace can be estimated.

In the video application, foreground images typically consistof one or more moving objects/people/regions and hence aresparse. Also, typically the objects are not static, i.e. there issome support change at least every few frames. On the otherhand, since the objects usually do not move very fast, slowsupport change is also valid most of the time. The time whenthe support change is almost comparable to the support size isusually when the object is entering or leaving the image, butthese are the exactly the times when the object’s support sizeis itself small (being smaller than0.5(s+ s∆) is a valid). Weshow some verification of these assumptions in Sec V.

III. PRAC-REPROCS: PRACTICAL REPROCS

We first develop a practical algorithm based on the basicReProCS idea from our earlier work [27]. Then we discusshow the sparse recovery and support estimation steps canbe improved. The complete algorithm is summarized in Al-gorithm 1. Finally we discuss an alternate subspace updateprocedure in Sec III-D.

A. Basic algorithm

We useSt, Tt, Lt to denote estimates ofSt, its support,Tt,andLt respectively; and we usePt to denote thebasis matrixfor the estimated subspace ofLt at time t. Also, let

Φt := (I − Pt−1P′t−1) (5)

Given the initial training sequence which does not con-tain the sparse components,Mtrain = [L1, L2, . . . Lttrain] wecomputeP0 as an approximate basis forMtrain, i.e. P0 =approx-basis(Mtrain, b%). Let r = rank(P0). We need tocompute an approximate basis because for real data, theLt’sare only approximately low-dimensional. We useb% = 95%or b% = 99.99% depending on whether the low-rank part isapproximately low-rank or almost exactly low-rank. After this,at each timet, ReProCS involves 4 steps: (a) PerpendicularProjection; (b) Sparse Recovery (recoverTt and St); (c)RecoverLt; (d) Subspace Update (updatePt).

Page 5: An Online Algorithm for Separating Sparse and Low ...arXiv:1310.4261v3 [cs.IT] 19 Jun 2014 1 An Online Algorithm for Separating Sparse and Low-dimensional Signal Sequences from their

5

Perpendicular Projection. In the first step, at timet, weproject the measurement vector,Mt, into the space orthogonalto range(Pt−1) to get the projected measurement vector,

yt := ΦtMt. (6)

Sparse Recovery (RecoverTt and St). With the aboveprojection,yt can be rewritten as

yt = ΦtSt + βt whereβt := ΦtLt (7)

Because of the slow subspace change assumption, projectingorthogonal to range(Pt−1) nullifies most of the contributionof Lt and henceβt can be interpreted as small “noise”. Weexplain this in detail in Appendix A.

Thus, the problem of recoveringSt from yt becomes atraditional noisy sparse recovery / CS problem. Notice that,since then×n projection matrix,Φt, has rankn−rank(Pt−1),thereforeyt has only this many “effective” measurements, eventhough its length isn. To recoverSt from yt, one can useℓ1 minimization [40], [39], or any of the greedy or iterativethresholding algorithms from literature. In this work we useℓ1 minimization: we solve

minx‖x‖1 s.t. ‖yt − Φtx‖2 ≤ ξ (8)

and denote its solution bySt,cs. By the denseness assumption,Pt−1 is dense. SincePt−1 approximates it, this is true forPt−1

as well [27, Lemma 6.6]. Thus, by Lemma 2.2, the RIC ofΦt is small enough. Using [39, Theorem 1], this and the factthat βt is small ensures thatSt can be accurately recoveredfrom yt. The constraintξ used in the minimization shouldequal‖βt‖2 or its upper bound. Sinceβt is unknown we setξ = ‖βt‖2 whereβt := ΦtLt−1.

By thresholding onSt,cs to get an estimate of its supportfollowed by computing a least squares (LS) estimate ofSt onthe estimated support and setting it to zero everywhere else,we can get a more accurate estimate,St, as suggested in [41].We discuss better support estimation and its parameter settingin Sec III-C.

Recover Lt. The estimateSt is used to estimateLt asLt = Mt− St. Thus, ifSt is recovered accurately, so willLt.

Subspace Update (UpdatePt). Within a short delayafter every subspace change time, one needs to update thesubspace estimate,Pt. To do this in a provably reliablefashion, we introduced the projection PCA (p-PCA) algorithmin [27]. The algorithm studied there used knowledge of thesubspace change timestj and of the number of new directionscj,new. Let P(j−1) denote the final estimate of a basis forthe span ofP(j−1). It is assumed that the delay betweenchange times is large enough so thatP(j−1) is an accurateestimate. Att = tj + α − 1, p-PCA gets the first estimateof the new directions,P(j),new,1, by projecting the lastαLt’s perpendicular toP(j−1) followed by computing thecj,new

top left singular vectors of the projected data matrix. It thenupdates the subspace estimate asPt = [P(j−1), P(j),new,1].The same procedure is repeated at everyt = tj + kα − 1for k = 2, 3, . . .K and each time we update the subspaceas Pt = [P(j−1), P(j),new,k]. Here K is chosen so that thesubspace estimation error decays down to a small enough valuewithin K p-PCA steps.

In this paper, we design a practical version of p-PCA whichdoes not need knowledge oftj or cj,new. This is summarizedin Algorithm 1. The key idea is as follows. We letσmin be therth largest singular value of the training dataset. This servesasthe noise threshold for approximately low rank data. We splitprojection PCA into two phases: “detect subspace change”and “p-PCA”. We are in the detect phase when the previoussubspace has been accurately estimated. Denote the basismatrix for this subspace byP(j−1). We detect the subspacechange as follows. Everyα frames, we project the lastα Lt’sperpendicular toP(j−1) and compute the SVD of the resultingmatrix. If there are any singular values aboveσmin, this meansthat the subspace has changed. At this point, we enter the “p-PCA” phase. In this phase, we repeat theK p-PCA stepsdescribed above with the following change: we estimatecj,new

as the number of singular values aboveσmin, but clipped at⌈α/3⌉ (i.e. if the number is more than⌈α/3⌉ then we clip itto ⌈α/3⌉). We stop either when the stopping criterion givenin step 4(b)iv is achieved (k ≥ Kmin and the projection ofLt alongPnew,k is not too different from that alongPnew,k) orwhenk ≥ Kmax.

For the above algorithm, with theoretically motivatedchoices of algorithm parameters, under the assumptions fromSec II, it is possible to show that, w.h.p., the support ofSt isexactly recovered, the subspace ofLt’s is accurately recoveredwithin a finite delay of the change time. We provide a briefoverview of the proof from [27], [29] in Appendix A that helpsexplain why the above approach works.

Remark 3.1:The p-PCA algorithm only allows additionof new directions. If the goal is to estimate the span of[L1, . . . Lt], then this is what is needed. If the goal is sparserecovery, then one can get a smaller rank estimate ofPt by alsoincluding a step to delete the span of the removed directions,P(j),old. This will result in more “effective” measurementsavailable for the sparse recovery step and hence possibly inimproved performance. The simplest way to do this is to doone simple PCA step every some frames. In our experiments,this did not help much though. A provably accurate solutionis described in [27, Sec VII].

Remark 3.2:The p-PCA algorithm works on small batchesof α frames. This can be made fully recursive if we com-pute the SVD of(I − P(j−1)P

′(j−1))[Lt−α+1, . . . Lt] using

the incremental SVD (inc-SVD) procedure summarized inAlgorithm 2 [13] for one frame at a time. As explained in [13]and references therein, we can get the left singular vectorsand singular values of any matrixM = [M1,M2, . . .Mα]recursively by starting withP = [.], Σ = [.] and calling [P , Σ]= inc-SVD(P , Σ, Mi) for every columni or for short batchesof columns of size ofα/k. Since we useα = 20 which is asmall value, the use of incremental SVD does not speed upthe algorithm in practice and hence we do not report resultsusing it.

B. Exploiting slow support change when valid

In [27], [28], we always usedℓ1 minimization followedby thresholding and LS for sparse recovery. However if slowsupport change holds, one can replace simpleℓ1 minimization

Page 6: An Online Algorithm for Separating Sparse and Low ...arXiv:1310.4261v3 [cs.IT] 19 Jun 2014 1 An Online Algorithm for Separating Sparse and Low-dimensional Signal Sequences from their

6

Algorithm 1 Practical ReProCS-pPCA

Input: Mt; Output: Tt, St, Lt; Parameters:q, b, α,Kmin,Kmax. We usedα = 20,Kmin = 3,Kmax = 10 in all experiments(α needs to only be large compared tocmax); we usedb = 95 for approximately low-rank data (all real videos and the lakevideo with simulated foreground) and usedb = 99.99 for almost exactly low rank data (simulated data); we usedq = 1whenever‖St‖2 was of the same order or larger than‖Lt‖2 (all real videos and the lake video) and usedq = 0.25 when itwas much smaller (simulated data with small magnitudeSt).Initialization

• [P0, Σ0]← approx-basis( 1√ttrain

[M1, . . .Mttrain], b%).

• Set r ← rank(P0), σmin ← ((Σ0)r,r), t0 = ttrain, flag= detect• Initialize P(ttrain) ← P0 and Tt ← [.].

For t > ttrain do1) Perpendicular Projection: computeyt ← ΦtMt with Φt ← I − Pt−1P

′t−1

2) Sparse Recovery (RecoverSt andTt)

a) If |Tt−2∩Tt−1||Tt−2|

< 0.5

i) ComputeSt,cs as the solution of (8) withξ = ‖ΦtLt−1‖2.ii) Tt ← Thresh(St,cs, ω) with ω = q

‖Mt‖2/n. HereT ← Thresh(x, ω) means thatT = {i : |(x)i| ≥ ω}.Else

i) ComputeSt,cs as the solution of (9) withT = Tt−1, λ = |Tt−2\Tt−1||Tt−1|

, ξ = ‖ΦtLt−1‖2.

ii) Tadd← Prune(St,cs, 1.4|Tt−1|). HereT ← Prune(x, k) returns indices of thek largest magnitude elements ofx.iii) St,add← LS(yt,Φt, Tadd). Here x← LS(y,A, T ) means thatxT = (AT

′AT )−1AT

′y and xT c = 0.iv) Tt ← Thresh(St,add, ω) with ω = q

‖Mt‖2/n.

b) St ← LS(yt,Φt, Tt)

3) EstimateLt: Lt ←Mt − St

4) UpdatePt: projection PCA

a) If flag= detect and mod(t− tj + 1, α) = 0, (here mod(t, α) is the remainder whent is divided byα)

i) compute the SVD of 1√α(I − P(j−1)P

′(j−1))[Lt−α+1, . . . Lt] and check if any singular values are aboveσmin

ii) if the above number is more than zero then set flag← pPCA, incrementj ← j + 1, set tj ← t− α+ 1, resetk ← 1

Else Pt ← Pt−1.b) If flag = pPCA and mod(t− tj + 1, α) = 0,

i) compute the SVD of 1√α(I − P(j−1)P

′(j−1))[Lt−α+1, . . . Lt],

ii) let Pj,new,k retain all its left singular vectors with singular values above σmin or all α/3 top left singular vectorswhichever is smaller,

iii) update Pt ← [P(j−1) Pj,new,k], incrementk ← k + 1

iv) If k ≥ Kmin and‖∑

tt−α+1

(Pj,new,i−1P′

j,new,i−1−Pj,new,iP′

j,new,i)Lt‖2

‖∑

tt−α+1

Pj,new,i−1P′

j,new,i−1Lt‖2

< 0.01 for i = k − 2, k − 1, k; or k = Kmax,

thenK ← k, P(j) ← [P(j−1) Pj,new,K ] and reset flag← detect.

Else Pt ← Pt−1.

by modified-CS [30] which requires fewer measurementsfor exact/accurate recovery as long as the previous supportestimate,Tt−1, is an accurate enough predictor of the currentsupport,Tt. In our application,Tt−1 is likely to contain asignificant number of extras and in this case, a better idea isto solve the following weightedℓ1 problem [42], [43]

minxλ‖xT ‖1+‖xT c‖1 s.t.‖yt−Φtx‖2 ≤ ξ, T := Tt−1 (9)

with λ < 1 (modified-CS solves the above withλ = 0).Denote its solution bySt,cs. One way to pickλ is to let itbe proportional to the estimate of the percentage of extrasin Tt−1. If slow support change does not hold, the previoussupport estimate is not a good predictor of the current support.

In this case, doing the above is a bad idea and one shouldinstead solve simpleℓ1, i.e. solve (9) withλ = 1. As explainedin [43], if the support estimate contains at least50% correctentries, then weightedℓ1 is better than simpleℓ1. We use theabove criteria with true values replaced by estimates. Thus, if|Tt−2∩Tt−1|

|Tt−2|> 0.5, then we solve (9) withλ = |Tt−2\Tt−1|

|Tt−1|,

else we solve it withλ = 1.

C. Improved support estimation

A simple way to estimate the support is by thresholding thesolution of (9). This can be improved by using the Add-LS-Del procedure for support and signal value estimation [30].We proceed as follows. First we compute the setTt,add by

Page 7: An Online Algorithm for Separating Sparse and Low ...arXiv:1310.4261v3 [cs.IT] 19 Jun 2014 1 An Online Algorithm for Separating Sparse and Low-dimensional Signal Sequences from their

7

thresholding onSt,cs in order to retain itsk largest magnitudeentries. We then compute a LS estimate ofSt on Tt,add whilesetting it to zero everywhere else. As explained earlier, becauseof the LS step,St,add is a less biased estimate ofSt thanSt,cs.We let k = 1.4|Tt−1| to allow for a small increase in thesupport size fromt− 1 to t. A larger value ofk also makes itmore likely that elements of the set(Tt \ Tt−1) are detectedinto the support estimate4.

The final estimate of the support,Tt, is obtained by thresh-olding on St,add using a thresholdω. If ω is appropriatelychosen, this step helps to delete some of the extra elementsfrom Tadd and this ensures that the size ofTt does not keepincreasing (unless the object’s size is increasing). An LSestimate computed onTt gives us the final estimate ofSt,i.e. St = LS(yt, A, Tt). We useω =

‖Mt‖2/n exceptin situations where‖St‖ ≪ ‖Lt‖ - in this case we useω = 0.25

‖Mt‖2/n. An alternate approach is to letω beproportional to the noise magnitude seen by theℓ1 step, i.e. tolet ω = q‖βt‖∞, however this approach required differentvalues of q for different experiments (it is not possible tospecify oneq that works for all experiments).

The complete algorithm with all the above steps is summa-rized in Algorithm 1.

Algorithm 2 [P , Σ] = inc-SVD(P , Σ, D)

1) setD‖,proj ← P ′D andD⊥ ← (I − P P ′)D

2) compute QR decomposition ofD⊥, i.e. D⊥QR= JK

(hereJ is a basis matrixandK is an upper triangularmatrix)

3) compute the SVD:

[

Σ D‖,proj0 K

]

SVD= P ΣV ′

4) updateP ← [P J ]P and Σ← Σ

Note: As explained in [13], due to numerical errors, step 4done too often can eventually result inP no longer beinga basis matrix. This typically occurs when one tries to useinc-SVD at every timet, i.e. whenD is a column vector.This can be addressed using the modified Gram Schmidt re-orthonormalization procedure whenever loss of orthogonalityis detected [13].

D. Simplifying subspace update: simple recursive PCA

Even the practical version of p-PCA needs to setKmin andKmax besides also settingb andα. Thus, we also experimentwith using PCA to replace p-PCA (it is difficult to prove aperformance guarantee with PCA but that does not necessarilymean that the algorithm itself will not work). The simplestway to do this is to compute the topr left singular vectors of[L1, L2, . . . Lt] either at each timet or everyα frames. Whilethis is simple, its complexity will keep increasing with time twhich is not desirable. Even if we use the lastd frames insteadof all past frames,d will still need to be large compared to

4Due to the larger weight on the‖x(Tct−1

)‖1 term as compared to that on

the‖x(Tt−1)

‖1 term, the solution of (9) is biased towards zero onT ct−1 and

thus the solution values along(Tt \ Tt−1) are smaller than the true ones.

r to get an accurate estimate. To address this issue, we canuse the recursive PCA (incremental SVD) algorithm given inAlgorithm 2. We give the complete algorithm that uses thisand a rankr truncation step everyd frames (motivated by[13]) in Algorithm 3.

Algorithm 3 Practical ReProCS-Recursive-PCA

Input: Mt; Output: Tt, St, Lt; Parameters: q, b, α, setα =20 in all experiments, setq, b as explained in Algorithm 1.Initialization: [P0,Σ0]← approx-basis([M1, . . .Mttrain], b%),r ← rank(P0), d ← 3r, initialize Ptmp ← P0, Σtmp ← Σ0,P(ttrain) ← P0 and Tt ← [.]. For t > ttrain do

1) Perpendicular Projection: do as in Algorithm 1.2) Sparse Recovery: do as in Algorithm 1.3) EstimateLt: do as in Algorithm 1.4) UpdatePt: recursive PCA

a) If mod(t− ttrain, α) = 0,

i) [Ptmp, Σtmp] ← inc-SVD(Ptmp, Σtmp,[Lt−α+1, . . . Lt]) where inc-SVD is given inAlgorithm 2.

ii) Pt ← (Ptmp)1:r

Else Pt ← Pt−1.b) If mod(t− ttrain, d) = 0,

i) Ptmp ← (Ptmp)1:r and Σtmp ← (Σtmp)1:r,1:r

IV. COMPRESSIVEMEASUREMENTS: RECOVERINGSt

Consider the problem of recoveringSt from

Mt := ASt +BLt

whenA andB arem × n andm × n2 matrices,St is annlength vector andLt is ann2 length vector. In generalm canbe larger, equal or smaller thann or n2. In the compressivemeasurements’ case,m < n. To specify the assumptionsneeded in this case, we need to define the basis matrix forrange(BP(j)) and we need to define a generalization of thedenseness coefficient.

Definition 4.1: Let Qj := basis(BP(j)) and letQj,new :=basis((I −Qj−1Q

′j−1)BP(j)).

Definition 4.2: For a matrix or a vectorM , define

κs,A(M) = κs,A(range(M)) := max|T |≤s

‖AT′basis(M)‖2 (10)

where‖.‖2 is the vector or matrix2-norm. This quantifies theincoherence between the subspace spanned by any set ofscolumns ofA and the range ofM .

We assume the following.

1) Lt andSt satisfy the assumptions of Sec II-A, II-C.2) The matrixA satisfies the restricted isometry property

[39], i.e. δs(A) ≤ δ∗ ≪ 1.3) The denseness assumption is replaced by:κ2s,A(Qj) ≤

κ∗, κ2s,A(Qj,new) ≤ κnew < κ∗ for a κ∗ that is smallcompared to one. Notice that this depends on theLt’sand on the matricesA andB.

Assume that we are given an initial training sequence thatsatisfiesMt = BLt for t = 1, 2, . . . ttrain. The goal is to

Page 8: An Online Algorithm for Separating Sparse and Low ...arXiv:1310.4261v3 [cs.IT] 19 Jun 2014 1 An Online Algorithm for Separating Sparse and Low-dimensional Signal Sequences from their

8

Algorithm 4 Compressive ReProCSUse Algorithm 1 or 3 with the following changes.

• ReplaceΦt in step 2 byΦtA.

• Replace step 3 byLt ←Mt −ASt.

• Use ˆLt in place ofLt and Q in place ofP everywhere.

recoverSt at each timet. It is not possible to recoverLt

unlessB is time-varying (this case is studied in [44]). In manyimaging applications, e.g. undersampled fMRI or single-pixelvideo imaging,B = A (B = A is a partial Fourier matrixfor MRI and is a random Gaussian or Rademacher matrix forsingle-pixel imaging). On the other hand, ifLt is large butlow-dimensional sensor noise, thenB = I (identity matrix),while A is the measurement matrix.

Let Lt := BLt. It is easy to see that ifLt lies in a slowlychanging low-dimensional subspace, the same is also true forthe sequenceLt. Consider the problem of recoveringSt fromMt := ASt + Lt when an initial training sequenceMt := Lt

for t = 1, 2, . . . ttrain is available. Using this sequence, it ispossible to estimate its approximate basisQ0 as explainedearlier. If we then projectMt into the subspace orthogonal torange(Q0), the resulting vectoryt := ΦtMt satisfies

yt = (ΦtA)St + βt

where βt = ΦtBLt is small noise for the same reasonsexplained earlier. Thus, one can still recoverSt from yt byℓ1 or weightedℓ1 minimization followed by support recovery

and LS. Then,Lt gets recovered asLt ←Mt−ASt and thisis used for updating its subspace estimate. We summarize theresulting algorithm in Algorithm 4. This is being analyzed inongoing work [45].

The following lemma explains why some of the extraassumptions are needed for this case.

Lemma 4.3:[45] For abasis matrix, Q,

δs((I −QQ′)A) ≤ κs,A(Q)2 + δs(A)

Using the above lemma withQ ≡ Qj , it is clear thatincoherence ofQj w.r.t. any set of2s columns ofA along withRIP of A ensures that anys sparse vectorx can be recoveredfrom y := (I−QjQ

′j)Ax by ℓ1 minimization. In compressive

ReProCS, the measurement matrix usesQj instead ofQj andalso involves small noise. With more work, these argumentscan be extended to this case as well [see [45]].

V. M ODEL VERIFICATION

A. Low-dimensional and slow subspace change assumption

We used two background image sequence datasets. The firstwas a video of lake water motion. For this sequence,n = 6480and the number of images were 1500. The second was anindoor video of window curtains moving due to the wind.There was also some lighting variation. The latter part of thissequence also contains a foreground (various persons comingin, writing on the board and leaving). For this sequence, theimage size wasn = 5120 and the number of background-only images were 1755. Both sequences are posted athttp://www.ece.iastate.edu/∼hanguo/PracReProCS.html.

First note that any given background image sequence willnever be exactly low-dimensional, but only be approximatelyso. Secondly, in practical data, the subspace does not justchange as simply as in the model of Sec II-A. Typically thereare some changes to the subspace at every timet. Moreover,with just one training sequence of a given type, it is notpossible to estimate the covariance matrix ofLt at eacht andthus one cannot detect the subspace change times. The onlything one can do is to assume that there may be a changeevery τ frames, and that during theseτ frames theLt’s arestationary and ergodic; estimate the covariance matrix ofLt

for this period using a time average; compute its eigenvectorscorresponding tob% energy (or equivalently compute theb%approximate basis of[Lt−τ+1, . . . Lt]) and use these asP(j).These can be used to test our assumptions.

Testing for slow subspace change can be done in variousways. In [27, Fig 6], we do this after low-rankifying thevideo data first. This helps to very clearly demonstrate slowsubspace change, but then it is not checking what we reallydo on real video data. In this work, we proceed without low-rankifying the data. We lett0 = 0 and tj = t0 + jτ withτ = 725. LetLt denote the mean subtracted background imagesequence, i.e.Lt = Bt − µ whereµ = (1/t1)

∑t1t=0 Bt. We

computedP(j) asP(j) = approx-basis([Ltj , ...Ltj+1−1], 95%).We observed that rank(P(j)) ≤ 38 for curtain sequence, whilerank(P(j)) ≤ 33 for lake sequence. In other words, 95% ofthe energy is contained in only 38 or lesser directions in eithercase, i.e. both sequences are approximately low-dimensional.Notice that the dimension of the matrix[Ltj , ...Ltj+1−1] isn × τ and min(n, τ) = τ = 725 is much larger than38. To test for slow subspace change, in Fig. 1a, we plot‖(I − P(j−1)P

′(j−1))Lt‖2/‖Lt‖2 when t ∈ [tj , tj+1). Notice

that, after every change time (tj = 725, 1450), this quantity isinitially small for the first 100-150 frames and then increasesgradually. It later decreases also but that is allowed (all weneed is that it be small initially and increase slowly).

B. Denseness assumption

Exactly verifying the denseness assumption is impossiblesince computingκs(.) has exponential complexity (one needsto check all setsT of size s). Instead, to get some idea ifit holds even just forT replaced byTt, in Fig. 1b, we plotmaxi ‖ITt

′(P(j))i‖2 whereTt is the true or estimated supportof St at time t. For the lake sequence,Tt is simulated andhence known. For the curtain sequence, we select a part ofthe sequence in which the person is wearing a black shirt(against a white curtains’ background). This part correspondsto t = 35 to t = 80. For this part, ReProCS returns a veryaccurate estimate ofTt, and we use this estimated support asa proxy for the true supportTt.

C. Support size, support change and slow support change

For real video sequences, it is not possible to get the trueforeground support. Thus we usedTt for the part of the curtainsequence described above in Sec V-B as a proxy forTt. Weplot the support size normalized by the image size|Tt|/n,and we plot the number of additions and removals normalized

Page 9: An Online Algorithm for Separating Sparse and Low ...arXiv:1310.4261v3 [cs.IT] 19 Jun 2014 1 An Online Algorithm for Separating Sparse and Low-dimensional Signal Sequences from their

9

by the support size, i.e.|∆t|/|Tt| and |∆e,t|/|Tt| in Fig. 1c.Notice from the figure that the support size is at most 10.2%of the image size. Notice also that at least at every 3 frames,there is at least a 1% support change. Thus there is somesupport change every few frames, thus exposing the part ofthe background behind the foreground. Finally notice that themaximum number of support changes is only 9.9% of thesupport size, i.e. slow support change holds for this piece.

VI. SIMULATION AND EXPERIMENTAL RESULTS

In this section, we show comparisons of ReProCS withother batch and recursive algorithms for robust PCA. Forimplementing theℓ1 or weightedℓ1 minimizations, we usedthe YALL1 ℓ1 minimization toolbox [46], its code is availableat http://yall1.blogs.rice.edu/.

Code and data for our algorithms and forall experiments given below is available athttp://www.ece.iastate.edu/∼hanguo/ReProCSdemo.rar.

Simulated Data. In this experiment, the measurement attime t, Mt := Lt + St, is ann× 1 vector withn = 100. WegeneratedLt using the autoregressive model described in [1]with auto-regression parameter 0.1, and the decay parameterfd = 0.1. The covariance of a direction decayed to zerobefore being removed. There was one change timet1. Fort < t1, Pt = P0 was a rankr0 = 20 matrix andCov(at) wasa diagonal matrix with entries104, 0.7079 × 104, 0.70792 ×104, · · · , 14.13. At t = t1, c = c1,new = 2 new directions,P1,new, got added withCov(at,new) being diagonal with entries60 and 50. Also, the variance along two existing directionsstarted to decay to zero exponentially. We usedttrain = 2000and t1 = ttrain + 5. The matrix [P0 P1,new] was generated asthe first 22 columns of ann× n random orthonormal matrix(generated by first generating ann×n matrix random Gaussianmatrix and then orthonormalizing it). For1 ≤ t ≤ ttrain, St = 0and henceMt = Lt. For t > ttrain, the support set,Tt, wasgenerated in a correlated fashion:St contained one block ofsize 9 or 27 (small and large support size cases). The blockstayed static with probability0.8 and move up or down by onepixel with probability 0.1 each independently at each time.Thus the support sets were highly correlated. The magnitudeof the nonzero elements ofSt is fixed at either 100 (large) or10 (small).

For the small magnitudeSt case,‖Lt‖2 ranged from 150to 250 while ‖St‖2 was equal to 30 and 52, i.e. in thiscase‖St‖2 ≪ ‖Lt‖2. For the large magnitude case,‖St‖2was 300 and 520. We implemented ReProCS (Algorithm 1)with b = 99.99 since this data is exactly low-rank. Weusedq = 0.25 for the small magnitudeSt case andq = 1for the other case. We compared with three recursive robustPCA methods – incremental robust subspace learning (iRSL)[15] and adapted (outlier-detection enabled) incrementalSVD(adapted-iSVD) [13] and GRASTA [31] – and with two batchmethods – Principal Components’ Pursuit (PCP) [5]5 and

5We use the Accelerated Proximal Gradient algorithm[47] andInexact ALMalgorithm [48] (designed for large scale problems) to solvePCP (1). The codeis available at http://perception.csl.uiuc.edu/matrix-rank/samplecode.html.

robust subspace learning (RSL)6 [4]. Results are shown inTable I.

From these experiments, we can conclude that ReProCS isable to successfully recover both small magnitude and fairlylarge support-sizedSt’s; iRSL has very bad performance inboth cases; RSL, PCP and GRASTA work to some extentin certain cases, though not as well as ReProCS. ReProCSoperates by first approximately nullifyingLt, i.e. computingyt as in (6), and then recoveringSt by exploiting its sparsity.iRSL and RSL also computeyt the same way, but they directlyuseyt to detect or soft-detect (and down-weight) the supportof St by thresholding. Recall thatyt can be rewritten asyt = St + (−Pt−1P

′t−1St) + βt. As the support size ofSt

increases, the interference due to(−Pt−1P′t−1St) becomes

larger, resulting in wrong estimates ofSt. For the same reason,direct thresholding is also difficult when some entries ofSt aresmall while others are not. Adapted-iSVD is our adaptation ofiSVD [13] in which we use the approach of iRSL describedabove to provide the outlier locations to iSVD (iSVD is analgorithm for recursive PCA with missing entries or whatcan be called recursive low-rank matrix completion). It fillsin the corrupted locations ofLt by imposing thatLt liesin range(Pt−1). We used a threshold of0.5mini∈Tt

|(St)i|for both iRSL and adapted-iSVD (we also tried various otheroptions for thresholds but with similar results). Since adapted-iSVD and iRSL are recursive methods, a wrongSt, in turn,results in wrong subspace updates, thus also causingβt tobecome large and finally causing the error to blow up.

RSL works to some extent for larger support size ofSt’sbut fails when the magnitude of the nonzeroSt’s is small.PCP fails since the support sets are generated in a highlycorrelated fashion and the support sizes are large (resulting inthe matrixSt being quite rank deficient also). GRASTA [31] isa recent recursive method from 2012. It was implemented us-ing code posted at https://sites.google.com/site/hejunzz/grasta.We tried two versions of GRASTA: the demo code as itis and the demo code modified so that it used as muchinformation as ReProCS used (i.e. we used all available framesfor training instead of just 100; we used all measurementsinstead of just 20% randomly selected pixels; and we usedr returned by ReProCS as the rank input instead of usingrank=5 always). In this paper, we show the latter case. Bothexperiments are shown on our supplementary material pagehttp://www.ece.iastate.edu/∼hanguo/PracReProCS.html.

Partly Simulated Video: Lake video with simulated fore-ground. In the comparisons shown next, we only comparewith PCP, RSL and GRASTA. To address a reviewer comment,we also compare with the batch algorithm of [34], [35](referred to as MG in the figures) implemented using codeprovided by the authors. There was not enough informationin the papers or in the code to successfully implement therecursive algorithm.

We implemented ReProCS (Algorithm 1 and Algorithm 3)with b = 95 since the videos are only approximately low-rankand we usedq = 1 since the magnitude ofSt is not small

6The code of RSL is available athttp://www.salleurl.edu/∼ftorre/papers/rpca/rpca.zip.

Page 10: An Online Algorithm for Separating Sparse and Low ...arXiv:1310.4261v3 [cs.IT] 19 Jun 2014 1 An Online Algorithm for Separating Sparse and Low-dimensional Signal Sequences from their

10

compared to that ofLt. The performance of both ReProCS-pPCA (Algorithm 1) and ReProCS-recursive-PCA (Algorithm3) was very similar. Results with using the latter are shown inFig. 6.

We used the lake sequence described earlier to serve as areal background sequence. Foreground consisting of a rect-angular moving object was overlaid on top of it using (3).The use of a real background sequence allows us to evaluateperformance for data that only approximately satisfies the low-dimensional and slow subspace change assumptions. The useof the simulated foreground allows us to control its intensityso that the resultingSt is small or of the same order asLt

(making it a difficult sequence), see Fig. 2b.The foregroundFt was generated as follows. For1 ≤

t ≤ ttrain, Ft = 0. For t > ttrain, Ft consists of a45 × 25moving block whose centroid moves horizontally according toa constant velocity model with small random acceleration [49,Example V.B.2]. To be precise, letpt be the horizontal locationof the block’s centroid at timet, let vt denote its horizontal

velocity. Thengt :=

[

ptvt

]

satisfiesgt = Ggt−1+

[

0nt

]

where

G :=

[

1 10 1

]

andnt is a zero mean truncated Gaussian with

varianceQ and with −2√Q < |nt| < 2

√Q. The nonzero

pixels’ intensity is i.i.d. over time and space and distributedas uniform(b1, b2), i.e. (Ft)i ∼ uniform(b1, b2) for i ∈ Tt.In our experiments, we generated the data withttrain = 1420,b1 = 170, b2 = 230, pt0+1 = 27, vt0+1 = 0.5 andQ = 0.02.With these values ofb1, b2, as can be seen from Fig. 2b,‖St‖2 is roughly equal or smaller than‖Lt‖2 making it adifficult sequence. Since it is not much smaller, ReProCS usedq = 1; since background data is approximately low-rank itusedb = 95.

We generated 50 realizations of the video sequence usingthese parameters and compared all the algorithms to estimateSt, Lt and then the foreground and the background sequences.We show comparisons of the normalized mean squared error(NMSE) in recoveringSt in Fig. 2a. Visual comparisons ofboth foreground and background recovery for one realizationare shown in Fig. 3. The recovered foreground image is shownas a white-black image showing the foreground support: pixelsin the support estimate are white. PCP gives large error for thissequence since the object moves in a highly correlated fashionand occupies a large part of the image. GRASTA also does notwork. RSL is able to recover a large part of the object correctly,however it also recovers many more extras than ReProCS. Thereason is that the magnitude of the nonzero entries ofSt isquite small (recall that(St)i = (Ft − Bt)i for i ∈ Tt) andis such that‖Lt‖2 is about as large as‖St‖2 or sometimeslarger (see Fig. 2b).

Real Video Sequences.Next we show comparisons ontwo real video sequences. These are originally taken fromhttp://perception.i2r.a-star.edu.sg/bkmodel/bk index.htmland http://research.microsoft.com/en-us/um/people/jckrumm/wallflower/testimages.htm,respectively. The first is the curtain sequence described earlier.For t > 1755, in the foreground, a person with a black shirtwalks in, writes on the board and then walk out, then asecond person with a white shirt does the same and then a

third person with a white shirt does the same. This video ischallenging because (i) the white shirt color and the curtains’color is quite similar, making the correspondingSt small inmagnitude; and (ii) because the background variations arequite large while the foreground person moves slowly. As canbe seen from Fig. 4, ReProCS’s performance is significantlybetter than that of the other algorithms for both foregroundand background recovery. This is most easily seen from therecovered background images. One or more frames of thebackground recovered by PCP, RSL and GRASTA containsthe person, while none of the ReProCS ones does.

The second sequence consists of a person entering a roomcontaining a computer monitor that contains a white movingregion. Background changes due to lighting variations and dueto the computer monitor. The person moving in the foregroundoccupies a very large part of the image, so this is an example ofa sequence in which the use of weightedℓ1 is essential (thesupport size is too large for simpleℓ1 to work). As can beseen from Fig. 5, for most frames, ReProCS is able to recoverthe person correctly. However, for the last few frames whichconsist of the person in a white shirt in front of the white partof the screen, the resultingSt is too small even for ReProCSto correctly recover. The same is true for the other algorithms.Videos of all above experiments and of a few others are postedat http://www.ece.iastate.edu/∼hanguo/PracReProCS.html.

Time Comparisons.The time comparisons are shown inTable II. In terms of speed, GRASTA is the fastest even thoughits performance is much worse. ReProCS is the second fastest.We expect that ReProCS can be speeded up by using mex files(C/C++ code) for the subspace update step. PCP and RSL areslower because they jointly process the entire image sequence.Moreover, ReProCS and GRASTA have the advantage of beingrecursive methods, i.e. the foreground/background recovery isavailable as soon as a new frame appears while PCP or RSLneed to wait for the entire image sequence.

Compressive ReProCS: comparisons for simulated video.We compare compressive ReProCS with SpaRCS [20] whichis a batch algorithm for undersampled robust PCA / separationof sparse and low-dimensional parts(its code is downloadedfrom http://www.ece.rice.edu/∼aew2/sparcs.html). No code isavailable for most of the other compressive batch robust PCAalgorithms such as [24], [25]. SpaRCS is a greedy approachthat combines ideas from CoSaMP [50] for sparse recoveryand ADMiRA [51] for matrix completion. The comparisonis done for compressive measurements of the lake sequencewith foreground simulated as explained earlier. The matrixB = A is m×n random Gaussian withm = 0.7n. Recall thatn = 6480. The SpaRCS code required the background datarank and foreground sparsity as inputs. For rank, we usedrreturned by ReProCS, for sparsity we used the true size of thesimulated foreground. As can be seen from Fig. 7, SpaRCSdoes not work while compressive ReProCS is able to recoverSt fairly accurately, though of course the errors are larger thanin the full sampled case. All experiments shown in [20] arefor very slow changing backgrounds and for foregrounds withvery small support sizes, while neither is true for our data.

Page 11: An Online Algorithm for Separating Sparse and Low ...arXiv:1310.4261v3 [cs.IT] 19 Jun 2014 1 An Online Algorithm for Separating Sparse and Low-dimensional Signal Sequences from their

11

(a) (St)i = 100 for i ∈ Tt and (St)i = 0 for i ∈ T ct

E‖S − S‖2F /E‖S‖2F E‖O − O‖2F /E‖O‖2F|Tt|/n ReProCS-pPCA RSL PCP GRASTA adapted-iSVD iRSL9% 1.52× 10−4 0.0580 0.0021 3.75× 10−4 0.0283 0.910527% 1.90× 10−4 0.0198 0.6852 0.1043 0.0637 0.9058

(b) (St)i = 10 for i ∈ Tt and (St)i = 0 for i ∈ T ct

E‖S − S‖2F /E‖S‖2F E‖O − O‖2F /E‖O‖2F|Tt|/n ReProCS-pPCA RSL PCP GRASTA adapted-iSVD iRSL9% 0.0344 8.7247 0.2120 0.1390 0.2346 0.973927% 0.0668 3.3166 0.6456 0.1275 0.3509 0.9778

TABLE I: Comparison of reconstruction errors of different algorithms for simulated data. Here,|Tt|/n is the sparsity ratio ofSt, E[.]denotes the Monte Carlo average computed over 100 realizations and‖.‖F is the Frobenius norm of a matrix. Also,S = [S1, S2, . . . Stmax ]and S is its estimate;(Ot)i = (Mt)i if i ∈ Tt and (Ot)i = 0 otherwise andOt is defined similarly with the estimates.O and O are thecorresponding matrices. We show error forO for iRSL and adapted-iSVD since these algorithms can only return an estimate of the outliersupportTt; they do not return the background estimate.

800 900 1000 1100 1200 1300 1400 1500 1600 17000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

time

‖(I

−Pj−

1P

′ j−1)L

t‖/‖Lt‖

curtainlake

(a)

35 40 45 50 55 60 65 70 75 800.2

0.25

0.3

0.35

0.4

0.45

0.5

0.55

0.6

0.65

0.7

time

max‖I′ Tt(P

j) i‖

curtainlake

(b)

30 40 50 60 70 800

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0.2

time

norm

aliz

ed s

ize

Tt/n|∆t|/|Tt||∆e,t |/|Tt|

(c)

Fig. 1: (a) Verification of slow subspace change assumption. (b) Verification of denseness assumption. (c) Verification of smallsupport size,small support change

0 10 20 30 40 50 60 70 800

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

time

NM

SE

ReProCS−pPCAReProCS−Recursive−PCAPCPRSLGRASTAMG

(a)

0 10 20 30 40 50 60 70 80200

400

600

800

1000

1200

1400

1600

time

Nor

m

‖St‖

‖Lt‖

(b)

Fig. 2: Experiments on partly simulated video. (a) Normalized meansquared error in recoveringSt for realizations. (b) Comparison of‖St‖2 and‖Lt‖ for one realization. MG refers to the batch algorithm of [34], [35] implemented using code provided by the authors. Therewas not enough information in the papers or in the code to successfully implement the recursive algorithm.

DataSet Image Size Sequence Length ReProCS-pPCA ReProCS-Recursive-PCA PCP RSL GRASTALake 72× 90 1420 + 80 2.99 + 19.97 sec 2.99 + 19.43 sec 245.03 sec 213.36 sec 39.47 + 0.42 sec

Curtain 64× 80 1755 + 1209 4.37 + 159.02 sec 4.37 + 157.21 sec 1079.59 sec 643.98 sec 40.01 + 5.13 secPerson 120× 160 200 + 52 0.46 + 42.43 sec 0.46 + 41.91 sec 27.72 sec 121.31 sec 13.80 + 0.64 sec

TABLE II: Comparison of speed of different algorithms. Experiments were done on a 64 bit Windows 8 laptop with 2.40GHz i7 CPUand 8G RAM. Sequence length refers to the length of sequence for training plus the length of sequence for separation. For ReProCS andGRASTA, the time is shown as training time + recovery time.

Page 12: An Online Algorithm for Separating Sparse and Low ...arXiv:1310.4261v3 [cs.IT] 19 Jun 2014 1 An Online Algorithm for Separating Sparse and Low-dimensional Signal Sequences from their

12

original ReProCS PCP RSL GRASTA MG ReProCS PCP RSL GRASTA MG(fg) (fg) (fg) (fg) (fg) (bg) (bg) (bg) (bg) (bg)

Fig. 3: Original video att = ttrain+30, 60, 70 and its foreground (fg) and background (bg) layer recovery results using ReProCS (ReProCS-pCA) and other algorithms. MG refers to the batch algorithm of [34], [35] implemented using code provided by the authors.There was notenough information in the papers or in the code to successfully implement the recursive algorithm. For fg, we only show the fg support inwhite for ease of display.

original ReProCS PCP RSL GRASTA MG ReProCS PCP RSL GRASTA MG(fg) (fg) (fg) (fg) (fg) (bg) (bg) (bg) (bg) (bg)

Fig. 4: Original video sequence att = ttrain + 60, 120, 199, 475, 1148 and its foreground (fg) and background (bg) layer recovery resultsusing ReProCS (ReProCS-pCA) and other algorithms. For fg, we only show the fg support in white for ease of display.

original ReProCS PCP RSL GRASTA MG ReProCS PCP RSL GRASTA MG(fg) (fg) (fg) (fg) (fg) (bg) (bg) (bg) (bg) (bg)

Fig. 5: Original video sequence att = ttrain+42, 44, 52 and its foreground (fg) and background (bg) layer recovery results using ReProCS(ReProCS-pCA) and other algorithms. For fg, we only show thefg support in white for ease of display.

(a) t = 30, 60, 70 (b) t = 60, 120, 475 (c) t = 42, 44, 52

Fig. 6: Foreground layer estimated by ReProCS-Recursive-PCA for the lake, curtain and person videos shown in Figs 3, 4 and 5. As canbe seen the recovery performance is very similar to that of ReProCS-pPCA (Algorithm 1).

Page 13: An Online Algorithm for Separating Sparse and Low ...arXiv:1310.4261v3 [cs.IT] 19 Jun 2014 1 An Online Algorithm for Separating Sparse and Low-dimensional Signal Sequences from their

13

original ReProCS SparCS

Fig. 7:Original video frames att = ttrain+30, 60, 70 and foregroundlayer recovery by ReProCS and SparCS.

VII. C ONCLUSIONS ANDFUTURE WORK

This work designed and evaluated Prac-ReProCS which isa practically usable modification of its theoretical counterpartthat was studied in our earlier work [27], [28], [29]. Weshowed that Prac-ReProCS has excellent performance forboth simulated data and for a real application (foreground-background separation in videos) and the performance is betterthan many of the state-of-the-art algorithms from recent work.Moreover, most of the assumptions used to obtain its guaran-tees are valid for real videos. Finally we also proposed andevaluated a compressive prac-ReProCS algorithm. In ongoingwork, on one end, we are working on performance guaranteesfor compressive ReProCS [45] and on the other end, we aredeveloping and evaluating a related approach for functionalMRI. In fMRI, one is allowed to change the measurementmatrix at each time. However if we replaceB = A by At

in Sec IV the compressive ReProCS algorithm does not applybecauseAtLt is not low-dimensional [44].

APPENDIX

A. Detailed Discussion of Why ReProCS Works

Define the subspace estimation error as

SE(P, P ) := ‖(I − P P ′)P‖2where bothP and P are basis matrices.Notice that thisquantity is always between zero and one. It is equal to zerowhen range(P ) contains range(P ) and it is equal to one ifP ′Pi = 0 for at least one column ofP .

Recall that for t ∈ [tj , tj+1 − 1], Pt = [P(j−1)Rj \P(j),old, P(j),new]. Thus,Lt can be rewritten as

Lt = P(j−1)at,∗ + P(j),newat,new

whereat,new := P ′(j),newLt andat,∗ := P ′

(j−1)Lt.Let c := cmax and rj := r0 + jc. We explain here the key

idea of why ReProCS works [27], [29]. Assume the followinghold besides the assumptions of Sec II.

1) Subspace change is detected immediately, i.e.tj = tjandcj,new is known.

2) Pick aζ ≪ 1. Assume that‖Lt‖2 ≤ γ∗ for a γ∗ thatsatisfiesγ∗ ≤ 1/

√rJζ. Sinceζ is very small,γ∗ can be

very large.3) Assume that(tj+1−tj) ≥ Kα for aK as defined below.

4) Assume the following model on the gradual increase ofat,new: for t ∈ [tj + (k − 1)α, tj + kα− 1], ‖at,new‖2 ≤vk−1γnew for a 1 < v ≤ 1.2 andγnew≪ γ∗.

5) Assume that projection PCA “works” i.e. its estimatessatisfy SE(P(j),new, P(j),new,k−1) ≤ 0.6k−1 +0.4cζ. Theproof of this statement is long and complicated and isgiven in [27], [29].

6) Assume that projection PCA is doneK times withKchosen so that0.6K−1 + 0.4cζ ≤ cζ.

Assume that att = tj − 1, SE(P(j−1), P(j−1)) ≤ rj−1ζ ≪1. We will argue below that SE(P(j), P(j)) ≤ rjζ. Sincerj ≤r0 + Jc is small, this error is always small and bounded.

First consider at ∈ [tj , tj + α). At this time, Pt = P(j−1).Thus,

‖βt‖2 = ‖(I − Pt−1P′t−1)Lt‖2

≤SE(P(j−1), P(j−1))‖at,∗‖2 + ‖at,new‖2≤ (rj−1ζ)γ∗ + γnew

≤√

ζ + γnew (11)

By construction,√ζ is very small and hence the second term

in the bound is the dominant one. By the slow subspaceassumptionγnew≪ ‖St‖2. Recall thatβt is the “noise” seenby the sparse recovery step. The above shows that this noiseis small compared to‖St‖2. Moreover, using Lemma 2.2 andsimple arguments [see [27, Lemma 6.6]], it can be shown that

δs(Φt) ≤ κ2∗ + rj−1ζ

is small. These two facts along with any RIP-based resultfor ℓ1 minimization, e.g. [39], ensure thatSt is recoveredaccurately in this step. If the smallest nonzero entry ofSt

is large enough, it is possible get a support thresholdω thatensures exact support recovery. Then, the LS step gives a veryaccurate final estimate ofSt and it allows us to get an exactexpression foret := St− St. SinceLt = Mt− St, this meansthatLt is also recovered accurately andet = Lt−Lt. This isthen used to argue that p-PCA att = tj + α− 1 “works”.

Next considert ∈ [tj +(k− 1)α, tj +kα− 1]. At this time,Pt = [P(j−1), P(j),new,k−1]. Then, it is easy to see that

‖βt‖2 = ‖(I − Pt−1P′t−1)Lt‖2

≤SE(P(j−1), P(j−1))‖at,∗‖2 + SE(P(j),new, P(j),new,k−1)‖at,new‖2≤ (rj−1ζ) γ∗ + (0.6k−1 + 0.4cζ) vk−1γnew

≤√

ζ + 0.72k−1γnew (12)

Ignoring the first term, in this interval,‖βt‖2 ≤ 0.72k−1γnew,i.e. the noise seen by the sparse recovery step decreasesexponentially with every p-PCA step. This, along with a boundon δs(Φt) (this bound needs a more complicated argumentthan that fork = 1, see [27, Lemma 6.6]), ensures thatthe recovery error ofSt, and hence also ofLt = Mt − St,decreases roughly exponentially withk. This is then used toargue that the p-PCA error also decays roughly exponentiallywith k.

Finally for t ∈ [tj+Kα, tj+1−1], because of the choice ofK, we have that SE(P(j),new, P(j),new,K) ≤ cζ. At this time,we set P(j) = [P(j)−1, P(j),new,K ]. Thus, SE(P(j), P(j)) ≤

Page 14: An Online Algorithm for Separating Sparse and Low ...arXiv:1310.4261v3 [cs.IT] 19 Jun 2014 1 An Online Algorithm for Separating Sparse and Low-dimensional Signal Sequences from their

14

SE(P(j−1), P(j)−1)+SE(P(j),new, P(j),new,K) ≤ rj−1ζ+ cζ =rjζ.

REFERENCES

[1] C. Qiu and N. Vaswani, “Real-time robust principal components’pursuit,” in Allerton Conf. on Communications, Control and Computing,2010.

[2] C. Qiu and N. Vaswani, “Recursive sparse recovery in large butcorrelated noise,” inAllerton Conf. on Communication, Control, andComputing, 2011.

[3] H. Guo, C. Qiu, and N. Vaswani, “Practical reprocs for separating sparseand low-dimensional signal sequences from their sum - part 1,” in IEEEIntl. Conf. Acoustics, Speech, Sig. Proc. (ICASSP), 2014.

[4] F. De La Torre and M. J. Black, “A framework for robust subspacelearning,” International Journal of Computer Vision, vol. 54, pp. 117–142, 2003.

[5] E. J. Candes, X. Li, Y. Ma, and J. Wright, “Robust principal componentanalysis?,”Journal of ACM, vol. 58, no. 3, 2011.

[6] J. Wright and Y. Ma, “Dense error correction via l1-minimization,”IEEE Trans. on Info. Th., vol. 56, no. 7, pp. 3540–3560, 2010.

[7] Jolliffe I.T., Principal Component Analysis, Springer, second edition,2002.

[8] J. Wright and Y. Ma, “Dense error correction via l1-minimization,”IEEE Trans. Info. Th., vol. 56, no. 7, pp. 3540–3560, July 2010.

[9] S. Roweis, “Em algorithms for pca and spca,”Advances in NeuralInformation Processing Systems, pp. 626–632, 1998.

[10] T. Zhang and G. Lerman, “A novel m-estimator for robust pca,”arXiv:1112.4863v3, to appear in Journal of Machine Learning Research,2013.

[11] H. Xu, C. Caramanis, and S. Sanghavi, “Robust pca via outlier pursuit,”IEEE Tran. on Information Theory, vol. 58, no. 5, 2012.

[12] Michael McCoy and Joel A Tropp, “Two proposals for robust pca usingsemidefinite programming,”Electronic Journal of Statistics, vol. 5, pp.1123–1160, 2011.

[13] M. Brand, “Incremental singular value decomposition of uncertain datawith missing values,” inEuropean Conference on Computer Vision,2002, pp. 707–720.

[14] D. Skocaj and A. Leonardis, “Weighted and robust incremental methodfor subspace learning,” inIEEE Intl. Conf. on Computer Vision (ICCV),Oct 2003, vol. 2, pp. 1494 –1501.

[15] Y. Li, L. Xu, J. Morphett, and R. Jacobs, “An integrated algorithm ofincremental and robust pca,” inIEEE Intl. Conf. Image Proc. (ICIP),2003, pp. 245–248.

[16] V. Chandrasekaran, S. Sanghavi, P. A. Parrilo, and A. S.Willsky, “Rank-sparsity incoherence for matrix decomposition,”SIAM Journal onOptimization, vol. 21, 2011.

[17] Michael B McCoy and Joel A Tropp, “Sharp recovery boundsforconvex deconvolution, with applications,”arXiv:1205.1580, acceptedto J. Found. Comput. Math, 2012.

[18] Venkat Chandrasekaran, Benjamin Recht, Pablo A. Parrilo, and Alan S.Willsky, “The convex geometry of linear inverse problems,”Foundationsof Computational Mathematics, , no. 6, 2012.

[19] Yue Hu, Sajan Goud, and Mathews Jacob, “A fast majorize-minimizealgorithm for the recovery of sparse and low-rank matrices,” IEEETransactions on Image Processing, vol. 21, no. 2, pp. 742=753, Feb2012.

[20] A. E. Waters, A. C. Sankaranarayanan, and R. G. Baraniuk, “Sparcs:Recovering low-rank and sparse matrices from compressive measure-ments,” inProc. of Neural Information Processing Systems(NIPS), 2011.

[21] Emile Richard, Pierre-Andre Savalle, and Nicolas Vayatis, “Estimationof simultaneously sparse and low rank matrices,”arXiv:1206.6474,appears in Proceedings of the 29th International Conference on MachineLearning (ICML 2012).

[22] Daniel Hsu, Sham M Kakade, and Tong Zhang, “Robust matrixdecomposition with sparse corruptions,”Information Theory, IEEETransactions on, vol. 57, no. 11, pp. 7221–7234, 2011.

[23] Morteza Mardani, Gonzalo Mateos, and Georgios B. Giannakis, “Re-covery of low-rank plus compressed sparse matrices with application tounveiling traffic anomalies,”IEEE Trans. Info. Th., 2013.

[24] John Wright, Arvind Ganesh, Kerui Min, and Yi Ma, “Compressiveprincipal component pursuit,”Information and Inference, vol. 2, no. 1,pp. 32–68, 2013.

[25] Arvind Ganesh, Kerui Min, John Wright, and Yi Ma, “Principalcomponent pursuit with reduced linear measurements,” inInformationTheory Proceedings (ISIT), 2012 IEEE International Symposium on.IEEE, 2012, pp. 1281–1285.

[26] Min Tao and Xiaoming Yuan, “Recovering low-rank and sparsecomponents of matrices from incomplete and noisy observations,” SIAMJournal on Optimization, vol. 21, no. 1, pp. 57–81, 2011.

[27] C. Qiu, N. Vaswani, B. Lois, and L. Hogben, “Recursive robust pca orrecursive sparse recovery in large but structured noise,”under revisionfor IEEE Trans. Info. Th., also at arXiv: 1211.3754[cs.IT],shorterversions in ICASSP 2013 and ISIT 2013.

[28] C. Qiu and N. Vaswani, “Recursive sparse recovery in large butstructured noise – part 2,” inIEEE Intl. Symp. Info. Th. (ISIT), 2013.

[29] “Blinded title,” in Double blind conference submission, 2014.[30] N. Vaswani and W. Lu, “Modified-cs: Modifying compressive sensing

for problems with partially known support,” IEEE Trans. SignalProcessing, September 2010.

[31] Jun He, Laura Balzano, and Arthur Szlam, “Incremental gradient onthe grassmannian for online foreground and background separation insubsampled video,” inIEEE Conf. on Comp. Vis. Pat. Rec. (CVPR),2012.

[32] J. Feng, H. Xu, and S. Yan, “Online robust pca via stochasticoptimization,” in Adv. Neural Info. Proc. Sys. (NIPS), 2013.

[33] J. Feng, H. Xu, S. Mannor, and S. Yan, “Online pca for contaminateddata,” in Adv. Neural Info. Proc. Sys. (NIPS), 2013.

[34] G. Mateos and G. Giannakis, “Robust pca as bilinear decompositionwith outlier-sparsity regularization,”IEEE Trans. Sig. Proc., Oct 2012.

[35] Morteza Mardani, Gonzalo Mateos, and G Giannakis, “Dynamicanomalography: Tracking network anomalies via sparsity and low rank,”J. Sel. Topics in Sig. Proc., Feb 2013.

[36] Kui Jia, Tsung-Han Chan, and Yi Ma, “Robust and practical facerecognition via structures sparisty,” inEur. Conf. on Comp. Vis. (ECCV),2012.

[37] E. Candes and T. Tao, “Decoding by linear programming,”IEEE Trans.Info. Th., vol. 51(12), pp. 4203 – 4215, Dec. 2005.

[38] Emmanuel J. Candes and Benjamin Recht, “Exact matrix completionvia convex optimization,”Commun. ACM, vol. 55, no. 6, pp. 111–119,2012.

[39] E. Candes, “The restricted isometry property and its implications forcompressed sensing,”Compte Rendus de l’Academie des Sciences, Paris,Serie I, pp. 589–592, 2008.

[40] S. Chen, D. Donoho, and M. Saunders, “Atomic decomposition by basispursuit,” SIAM Journal of Scientific Computing, vol. 20, pp. 33–61,1998.

[41] E. Candes and T. Tao, “The dantzig selector: statistical estimation whenp is much larger than n,”Annals of Statistics, vol. 35 (6), pp. 2313–2351,2007.

[42] A. Khajehnejad, W. Xu, A. Avestimehr, and B. Hassibi, “Weightedℓ1minimization for sparse recovery with prior information,”in IEEE Intl.Symp. Info. Th. (ISIT), 2009.

[43] M.P. Friedlander, H. Mansour, R. Saab, and O. Yilmaz, “Recoveringcompressively sampled signals using partial support information,” IEEETrans. Info. Th., vol. 58, no. 2, pp. 1122–1134, 2012.

[44] J. Zhan and N. Vaswani, “Separating sparse and low-dimensional signalsequence from time-varying undersampled projections of their sums,”in IEEE Intl. Conf. Acoustics, Speech, Sig. Proc. (ICASSP), 2013.

[45] B. Lois, N. Vaswani, and C. Qiu, “Performance guarantees for un-dersampled recursive sparse recovery in large but structured noise,” inGlobalSIP, 2013.

[46] Junfeng Yang and Yin Zhang, “Alternating direction algorithms for l1problems in compressive sensing,” Tech. Rep., Rice University, June2010.

[47] Z. Lin, A. Ganesh, J. Wright, L. Wu, M. Chen, and Y. Ma, “Fastconvex optimization algorithms for exact recovery of a corrupted low-rank matrix,” Tech. Rep., University of Illinois at Urbana-Champaign,August 2009.

[48] Zhouchen Lin, Minming Chen, and Yi Ma, “Alternating directionalgorithms for l1 problems in compressive sensing,” Tech. Rep.,University of Illinois at Urbana-Champaign, November 2009.

[49] H. Vincent Poor, An Introduction to Signal Detection and Estimation,Springer, second edition, 1994.

[50] D. Needell and J.A. Tropp., “Cosamp: Iterative signal recovery fromincomplete and inaccurate samples,”Appl. Comp. Harmonic Anal., vol.26(3), pp. 301–321, May 2009.

[51] K. Lee and Y. Bresler, “Admira: Atomic decomposition for minimumrank approximation,” IEEE Transactions on Information Theory, vol.56, no. 9, September 2010.