A multilevel iterated-shrinkage approach to l 1 penalized least-squares

A Multilevel Approach for Sparse & Redundant Representation of Signals

A multilevel iterated-shrinkage approach to l1 penalized least-squaresEran Treister and Irad YavnehComputer Science, Technion(with thanks to Michael Elad)1Part I: BackgroundSparse Representation of SignalsApplicationsProblem DefinitionExisting Methods2Example: Image Denoising3fyvNoisy SignalAdditive NoiseSignalDenoising

+=3

Many denoisingalgorithms minimize:

Example: Image Denoising

relation to prior or measurement regularization4fyv+=4Sparse Representation ModelingAxf=5Dictionary(matrix)Sparse RepresentationSignalThe signal f represented by only a few columns of A.The matrix A is redundant (# columns > # rows).

5

Sparse Representation ModelingAx xS Support sub-vectorAS -Support sub-matrix6The support the set of columns that comprise the signal. S = supp{x} = {i : xi 0}. f=6Denoising by sparse representationAxy7+vAdditive Noise fReconstruct clean signal f from noisy yImageNoisy Signal

7ApplicationsDe-noising.De-blurring.In-painting.De-mosaicing.ComputedTomography.Image scale-up & super-resolutionAnd more

88Formulation 1The straightforward way to formulate sparse representation: By constrained minimization

The problem is not convex and may have many local minima.Solution approximated by greedy algorithms.

99Formulation 2 - Basis PursuitA relaxation of the previous problem:

||x||1 l1 norm; minimizer x is typically sparse.The problem is convex; has a convex set of equivalent solutions.10

10Alternative Formulation 2l1 penalized least-squares F (x) convex.Bigger sparser minimizer. Gradient is discontinuous for xi = 0.

General purpose optimization tools struggle.

11

11Iterated Shrinkage MethodsBound-Optimization and EM [Figueiredo & Nowak, `03].Surrogate-Separable-Function (SSF) [Daubechies et al., `04].Parallel-Coordinate-Descent (PCD) [Elad `05], [Matalon, et.al. `06].IRLS-based algorithm [Adeyemi & Davies, `06].Gradient Projection Sparse Reconstruction (GPSR)[M. Figueiredo et.al. `07].Sparse Reconstruction by separable approx.(SpaRSA)[Wright et al. `09]12

12Iterated ShrinkageCoordinate Descent (CD) Updates each scalar variable in turn so as to minimize the objective.Parallel Coordinate Descent (PCD)Applies the CD update simultaneously to all variables.Based on the projection of the residual: AT(Ax-y).PCD + (non-linear) Conjugate Gradient (CG-PCD) [Zibulevski, Elad `10]. Uses two consecutive PCD steps to calculate the next one.1313Part II:A Multilevel Algorithm forSparse Representation

14

The main idea:Use existing iterated shrinkage methods (as relaxations).Improve the current approximation by using a reduced (lower-level) dictionary.

1515

The main idea:Reducing the dimension of AThe solution is sparse most columns will not end up in the support!At each stage: Many columns are highly unlikely to contribute to the minimizer.Such columns can be temporarily dropped resulting in a smaller problem.

1616Reducing the dimension of A

17

C : lower-levelsubset of columns 17Fine level problem:

Assume we have a prolongation P that satisfies substituting for x:

18Reducing the problem18Reducing the problem

1919The choice of C - Likelihood to enter the support The residual is defined by:

A column is likely to enter the support If it has a high inner-product with r (greedy approach).

Likely: Columns that are currently not in the support , which have the largest likelihood.

2020Lower-level dictionarychoosing mc = m/2 columns21

22

The multilevel cycleRepeated iteratively until convergenceTheoretical PropertiesInter-level correspondence:

Direct-Solution (two-level):

23

Theoretical PropertiesNo stagnation (two-level):

Complementary roles:

24

24Theoretical PropertiesMonotonicity:

Convergence:Assuming that: Relax(x) reduces F(x) proportionally to the square of its gradient.25

25Theoretical PropertiesC-selection guaranteeAssume columns of A are normalized.x current approximation. x - solution. C - chosen using x, |C|>|supp{x}|.

26

26InitializationWhen starting with zero initial guess, relaxations tend to initially generate supports that are too rich.V-cycle efficiency might be hampered.We adopt a Full multigrid (FMG) algorithm:

27

27Numerical Results Synthetic denoising experimentExperiments with various dictionaries Anxm.n=1024, m=4096.Initial Support S randomly chosen of size 0.1n.xS random vector ~ N(0,I).f = ASxS.Addition of noise: v ~ N(0,2I). = 0.02y = f + v2828Numerical ResultsStopping criterion [Loris 2009]:

One level methods:CD+ - CD + linesearch.PCD, CG non-linear CG with PCD [Zibulevsky & Elad 2010].SpaRSA [Wright et al. `09].ML - multilevel method. ML-CD - multilevel framework with CD + as shrinkage iteration.ML-CG - multilevel framework with CG as shrinkage iteration.

29

29Experiment 1: Random NormalA random dense nxm matrix. Ai,j~N(0,1). 30

Experiment 2: Random 1 31

Experiment 3: ill-conditionedA random dense nxm matrix. Ai,j~N(0,1).Singular values manipulated so that A becomes ill-conditioned [Loris 2009, Zibulevsky & Elad 2010].

32

33Experiment 3: ill-conditioned

Experiment 4: Similar columnsA = [B|C]. B random. C perturbed rank 1 matrix.34

Conclusions & Future workNew multilevel approach developed.Exploits the sparsity of the solution.Accelerates existing iterated shrinkage methods. Future work:Improvements: Faster lowest-level solutions.More suitable iterated shrinkage schemes. Handling non-sparse solutions (different priors).A multilevel method for fast-operator dictionaries.3535Next step: Covariance Selection Given a few random vectors:

we wish to estimate the inverse of the covariance, -1,assuming it is sparse. From probability theory:

36

Problem formulationMaximum likelihood (ML) estimation - we maximize

Likeliest mean: Likeliest covariance:

37

Problem formulationSetting the gradient of J to zero yields:

However, K

Documents

A multilevel iterated-shrinkage approach to l 1 penalized least-squares