Research Article An Analysis Dictionary Learning Algorithm

Research ArticleAn Analysis Dictionary Learning Algorithm under a Noisy DataModel with Orthogonality Constraint

Ye Zhang1 Tenglong Yu1 and Wenwu Wang2

1 Department of Electronic Information Engineering Nanchang University Nanchang 330031 China2 Centre for Vision Speech and Signal Processing University of Surrey Guildford GU2 7XH UK

Correspondence should be addressed to Ye Zhang zhangyencueducn

Received 25 January 2014 Revised 13 June 2014 Accepted 26 June 2014 Published 13 July 2014

Academic Editor Francesco Camastra

Copyright copy 2014 Ye Zhang et alThis is an open access article distributed under the Creative CommonsAttribution License whichpermits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

Two common problems are often encountered in analysis dictionary learning (ADL) algorithms The first one is that the originalclean signals for learning the dictionary are assumed to be known which otherwise need to be estimated from noisymeasurementsThis however renders a computationally slow optimization process and potentially unreliable estimation (if the noise level ishigh) as represented by the Analysis K-SVD (AK-SVD) algorithm The other problem is the trivial solution to the dictionary forexample the null dictionary matrix that may be given by a dictionary learning algorithm as discussed in the learning overcompletesparsifying transform (LOST) algorithm Here we propose a novel optimization model and an iterative algorithm to learn theanalysis dictionary where we directly employ the observed data to compute the approximate analysis sparse representation of theoriginal signals (leading to a fast optimization procedure) and enforce an orthogonality constraint on the optimization criterionto avoid the trivial solutions Experiments demonstrate the competitive performance of the proposed algorithm as compared withthree baselines namely the AK-SVD LOST and NAAOLA algorithms

1 Introduction

Sparse signal representation has been the focus of muchrecent research in signal processing fields such as imagedenoising compression and source separation Sparse rep-resentation is often established on the synthesis model [1ndash3] Considering a signal x isin 119877

119872 the synthesis sparserepresentation of x can be described as

x = Da with a0 = 119896 (1)

where sdot0represents the ℓ

0pseudonorm defined as the

number of nonzero elements of a vector D isin 119877119872times119873 is a

possibly overcomplete dictionary (119873 ge 119872) and a isin 119877119873containing the coding coefficients is assumed to be sparsewith 119896 ≪ 119873

Recently an alternative form of sparse representationmodel called analysis model was proposed in [4ndash14] In thismodel an overcomplete analysis dictionary or operator Ω isin

119877119875times119872 (119875 ge 119872) is sought to transform x isin 119877119872 to a high

dimensional space that is

Ωx = z with z0 = 119875 minus 119897 (2)

where z isin 119877119875 is called the analysis representation of x andassumed to be sparse and 119897 is cosparsity of the analysis modelwhich is the number of zeros in the vector z

Both models can be used to reconstruct an unknownsignal x from the corrupted measurement y isin 119877119872 that is

y = x + k (3)

where k isin 119877119872 is a Gaussian noise vector This is an ill-posedlinear inverse problem which has been studied extensively[15 16] Using the synthesis model the original signal x canbe estimated by solving

a = argmina

a0 subject to 1003817100381710038171003817119910 minusDa100381710038171003817

1003817

2

2le 120576 (4)

and then calculated as x = Da where 120576 denotes the noisefloor It is nowwell known [6] that the original signal can also

Hindawi Publishing Corporatione Scientific World JournalVolume 2014 Article ID 852978 8 pageshttpdxdoiorg1011552014852978

2 The Scientific World Journal

be recovered by solving the following analysis model basedoptimization problem

x = arg min Ωx0 subject to 1003817100381710038171003817y minus x100381710038171003817

1003817

2

2le 120576 (5)

For the squared and invertible dictionary the synthesis andthe analysis models are identical withD = Ωminus1 [9] Howeverthe two models depart if we concentrate on the redundantcase (119875 gt 119872 and 119873 gt 119872) In the analysis model the signalx is described by the zero elements of z that is zero entriesin z that define a subspace which the signal x belongs toas opposed to the few nonzero entries of a in the synthesismodel

The performance of both the synthesis and analysismodels hinges on the representation of the signals with anappropriately chosen dictionary In the past decade a greatdeal of effort has been dedicated to learning the dictionary forthe synthesis model however the ADL problem has receivedless attention with only a few algorithms proposed recently[5 6 10 17ndash19] In these works the dictionary Ω is oftenlearned from the observed signals Y = [y

1 y2 y

119870] isin

119877119872times119870 measured in the presence of additive noise that is

Y = X + V (6)

where X = [x1 x2 x

119870] isin 119877

119872times119870 contains the originalsignals V = [k

1 k2 k

119870] isin 119877

119872times119870 is a Gaussian noisematrix and 119870 is the number of signals

Exploiting the fact that a row of the dictionary Ω isorthogonal to a subset of training signals X a sequentialminimal eigenvalue based ADL algorithm is proposed in [5]Once the subset is found the corresponding row of the dic-tionary can be updated with the eigenvector associated withthe smallest eigenvalue of the autocorrelation matrix of thesesignals However as 119875 increases so does the computationalcost of the method In [6ndash8] an ℓ

1-norm penalty function

is applied to Ωx and a projected subgradient algorithm(called NAAOLA) is proposed for analysis operator learningThese works employ a uniformly normalized tight frame asa constraint on the dictionary to avoid the trivial solution(eg Ω = 0) which however limits the range of possibleΩ to be learned In [10] the Analysis K-SVD (AK-SVD)algorithm is proposed for ADL By keeping Ω fixed theoptimal backward greedy algorithm (OBG) is employed toestimate a submatrix of Ω whose rows are then used todetermine a submatrix of Y and the eigenvector associatedwith the smallest eigenvalue of this submatrix is then used toupdateΩ A generalized analysismodel that is the transformmodel is proposed in the learning sparsifying transform(LST) algorithm [11] Unlike the analysismodel (2) the sparserepresentation of a signal in the LSTmodel is not constrainedto lie in the range space ofΩ Such a generalization allows thetransform model to accommodate a wider class of signals Aclosed-form solution to the LST problem is also developedin [13] The LST algorithm [11] is further extended to theovercomplete case leading to the LOST algorithm in [12]The transform K-SVD algorithm proposed recently in [14] isessentially a combination of the ideas in the LOST and AK-SVD algorithms

There are two problems that have been observed oralready studied in some of the ADL algorithms discussedabove The first one is associated with the computationalcost in the optimization process of the ADL algorithmsFor example in the AK-SVD algorithm X needs to beestimated before learning the dictionary and as a result theoptimization process becomes computationally demandingThe second one is associated with the trivial solutions to thedictionary which may lead to spurious sparse representa-tions To eliminate such trivial solutions extra constraints arerequired such as the full-rank constraint on Ω employed in[11ndash13] and the mutual coherence constraint on the rows ofΩin [12 14]

In this paper we propose a new optimization model andalgorithm attempting to provide alternative solutions to thetwo potential problems mentioned above More specificallysimilar in spirit to our recent work [17] we directly use theobserved data to compute the approximate analysis sparserepresentation of the original signals without having topreestimate X for learning the dictionary This leads to acomputationally very efficient algorithm Moreover differentfrom the LOST algorithm we enforce an orthogonalityconstraint onΩ which as shown later has led to an improvedlearning performance

The paper is organized as follows In Section 2 we discussthe novel ADL model and algorithm In Section 3 we showsome experimental results before concluding the paper inSection 4

2 The Proposed ADL Algorithm

In some algorithms discussed above such as [10] the opti-mization criterion is based on X In practice however Xis unknown and required to be estimated from Y Thisunfortunately results in a computationally very slow opti-mization process as shown in Section 31 To address thisissue we introduce a new model and algorithm for ADLwhere similar to [17] we use ΩY

1as an approximation to

ΩX1 which is further replaced by Z

1and then used as

a new constraint for the reconstruction term Z = ΩY Thisleads to the following model of ADL

min(12

Z minusΩY2119865+ 120582Z1) (7)

where 120582 gt 0 is a regularization parameter According tothe analysis model (2) Z would be an analysis coefficientmatrix only if its columns lie in the range space of the analysisdictionary Ω However Z=ΩY is not explicitly enforced in(7) and hence it is called the transform coefficient matrix asin [19] Using (7) the analysis dictionaryΩ is learned from Ywithout having to preestimate X as opposed to the AK-SVDalgorithm so the computational effort for estimating X fromY is exempted

With the model (7) however there is a trivial solutionthat is Ω = 0 Z = 0 To avoid such a solution additionalconstraints on Ω are required In [12] a full column rankconstraint has been imposed on Ω through the use of anegative log determinant of Ω119879Ω However the inverse of

The Scientific World Journal 3

Table 1 Image denoising results (PSNR in dB)

120590 Noisy in dB ADL method Lena House Peppers

5 3415

OIHT-ADL 3826 3808 3770NAAOLA 3736 3704 3593AK-SVD 3844 3920 3793LOST 3816 3850 3747

10 2813


15 2461


20 2211


Ω119879Ω needs to be computed in the conjugate gradientmethod

in [12] and the inverse of Ω119879Ω may affect the stability ofthe numerical computation when the initial of Ω119879Ω is illconditioned To address this problem in our work a functiondefined on Ω119879Ω = I is employed as a constraint term whichenforcesΩ to be a full column rank matrix based on the factthat the ranks of Ω and its corresponding Gram matrix areequal This leads to the following new optimization criterion

min(12

Z minusΩY2119865+ 120582Z1) subject to Ω119879Ω = I (8)

Using a Lagrangian multiplier 120574 the optimization problemcan be reformulated as

min(12

Z minusΩY2119865+ 120582Z1 +

120574

4

10038171003817100381710038171003817Ω119879Ω minus I1003817100381710038171003817

1003817

2

119865) (9)

It is worth noting that our proposed objective function(9) is essentially different from the one used in [7] In [7] theobjective function is defined as follows

min(ΩX1 +120574

4

10038171003817100381710038171003817Ω119879Ω minus I1003817100381710038171003817

1003817

2

119865+

120582

4

sum

119894

1003817100381710038171003817100381710038171003817

w119879119894w119894minus

119872

119875

1003817100381710038171003817100381710038171003817

2

2

) (10)

wherew119894is the 119894th row ofΩWith (10) the analysis dictionary

is learned from the clean signalX and the sparse constraint isenforced on ΩX In our proposed ADL algorithm howeverthe analysis dictionary is directly learned from the noisyobserved signal Y and thus can tolerate some sparsificationerrors as in [11] The results of the experiments demonstratethat our proposed approach is more robust

Note also that whenZ is sparse minimizing ΩY1can be

obtained by the minimization of Z minusΩY2119865 subject to the

sparsity constraint However both Z and Ω are unknownTo solve the problem we propose an iterative method toalternatively update the estimation of Z andΩ

21 Estimating Z Given the dictionary Ω we first considerthe optimization of Z only In this case the objective function(9) can be modified as

Z = arg minZ

(

1

2

Z minusΩY2119865+ 120582Z1) (11)

The first-order optimality condition of Z implies that

Z minusΩY + 120582 sign (Z) = 0 (12)

Therefore we have

119885119894119895=

(ΩY)119894119895 minus 120582 (ΩY)119894119895 gt 120582(ΩY)119894119895 + 120582 (ΩY)119894119895 lt minus1205820 otherwise

(13)

where 119894 = 1 2 119875 and 119895 = 1 2 119870 are the indices of thematrix elements It is well known that the above solution forZ is called soft thresholding Indeed the sparsity constraintZ1is only an approximation to Z

0 In order to promote

sparsity and to improve the approximation one could insteaduse the hard thresholding method [20] as an alternative thatis setting the smallest components of the vectors to be zeroswhile retaining the others

119885119894119895=

(ΩY)11989411989510038161003816100381610038161003816(ΩY)119894119895

10038161003816100381610038161003816gt 120582

0 otherwise(14)

As such the solution obtained using the constraint Z1will

be closer to that using Z0

22 Dictionary Learning For a given Z the objective func-tion (9) is nonconvex with respect to Ω similar to theobjective function (10) that is

Ω = arg minΩ

119891 (Ω)

= arg minΩ

(

1

2

Z minusΩY2119865+

120574

4

10038171003817100381710038171003817Ω119879Ω minus I1003817100381710038171003817

1003817

2

119865)

(15)

The local minimum of the objective function (15) can befound by using a simple gradient descent method

Ω119905+1= Ω119905minus 120572nabla119891 (Ω) (16)

where 120572 is a step size and nabla119891(Ω) = minus(Z minusΩY)Y119879 + 120574(ΩΩ119879 minusI)Ω If the rows of Ω have different scales of norm wecannot directly use the hard thresholdingmethod to obtainZThe phenomenon is called scaling ambiguity Moreover theconstraint Ω119879Ω = I can enforce Ω to be of full column rankbut cannot avoid a subset of rows of Ω to be possibly zerosHence we normalize the rows of Ω to prevent the scalingambiguity and replace the zero rows if any by the normalizedrandom vectors that is

w119894=

w119894

1003817100381710038171003817w119894

10038171003817100381710038172

1003817100381710038171003817w119894

10038171003817100381710038172= 0

r otherwise(17)


where w119894is the 119894th row of Ω and r is a normalized random

vector There may be other methods to prevent this problemfor example by adding the normalization constraint to theobjective function [7] or by adding the mutual coherenceconstraint on the rows ofΩ to the objective function [12 14]however this is out of the scope of our work Although afterthe row normalization the columns of the dictionary areno longer close to the unit norm the rank of Ω will not bechanged

23 Convergence In the step of updating Z the algorithm tooptimize the objective function (11) is analytical Thus thealgorithm is guaranteed not to increase (11) and converges toa local minimum of (11) [21] Furthermore in the dictionaryupdate step Ω is updated by minimizing a fourth-orderfunction and thus amonotonic reduction in the cost function(15) is guaranteed

Our algorithm (called orthogonality constrained ADLwith iterative hard thresholding OIHT-ADL) to learn anal-ysis dictionary is summarized in Algorithm 1

3 Computer Simulations

To validate the proposed algorithm we perform two exper-iments In the first experiment we show the performanceof the proposed algorithm for synthetic dictionary recoveryproblems In the second experiment we consider the naturalimage denoising problems In these experimentsΩ

0isin 119877119875times119872

is the initial dictionary in which each row is orthogonal to arandom set of119872minus1 training data and is also normalized [10]For performance comparison the AK-SVD [10] NAAOLA[8] and LOST [12] algorithms are used as baselines

31 Experiments on Synthetic Data Following the work in[10] synthetic data are used to demonstrate the performanceof the proposed algorithm in recovering an underlyingdictionary Ω In this experiment we utilize the methods torecover a dictionary that is used to produce the set of trainingdata The convergence curves and the dictionary recoverypercentage are used to show their performance If min

119894(1 minus

|w119879119894w119895|) lt 001 a roww119879

119895in the true dictionaryΩ is regarded

as recovered where w119879119894is an atom of the trained dictionary

Ω isin 11987750times25 is generated with random Gaussian entries and

the dataset consists of 119870 = 50000 signals each residing in a4-dimensional subspace with both the noise-free setup andthe noise setup (120590 = 004 SNR = 25 dB) The parameters ofthe proposed algorithm are set empirically by experimentaltests and we choose the parameters as 120582 = 01 120574 = 10and 120572 = 10minus4 which give the best results in atom recoveryrate We have tested the parameters with different values inthe ranges of 10minus2 lt 120582 lt 10

2 10minus2 lt 120574 lt 102 and

10minus2lt 120572 lt 10

minus5 For the AK-SVD algorithm the dimensionof the subspace is set to be 119903 = 4 following the setting in [10]For the NAAOLA algorithm we set 120582 = 05 which is closeto the default value 120582 = 03 in [8] The parameters in theLOST algorithms are set as those in [11] that is 120572 = 10minus4120582 = 120578 = 120583 = 10

5 119901 = 20 and 119904 = 29 The convergence

curves of the error by the first term of (7) versus the iterationnumber are shown in Figure 1(a) It can be observed that thecompared algorithms take different numbers of iterations toconverge (the terms used tomeasure the rates of convergenceare slightly different in these algorithms because differentcost functions have been used) which are approximately200 100 1000 and 300 iterations for the OIHT-ADL AK-SVD NAAOLA and LOST algorithms respectively Thusthe recovery percentages of these algorithms are measuredat these different iteration numbers (to ensure that eachalgorithm converges to the stable solutions) It is observedfrom Figure 2 that 76 94 42 and 72 of the rows inthe true dictionary Ω are recovered for the noise-free caseand 72 86 12 and 68 for the noisy case respectivelyNote that the recovery rates for the LOST algorithm areobtained with random initialization With the DCT andidentitymatrix initialization the LOST algorithm can recover74 and 72 rows of the true dictionary Ω in noise-freeand noise cases respectively after 300 iterations Althoughit may not be necessary to recover the true dictionary inpractical applications the use of atom recovery rate allowsus to compare the proposed algorithm with the baselineADL algorithms as it has been widely used in these worksThe recovery rate by the NAAOLA algorithm is relativelylow mainly because it employs a uniformly normalized tightframe as a constraint on the dictionary and this constraint haslimited the possibleΩ to be learned The AK-SVD algorithmperforms the best in terms of the recovery rate that istaking fewer iterations to reach a similar recovery percentageHowever the running time in each iteration of the AK-SVDalgorithm is significantly higher than that in our proposedOIHT-ADL algorithm The total runtime of our algorithmfor 200 iterations is about 825 and 832 seconds for the noise-free and noise case respectively In contrast the AK-SVDalgorithm takes about 11544 or 10948 seconds respectivelyfor only 100 iterations (Computer OSWindows 7 CPU IntelCore i5-3210M 250GHz and RAM 4GB)This is becauseour algorithm does not need to estimate X in each iterationas opposed to the AK-SVD algorithm

32 Experiments on Natural Image Denoising In this exper-iment the training set of 20000 image patches each of7 times 7 pixels obtained from three images (Lena Houseand Peppers) that are commonly used in denoising [10]has been used for learning the dictionaries each of whichcorresponds to a particular noisy image Noise with differentlevel 120590 varying from 5 to 20 is added to these imagepatches The dictionary of size 63 times 49 is learned fromthe training data which are normalized to have zero meanFor fair comparison the dictionary size is the same as thatin [10] The dictionary with a bigger size may lead to asparser representation of the signal but may have a highercomputational cost Similar to Section 31 enough iterationsare performed for the tested algorithms to converge Theparameters for the OIHT-ADL algorithm are set the same asthose in Section 31 except that 120582 = 005 For the AK-SVDalgorithm the dimension of the subspace is set to be 119903 = 7following the setting in [10] For the NAAOLA algorithm


0 50 100 150 200800

1000

1200

1400

1600

1800

2000

Iteration number

05X

minusΩY

2 F

(a) OHIT-ADL

0 20 40 60 80 100001

002

003

004

005

006

007

008

Iteration number

XminusY

Fradic

MK

(b) AK-SVD

Noisy SNR = 25

0 500 1000 1500 2000538

539

54

541

542

543

544

545

546

Iteration number

Noise-free

log10(ΩY

1)

(c) NAAOLA

0 50 100 150 200 250 3000

200

400

600

800

1000

1200

1400

1600

Iteration number

Noise-free

05X

minusΩY

2 F

Noisy SNR = 25

(d) LOST

Figure 1 The convergence curves of the ADL algorithms for both the noise-free and noise case (where SNR = 25 dB)

Input Observed data Y isin 119877119872times119870 the initial dictionaryΩ0isin 119877119875times119872 120582 120574 120572 the number of iterations 119888

Output DictionaryΩInitialization SetΩ = Ω

0

For 119905 = 1 119888 do(i) FixingΩ computeΩY(ii) Update the analysis coefficients Z using (14)(iii) Fixing Z update the dictionaryΩ using (16)(iv) Normalize rows inΩ using (17)

End

Algorithm 1 OIHT-ADL


0 50 100 150 2000

20

40

60

80

100

Iteration number

The r

ecov

ered

row

s (

)

(a) OHIT-ADL

0

20

40

60

80

100

0 20 40 60 80 100

Iteration number

The r

ecov

ered

row

s (

)

(b) AK-SVD

0 500 1000 1500 20000

20

40

60

80

100

Iteration number

Noise-free

The r

ecov

ered

row

s (

)

Noisy SNR = 25

(c) NAAOLA

0 50 100 150 200 250 3000

20

40

60

80

100

Iteration number

Noise-free

The r

ecov

ered

row

s (

)

Noisy SNR = 25

(d) LOST

Figure 2 The recovery percentage curves of the ADL algorithms

we set 120582 = 3 (120590 = 5) 120582 = 1 (120590 = 10 15) and 120582 = 05(120590 = 20) For the LOST algorithm the parameter values areset as 120572 = 10minus11 120582 = 120578 = 120583 = 105 119901 = 20 and 119904 = 21The parameters for the NAAOLA algorithm and the LOSTalgorithmare chosen empiricallyThe examples of the learneddictionaries are shown from the top to the bottom row inFigure 3 where each atom is shown as a 7 times 7 pixel imagepatch The atoms in the dictionaries learned by the AK-SVDalgorithm are sorted by the number of training patches whichare orthogonal to the corresponding atoms and thereforethe atoms learned by the AK-SVD algorithm appear to bemore structured Sorting is however not used in our proposedalgorithmThenweuse the learneddictionary to denoise eachoverlapping patch extracted from the noisy image Each patchis reconstructed individually and finally the entire image isformed by averaging over the overlapping regions We usethe OBG algorithm [10] to recover each patch image For

fair comparison we do not use the specific image denoisingmethod designed for the corresponding ADL algorithm suchas the one in [12] The denoising performance is evaluatedby the peak signal to noise ratio (PSNR) defined as PSNR =10 log

10(2552 KMsum119870

119894=1sum119872

119895=1(119909119894119895minus 119909119894119895)2) (dB) where 119909

119894119895and

119909119894119895are the 119894119895th pixel value in noisy and denoised images

respectivelyThe results averaged over 10 trials are presentedinTable 1 fromwhichwe can observe that the performance ofthe OIHT-ADL algorithm is generally better than that of theNAAOLA algorithm and the LOST algorithm It is also betterthan that of the AK-SVD algorithm when the noise level isincreased

4 Conclusion

Wehave presented a novel optimizationmodel and algorithmfor ADL where we learn the dictionary directly from the


(a) Lena (b) House (c) Peppers

(d) Lena (e) House (f) Peppers

(g) Lena (h) House (i) Peppers

(j) Lena (k) House (l) Peppers

Figure 3 The learned dictionaries of size 63 times 49 by using the OIHT-ADL AK-SVD NAAOLA and LOST algorithms on the three imageswith noise level 120590 = 5 The results of the OIHT-ADL are shown in the top row in (a) Lena (b) House and (c) Peppers followed by theAK-SVD NAAOLA and LOST in the remaining rows

observed noisy dataWe have also enforced the orthogonalityconstraint on the optimization criterion to remove the trivialsolutions induced by a null dictionary matrix The proposedmethod offers computational advantage over the AK-SVDalgorithm and also provides an alternative to the LOST

algorithm in avoiding the trivial solutions The experimentsperformed have shown its competitive performance as com-pared to the three state-of-the-art algorithms the AK-SVDNAAOLA and LOST algorithms respectively The proposedalgorithm is easy to implement and computationally efficient


Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work was supported by the National Natural ScienceFoundation of China (Grants nos 61162014 and 61210306074)and the Natural Science Foundation of Jiangxi (Grant no20122BAB201025) andwas supported in part by the Engineer-ing and Physical Sciences Research Council (EPSRC) of theUK (Grant no EPK0143071)

References

[1] K Engan S O Aase and J H Husoy ldquoMethod of optimaldirections for frame designrdquo in IEEE International Conferenceon Acoustics Speech and Signal Processing vol 5 pp 2443ndash2446 1999

[2] M Aharon M Elad and A Bruckstein ldquoK-svd an algorithmfor designing overcomplete dictionaries for sparse representa-tionrdquo IEEE Transactions on Signal Processing vol 54 no 11 pp4311ndash4322 2006

[3] K Skretting and K Engan ldquoRecursive least squares dictionarylearning algorithmrdquo IEEE Transactions on Signal Processing vol58 no 4 pp 2121ndash2130 2010

[4] S Nam M E Davies M Elad and R Gribonval ldquoCosparseanalysis modelingmdashuniqueness and algorithmsrdquo in Proceedingsof the IEEE International Conference on Acoustics Speech andSignal Processing (ICASSP 11) pp 5804ndash5807 Prague CzechRepublic May 2011

[5] B Ophir M Elad N Bertin and M D Plumbley ldquoSequen-tial minimal eigenvalues an approach to analysis dictionarylearningrdquo in Proceedings of the 9th European Signal ProcessingConference pp 1465ndash1469 2011

[6] M Yaghoobi S Nam R Gribonval and M E Davies ldquoCon-strained overcomplete analysis operator learning for cosparsesignal modellingrdquo IEEE Transactions on Signal Processing vol61 no 9 pp 2341ndash2355 2013

[7] M Yaghoobi and M E Davies ldquoRelaxed analysis operatorlearningrdquo in Proceedings of the NIPS Workshop on AnalysisOperator Learning vs Dictionary Learning Fraternal Twins inSparse Modeling 2012

[8] M Yaghoobi S Nam R Gribonval and M E Davies ldquoNoiseaware analysis operator learning for approximately cosparsesignalsrdquo in Proceedings of the IEEE International Conference onAcoustics Speech and Signal Processing pp 5409ndash5412 2012

[9] M Elad P Milanfar and R Rubinstein ldquoAnalysis versussynthesis in signal priorsrdquo Inverse Problems vol 23 no 3 pp947ndash968 2007

[10] R Rubinstein T Peleg and M Elad ldquoAnalysis K-SVD adictionary-learning algorithm for the analysis sparse modelrdquoIEEE Transactions on Signal Processing vol 61 no 3 pp 661ndash677 2013

[11] S Ravishankar and Y Bresler ldquoLearning sparsifying transformsfor image processingrdquo in Proceedings of the IEEE InternationalConference on Image Processing pp 681ndash684 2012

[12] S Ravishankar and Y Bresler ldquoLearning overcomplete sparsi-fying transforms for signal processingrdquo in Proceedings of theIEEE International Conference on Acoustics Speech and Signal

Processing (ICASSP 13) pp 3088ndash3092 Vancouver CanadaMay 2013

[13] S Ravishankar and Y Bresler ldquoClosed-form solutions withinsparsifying transform learningrdquo inProceedings of the IEEE Inter-national Conference on Acoustics Speech and Signal Processingpp 5378ndash5382 2013

[14] E M Eksioglu and O Bayir ldquoK-svd meets transform learningtransform k-svdrdquo IEEE Signal Processing Letters vol 21 no 3pp 347ndash351 2014

[15] E J Candes J Romberg and T Tao ldquoRobust uncertaintyprinciples exact signal reconstruction from highly incompletefrequency informationrdquo IEEE Transactions on InformationThe-ory vol 52 no 2 pp 489ndash509 2006

[16] E J Candes and T Tao ldquoDecoding by linear programmingrdquoIEEE Transactions on Information Theory vol 51 no 12 pp4203ndash4215 2005

[17] Y Zhang H Wang T Yu and W Wang ldquoSubsetpursuit for analysis dictionary learningrdquo in Proceedingsof the European Signal Processing Conference 2013httppersonaleesurreyacukPersonalWWangpapersZha-ngWYW EUSIPCO 2013pdf

[18] Y Zhang W Wang H Wang and S Sanei ldquoKplane clusteringalgorithm for analysis dictionary learningrdquo in Proceedings of theIEEE International Workshop on Machine Learning for SignalProcessing pp 1ndash4 2013

[19] S Ravishankar and Y Bresler ldquoLearning sparsifying trans-formsrdquo IEEE Transactions on Signal Processing vol 61 no 5 pp1072ndash1086 2013

[20] M Elad M A T Figueiredo and Y Ma ldquoOn the role ofsparse and redundant representations in image processingrdquoProceedings of the IEEE vol 98 no 6 pp 972ndash982 2010

[21] T Blumensath and M E Davies ldquoIterative thresholding forsparse approximationsrdquo The Journal of Fourier Analysis andApplications vol 14 no 5-6 pp 629ndash654 2008

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014


Active and Passive Electronic Components

Control Scienceand Engineering

Journal of



RotatingMachinery


Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design



Shock and Vibration


Civil EngineeringAdvances in

Acoustics and VibrationAdvances in



Electrical and Computer Engineering

Journal of

Advances inOptoElectronics


Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of


Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014


Chemical EngineeringInternational Journal of Antennas and

Propagation




Navigation and Observation



DistributedSensor Networks



be recovered by solving the following analysis model basedoptimization problem

x = arg min Ωx0 subject to 1003817100381710038171003817y minus x100381710038171003817

1003817

2

2le 120576 (5)

For the squared and invertible dictionary the synthesis andthe analysis models are identical withD = Ωminus1 [9] Howeverthe two models depart if we concentrate on the redundantcase (119875 gt 119872 and 119873 gt 119872) In the analysis model the signalx is described by the zero elements of z that is zero entriesin z that define a subspace which the signal x belongs toas opposed to the few nonzero entries of a in the synthesismodel

The performance of both the synthesis and analysismodels hinges on the representation of the signals with anappropriately chosen dictionary In the past decade a greatdeal of effort has been dedicated to learning the dictionary forthe synthesis model however the ADL problem has receivedless attention with only a few algorithms proposed recently[5 6 10 17ndash19] In these works the dictionary Ω is oftenlearned from the observed signals Y = [y

1 y2 y

119870] isin

119877119872times119870 measured in the presence of additive noise that is

Y = X + V (6)

where X = [x1 x2 x

119870] isin 119877

119872times119870 contains the originalsignals V = [k

1 k2 k

119870] isin 119877

119872times119870 is a Gaussian noisematrix and 119870 is the number of signals

Exploiting the fact that a row of the dictionary Ω isorthogonal to a subset of training signals X a sequentialminimal eigenvalue based ADL algorithm is proposed in [5]Once the subset is found the corresponding row of the dic-tionary can be updated with the eigenvector associated withthe smallest eigenvalue of the autocorrelation matrix of thesesignals However as 119875 increases so does the computationalcost of the method In [6ndash8] an ℓ

1-norm penalty function

is applied to Ωx and a projected subgradient algorithm(called NAAOLA) is proposed for analysis operator learningThese works employ a uniformly normalized tight frame asa constraint on the dictionary to avoid the trivial solution(eg Ω = 0) which however limits the range of possibleΩ to be learned In [10] the Analysis K-SVD (AK-SVD)algorithm is proposed for ADL By keeping Ω fixed theoptimal backward greedy algorithm (OBG) is employed toestimate a submatrix of Ω whose rows are then used todetermine a submatrix of Y and the eigenvector associatedwith the smallest eigenvalue of this submatrix is then used toupdateΩ A generalized analysismodel that is the transformmodel is proposed in the learning sparsifying transform(LST) algorithm [11] Unlike the analysismodel (2) the sparserepresentation of a signal in the LSTmodel is not constrainedto lie in the range space ofΩ Such a generalization allows thetransform model to accommodate a wider class of signals Aclosed-form solution to the LST problem is also developedin [13] The LST algorithm [11] is further extended to theovercomplete case leading to the LOST algorithm in [12]The transform K-SVD algorithm proposed recently in [14] isessentially a combination of the ideas in the LOST and AK-SVD algorithms

There are two problems that have been observed oralready studied in some of the ADL algorithms discussedabove The first one is associated with the computationalcost in the optimization process of the ADL algorithmsFor example in the AK-SVD algorithm X needs to beestimated before learning the dictionary and as a result theoptimization process becomes computationally demandingThe second one is associated with the trivial solutions to thedictionary which may lead to spurious sparse representa-tions To eliminate such trivial solutions extra constraints arerequired such as the full-rank constraint on Ω employed in[11ndash13] and the mutual coherence constraint on the rows ofΩin [12 14]

In this paper we propose a new optimization model andalgorithm attempting to provide alternative solutions to thetwo potential problems mentioned above More specificallysimilar in spirit to our recent work [17] we directly use theobserved data to compute the approximate analysis sparserepresentation of the original signals without having topreestimate X for learning the dictionary This leads to acomputationally very efficient algorithm Moreover differentfrom the LOST algorithm we enforce an orthogonalityconstraint onΩ which as shown later has led to an improvedlearning performance

The paper is organized as follows In Section 2 we discussthe novel ADL model and algorithm In Section 3 we showsome experimental results before concluding the paper inSection 4

2 The Proposed ADL Algorithm

In some algorithms discussed above such as [10] the opti-mization criterion is based on X In practice however Xis unknown and required to be estimated from Y Thisunfortunately results in a computationally very slow opti-mization process as shown in Section 31 To address thisissue we introduce a new model and algorithm for ADLwhere similar to [17] we use ΩY

1as an approximation to

ΩX1 which is further replaced by Z

1and then used as

a new constraint for the reconstruction term Z = ΩY Thisleads to the following model of ADL

min(12

Z minusΩY2119865+ 120582Z1) (7)

where 120582 gt 0 is a regularization parameter According tothe analysis model (2) Z would be an analysis coefficientmatrix only if its columns lie in the range space of the analysisdictionary Ω However Z=ΩY is not explicitly enforced in(7) and hence it is called the transform coefficient matrix asin [19] Using (7) the analysis dictionaryΩ is learned from Ywithout having to preestimate X as opposed to the AK-SVDalgorithm so the computational effort for estimating X fromY is exempted

With the model (7) however there is a trivial solutionthat is Ω = 0 Z = 0 To avoid such a solution additionalconstraints on Ω are required In [12] a full column rankconstraint has been imposed on Ω through the use of anegative log determinant of Ω119879Ω However the inverse of




5 3415


10 2813


15 2461


20 2211




min(12



min(12

Z minusΩY2119865+ 120582Z1 +

120574

4

10038171003817100381710038171003817Ω119879Ω minus I1003817100381710038171003817

1003817

2

119865) (9)


min(ΩX1 +120574

4

10038171003817100381710038171003817Ω119879Ω minus I1003817100381710038171003817

1003817

2

119865+

120582

4

sum

119894

1003817100381710038171003817100381710038171003817

w119879119894w119894minus

119872

119875

1003817100381710038171003817100381710038171003817

2

2

) (10)







Z = arg minZ

(

1

2

Z minusΩY2119865+ 120582Z1) (11)


Z minusΩY + 120582 sign (Z) = 0 (12)

Therefore we have

119885119894119895=


(13)




119885119894119895=

(ΩY)11989411989510038161003816100381610038161003816(ΩY)119894119895

10038161003816100381610038161003816gt 120582

0 otherwise(14)




Ω = arg minΩ

119891 (Ω)

= arg minΩ

(

1

2

Z minusΩY2119865+

120574

4

10038171003817100381710038171003817Ω119879Ω minus I1003817100381710038171003817

1003817

2

119865)

(15)


Ω119905+1= Ω119905minus 120572nabla119891 (Ω) (16)


w119894=

w119894

1003817100381710038171003817w119894

10038171003817100381710038172

1003817100381710038171003817w119894

10038171003817100381710038172= 0

r otherwise(17)








0isin 119877119875times119872



119894(1 minus

|w119879119894w119895|) lt 001 a roww119879





2 10minus2 lt 120574 lt 102 and

10minus2lt 120572 lt 10






0 50 100 150 200800

1000

1200

1400

1600

1800

2000

Iteration number

05X

minusΩY

2 F

(a) OHIT-ADL

0 20 40 60 80 100001

002

003

004

005

006

007

008

Iteration number

XminusY

Fradic

MK

(b) AK-SVD

Noisy SNR = 25

0 500 1000 1500 2000538

539

54

541

542

543

544

545

546

Iteration number

Noise-free

log10(ΩY

1)

(c) NAAOLA

0 50 100 150 200 250 3000

200

400

600

800

1000

1200

1400

1600

Iteration number

Noise-free

05X

minusΩY

2 F

Noisy SNR = 25

(d) LOST




0


End



0 50 100 150 2000

20

40

60

80

100

Iteration number

The r

ecov

ered

row

s (

)

(a) OHIT-ADL

0

20

40

60

80

100

0 20 40 60 80 100

Iteration number

The r

ecov

ered

row

s (

)

(b) AK-SVD

0 500 1000 1500 20000

20

40

60

80

100

Iteration number

Noise-free

The r

ecov

ered

row

s (

)

Noisy SNR = 25

(c) NAAOLA

0 50 100 150 200 250 3000

20

40

60

80

100

Iteration number

Noise-free

The r

ecov

ered

row

s (

)

Noisy SNR = 25

(d) LOST




10(2552 KMsum119870

119894=1sum119872

119895=1(119909119894119895minus 119909119894119895)2) (dB) where 119909

119894119895and



4 Conclusion













Acknowledgments


References

























RoboticsJournal of





Journal of



RotatingMachinery





VLSI Design



Shock and Vibration







Journal of



Volume 2014


SensorsJournal of





Propagation












5 3415


10 2813


15 2461


20 2211




min(12



min(12

Z minusΩY2119865+ 120582Z1 +

120574

4

10038171003817100381710038171003817Ω119879Ω minus I1003817100381710038171003817

1003817

2

119865) (9)


min(ΩX1 +120574

4

10038171003817100381710038171003817Ω119879Ω minus I1003817100381710038171003817

1003817

2

119865+

120582

4

sum

119894

1003817100381710038171003817100381710038171003817

w119879119894w119894minus

119872

119875

1003817100381710038171003817100381710038171003817

2

2

) (10)







Z = arg minZ

(

1

2

Z minusΩY2119865+ 120582Z1) (11)


Z minusΩY + 120582 sign (Z) = 0 (12)

Therefore we have

119885119894119895=


(13)




119885119894119895=

(ΩY)11989411989510038161003816100381610038161003816(ΩY)119894119895

10038161003816100381610038161003816gt 120582

0 otherwise(14)




Ω = arg minΩ

119891 (Ω)

= arg minΩ

(

1

2

Z minusΩY2119865+

120574

4

10038171003817100381710038171003817Ω119879Ω minus I1003817100381710038171003817

1003817

2

119865)

(15)


Ω119905+1= Ω119905minus 120572nabla119891 (Ω) (16)


w119894=

w119894

1003817100381710038171003817w119894

10038171003817100381710038172

1003817100381710038171003817w119894

10038171003817100381710038172= 0

r otherwise(17)








0isin 119877119875times119872



119894(1 minus

|w119879119894w119895|) lt 001 a roww119879





2 10minus2 lt 120574 lt 102 and

10minus2lt 120572 lt 10






0 50 100 150 200800

1000

1200

1400

1600

1800

2000

Iteration number

05X

minusΩY

2 F

(a) OHIT-ADL

0 20 40 60 80 100001

002

003

004

005

006

007

008

Iteration number

XminusY

Fradic

MK

(b) AK-SVD

Noisy SNR = 25

0 500 1000 1500 2000538

539

54

541

542

543

544

545

546

Iteration number

Noise-free

log10(ΩY

1)

(c) NAAOLA

0 50 100 150 200 250 3000

200

400

600

800

1000

1200

1400

1600

Iteration number

Noise-free

05X

minusΩY

2 F

Noisy SNR = 25

(d) LOST




0


End



0 50 100 150 2000

20

40

60

80

100

Iteration number

The r

ecov

ered

row

s (

)

(a) OHIT-ADL

0

20

40

60

80

100

0 20 40 60 80 100

Iteration number

The r

ecov

ered

row

s (

)

(b) AK-SVD

0 500 1000 1500 20000

20

40

60

80

100

Iteration number

Noise-free

The r

ecov

ered

row

s (

)

Noisy SNR = 25

(c) NAAOLA

0 50 100 150 200 250 3000

20

40

60

80

100

Iteration number

Noise-free

The r

ecov

ered

row

s (

)

Noisy SNR = 25

(d) LOST




10(2552 KMsum119870

119894=1sum119872

119895=1(119909119894119895minus 119909119894119895)2) (dB) where 119909

119894119895and



4 Conclusion













Acknowledgments


References

























RoboticsJournal of





Journal of



RotatingMachinery





VLSI Design



Shock and Vibration







Journal of



Volume 2014


SensorsJournal of





Propagation
















0isin 119877119875times119872



119894(1 minus

|w119879119894w119895|) lt 001 a roww119879





2 10minus2 lt 120574 lt 102 and

10minus2lt 120572 lt 10






0 50 100 150 200800

1000

1200

1400

1600

1800

2000

Iteration number

05X

minusΩY

2 F

(a) OHIT-ADL

0 20 40 60 80 100001

002

003

004

005

006

007

008

Iteration number

XminusY

Fradic

MK

(b) AK-SVD

Noisy SNR = 25

0 500 1000 1500 2000538

539

54

541

542

543

544

545

546

Iteration number

Noise-free

log10(ΩY

1)

(c) NAAOLA

0 50 100 150 200 250 3000

200

400

600

800

1000

1200

1400

1600

Iteration number

Noise-free

05X

minusΩY

2 F

Noisy SNR = 25

(d) LOST




0


End



0 50 100 150 2000

20

40

60

80

100

Iteration number

The r

ecov

ered

row

s (

)

(a) OHIT-ADL

0

20

40

60

80

100

0 20 40 60 80 100

Iteration number

The r

ecov

ered

row

s (

)

(b) AK-SVD

0 500 1000 1500 20000

20

40

60

80

100

Iteration number

Noise-free

The r

ecov

ered

row

s (

)

Noisy SNR = 25

(c) NAAOLA

0 50 100 150 200 250 3000

20

40

60

80

100

Iteration number

Noise-free

The r

ecov

ered

row

s (

)

Noisy SNR = 25

(d) LOST




10(2552 KMsum119870

119894=1sum119872

119895=1(119909119894119895minus 119909119894119895)2) (dB) where 119909

119894119895and



4 Conclusion













Acknowledgments


References

























RoboticsJournal of





Journal of



RotatingMachinery





VLSI Design



Shock and Vibration







Journal of



Volume 2014


SensorsJournal of





Propagation










0 50 100 150 200800

1000

1200

1400

1600

1800

2000

Iteration number

05X

minusΩY

2 F

(a) OHIT-ADL

0 20 40 60 80 100001

002

003

004

005

006

007

008

Iteration number

XminusY

Fradic

MK

(b) AK-SVD

Noisy SNR = 25

0 500 1000 1500 2000538

539

54

541

542

543

544

545

546

Iteration number

Noise-free

log10(ΩY

1)

(c) NAAOLA

0 50 100 150 200 250 3000

200

400

600

800

1000

1200

1400

1600

Iteration number

Noise-free

05X

minusΩY

2 F

Noisy SNR = 25

(d) LOST




0


End



0 50 100 150 2000

20

40

60

80

100

Iteration number

The r

ecov

ered

row

s (

)

(a) OHIT-ADL

0

20

40

60

80

100

0 20 40 60 80 100

Iteration number

The r

ecov

ered

row

s (

)

(b) AK-SVD

0 500 1000 1500 20000

20

40

60

80

100

Iteration number

Noise-free

The r

ecov

ered

row

s (

)

Noisy SNR = 25

(c) NAAOLA

0 50 100 150 200 250 3000

20

40

60

80

100

Iteration number

Noise-free

The r

ecov

ered

row

s (

)

Noisy SNR = 25

(d) LOST




10(2552 KMsum119870

119894=1sum119872

119895=1(119909119894119895minus 119909119894119895)2) (dB) where 119909

119894119895and



4 Conclusion













Acknowledgments


References

























RoboticsJournal of





Journal of



RotatingMachinery





VLSI Design



Shock and Vibration







Journal of



Volume 2014


SensorsJournal of





Propagation










0 50 100 150 2000

20

40

60

80

100

Iteration number

The r

ecov

ered

row

s (

)

(a) OHIT-ADL

0

20

40

60

80

100

0 20 40 60 80 100

Iteration number

The r

ecov

ered

row

s (

)

(b) AK-SVD

0 500 1000 1500 20000

20

40

60

80

100

Iteration number

Noise-free

The r

ecov

ered

row

s (

)

Noisy SNR = 25

(c) NAAOLA

0 50 100 150 200 250 3000

20

40

60

80

100

Iteration number

Noise-free

The r

ecov

ered

row

s (

)

Noisy SNR = 25

(d) LOST




10(2552 KMsum119870

119894=1sum119872

119895=1(119909119894119895minus 119909119894119895)2) (dB) where 119909

119894119895and



4 Conclusion













Acknowledgments


References

























RoboticsJournal of





Journal of



RotatingMachinery





VLSI Design



Shock and Vibration







Journal of



Volume 2014


SensorsJournal of





Propagation




















Acknowledgments


References

























RoboticsJournal of





Journal of



RotatingMachinery





VLSI Design



Shock and Vibration







Journal of



Volume 2014


SensorsJournal of





Propagation












Acknowledgments


References

























RoboticsJournal of





Journal of



RotatingMachinery





VLSI Design



Shock and Vibration







Journal of



Volume 2014


SensorsJournal of





Propagation











RoboticsJournal of





Journal of



RotatingMachinery





VLSI Design



Shock and Vibration







Journal of



Volume 2014


SensorsJournal of





Propagation









Documents

Research Article An Analysis Dictionary Learning Algorithm