Label Embedded Dictionary Learning for Image Classi cation · 2019. 4. 18. · Label Embedded Dictionary Learning for Image Classi cation Shuai Shao a, Yan-Jiang Wang a,, Bao-Di Liu

Label Embedded Dictionary Learning for Image

Classification

Shuai Shaoa, Yan-Jiang Wanga,∗, Bao-Di Liua,∗∗, Weifeng Liua, Rui Xua

aCollege of Information and Control Engineering, China University of Petroleum,Qingdao 266580, China

Abstract

Recently, label consistent k-svd (LC-KSVD) algorithm has been successfullyapplied in image classification. The objective function of LC-KSVD is con-sisted of reconstruction error, classification error and discriminative sparsecodes error with `0-norm sparse regularization term. The `0-norm, however,leads to NP-hard problem. Despite some methods such as orthogonal match-ing pursuit can help solve this problem to some extent, it is quite difficult tofind the optimum sparse solution. To overcome this limitation, we proposea label embedded dictionary learning (LEDL) method to utilise the `1-normas the sparse regularization term so that we can avoid the hard-to-optimizeproblem by solving the convex optimization problem. Alternating direc-tion method of multipliers and blockwise coordinate descent algorithm arethen exploited to optimize the corresponding objective function. Extensiveexperimental results on six benchmark datasets illustrate that the proposedalgorithm has achieved superior performance compared to some conventionalclassification algorithms.

Keywords: Dictionary learning, sparse representation, label embeddeddictionary learning, image classification

∗PaperID:NEUCOM-S-19-01925 Corresponding author.∗∗PaperID:NEUCOM-S-19-01925 Corresponding author.Email addresses: [email protected] (Shuai Shao), [email protected]

(Yan-Jiang Wang), [email protected] (Bao-Di Liu), [email protected](Weifeng Liu), [email protected] (Rui Xu)

Preprint submitted to Neurocomputing April 18, 2019

arX

iv:1

903.

0308

7v2

[cs

.CV

] 1

7 A

pr 2

019

Figure 1: The scheme of LEDL is on the right while the LC-KSVD is on the left.The difference between the two methods is the sparse regularization term which LEDLuse the `1-norm regularization term and LC-KSVD use the `0-norm regularization term.Compared with `0-norm, the sparsity constraint factor of `1-norm is unfixed so that thebasis vectors can be selected freely for linear fitting. Thus, our proposed LEDL methodcan get smaller errors than LC-KSVD.

1. Introduction

Recent years, image classification has been a classical issue in computervision. Many successful algorithms Yu et al. (2012b); Shi et al. (2018); Wrightet al. (2009); Yu et al. (2014); Song et al. (2018); Yu et al. (2013); Wanget al. (2018); Yu et al. (2012a); Yang et al. (2009, 2017); Liu et al. (2014a,2017); Jiang et al. (2013); Hao et al. (2017); Chan et al. (2015); Nakazawaand Kulkarni (2018); Ji et al. (2014); Xu et al. (2019); Yuan et al. (2016)have been proposed to solve the problem. In these algorithms, there is onecategory that contributes a lot for image classification which is the sparserepresentation based method.

Sparse representation is capable of expressing the input sample featuresas a linear combination of atoms in an overcomplete basis set. Wright et al.(2009) proposed sparse representation based classification (SRC) algorithm

2

which use the `1-norm regularization term to achieve impressive performance.SRC is the most representative one in the sparse representation based meth-ods. However, in traditional sparse representation based methods, trainingsample features are directly exploited without considering the discriminativeinformation which is crucial in real applications. That is to say, sparse rep-resentation based methods can gain better performance if the discriminativeinformation is properly harnessed.

To handle this problem, dictionary learning (DL) method is introducedto preprocess the training sample features befor classification. DL is a gener-ative model for sparse representation which the concept was firstly prposedby Mallat and Zhang (1993). A few years later, Olshausen and Field (1996,1997) proposed the application of DL on natural images and then it has beenwidely used in many fields such as image denoising Chang et al. (2000); Liet al. (2012, 2018), image superresolution Yang et al. (2010); Wang et al.(2012b); Gao et al. (2018), and image classification Liu et al. (2017); Jianget al. (2013); Chang et al. (2016). A well learned dictionary can help to getsignificant boost in classification accuracy. Therefore, DL based methods inclassification are more and more popular in recent years.

Specificially, there are two strategies are proposed to successfully utilisethe discriminative information: i) class specific dictionary learning ii) classshared dictionary learning. The first strategy is to learn specific dictionariesfor each class such as Wang et al. (2012a); Yang et al. (2014); Liu et al.(2016). The second strategy is to learn a shared dictionary for all classes.For example, Zhang and Li (2010) proposed discriminative K-SVD(D-KSVD)algorithm to directly add the discriminative information into objective func-tion. Furthermore, Jiang et al. (2013) proposed label consistence K-SVD(LC-KSVD) method which add a label consistence term into the objectivefunction of D-KSVD. The motivation for adding this term is to encourage thetraining samples from the same class to have similar sparse codes and thosefrom different classes to have dissimilar sparse codes. Thus, the discrimi-native abilities of the learned dictionary is effectively improved. However,the sparse regularization term in LC-KSVD is `0-norm which leads to theNP-hard Natarajan (1995) problem. Although some greedy methods such asorthogonal matching pursuit (OMP) Tropp and Gilbert (2007) can help solvethis problem to some extent, it is usually to find the suboptimum sparse solu-tion instead of the optimal sparse solution. More specifically, greedy methodsolve the global optimal problems by finding basis vectors in order of recon-struction errors from small to large until T (the sparsity constraint factor)

3

times. Thus, the initialized values are crucial. To this end, `0-norm basedsparse constraint is not conducive to finding a global minimum value to ob-tain the optimal sparse solution.

In this paper, we propose a novel dictionary learning algorithm namedlabel embedded dictionary learning (LEDL). This method introduces the `1-norm regularization term to replace the `0-norm regularization of LC-KSVD.Thus, we can freely select the basis vectors for linear fitting to get optimalsparse solution. In addition, `1-norm sparse representation is widely usedin many fields so that our proposed LEDL method can be extended andapplied easily. We show the difference between our proposed LEDL and LC-KSVD in Figure 1. We adopt the alternating direction method of multipliers(ADMM) Boyd et al. (2011) framework and blockwise coordinate descent(BCD) Liu et al. (2014b) algorithm to optimize LEDL. Our work mainlyfocuses on threefold.

• We propose a novel dictionary learning algorithm named label embed-ded dictionary learning which introduces the `1-norm regularizationterm as the sparse constraint. The `1-norm sparse constraint is able tohelp easily find the optimal sparse solution.

• We propose to utilize the alternating direction method of multipliers(ADMM) Boyd et al. (2011) algorithm and blockwise coordinate de-scent (BCD) Liu et al. (2014b) algorithm to optimize dictionary learn-ing task.

• We verify the superior performance of our method on six benchmarkdatasets.

The rest of the paper is organized as follows. Section 2 reviews two con-ventional methods which are SRC and LC-KSVD. Section 3.1 presents LEDLmethod for image classification. The optimization approach and the conver-gence are elaborated in Section 3.2. Section 4 shows experimental results onsix well-known datasets. Finally, we conclude this paper in Section 5.

2. Related Work

In this section, we overview two related algorithms, including sparserepresentation based classification (SRC) and label consistent K-SVD (LC-KSVD).

4

Algorithm 1 Sparse representation based classificationInput: X ∈ RD×N , y ∈ RD×1, αOutput: id(y)

1: Code y with the dictionary X via `1-minimization.

2: s = arg mins

{‖y −Xs‖22 + 2α‖s‖1

}3: for c = 1;c ≤ C;c++ do4: Compute the residual ec(y) = ‖y − xcsc‖225: end for6: id (y) = arg minc {ec}7: return id(y)

2.1. Sparse representation based classification (SRC)

SRC was proposed by Wright et al. (2009). Assume that we have C classesof training samples, denoted by Xc, c = 1, 2, · · · , C, where Xc is the trainingsample matrix of class c. Each column of the matrix Xc is a training samplefeature from the cth class. The whole training sample matrix can be denotedas X = [X1,X2, · · ·XC ] ∈ RD×N , where D represents the dimensions of thesample features and N is the number of training samples. Supposing thaty ∈ RD×1 is a testing sample vector, the sparse representation algorithmaims to solve the following objective function:

s = arg mins

{‖y −Xs‖2

2 + 2α‖s‖1

}(1)

where, α is the regularization parameter to control the tradeoff between fit-ting goodness and sparseness. The sparse representation based classificationis to find the minimum value of the residual error for each class.

id (y) = arg minc‖y − xcsc‖2

2 (2)

where id (y) represents the predictive label of y, sc is the sparse code ofcth class. The procedure of SRC is shown in Algorithm 1. Obviously, theresidual ec is associated with only a few images in class c.

2.2. Label Consistent K-SVD (LC-KSVD)

Jiang et al. (2013) proposed LC-KSVD to encourage the similarity amongrepresentations of samples belonging to the same class in D-KSVD. The au-thors proposed to combine the discriminative sparse codes error with the re-construction error and the classification error to form a unified objective func-tion, which gave discriminative sparse codes matrix Q = [q1,q2, · · · ,qN ] ∈

5

Algorithm 2 Label Consistent K-SVDInput: X ∈ RD×N , H ∈ RC×N , Q ∈ RK×N , λ, ω, T , KOutput: B ∈ RD×K , W ∈ RC×K , A ∈ RK×K , S ∈ RK×N

1: Compute B0 by combining class-specific dictionary items for each class using K-SVD Aharon et al.(2006);

2: Compute S0 for X and B0 using sparse coding;

3: Compute A0 using A = QST(SST + I

)−1;

4: Cpmpute W0 using W = HST(SST + I

)−1;

5: Solve Eq.(3); Use

B0√ωA0√λW0

to initialize the dictionary.

6: Normalize B A W:B←

{b1‖b1‖2

, b2‖b2‖2

, · · · , bK‖bK‖2

}A←

{a1‖b1‖2

, a2‖b2‖2

, · · · , aK‖bK‖2

}W←

{w1‖b1‖2

, w2‖b2‖2

, · · · , wK‖bK‖2

}7: return B, W, A, S

RK×N , label matrix H = [h1,h2, · · · ,hN ] ∈ RC×N and training sample ma-trix X. The objective function is defined as follows:

< B,W,A,S > = arg minB,W,A,S

‖X−BS‖2F + λ ‖H−WS‖2

F

+ ω ‖Q−AS‖2F

s.t. ‖si‖0 < T (i = 1, 2 · · · , N)

= argminB,W,A,S

∥∥∥∥∥∥ X√

ωQ√λH

− B√

ωA√λW

S

∥∥∥∥∥∥2

F

s.t. ‖si‖0 < T (i = 1, 2 · · · , N)

(3)

where T is the sparsity constraint factor, making sure that si has no morethan T nonzero entries. The dictionary B = [b1,b2, · · · ,bK ] ∈ RD×K , whereK > D is the number of atoms in the dictionary, and S = [s1, s2, · · · , sN ] ∈RK×D is the sparse codes of training sample matrix X. W = [w1,w2, · · · ,wK ]∈ RC×K is a classifier learned from the given label matrix H. We hopethe W can return the most probable class this sample belongs to. A =[a1, a2, · · · , aK ] ∈ RK×K is a linear transformation relys on Q. λ and ω arethe regularization parameters balancing the discriminative sparse codes er-ror and the classification contribution to the overall objective, respectively.The algorithm is shown in Algorithm 2. Here, we denote m (m = 0, 1, 2, · · ·)as the iteration number and (•)m means the value of matrix (•) after mth

iteration.

6

While the LC-KSVD algorithm exploits the `0-norm regularization termto control the sparseness, it is difficult to find the optimal sparse solution to ageneral image recognition. The reason is that LC-KSVD use OMP method tooptimise the objective function which usually obtain the suboptimal sparsesolution unless finding the perfect initialized values.

3. Methodology

In this section, we first give our proposed label embedded dictionarylearning algorithm. Then we elaborate the optimization of the objectivefunction.

3.1. Proposed Label Embedded Dictionary Learning (LEDL)

Motivated by that the optimal sparse solution can not be found easily with`0-norm regularization term, we propose a novel dictionary learning methodnamed label embedded dictionary learning (LEDL) for image classification.This method introduces the `1-norm regularization term to replace the `0-norm regularization of LC-KSVD. Thus, we can freely select the basis vectorsfor linear fitting to get optimal sparse solution. The objection function is asfollows:

< B,W,A,S >= arg minB,W,A,S

‖X−BS‖2F + λ ‖H−WS‖2

F

+ ω ‖Q−AS‖2F + 2ε‖S‖`1

s.t. ‖B•k‖22 ≤ 1, ‖W•k‖2

2 ≤ 1, ‖A•k‖22 ≤ 1 (k = 1, 2, · · ·K)

(4)

where, (•)•k denotes the kth column vector of matrix (•). The `1-norm reg-ularization term is utilized to enforce sparsity and ε is the regularizationparameter which has the same function as α in Equation (1).

3.2. Optimization of Objective Function

Consider the optimization problem (4) is not jointly convex in both S,B, W and A, it is separately convex in either S (with B, W, A fixed),B (with S, W, A fixed), W (with S, B, A fixed) or A (with S, B, Wfixed). To this end, the optimization problem can be recognised as fouroptimization subproblems which are finding sparse codes (S) and learningbases (B, W, A), respectively. Here, we employ the alternating direction

7

Figure 2: The complete process of LEDL algorithm

method of multipliers (ADMM) Boyd et al. (2011) framework to solve the firstsubproblem and the blockwise coordinate descent (BCD) Liu et al. (2014b)algorithm for the rest subproblem. The complete process of LEDL is shownin Figure 2.

3.2.1. ADMM for finding sparse codes

While fixing B, W and A, we introduce an auxiliary variable Z and refor-mulate the LEDL algorithm into a linear equality-constrained problem withrespect to each iteration has the closed-form solution. The objective functionis as follows:

< B,W,A,C,Z >= arg minB,W,A,C,Z

‖X−BC‖2F + 2ε‖Z‖`1

+ λ ‖H−WC‖2F + ω ‖Q−AC‖2

F

s.t. C = Z, ‖B•k‖22 ≤ 1, ‖W•k‖2

2 ≤ 1,

‖A•k‖22 ≤ 1(k = 1, 2 · · ·K)

(5)

While utilising the ADMM framework with fixed B, W and A, the la-grangian function of the problem (5) is rewritten as:

< C,Z,L >= arg minC,Z,L

‖X−BC‖2F + λ ‖H−WC‖2

F + ω ‖Q−AC‖2F

+ 2ε‖Z‖`1 + 2LT (C− Z) + ρ ‖C− Z‖2F

(6)

where L = [l1, l2, · · · , lN ] ∈ RK×N is the augmented lagrangian multiplierand ρ > 0 is the penalty parameter. After fixing B, W and A, we initializethe C0, Z0 and L0 to be zero matrices. Equation (6) can be solved as follows:

8

(1) Updating C while fixing Z, L, B, W and A:

Cm+1 =< Bm,Wm,Am,Cm,Zm,Lm > (7)

The closed form solution of C is

Cm+1 =(Bm

TBm + λWmTWm + ωAm

TAm + ρI)−1

×(Bm

TX + λWmTH + ωAm

TQ + ρZm − Lm) (8)

(2) Updating Z while fixing C, L, B, W and A

Zm+1 =< Bm,Wm,Am,Cm+1,Zm,Lm > (9)

The closed form solution of Z is

Zm+1 = max

{Cm+1 +

Lmρ− ε

ρI,0

}+ min

{Cm+1 +

Lmρ

+ε

ρI,0

} (10)

where I is the identity matrix and 0 is the zero matrix.(3) Updating the Lagrangian multiplier L

Lm+1 = Lm + ρ (Cm+1 − Zm+1) (11)

where the ρ in Equation (11) is the gradient of gradient descent (GD) method,which has no relationship with the ρ in Equation (6). In order to make betteruse of ADMM framework, the ρ in Equation (11) can be rewritten as θ.

Lm+1 = Lm + θ (Cm+1 − Zm+1) (12)

3.2.2. BCD for learning bases

Without consisdering the sparseness regulariation term in Equation (5),the constrained minimization problem of (4) with respect to the single col-umn has the closed-form solution which can be solved by BCD method. Theobjective function can be rewritten as follows:

< B,W,A > = arg minB,W,A

‖X−BC‖2F + λ ‖H−WC‖2

F + ω ‖Q−AC‖2F

+ 2ε‖Z‖`1 + 2LT (C− Z) + ρ ‖C− Z‖2F

s.t. ‖B•k‖22 ≤ 1, ‖W•k‖2

2 ≤ 1, ‖A•k‖22 ≤ 1(k = 1, 2 · · ·K)

(13)

9

We initialize B0, W0 and A0 to be random matrices and normalize them,respectively. After that we use BCD method to update B, W and A.

(1) Updating B while fixing C, L, Z, W and A

Bm+1 =< Bm,Wm,Am,Cm+1,Zm+1,Lm+1 > (14)

The closed-form solution of single column of B is

(B•k) m+1 =X[(Ck•)m+1

]T − (Bk)mCm+1

[(Ck•)m+1

]T∥∥∥X[(Ck•)m+1

]T − (Bk)mCm+1

[(Ck•)m+1

]T∥∥∥2

(15)

where Bk =

{B•p, p 6= k0, p = k

, (•)k• denotes the kth row vector of matrix (•).

(2) Updating W while fixing C, L, Z, B and A

Wm+1 =< Bm+1,Wm,Am,Cm+1,Zm+1,Lm+1 > (16)

The closed-form solution of single column of W is

(W•k) m+1 =H[(Ck•)m+1

]T − (Wk)mCm+1

[(Ck•)m+1

]T∥∥∥H[(Ck•)m+1

]T − (Wk)mCm+1

[(Ck•)m+1

]T∥∥∥2

(17)

where Wk =

{W•p, p 6= k0, p = k

.

(3) Updating A while fixing C, L, Z, B and W

Am+1 =< Bm+1,Wm+1,Am,Cm+1,Zm+1,Lm+1 > (18)

The closed-form solution of single column of A is

(A•k) m+1 =Q[(Ck•)m+1

]T − (Ak)mCm+1

[(Ck•)m+1

]T∥∥∥Q[(Ck•)m+1

]T − (Ak)mCm+1

[(Ck•)m+1

]T∥∥∥2

(19)

where Ak =

{A•p, p 6= k0, p = k

.

10

Figure 3: Convergence curve of LEDL Algorithm on four datasets.

3.2.3. Convergence Analysis

Assume that the result of the objective function after mth iteration is de-fined as f (Cm,Zm,Lm,Bm,Wm,Am). Since the minimum point is obtainedby ADMM and BCD methods, each method will monotonically decreasethe corresponding objective function after about 100 iterations. Consider-ing that the objective function is obviously bounded below and satisfies theEquation (20), it converges. Figure 3 shows the convergence curve of theproposed LEDL algorithm by using four well-known datasets. The resultsdemonstrate that our proposed LEDL algorithm has fast convergence andlow complexity.

f (Cm,Zm,Lm,Bm,Wm,Am)

≥f (Cm+1,Zm+1,Lm+1,Bm,Wm,Am)

≥f (Cm+1,Zm+1,Lm+1,Bm+1,Wm+1,Am+1)

(20)

11

3.2.4. Overall Algorithm

The overall updating procedures of proposed LEDL algorithm is summa-rized in Algorithm 3. Here, maxiter is the maximum number of iterations,1 ∈ RK×K is a squre matrix with all elements 1 and � indicates element dotproduct. By iterating C, Z, L, B, W and A alternately, the sparse codesare obtained, and the corresponding bases are learned.

Algorithm 3 Label Embedded Dictionary LearningInput: X ∈ RD×N , H ∈ RC×N , Q ∈ RK×N , λ, ω, ε, ρ, θ, KOutput: B ∈ RD×K , W ∈ RC×K , A ∈ RK×K , C ∈ RK×N

1: C0 ← zeros (K,N), Z0 ← zeros (K,N), L0 ← zeros (K,N)2: B0 ← rand (D,K), W0 ← rand (C,K), A0 ← rand (K,K)

3: B•k = B•k‖B•k‖2

, W•k = W•k‖W•k‖2

, A•k = A•k‖A•k‖2

, (k = 1, 2 · · ·K)

4: m = 05: while m ≤ max iter do6: m← m+ 17: Update C:

8: Cm+1 =(Bm

TBm + λWmTWm + ωAm

TAm + ρI)−1

9: ×(Bm

TX + λWmTH + ωAm

TQ + ρZm − Lm)

10: Update Z:

11: Zm+1 = max{Cm+1 + Lm

ρ− ερI,0}

12: + min{Cm+1 + Lm

ρ+ ερI,0}

13: Update L:14: Lm+1 = Lm + θ (Cm+1 − Zm+1)15: Update B, W, A:16: Compute Dm+1 =

(Cm+1Cm+1

T)� (1− I)

17: for k = 1;k ≤ K;k++ do

18: (B•k) m+1 =X[(Ck•)m+1]T−Bm(D•k)m+1∥∥∥X[(Ck•)m+1]T−Bm(D•k)m+1

∥∥∥2

19: (W•k) m+1 =H[(Ck•)m+1]T−Wm(D•k)m+1∥∥∥H[(Ck•)m+1]T−Wm(D•k)m+1

∥∥∥2

20: (A•k) m+1 =Q[(Ck•)m+1]T−Am(D•k)m+1∥∥∥Q[(Ck•)m+1]T−Am(D•k)m+1

∥∥∥2

21: end for22: Update the objective function:23: f = ‖X−BC‖2F + λ ‖Y −WC‖2F + ω ‖Q−AC‖2F + 2ε‖Z‖`1 + LT (C− Z) + ρ ‖C− Z‖2F24: end while25: return B, W, A, C

In testing stage, the constraint terms are based on `1-norm sparse con-straint. Here, we exploit the learned dictionary D to fit the testing sample yto obtain the sparse codes s. Then, we use the trained classfier W to predictthe label of y by calculating max {Ws}.

12

Table 1: Classification rates (%) on different datasets

Datasets\Methods SRC CRC CSDL-SRC LC-KSVD LEDLExtended YaleB 79.1 79.2 80.2 73.5 81.3

CMU PIE 73.7 73.3 77.4 67.1 77.7UC-Merced 80.4 80.7 80.5 79.4 80.7

AID 71.6 72.6 71.6 70.2 72.9Caltech101 89.4 89.4 89.4 88.3 90.1

USPS 78.4 77.9 78.8 71.1 81.1

4. Experimental results

In this section, we utilize several datasets (Extended YaleB Georghiadeset al. (2001), CMU PIE Sim et al. (2002), UC Merced Land Use Yang andNewsam (2010), AID Xia et al. (2017), Caltech101 Fei-Fei et al. (2007) andUSPS Hull (1994)) to evaluate the performance of our algorithm and compareit with other state-of-the-art methods such as SRC Wright et al. (2009), LC-KSVD Jiang et al. (2013), CRC Zhang et al. (2011) and CSDL-SRC Liuet al. (2016). In the following subsection, we first give the experimentalsettings. Then experiments on these six datasets are analyzed. Moreover,some discussions are listed finally.

4.1. Experimental settings

For all the datasets, in order to eliminate the randomness, we carry outevery experiment 8 times and the mean of the classification rates is reported.And we randomly select 5 samples per class for training in all the experi-ments. For Extended YaleB dataset and CMU PIE dataset, each image iscropped to 32×32, pulled into column vector, and `2 normalized to form theraw `2 normalized features. For UC Merced Land Use dataset, AID dataset,we use resnet model He et al. (2016) to extract the features. Specifically,the layer pool5 is utilized to extract 2048-dimensional vectors for them. ForCaltech101 dataset, we use the layer pool5 of resnet model and spatial pyra-mid matching (SPM) with two layers (the second layer include five part,such as left upper, right upper, left lower, right lower, center) to extract12288-dimensional vectors. And finally, each of the images in USPS datasetis resized into 16× 16 vectors.

For convenience, the dictionary size (K) is fixed to the twice the numberof training samples. In addition, we set ρ = 1 and initial θ = 0.5, then

13

decrease the θ in each iteration. Moreover, there are other three parameters(λ, ω and ε) need to be adjust to achieve the highst classification rates. Thedetails are showed in the following subsections.

4.2. Extended YaleB Dataset

The Extended YaleB dataset contains 2,432 face images from 38 individ-uals, each having 64 frontal images under varying illumination conditions.Figure 4 shows some images of the dataset.

Figure 4: Examples of the Extended YaleB dataset

In addition, we set λ = 2−3, ω = 2−11, ε = 2−8 in our experiment. The ex-perimental results are summarized in Table (1). We can see that our proposedLEDL algorithm achieves superior performance to other classical classifica-tion methods by an improvement of at least 1.1%. Compared with `0-normsparsity constraint based dictionary learning algorithm LC-KSVD, our pro-posed `1-norm sparsity constraint based dictionary learning algorithm LEDLalgorithm exceeds it 7.8%. The reason of the high improvement betweenLC-KSVD and LEDL is that `0-norm sparsity constraint leads to NP-hardproblem which is not conductive to finding the optimal sparse solution forthe dictionary. In order to further illustrate the performance of our method,we choose the first 20 classes samples as a subdataset and show the confu-sion matrices in Figure 5. As can be seen that, our method achieves higherclassification rates in all the chosen 20 classes than LC-KSVD. Especiallyin class1, class2, class3, class10, class16, LEDL can achieve at least 10.0%performance gain than LC-KSVD.

14

Figure 5: Confusion matrices on Extended YaleB dataset

4.3. CMU PIE Dataset

The CMU PIE dataset consists of 41,368 images of 68 individuals with43 different illumination conditions. Each human is under 13 different posesand with 4 different expressions. In Figure 6, we list several samples fromthis dataset.

Figure 6: Examples of the CMU PIE dataset

The comparasion results are showed in Table 1, we can see that ourproposed LEDL algorithm outperforms over other well-known methods by animprovement of at least 0.5%. To be attention, LEDL is capable of exceedingLC-KSVD 10.6% in this dataset. The optimal parameters are λ = 2−3,ω = 2−11, ε = 2−8.

15

4.4. UC Merced Land Use Dataset

The UC Merced Land Use dataset is widely used for aerial image clas-sification. It consists of totally 2,100 land-use images of 21 classes. Somesamples are showed in Figure 7.

Figure 7: Examples of the UC Merced dataset

In Table 1, we can see that our proposed LEDL algorithm is only similarwith CRC and still outperforms the other methods. Compared with LC-KSVD, LEDL achieves the higher accuracy by an improvement of 1.3%.Here, we set λ = 20, ω = 2−9, ε = 2−6 to get the optimal result. Theconfusion matrices of the UC Merced Land Use dataset for all classes areshown in Figure 8. We can see that, in all classes except the tennis, LEDLalmost achieve better results compared with LC-KSVD. In several classessuch as building, freeway, river, and sparse, our method achieves superiorperformance to LC-KSVD by an improvement of at least 0.5%.

4.5. AID Dataset

The AID dataset is a new large-scale aerial image dataset which can bedownloaded from Google Earth imagery. It contains 10,000 images from 30aerial scene types. In Figure 9, we show several images of this dataset.

Table 1 illustrates the effectiveness of LEDL for classifying images. Weadjust λ = 2−6, ω = 2−14, ε = 2−12 to achieve the highest accuracy by animprovement of at least 0.3% in the five algorithms. While compared withLC-KSVD, LEDL achieves an improvement of 2.7%.

16

Figure 8: Confusion matrices on UCMerced dataset

Figure 9: Examples of the AID dataset

4.6. Caltech101 Dataset

The caltech101 dataset includes 9,144 images of 102 classes in total, whichare consisted of cars, faces, flowers and so on. Each category have about 40to 800 images and most of them have about 50 images. In figure 10, we showseveral images of this dataset.

As can be seen in Table 1, our proposed LEDL algorithm outperformsall the competing approaches by setting λ = 2−4, ω = 2−13, ε = 2−14 andachieves improvements of 1.8% and 0.7% over LC-KSVD and other methods,

17

Figure 10: Examples of the Caltech101 dataset

respectively. Here, we also choose the first 20 classes to build the confusionmatrices. They are shown in Figure 11.

Figure 11: Confusion matrices on Caltech101 dataset

4.7. USPS Dataset

The USPS dataset contains 9,298 handwritten digit images from 0 to 9which come from the U.S. Postal System. We list several samples from thisdataset in Figure 12.

18

Figure 12: Examples of the USPS dataset

Table 1 shows the comparasion results of five algorithms and it is easyto find out that our proposed LEDL algorithm outperforms over other well-known methods by an improvement of at least 2.3%. And our proposedmethod achieves an improvement of 10.0% over LC-KSVD method. Theoptimal parameters are λ = 2−4, ω = 2−8, ε = 2−5.

4.8. Discussion

From the experimental results on six datasets, we can obtain the followingconclusions.

(1) All the above experimental results illustrate that, our proposed LEDLalgorithm is an effective and general classifier which can achieve superiorperformacne to state-of-the-art methods on various datasets, especially onExtended YaleB dataset, CMU PIE dataset and USPS dataset.

(2) Our proposed LEDL method introduces the `1-norm regularizationterm to replace the `0-norm regularization of LC-KSVD. However, comparedwith LC-KSVD algorithm, LEDL method is always better than it on the sixdatasets. Moreover, on the two face datasets and USPS dataset, our methodcan exceed LC-KSVD nearly 10.0%.

(3) Confusion matrices of LEDL and LC-KSVD on three datasets areshown in Figure 5 8 and 11. They clearly illustrate the superiority of ourmethod. Specificially, for Extended YaleB dataset, our method achieve out-standing performance in five classes (class1, class2, class3, class10, class16).For UC Merced dataset, LEDL almost achieve better classification rates thanLC-KSVD in all classes except the tennis class. For Caltech101 dataset, ourproposed LEDL method perform much better than LC-KSVD method in

19

some classes such as beaver, binocular, brontosaurus, cannon and ceiling fan.

5. Conclusion

In this paper, we propose a Label Embedded Dictionary Learning (LEDL)algorithm. Specifically, we introduce the `1-norm regularization term to re-place the `0-norm regularization term of LC-KSVD which can help to avoidthe NP-hard problem and find optimal solution easily. Furthermore, we pro-pose to adopt ADMM algorithm to solve `1-norm optimization problem andBCD algorithm to update the dictionary. Besides, extensive experimentson six well-known benchmark datasets have proved the superiority of ourproposed LEDL algorithm.

6. Acknowledgment

This research was funded by the National Natural Science Foundation ofChina (Grant No. 61402535, No. 61671480), the Natural Science Founda-tion for Youths of Shandong Province, China (Grant No. ZR2014FQ001),the Natural Science Foundation of Shandong Province, China(Grant No.ZR2018MF017), Qingdao Science and Technology Project (No. 17-1-1-8-jch),the Fundamental Research Funds for the Central Universities, China Univer-sity of Petroleum (East China) (Grant No. 16CX02060A, 17CX02027A),and the Innovation Project for Graduate Students of China University ofPetroleum(East China) (No. YCX2018063).

References

Aharon, M., Elad, M., Bruckstein, A., et al., 2006. K-svd: An algorithmfor designing overcomplete dictionaries for sparse representation. IEEETransactions on signal processing 54 (11), 4311–4322. 6

Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J., 2011. Distributedoptimization and statistical learning via the alternating direction methodof multipliers. Foundations and Trends R© in Machine learning 3 (1), 1–122.4, 8

Chan, T.-H., Jia, K., Gao, S., Lu, J., Zeng, Z., Ma, Y., 2015. Pcanet: Asimple deep learning baseline for image classification? IEEE Transactionson image processing 24 (12), 5017–5032. 2

20

Chang, H., Yang, M., Yang, J., 2016. Learning a structure adaptive dic-tionary for sparse representation based classification. Neurocomputing190 (19), 124–131. 3

Chang, S. G., Yu, B., Vetterli, M., 2000. Adaptive wavelet thresholding forimage denoising and compression. IEEE Transactions on image processing9 (9), 1532–1546. 3

Fei-Fei, L., Fergus, R., Perona, P., 2007. Learning generative visual modelsfrom few training examples: An incremental bayesian approach tested on101 object categories. Computer vision and Image understanding 106 (1),59–70. 13

Gao, D., Hu, Z., Ye, R., 2018. Self-dictionary regression for hyperspectralimage super-resolution. Remote Sensing 10 (10), 1574–1596. 3

Georghiades, A. S., Belhumeur, P. N., Kriegman, D. J., 2001. From few tomany: Illumination cone models for face recognition under variable lightingand pose. IEEE Transactions on pattern analysis and machine intelligence23 (6), 643–660. 13

Hao, S., Wang, W., Yan, Y., Bruzzone, L., 2017. Class-wise dictionary learn-ing for hyperspectral image classification. Neurocomputing 220 (12), 121–129. 2

He, K., Zhang, X., Ren, S., Sun, J., 2016. Deep residual learning for imagerecognition. In: Computer Vision and Pattern Recognition (CVPR), 2016IEEE conference on. IEEE, pp. 770–778. 13

Hull, J. J., 1994. A database for handwritten text recognition research. IEEETransactions on pattern analysis and machine intelligence 16 (5), 550–554.13

Ji, R., Gao, Y., Hong, R., Liu, Q., Tao, D., Li, X., 2014. Spectral-spatialconstraint hyperspectral image classification. IEEE Transactions on geo-science and remote sensing 52 (3), 1811–1824. 2

Jiang, Z., Lin, Z., Davis, L. S., 2013. Label consistent k-svd: Learning adiscriminative dictionary for recognition. IEEE Transactions on patternanalysis and machine intelligence 35 (11), 2651–2664. 2, 3, 5, 13

21

Li, H., He, X., Tao, D., Tang, Y., Wang, R., 2018. Joint medical image fusion,denoising and enhancement via discriminative low-rank sparse dictionarieslearning. Pattern Recognition 79, 130–146. 3

Li, S., Fang, L., Yin, H., 2012. An efficient dictionary learning algorithmand its application to 3-d medical image denoising. IEEE Transactions onbiomedical engineering 59 (2), 417–427. 3

Liu, B.-D., Gui, L., Wang, Y., Wang, Y.-X., Shen, B., Li, X., Wang, Y.-J., 2017. Class specific centralized dictionary learning for face recognition.Multimedia Tools and Applications 76 (3), 4159–4177. 2, 3

Liu, B.-D., Shen, B., Gui, L., Wang, Y.-X., Li, X., Yan, F., Wang, Y.-J.,2016. Face recognition using class specific dictionary learning for sparserepresentation and collaborative representation. Neurocomputing 204 (5),198–210. 3, 13

Liu, B.-D., Shen, B., Wang, Y.-X., 2014a. Class specific dictionary learn-ing for face recognition. In: Security, Pattern Analysis, and Cybernetics(SPAC), 2014 IEEE International conference on. IEEE, pp. 229–234. 2

Liu, B.-D., Wang, Y.-X., Shen, B., Zhang, Y.-J., Wang, Y.-J., 2014b. Block-wise coordinate descent schemes for sparse representation. In: Acoustics,Speech and Signal Processing (ICASSP), 2014 IEEE International confer-ence on. IEEE, pp. 5267–5271. 4, 8

Mallat, S. G., Zhang, Z., 1993. Matching pursuit with time-frequency dictio-naries. IEEE Transactions on signal processing 41 (12), 3397–3415. 3

Nakazawa, T., Kulkarni, D. V., 2018. Wafer map defect pattern classificationand image retrieval using convolutional neural network. IEEE Transactionson semiconductor manufacturing 31 (2), 309–314. 2

Natarajan, B. K., 1995. Sparse approximate solutions to linear systems.SIAM journal on computing 24 (2), 227–234. 3

Olshausen, B. A., Field, D. J., 1996. Emergence of simple-cell receptive fieldproperties by learning a sparse code for natural images. Nature 381 (6583),607–609. 3

22

Olshausen, B. A., Field, D. J., 1997. Sparse coding with an overcompletebasis set: a strategy employed by v1? Vision Research 37 (23), 3311–3325.3

Shi, H., Zhang, Y., Zhang, Z., Ma, N., Zhao, X., Gao, Y., Sun, J., 2018.Hypergraph-induced convolutional networks for visual classification. IEEETransactions on neural networks and learning systems. 2

Sim, T., Baker, S., Bsat, M., 2002. The cmu pose, illumination, and expres-sion (pie) database. In: Automatic Face and Gesture Recognition (FG),2002 IEEE International conference on. IEEE, pp. 53–58. 13

Song, Y., Liu, Y., Gao, Q., Gao, X., Nie, F., Cui, R., 2018. Euler label consis-tent k-svd for image classification and action recognition. Neurocomputing310 (8), 277–286. 2

Tropp, J. A., Gilbert, A. C., 2007. Signal recovery from random measure-ments via orthogonal matching pursuit. IEEE Transactions on informationtheory 53 (12), 4655–4666. 3

Wang, H., Yuan, C., Hu, W., Sun, C., 2012a. Supervised class-specific dic-tionary learning for sparse modeling in action recognition. Pattern Recog-nition 45 (11), 3902–3911. 3

Wang, N., Zhao, X., Jiang, Y., Gao, Y., BNRist, K., 2018. Iterative metriclearning for imbalance data classification. In: International Joint Confer-ence on Artificial Intelligence (IJCAI), 2018 Morgan Kaufmann conferenceon. Morgan Kaufmann, pp. 2805–2811. 2

Wang, S., Zhang, L., Liang, Y., Pan, Q., 2012b. Semi-coupled dictionarylearning with applications to image super-resolution and photo-sketch syn-thesis. In: Computer Vision and Pattern Recognition (CVPR), 2012 IEEEconference on. IEEE, pp. 2216–2223. 3

Wright, J., Yang, A. Y., Ganesh, A., Sastry, S. S., Ma, Y., 2009. Robustface recognition via sparse representation. IEEE Transactions on patternanalysis and machine intelligence 31 (2), 210–227. 2, 5, 13

Xia, G.-S., Hu, J., Hu, F., Shi, B., Bai, X., Zhong, Y., Zhang, L., Lu, X.,2017. Aid: A benchmark data set for performance evaluation of aerial scene

23

classification. IEEE Transactions on geoscience and remote sensing 55 (7),3965–3981. 13

Xu, J., An, W., Zhang, L., Zhang, D., 2019. Sparse, collaborative, or nonneg-ative representation: Which helps pattern classification? Pattern Recog-nition 88, 679–688. 2

Yang, J., Wright, J., Huang, T. S., Ma, Y., 2010. Image super-resolutionvia sparse representation. IEEE Transactions on image processing 19 (11),2861–2873. 3

Yang, J., Yu, K., Gong, Y., Huang, T., 2009. Linear spatial pyramid match-ing using sparse coding for image classification. In: Computer Vision andPattern Recognition (CVPR), 2009 IEEE conference on. IEEE, pp. 1794–1801. 2

Yang, M., Chang, H., Luo, W., 2017. Discriminative analysis-synthesis dic-tionary learning for image classification. Neurocomputing 219 (5), 404–411.2

Yang, M., Zhang, L., Feng, X., Zhang, D., 2014. Sparse representation basedfisher discrimination dictionary learning for image classification. Interna-tional Journal of Computer Vision 109 (3), 209–232. 3

Yang, Y., Newsam, S., 2010. Bag-of-visual-words and spatial extensions forland-use classification. In: Advances in Geographic Information Systems(GIS), 2010 ACM International conference on. ACM, pp. 270–279. 13

Yu, J., Feng, L., Seah, H. S., Li, C., Lin, Z., 2012a. Image classification bymultimodal subspace learning. Pattern Recognition Letters 33 (9), 1196–1204. 2

Yu, J., Rui, Y., Tang, Y. Y., Tao, D., 2014. High-order distance-based mul-tiview stochastic learning in image classification. IEEE Transactions oncybernetics 44 (12), 2431–2442. 2

Yu, J., Tao, D., Rui, Y., Cheng, J., 2013. Pairwise constraints based mul-tiview features fusion for scene classification. Pattern Recognition 46 (2),483–496. 2

24

Yu, J., Tao, D., Wang, M., 2012b. Adaptive hypergraph learning and itsapplication in image classification. IEEE Transactions on image processing21 (7), 3262–3272. 2

Yuan, L., Liu, W., Li, Y., 2016. Non-negative dictionary based sparse repre-sentation classification for ear recognition with occlusion. Neurocomputing171 (1), 540–550. 2

Zhang, L., Yang, M., Feng, X., 2011. Sparse representation or collabora-tive representation: Which helps face recognition? In: Computer Vision(ICCV), 2011 IEEE International conference on. IEEE, pp. 471–478. 13

Zhang, Q., Li, B., 2010. Discriminative k-svd for dictionary learning in facerecognition. In: Computer Vision and Pattern Recognition (CVPR), 2010IEEE conference on. IEEE, pp. 2691–2698. 3

25

Documents

Label Embedded Dictionary Learning for Image Classi cation · 2019. 4. 18. · Label Embedded Dictionary Learning for Image Classi cation Shuai Shao a, Yan-Jiang Wang a,, Bao-Di Liu