1
Super-resolution Person Re-identification with Semi-coupled Low-rank Discriminant Dictionary Learning Xiao-Yuan Jing 1,3 , Xiaoke Zhu 1 , Fei Wu 1,3 , Xinge You 2 , Qinglong Liu 1 , Dong Yue 3 , Ruimin Hu 4 , Baowen Xu 1 1 State Key Laboratory of Software Engineering, School of Computer, Wuhan University, China 2 School of Electronic Information and Communications, Huazhong University of Science and Technology, China 3 College of Automation, Nanjing University of Posts and Telecommunications, China 4 National Engineering Research Center for Multimedia Software, School of Computer, Wuhan University, China Person re-identification [2] has been widely studied due to its importance in surveillance and forensics applications. In practice, gallery images are high-resolution (HR) while probe images are usually low-resolution (LR) in the identification scenarios with large variation of illumination, weather or quality of cameras. Person re-identification in this kind of scenarios, which we call super-resolution (SR) person re-identification, has not been well studied. Motivated by dictionary learning based SR restoration works [5], we propose a semi-coupled low-rank discriminant dictionary learning (SLD 2 L) approach for SR person re-identification in this paper. Specifically, assume that C A is a HR pedestrian image set from camera A and C B is a LR pedes- trian image set from camera B, we aim to learn a pair of HR and LR dictio- naries and a mapping function between features of HR and LR images, such that the features of LR images in C B can be converted into discriminating HR features. To this end, we firstly generate the LR version of C A by performing down-sampling and smoothing operations, which has the same resolution as C B and is denoted by C A . Then we exploit semi-coupled dictionary learning (DL) to learn a pair of HR and LR dictionaries and a mapping matrix be- tween the corresponding features of C A and C A . To ensure that the learned dictionaries and mapping matrix have favorable discriminative capability, we require that HR features of images in C B , which are reconstructed using the learned dictionary pair and mapping matrix, should be close to the fea- tures of images from the same person in C A , but far away from the features of images from different persons in C A . In practice, low resolution has different influences on different patches, e.g., patches with pure color suffer little influence, while patches with com- plex texture suffer more influence. Therefore, learning a common mapping function is not enough to catch all the relationships. Intuitively, we can di- vide images into patches and group patches into several clusters, and then a pair of HR and LR sub-dictionaries and a more stable mapping function can be learned for each cluster. In this paper, we group patches in C A and C B us- ing K-means algorithm according to the similarity of patch features. Then, the patches in C A are grouped according to clustering results of the corre- sponding patches in C A . We require that each cluster-specific sub-dictionary has good representation ability to the patches from the associated cluster but poor representation ability for other clusters. Denote by D i H and D i L the HR and LR sub-dictionaries of the i th cluster, respectively. And V i denotes the mapping of the i th cluster. By separately combining HR and LR sub- dictionaries, we can obtain the structured HR and LR dictionaries, namely D H =[D 1 H , D 2 H , ..., D c H ] and D L =[D 1 L , D 2 L , ..., D c L ], where c is the number of clusters. To ensure that the learned sub-dictionary pairs can well characterize the intrinsic feature spaces of HR and LR images, the noises should be separat- ed from patches in the learning process. Considering that patches from the same cluster are linearly correlated, we can employ low-rank matrix recov- ery technique to separate noises from patches [3, 4]. Figure 1 illustrates the overall flow of SLD 2 L. With the learned dictionaries and mapping matrices, features of LR probe images can be converted into discriminative HR features. Then SR person re-identification can be performed using the features of HR gallery images and the converted HR probe features. Figure 2 reports the matching results of all compared methods on the VIPeR dataset [1] at sampling rate of 1 8. The matching rates of all com- peting methods are significantly lower than those provided in the original pa- This is an extended abstract. The full paper is available at the Computer Vision Foundation webpage. Figure 1: The flowchart of SLD 2 L. pers. The reason is that low resolution results in the loss of useful informa- tion and these methods cannot work well in this scenario. The experimental results of SLD 2 L always outperform these related methods. Experimental results on the i-LIDS and PRID datasets also demonstrate the effectiveness of the proposed approach for SR person re-identification problem. 0 10 20 30 40 50 20 40 60 80 100 Rank Matching Rate (%) VIPeR with sampling rate=1/8 RDC SSCDL RPLM KISSME SLD 2 L Figure 2: Results on the VIPeR dataset. [1] Douglas Gray, Shane Brennan, and Hai Tao. Evaluating appearance models for recognition, reacquisition, and tracking. In Performance Evaluation of Tracking and Surveillance, IEEE Workshop on, 2007. [2] Xiao Liu, Mingli Song, Dacheng Tao, Xingchen Zhou, Chun Chen, and Jiajun Bu. Semi-supervised coupled dictionary learning for person re- identification. In CVPR, IEEE Conference on, pages 3550–3557, 2014. [3] Long Ma, Chunheng Wang, Baihua Xiao, and Wen Zhou. Sparse repre- sentation for face recognition based on discriminative low-rank dictio- nary learning. In CVPR, IEEE Conference on, pages 2586–2593, 2012. [4] Houwen Peng, Bing Li, Rongrong Ji, Weiming Hu, Weihua Xiong, and Congyan Lang. Salient object detection via low-rank and structured sparse matrix decomposition. In AAAI, pages 796–802, 2013. [5] Shenlong Wang, Lei Zhang, Yan Liang, and Quan Pan. Semi-coupled dictionary learning with applications to image super-resolution and photo-sketch synthesis. In CVPR, IEEE Conference on, pages 2216– 2223, 2012.

Super-resolution Person Re-identification with Semi-coupled Low … · 2015. 5. 24. · Super-resolution Person Re-identification with Semi-coupled Low-rank Discriminant Dictionary

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Super-resolution Person Re-identification with Semi-coupled Low … · 2015. 5. 24. · Super-resolution Person Re-identification with Semi-coupled Low-rank Discriminant Dictionary

Super-resolution Person Re-identification with Semi-coupled Low-rank Discriminant DictionaryLearning

Xiao-Yuan Jing1,3, Xiaoke Zhu1, Fei Wu1,3, Xinge You2, Qinglong Liu1, Dong Yue3, Ruimin Hu4, Baowen Xu1

1State Key Laboratory of Software Engineering, School of Computer, Wuhan University, China2School of Electronic Information and Communications, Huazhong University of Science and Technology, China3College of Automation, Nanjing University of Posts and Telecommunications, China4National Engineering Research Center for Multimedia Software, School of Computer, Wuhan University, China

Person re-identification [2] has been widely studied due to its importancein surveillance and forensics applications. In practice, gallery images arehigh-resolution (HR) while probe images are usually low-resolution (LR)in the identification scenarios with large variation of illumination, weatheror quality of cameras. Person re-identification in this kind of scenarios,which we call super-resolution (SR) person re-identification, has not beenwell studied.

Motivated by dictionary learning based SR restoration works [5], wepropose a semi-coupled low-rank discriminant dictionary learning (SLD2L)approach for SR person re-identification in this paper. Specifically, assumethat CA is a HR pedestrian image set from camera A and CB is a LR pedes-trian image set from camera B, we aim to learn a pair of HR and LR dictio-naries and a mapping function between features of HR and LR images, suchthat the features of LR images in CB can be converted into discriminatingHR features.

To this end, we firstly generate the LR version of CA by performingdown-sampling and smoothing operations, which has the same resolution asCB and is denoted by C’

A . Then we exploit semi-coupled dictionary learning(DL) to learn a pair of HR and LR dictionaries and a mapping matrix be-tween the corresponding features of CA and C’

A . To ensure that the learneddictionaries and mapping matrix have favorable discriminative capability,we require that HR features of images in CB, which are reconstructed usingthe learned dictionary pair and mapping matrix, should be close to the fea-tures of images from the same person in CA, but far away from the featuresof images from different persons in CA.

In practice, low resolution has different influences on different patches,e.g., patches with pure color suffer little influence, while patches with com-plex texture suffer more influence. Therefore, learning a common mappingfunction is not enough to catch all the relationships. Intuitively, we can di-vide images into patches and group patches into several clusters, and then apair of HR and LR sub-dictionaries and a more stable mapping function canbe learned for each cluster. In this paper, we group patches in C’

A and CB us-ing K-means algorithm according to the similarity of patch features. Then,the patches in CA are grouped according to clustering results of the corre-sponding patches in C’

A . We require that each cluster-specific sub-dictionaryhas good representation ability to the patches from the associated cluster butpoor representation ability for other clusters. Denote by Di

H and DiL the

HR and LR sub-dictionaries of the ith cluster, respectively. And Vi denotesthe mapping of the ith cluster. By separately combining HR and LR sub-dictionaries, we can obtain the structured HR and LR dictionaries, namelyDH= [D1

H ,D2H , ...,D

cH ] and DL= [D1

L,D2L, ...,D

cL], where c is the number of

clusters.To ensure that the learned sub-dictionary pairs can well characterize the

intrinsic feature spaces of HR and LR images, the noises should be separat-ed from patches in the learning process. Considering that patches from thesame cluster are linearly correlated, we can employ low-rank matrix recov-ery technique to separate noises from patches [3, 4]. Figure 1 illustrates theoverall flow of SLD2L.

With the learned dictionaries and mapping matrices, features of LRprobe images can be converted into discriminative HR features. Then SRperson re-identification can be performed using the features of HR galleryimages and the converted HR probe features.

Figure 2 reports the matching results of all compared methods on theVIPeR dataset [1] at sampling rate of 1

/8. The matching rates of all com-

peting methods are significantly lower than those provided in the original pa-

This is an extended abstract. The full paper is available at the Computer Vision Foundationwebpage.

Figure 1: The flowchart of SLD2L.

pers. The reason is that low resolution results in the loss of useful informa-tion and these methods cannot work well in this scenario. The experimentalresults of SLD2L always outperform these related methods. Experimentalresults on the i-LIDS and PRID datasets also demonstrate the effectivenessof the proposed approach for SR person re-identification problem.

0 10 20 30 40 50

20

40

60

80

100

Rank

Mat

chin

g R

ate

(%)

VIPeR with sampling rate=1/8

RDCSSCDLRPLMKISSME

SLD2L

Figure 2: Results on the VIPeR dataset.

[1] Douglas Gray, Shane Brennan, and Hai Tao. Evaluating appearancemodels for recognition, reacquisition, and tracking. In PerformanceEvaluation of Tracking and Surveillance, IEEE Workshop on, 2007.

[2] Xiao Liu, Mingli Song, Dacheng Tao, Xingchen Zhou, Chun Chen, andJiajun Bu. Semi-supervised coupled dictionary learning for person re-identification. In CVPR, IEEE Conference on, pages 3550–3557, 2014.

[3] Long Ma, Chunheng Wang, Baihua Xiao, and Wen Zhou. Sparse repre-sentation for face recognition based on discriminative low-rank dictio-nary learning. In CVPR, IEEE Conference on, pages 2586–2593, 2012.

[4] Houwen Peng, Bing Li, Rongrong Ji, Weiming Hu, Weihua Xiong, andCongyan Lang. Salient object detection via low-rank and structuredsparse matrix decomposition. In AAAI, pages 796–802, 2013.

[5] Shenlong Wang, Lei Zhang, Yan Liang, and Quan Pan. Semi-coupleddictionary learning with applications to image super-resolution andphoto-sketch synthesis. In CVPR, IEEE Conference on, pages 2216–2223, 2012.