Zhimin CaoThe Chinese University of Hong Kong Qi YinITCS, Tsinghua University Xiaoou TangShenzhen Institutes of Advanced Technology Chinese Academy of

Face Recognition with Learning-Based

DescriptorZhimin Cao The Chinese University of Hong KongQi Yin ITCS, Tsinghua UniversityXiaoou Tang Shenzhen Institutes of Advanced Technology

Chinese Academy of Sciences, ChinaJian Sun Microsoft Research Asia

Outline

1. Introduction2. Overview of framework3. Learning-based descriptor

extraction4. Pose-adaptive matching5. Experimental results6. Conclusion and discussion

1. Introduction(1/2)

LBP, SIFT or HOG are effective descriptors using handcrafted encoding.

However, existing handcrafted encoding methods suffer two drawbacks: Manually getting an optimal encoding method

is difficult. Handcrafted codes are usually unevenly

distributed

distribution of code emergence frequency in 1000 face images

1. Introduction(2/2)

learning-based encoding method uses unsupervised learning methods to encode the local microstructures of the face into a set of discrete codes.

Apply PCA and proper normalization mechanism to improve the discriminative ability of the code histogram.

training a set of pose-specific classifiers (each for one specific pose combination) to make the final decision.

(1000 face images)

2. Overview of framework

“pose-adaptive face matching” pipeline

“learning-based descriptor” pipeline

3. Learning-based descriptor extraction(1/4)

Sampling and normalization sample r*8 neighboring pixels at even

intervals on the ring of radius r to form a low-level feature vector.

normalize the sampled feature vector into unit length.

(1)R1 = 1, with center;(2)R1 = 1,R2 = 2, with

center; (3)R1 = 3, no center; (4)R1 = 4,R2 = 7, no

center.


Learning-based encoding and histogram representation three unsupervised learning methods: ▪ K-means▪ PCA tree▪ Random-projection tree

encoding method is applied to encode the normalized feature vector into discrete codes and then get local filter response codebook.


After the encoding, the input image is turned into a “code” image.

Divide the encoded image into a grid of patches and compute a histogram of the LE codes for each patch.▪ e.g. 5×7 patches for the holistic face (84×96)

Concatenate all patch histograms to form the descriptor of the whole face image.

performance comparison of the different learning methods

Select 1,000 images from the LFW training set

LE descriptors start to beat existing descriptors when the code number reaches 32.


PCA dimension reduction resulting face feature may be too large.▪ e.g. 256 codes × 35 patch = 8,960 400

dimension normalization is applied after the PCA

compression improves the performance. Multiple LE descriptors

Generally, training a linear SVM to combine the similarity scores generated by different LE descriptors can always achieve better result.

recognition rates with different normalization methods

choose 256 code and 400 PCA-dimension as our default setting

The recognition rate of PCA with L1 or L2 normalization version can be higher than non PCA and PCA only version.

comparison curves of different descriptors

the combination of four LE descriptors obtained the best performance on the LFW.

4. Pose-adaptive matching(1/3)

Component-level face alignment Use 9 face components alignment to replace

holistic alignment separately using similarity transform.

face similarity score is the sum of similarities between corresponding components.

more accurately align each component without balancing across the whole face and the negative effect of landmark error will also be reduced

performance comparison of different alignment methods


Pose-adaptive matching each component contributes differently

for the recognition when the pose combination of the matching pair is different.▪ e.g. the right eye is less effective when we match a frontal

face and a right-turned face categorize the pose of the input face to one of three

poses (frontal (F), left (L), and right (R)). Select three gallery images from the Multi-PIE

dataset and measure the similarity between the probe face and them.

pose label of the most alike gallery image is assigned to the probe face.


pose combinations of a face pair could be {FF, LL, RR, LR (RL), LF (FL), RF (FR)}.

each by a subset of training pairs with a specific pose combination trained a linear SVM classifier by a subset of training pairs.

final pose-adaptive classifier consists of 6 linear SVM classifiers.

The “best-fit” classifier having the same pose combination with the input matching pair makes the final decision.

recognition performances of pose-adaptive matching

Randomly sampling 3,000 intra-/extra-personal pairs from LFW for each pose combination. ▪ e.g. pair number is 3, 000 × 6 = 18, 000

Before: 76.20％±0.41％After: 78.30％±0.42％

5. Experimental results(1/2)

Results on the LFW benchmark

5. Experimental results(2/2)

Results on the Multi-PIE The default descriptors trained on the

LFW benchmark are adopted in the experiments.

randomly generate 10 subsets of face images with Multi-PIE, each has 300 intra-personal and 300 extra-personal image pairs.

6. Conclusion and discussion

face recognition using learning-based (LE) descriptor and pose-adaptive matching do well on the LFW benchmark.

excellent generalization ability on Multi-PIE.

Replace manually designed pattern sampling by automating may produce a more powerful descriptor for face recognition.

Documents

Zhimin CaoThe Chinese University of Hong Kong Qi YinITCS, Tsinghua University Xiaoou TangShenzhen Institutes of Advanced Technology Chinese Academy of