8
20 th Computer Vision Winter Workshop Paul Wohlhart,Vincent Lepetit (eds.) Seggau, Austria, February 9-11, 2015 Multi-view Facial Expressions Recognition using Local Linear Regression of Sparse Codes Mahdi Jampour Thomas Mauthner Horst Bischof Institute for Computer Graphics and Vision Graz University of Technology {jampour, mauthner, bischof}@icg.tugraz.at Abstract. We introduce a linear regression-based projection for multi-view facial expressions recogni- tion (MFER) based on sparse features. While facial expression recognition (FER) approaches have be- come popular in frontal or near to frontal views, few papers demonstrate their results on arbitrary views of facial expressions. Our model relies on a new method for multi-view facial expression recognition, where we encode appearance-based facial features using sparse codes and learn projections from non- frontal to frontal views using linear regression pro- jection. We then reconstruct facial features from the projected sparse codes using a common global dic- tionary. Finally, the reconstructed features are used for facial expression recognition. Our regression of sparse codes approach outperforms the state-of-the- art results on both protocols of BU3DFE dataset. 1. Introduction Facial expression recognition (FER) has attracted significant interest in the computer vision commu- nity because of its applications in human computer interaction, education, robotics, games, medicine and psychology [30]. There are six basic classes in facial expression recognition : anger (AN), disgust (DI), fear (FE), Happiness (HA), sadness (SA) and sur- prise (SU). Most of the existing approaches work on frontal or near to frontal views [18, 13, 28], whereas in real-world applications, a frontal view is an un- realistic assumption and limits the applicability. For this reason, non-frontal analysis is now one of the active challenges related to facial expression recog- nition, which needs not only an effective recogni- tion approach, but also a method for compensating missing information (i.e. non-frontal counterpart) [24]. This is a challenging problem because some of the facial features which are necessary for recog- nition are not available or not completely available due to the face orientation. For example, eyebrows, which are very important for recognizing facial ex- pression, may not be visible in a non-frontal face. On the other hand, we have pairwise sets of non-frontal and frontal views that provide the ability of learning transformations between them to benefit from sim- ilar features. For these reasons, we would like to estimate invisible facial features of a face from visi- ble frontal views. To this end, we employ linear re- gression which can provide an elegant approximate transformation on learning collections (e.g. non- frontal to frontal facial expressions). The idea be- hind using linear regression is inspired from non- frontal face recognition [16, 3]. In contrast, we apply sparse features instead raw image data, which shows to be more efficient and less sensible to viewpoint changes than raw data. In this work, we aim to ad- dress the gap of missed facial features of non-frontal views using linear regression of non-frontal sparse coded features to frontal codes. The motivation be- hind employing sparse coded features is the robust- ness on viewpoint variations. Moreover, it has been shown that sparse representation is one of the suc- cessful feature-based representation models for fa- cial expression recognition [21]. Therefore, we first create a dictionary using raw training data and then transform raw features to sparse codes using the dic- tionary. Second afterward, we estimate linear regres- sions to project non-frontal sparse features to frontal ones and finally we use reconstructed features for expression recognition. In other words, our linear regression-based transformation can approximate a frontal view given a non-frontal view using pairwise collections of frontal and non-frontal training data. Moreover, to show the efficiency of our approach, an

Multi-view Facial Expressions Recognition using Local Linear … · 2019-02-19 · Multi-view Facial Expressions Recognition using Local Linear Regression of Sparse Codes Mahdi Jampour

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Multi-view Facial Expressions Recognition using Local Linear … · 2019-02-19 · Multi-view Facial Expressions Recognition using Local Linear Regression of Sparse Codes Mahdi Jampour

20th Computer Vision Winter WorkshopPaul Wohlhart, Vincent Lepetit (eds.)Seggau, Austria, February 9-11, 2015

Multi-view Facial Expressions Recognition using Local Linear Regression ofSparse Codes

Mahdi Jampour Thomas Mauthner Horst BischofInstitute for Computer Graphics and Vision

Graz University of Technology{jampour, mauthner, bischof}@icg.tugraz.at

Abstract. We introduce a linear regression-basedprojection for multi-view facial expressions recogni-tion (MFER) based on sparse features. While facialexpression recognition (FER) approaches have be-come popular in frontal or near to frontal views, fewpapers demonstrate their results on arbitrary viewsof facial expressions. Our model relies on a newmethod for multi-view facial expression recognition,where we encode appearance-based facial featuresusing sparse codes and learn projections from non-frontal to frontal views using linear regression pro-jection. We then reconstruct facial features from theprojected sparse codes using a common global dic-tionary. Finally, the reconstructed features are usedfor facial expression recognition. Our regression ofsparse codes approach outperforms the state-of-the-art results on both protocols of BU3DFE dataset.

1. Introduction

Facial expression recognition (FER) has attractedsignificant interest in the computer vision commu-nity because of its applications in human computerinteraction, education, robotics, games, medicine andpsychology [30]. There are six basic classes in facialexpression recognition : anger (AN), disgust (DI),fear (FE), Happiness (HA), sadness (SA) and sur-prise (SU). Most of the existing approaches work onfrontal or near to frontal views [18, 13, 28], whereasin real-world applications, a frontal view is an un-realistic assumption and limits the applicability. Forthis reason, non-frontal analysis is now one of theactive challenges related to facial expression recog-nition, which needs not only an effective recogni-tion approach, but also a method for compensatingmissing information (i.e. non-frontal counterpart)[24]. This is a challenging problem because some

of the facial features which are necessary for recog-nition are not available or not completely availabledue to the face orientation. For example, eyebrows,which are very important for recognizing facial ex-pression, may not be visible in a non-frontal face. Onthe other hand, we have pairwise sets of non-frontaland frontal views that provide the ability of learningtransformations between them to benefit from sim-ilar features. For these reasons, we would like toestimate invisible facial features of a face from visi-ble frontal views. To this end, we employ linear re-gression which can provide an elegant approximatetransformation on learning collections (e.g. non-frontal to frontal facial expressions). The idea be-hind using linear regression is inspired from non-frontal face recognition [16, 3]. In contrast, we applysparse features instead raw image data, which showsto be more efficient and less sensible to viewpointchanges than raw data. In this work, we aim to ad-dress the gap of missed facial features of non-frontalviews using linear regression of non-frontal sparsecoded features to frontal codes. The motivation be-hind employing sparse coded features is the robust-ness on viewpoint variations. Moreover, it has beenshown that sparse representation is one of the suc-cessful feature-based representation models for fa-cial expression recognition [21]. Therefore, we firstcreate a dictionary using raw training data and thentransform raw features to sparse codes using the dic-tionary. Second afterward, we estimate linear regres-sions to project non-frontal sparse features to frontalones and finally we use reconstructed features forexpression recognition. In other words, our linearregression-based transformation can approximate afrontal view given a non-frontal view using pairwisecollections of frontal and non-frontal training data.Moreover, to show the efficiency of our approach, an

Page 2: Multi-view Facial Expressions Recognition using Local Linear … · 2019-02-19 · Multi-view Facial Expressions Recognition using Local Linear Regression of Sparse Codes Mahdi Jampour

extensive investigation is provided on the BU3DFEdataset. BU3DFE dataset is a popular facial expres-sion dataset which is introduced in section 4.1. Weshow that our approach outperforms the state-of-the-art results on BU3DFE. Contribution: In this paperwe introduce a novel approach for multi-view facialexpression recognition. We propose to use a sparsecoding representation for facial expression recogni-tion which is efficient and stable with viewpoint vari-ation. We introduce also linear regression of suchsparse features to approximate projections in localfeature space.

2. Related works

There are significant works on facial expressionrecognition with many interesting applications in hu-man computer interaction, psychology, games, chil-dren education, etc. which could be broadly catego-rized into the three general categories: 1) Geometric-based models [17, 19, 7, 2], 2) Appearance-basedmodels [32, 6, 14, 9, 21], and 3) hybrid meth-ods which use both texture and shape information[10, 12]. Typically, geometric-based approaches aremethods that employ shape information (e.g. facialaction units) whereas appearance-based approachesuse only texture information. Multi-view facial ex-pression recognition has been attracting increasingattention among the face researcher as well as fa-cial expression recognition. For instance, [17] pro-posed geometric-based methods that uses 2D facialpoints to map from non-frontal to frontal view. [7]proposed a computation of 2D facial feature dis-placement. They normalized extracted distances tozero mean and unit variance to make much discrim-inative classification. Other approaches rely on theappearance-based model. For instance, a discrimi-nant analysis theory (BDA/GMM) proposed by [32]which optimizes upper bound of the Bayes error de-rived by Gaussian mixture model. Hesse et al. [6]evaluated different descriptors such as SIFT, LBPand DCT extracted around of facial landmarks andclassify then using ensemble SVM. The latter ap-proach proposed by [15] which is a two-step multi-view facial expression recognition model that esti-mate the pose orientation directly from the imagein the first step and then, a pose-dependent expres-sion classifier recognizes facial expressions. Be-side of appearance-based multi-view facial expres-sion recognition models, transmutable approacheshave been also proposed by [9] and [21], where [9]

proposed a multi-view discriminative framework us-ing multi-set canonical correlation analysis (MCCA)and the multi-view model theorem for facial expres-sion recognition with arbitrary views. Their methodrespects the intrinsic and discriminant structure ofsamples. They obtained discriminative informationfrom facial expression images based on the discrimi-native neighbor preserving embedding (DNPE). [21]improved an existing facial expression recognitionmodel using generic sparse coding feature. They ap-plied sparse coding features of dense SIFT on the fa-cial images in a three level spatial pyramid and thenencode the local features into sparse codes to makethe possibility of multi-view processing.

Sparse coding has been previously used for facerecognition [31], facial expression recognition [21]and other applications where it has been shown thatsparse coding is a successful encoding technique in[25]. Zhang [31] explained that why sparsity couldimprove discrimination and how regression could beused to solve a classification problem. [25] proposedan efficient sparse-based model and showed that re-gression transformation can improve the time com-plexity in both global and anchored neighborhoodregression which are much faster than other relatedworks. An important yet relatively unexplored ap-proach is to employ pose specific linear regressionwhich is challenging due to the partial linear re-gression. A regression-based approach proposed by([17]) which employed global transformation how-ever it is not as well as local linear regression ofsparse features (LLRSF in section 3.4) on accuracy.Similarly, the approach that used sparse coding fea-ture [21] did not profits the regression transforma-tion. Therefore, to address the above problems, thispaper proposes to integrate them in a sequence.

3. Multi-view Facial Expression Recognition

In this section, we describe the proposed approachto multi-view facial expression recognition. Our ap-proach consists of three modules:

a) Feature extraction: We apply a concatenationof HOG and LBP features which are popular in faceanalysis. HOG [5] is a successful gradient-based de-scriptor used in different purpose of object detectionand recognition that is stable on illumination varia-tion. Moreover, it is a fast descriptor in comparisonto the SIFT and LDP (Local Directional Pattern) [11]due to the simple computation. On the other hand,LBP is a common texture-based descriptor which de-

Page 3: Multi-view Facial Expressions Recognition using Local Linear … · 2019-02-19 · Multi-view Facial Expressions Recognition using Local Linear Regression of Sparse Codes Mahdi Jampour

scribes image pixels based on the neighborhood in-tensities. It has been shown that a concatenation ofHOG and LBP can improve human detection perfor-mance by [27]. In our experiments, the extractedfeatures are considered as feature vectors for everyfacial image in every viewpoint without any concernabout head pose or expressions. The basic idea toconcatenate these two feature descriptors is synthesisof vectors where Ii = [Hi;Li] is a feature vector withsize (q×1) concatenated by HOG (Hi) and LBP (Li)feature vectors, related to the ith face image. The cellsize considered for both HOG and LBP is 25 pixels,therefore, the overall dimensionality is 5480 in totalwhere first 2232 dimensions are computed by HOGand the rest 3248 dimensions via LBP.

b) Projections: Linear regression projection isperformed to estimate projections based on the Eq.2, with global projection for global model (GLR) orseveral projections for local model (LLR). Details areexplained within Section 3.1 and 3.3 respectively. Inaddition, all extracted features are transformed intothe sparse representation as described in Section 3.2and 3.4. The main motivation to use sparse repre-sentation is its robustness on the viewpoint variation.In the following, projections of non-frontal to frontalviewpoints are generated using linear regression onthe sparse codes. We show that sparse coded facialexpression features are much more stable for estimat-ing projections by linear regression than our raw fea-tures.

c) Classification: Both testing and training partsof all non-frontal viewpoints are projected to thefrontal feature space and a global classifier is used forexpression recognition. Linear SVM [4] is applied asour basic classifier to find best facial expression esti-mation. A strategy of using nearest neighbors is alsoprovided to improve our overall result which is de-scribed in Section 3.5.

3.1. Global Linear Regression

Let X be a set of aligned vectorized facial fea-tures which have size (q × 1). Xθi is a subset offacial features in X from viewing angle θi, whereXθi = [Iθi1 , I

θi2 , . . . , I

θiN ] is a matrix of size (q ×N),

and refers to the N vectorized facial features denotedby Iθik ∈ IR(q×1). Note that I0k and Iθik are vec-torized features of the kth facial expression imageof the training data from the same person in differ-ent poses. Based on this, we define pairwise setsof training data, X0 and Xθi , where the former is a

set of frontal views and latter is correspondence non-frontal view with angle θi. The number of samples inboth sets ofX0, Xθi , i = 1, 2, . . . ,M is equal. More-over, Xθ = [X0, Xθ1 , . . . , XθM ] contains M sets ofnon-frontal views and one set of frontal view. There-fore, we need the same number of samples for bothsets of frontal and non-frontal views to train a pro-jection between them using linear regression. To thisend, we define XM

0 which is the frontal set repeatedM+1 times. So, XM

0 and Xθ have same number ofsamples and we can estimate the projection betweenthem. Mathematically, this can be formulated as:

argminP

∥∥XM0 − PXθ

∥∥ (1)

Where the global linear projection P can be estimatedby Eq. 2, which is the closed form solution for Eq. 1.

P = XM0 (XT

θ Xθ)−1XT

θ (2)

Xθ = PXθ (3)

Therefore, Eq. 3 is the global linear regression whichapproximates frontal features Xθ from non-frontal.The overall structure of our globally linear regres-sion of sparse features (GLRSF) is introduced in thefollowing section.

3.2. Global Linear Regression of Sparse Features(GLRSF)

Embedding the feature information within aglobal code book aims for regularizing the data andtherefore being more robust concerning outliers. Weare interested in finding a reconstructive dictionarygiven the training features X by minimizing:

‖X −DS‖22 s.t. ‖si‖0 ≤ Γ (4)

where D ∈ IR(q×s) is the dictionary, each columnrepresenting a code book vector, and S ∈ IR(s×N)

the matrix of encoding coefficients. Γ is the sparsityconstraint factor, defining the maximum number ofnon-zero coefficients per sample. We apply K-SVD[1] as dictionary learning algorithm and orthogonalmatching pursuit (OMP) [26] as an efficient way forsolving the coding of new test samples, given a fixeddictionary. Similarly, SM0 is frontal sparse coded setrepeated M+1 times and Sθ = [S0, Sθ1 , . . . , SθM ] is aglobal collection of one frontal and M sets of sparsefeatures of non-frontal facial expressions where allsets have the same number of samples. Eq. 2 and 3could be rewritten for sparse representation as:

P = SM0 (STθ Sθ)−1STθ (5)

Sθ = PSθ (6)

Page 4: Multi-view Facial Expressions Recognition using Local Linear … · 2019-02-19 · Multi-view Facial Expressions Recognition using Local Linear Regression of Sparse Codes Mahdi Jampour

Figure 1: The overall structure of Local Linear Regression of Sparse Features (LLRSF). Train: A global dictio-nary is trained with K-SVD and facial features are encoded. Projections are estimated using local collections offrontal and non-frontal sparse features, and finally, features are reconstructed using the global dictionary. Test:Input sample is encoded via OMP and encoded vector projected to frontal using appropriate projection. Pro-jected vector is then reconstructed using dictionary and finally it is classified for final expression recognition.Training step benefits from j nearest neighbors to improve our learning model as described in Section 3.5.

Sθ defines the projected sparse codes, and the ap-proximated features of the projected views can be re-constructed using the global dictionary D with:

Xθ = DSθ. (7)

To improve our projections, we introduce the local-ity idea as a local linear regression approach in thefollowing section.

3.3. Local Linear Regression

As mentioned before, a huge number of data witha lot of different properties affect on the mappingfunction in the global model due to the differentviewpoints, expressions, gender, age, skin color, etc.Intuitively, the projection error could be decreasedwhen we increase the number of projections reason-ably; this means, splitting data into several meaning-ful parts and making correspondent projections leadsto reduction of the overall projection error comparedto using one global projection. Therefore, we used asupervised learning classification to split data usinglogistic regression SVM where we learn our model

with training data based on the viewpoints and clas-sify data into M smaller subsets. Subsequently, linearregressions between specific non-frontal setsXθi andthe frontal set X0 estimated for all subsets by:

Pi = X0(XTθiXθi)

−1XTθi

i = 1, 2, . . . ,M (8)

Xθi = PiXθi (9)

Where Xθi refers to approximation of frontal fea-tures by ith linear regression. Therefore, local linearregression workflow is summarized as:Step 1: Classifying facial features to the M subsets.Step 2: Approximating linear regression from non-frontal to frontal subset.Step 3: Estimating projected facial features by ap-proximated projections: Xθi = PiXθi

Step 4: Train a global classifier using projected fea-tures X = [X0, Xθ1 , Xθ2 , . . . , XθM ].The above workflow is the overall structure for ourlocal linear regression (LLR). Next, we describe howsparse coding could be beneficial for our approach.

Page 5: Multi-view Facial Expressions Recognition using Local Linear … · 2019-02-19 · Multi-view Facial Expressions Recognition using Local Linear Regression of Sparse Codes Mahdi Jampour

3.4. Local Linear Regression of Sparse Features(LLRSF)

GLRSF is an efficient approach for MFER whichis also almost stable with outliers but as it uses basicfeatures, it is expensive in terms of memory usagedue to the large feature vectors. Therefore, sparserepresentation is a successful alternative that couldhelp us to improve our solution. We are interestedin finding a reconstructive dictionary given the train-ing features X similar to the Eq. 4 and again ap-ply K-SVD [1] and OMP [26] to solve the codingof new test samples, given a fixed dictionary. Simi-lar to LLR, we define M local projections which ap-proximate linear regression for each viewpoint; thuslet S0 be a set of sparse features of frontal facial ex-pressions and Sθ1 , Sθ2 , . . . , SθM are M sets of sparsefeatures of non-frontal facial expressions where allsets have the same number of samples and providedby OMP. Eq. 8 and 9 could be rewritten for sparserepresentation as:

Pi = S0(STθiSθi)

−1STθi i = 1, 2, . . . ,M (10)

Sθi = PiSθi (11)

where Pi is ith projection which has been estimatedusing correspondent sparse features. Sθi defines theprojected sparse codes, and the approximated fea-tures of the projected frontal view can be recon-structed using the global dictionary D with:

Xθi = DSθi . (12)

The overall structure of our local linear regression ofsparse features (LLRSF) is illustrated in Figure 1.

3.5. Soft Learning using Nearest Neighbors

Local linear regression (LLR) or in general, ana-lyzing with subsets of data is almost an efficient so-lution if we perform meaningful constraint S regard-ing to splitting data. Nevertheless, supervised learn-ing is sensitive to the number of training samples,therefore, while splitting data into the small collec-tions is useful for regression approximation, it hasdisadvantage on the supervised classification. To thisend, we propose an idea to make a contribution us-ing each cluster neighborhood. In other words, whilewe classify all data into the M subsets (clusters),there are M cluster centers that they are basicallyuseful to present as a similarity measurement; whichmeans we compute and exploit N-nearest neighbors

Figure 2: Multi-view rendered faces of a subjectfrom BU3DFE-P1 (35 viewpoints)

Figure 3: Multi-view rendered faces of a subjectfrom BU3DFE-P2 (5 viewpoints)

for each cluster based on the cluster centers similar-ity. Neighbors are the best candidates for contribut-ing as training data in our work because while weassumed that clusters are splitted based on the view-points, there are only small changes within sequencesof head poses. Therefore, we profit from the neigh-bors to train each specific subset. The results in thefollowing show that it can improve our overall rate.

4. Experimental Results

In order to demonstrate the performance of ourmodel we evaluate on the BU3DFE which is mostthe popular dataset for multi-view facial expressionrecognition. We follow on the standard evaluationscheme and apply a 5-fold cross validation over thehighest level of expression intensity. The details aregiven in the following.

4.1. BU3DFE dataset

BU3DFE is a publicly available dataset contain-ing 3D scanned faces of 100 subjects with six ba-sic expressions. More details can be found in [29].We rendered multiple views from the 3D faces inseven pan angles (0◦, ±15◦, ±30◦, ±45◦) and fivetilt angles (0◦, ±15◦, ±30◦) which means 35 view-

Page 6: Multi-view Facial Expressions Recognition using Local Linear … · 2019-02-19 · Multi-view Facial Expressions Recognition using Local Linear Regression of Sparse Codes Mahdi Jampour

Table 1: Multi-view facial expression recognitioncomparison between proposed approaches

Method Dataset Accuracy

SF BU3DFE-P1 68.14GLRSF BU3DFE-P1 69.87LLRSF BU3DFE-P1 74.42Soft-LLRSF BU3DFE-P1 78.64

SF BU3DFE-P2 70.33GLRSF BU3DFE-P2 75.10LLRSF BU3DFE-P2 75.07Soft-LLRSF BU3DFE-P2 76.64

points to compare our results with the related works[21, 22, 23, 20, 32], shown in Figure 2. In addi-tion we generated views for 0◦, 30◦, 45◦, 60◦ and90◦ which means 5 viewpoints as second protocol tocompare our model with papers that applied this pro-tocol [9, 8], as shown in Figure 3. Therefore, as thereare 6 expressions for 100 subjects over the highestlevel of expression intensity in 35 viewpoints, wehave 21000 samples in the first protocol and 3000samples in the second protocol of BU3DFE.

4.2. Evaluation of proposed approaches

We proposed two regression based methodsnamely Global Linear Regression of Sparse Features(GLRSF) introduced in Section 3.2 and Local LinearRegression of Sparse Features (LLRSF) proposed inSection 3.4. Another reasonable comparison is pro-vided by SF, defining the baseline results of classi-fying direct on the sparse features, without neitherglobal nor local projection. This highlights the im-pact of our original idea to use linear regression forprojecting non-frontal to frontal views. Parameterslike dictionary size and sparsity in K-SVD and num-ber of nearest neighbors in soft learning are evaluatedwhere the best result is achieved by dictionary sizeof 200 with sparsity 150 and N=4 for nearest neigh-bors. Moreover, the local linear regression makesprojections much more accurate than global regres-sion because local distributions are more compact,smaller and almost easier than global feature space, itcan bee seen in the results where Table 1 shows thatLLRSF has best overall recognition rate among ofSF, GLRSF and LLRSF. It has also found that our lo-cal regression projection approach using sparse fea-tures has about 6% improvement compared to non-projection baseline SF which shows the importanceof proposed idea. Moreover, Soft-LLRSF which is

(a) Protocol1=78.64% (b) Protocol2=76.64%

Figure 4: The confusion matrices of Soft-LLRSFmethod for two protocols of BU3DFE (a),(b)

extended form of LLRSF to compensate low numberof training samples using N nearest neighbors suc-cessfully outperforms all methods in both protocols.

4.3. Results on BU3DFE-P1 (35 viewpoints)

In this section, we propose the performance of ourapproach on the first protocol of BU3DFE. The con-fusion matrics between expressions are presented inFigure 4 where the largest confusion is occurred be-tween AN with SA and AN with DI. Our overall ac-curacy rate is 78.64% when we used 5-fold cross-validation, averaged across all subjects, expressions,poses on highest intensity level of expression onBU3DFE-P1. Performing comparison of our ap-proaches over the variations in pan and tilt, illustratedin Figure 5, note that the results in the Figure 5(a)is averaged across corresponding pan and the Fig-ure 5(b) is averaged across corresponding tilt angels.As can be seen, our regression models (LLRSF andSoft-LLRSF) are obviously better than other meth-ods. This is our expectation that local linear regres-sion outperforms global regression because of theprojections accuracy.

4.4. Results on BU3DFE-P2 (5 viewpoints)

Some related works evaluated their results on theprotocol 2; we have also applied our model by thisprotocol. First, we show the performance of ourapproach based on the viewpoints and expressionswhere it is demonstrated in Table 1. As can beseen, again Soft-LLRSF outperforms other methodshowever there is no improvement on LLRSF andGLRSF due to the small problem space (5 view-points) whereas it is clearly better than SF (base-line) which shows again proposed regression modelover sparse features improves overall recognitionrate. Moreover, with attention to the confusion ma-trix provided by Figure 4 (b) we can find that the

Page 7: Multi-view Facial Expressions Recognition using Local Linear … · 2019-02-19 · Multi-view Facial Expressions Recognition using Local Linear Regression of Sparse Codes Mahdi Jampour

(a) (b)

Figure 5: Proposed methods (SF, GLRSF, LLRSF and Soft-LLRSF) performance (a) over pan (b) and tilt

Table 2: Comparison of proposed model with thestate-of-the-art

Method Dataset Accuracy

Zheng et al. [32] BU3DFE-P1 68.20Tang et al. [20] BU3DFE-P1 75.30Tariq et al. [21] BU3DFE-P1 76.10Tariq et al. [22] BU3DFE-P1 76.34Tariq et al. [23] BU3DFE-P1 76.60Soft-LLRSF BU3DFE-P1 78.64

Huang et al. [9] BU3DFE-P2 72.47Hu et al. [8] BU3DFE-P2 74.46Soft-LLRSF BU3DFE-P2 76.64

largest confusion is again between AN and DI whichshows that the similarities of these two expressionsis usually more than other expressions. The overallfacial expression recognition rate in this protocol is75.07% for LLRSF although using Soft-LLRSF it is76.64% when we used 5-fold cross-validation, aver-aged across all subjects, expressions, poses and high-est intensity level of expression on BU3DFE. In thefollowing, we compare our approach with the state-of-the-art.

4.5. Comparison with the state-of-the-art

In this section, we provide a comparison of ourregression-based approach (Soft-LLRSF) with thestate-of-the-art. Table 2 depicts that our proposed ap-proach outperforms the state-of-the-art in both proto-cols of BU3DFE. A similar sparse coding approachproposed by [21] which achieved 76.10% accuracyon the same dataset whereas our model reasonablyoutperforms it with 78.64% due to employing localregression projection of sparse features.

5. Conclusion

In this paper, we introduced linear regression pro-jection of sparse features for multi-view facial ex-pression recognition where all facial features firstencoded to the sparse codes then they projected tothe frontal features and finally facial features recon-structed from projected sparse features. We proposedtwo methods of global linear regression of sparse fea-tures (GLRSF) and local linear regression of sparsefeatures (LLRSF) to solve the problem of multi-viewfacial expression recognition. Our methods are capa-ble to compensate the facial features of missing partsof the faces. In both methods, the features estimationof non-visible parts of the faces is estimated usingregression projection. We have shown that the pro-posed local regression based model for multi-viewfacial expression recognition outperforms not onlybaseline SF but also the stat-of-the-art approacheson both protocols of BU3DFE. Another advantageof our approach is that it does not need landmarkdetection, therefore, it is more suitable for practicalapplications. We start from an extremely low base-line (SF, GLRSF) compared to related work. There-fore, examination of alternative facial expression fea-tures and investigation of non-linear projections forapproximation of frontal-views would be the possi-ble directions for future works.

References

[1] M. Aharon, M. Elad, and A. Bruckstein. K-svd: Analgorithm for designing overcomplete dictionariesfor sparse representation. Signal Processing, IEEETransactions on, 54(11):4311–4322, 2006. 3, 5

[2] B. B. Amor, H. Drira, S. Berretti, M. Daoudi, andA. Srivastava. 4d facial expression recognition by

Page 8: Multi-view Facial Expressions Recognition using Local Linear … · 2019-02-19 · Multi-view Facial Expressions Recognition using Local Linear Regression of Sparse Codes Mahdi Jampour

learning geometric deformations. IEEE Transac-tions on Cybernetics, 44:1–16, 2014. 2

[3] X. Chai, S. Shan, X. Chen, and W. Gao. Lo-cally linear regression for pose-invariant face recog-nition. Image Processing, IEEE Transactions on,16(7):1716–1725, 2007. 1

[4] C.-C. Chang and C.-J. Lin. Libsvm: A library forsupport vector machines. ACM T. Intell. Syst. Tech-nol., 2(3), 2011. 3

[5] N. Dalal and B. Triggs. Histograms of oriented gra-dients for human detection. In CVPR, 2005. 2

[6] N. Hesse, T. Gehrig, H. Gao, and H. Ekenel. Multi-view facial expression recognition using local ap-pearance features. In ICPR, pages 3533–3536, 2012.2

[7] Y. Hu, Z. Zeng, L. Yin, X. Wei, J. Tu, andT. Huang. A study of non-frontal-view facial ex-pressions recognition. In ICPR, pages 1–4, 2008. 2

[8] Y. Hu, Z. Zeng, L. Yin, X. Wei, X. Zhou, andT. Huang. Multi-view facial expression recognition.In FGR, pages 1–6, 2008. 6, 7

[9] X. Huang, G. Zhao, and M. Pietikinen. Emo-tion recognition from facial images with arbitraryviews. In Proc. the British Machine Vision Confer-ence (BMVC 2013), Bristol, UK, page 11 p, 2013. 2,6, 7

[10] X. Huang, G. Zhao, M. Pietikinen, and W. Zheng.Dynamic facial expression recognition usingboosted component-based spatiotemporal featuresand multi-classifier fusion. ACIVS, pages 312–322,2010. 2

[11] T. Jabid, M. H. Kabir, and O. Chae. Facial ex-pression recognition using local directional pattern(LDP). In ICIP, 2010. 2

[12] I. Kotsia, S. Zafeiriou, and I. Pitas. Texture andshape information fusion for facial expression andfacial action unit recognition. Pattern Recognition,41(3), 2008. 2

[13] W. Liu, C. Song, and Y. Wang. Facial expres-sion recognition based on discriminative dictionarylearning. In ICPR, pages 1839–1842, 2012. 1

[14] S. Moore and R. Bowden. Multi-view pose and fa-cial expression recognition. In Proc. BMVC, 2010.2

[15] S. Moore and R. Bowden. Local binary patternsfor multi-view facial expression recognition. Com-puter Vision and Image Understanding, 115:541–558, 2011. 2

[16] I. Naseem, R. Togneri, and M. Bennamoun. Lin-ear regression for face recognition. Pattern Analy-sis and Machine Intelligence, IEEE Transactions on,32(11):2106–2112, 2010. 1

[17] O. Rudovic, I. Patras, and M. Pantic. Regression-based multi-view facial expression recognition. InICPR, pages 4121–4124, 2010. 2

[18] O. Rudovic, V. Pavlovic, and M. Pantic. Multi-output laplacian dynamic ordinal regression for fa-cial expression recognition and intensity estimation.In CVPR, pages 2634–2641, 2012. 1

[19] S. Taheri, P. K. Turaga, and R. Chellappa. To-wards view-invariant expression analysis using ana-lytic shape manifolds. In Automatic Face & GestureRecognition, 2011. 2

[20] H. Tang, M. Hasegawa-Johnson, and T. Huang.Non-frontal view facial expression recognitionbased on ergodic hidden markov model supervec-tors. In Multimedia and Expo (ICME), 2010IEEE International Conference on, pages 1202–1207, 2010. 6, 7

[21] U. Tariq, J. Yang, and T. Huang. Multi-view facialexpression recognition analysis with generic sparsecoding feature. ECCV, pages 578–588, 2012. 1, 2,6, 7

[22] U. Tariq, J. Yang, and T. Huang. Maximum margingmm learning for facial expression recognition. FG,pages 1–6, 2013. 6, 7

[23] U. Tariq, J. Yang, and T. Huang. Supervised super-vector encoding for facial expression recognition.Pattern Recognition Letters, 46:89–95, 2014. 6, 7

[24] Y. Tian, T. Kanade, and J. Cohn. Facial expres-sion recognition. In Handbook of Face Recognition,pages 487–519. 2011. 1

[25] R. Timofte, V. De, and L. V. Gool. Anchored neigh-borhood regression for fast example-based super-resolution. ICCV, pages 1920–1927, 2013. 2

[26] J. Tropp and A. Gilbert. Signal recovery from ran-dom measurements via orthogonal matching pur-suit. Information Theory, IEEE Transactions on,53(12):4655–4666, 2007. 3, 5

[27] X. Wang, T. Han, and S. Yan. An hog-lbp human de-tector with partial occlusion handling. ICCV, pages32–39, 2009. 3

[28] L. Xu and P. Mordohai. Automatic facial expressionrecognition using bags of motion words. In BMVC,pages 13.1–13.13, 2010. 1

[29] L. Yin, X. Wei, Y. Sun, J. Wang, and M. Rosato. A3d facial expression database for facial behavior re-search. In Automatic Face and Gesture Recognition,2006. FGR 2006. 7th International Conference on,pages 211–216, 2006. 5

[30] Z. Zeng, M. Pantic, G. Roisman, and T. Huang. Asurvey of affect recognition methods: Audio, visual,and spontaneous expressions. PAMI, 31(1):39–58,2009. 1

[31] D. Zhang, M. Yang, and X. Feng. Sparse represen-tation or collaborative representation: Which helpsface recognition? ICCV, pages 471–478, 2011. 2

[32] W. Zheng, H. Tang, Z. Lin, and T. Huang. Emo-tion recognition from arbitrary view facial images.In ECCV, pages 490–503. 2010. 2, 6, 7