Robust face recognition using posterior union model based neural networks

130

& T

www.ietdl.org

Published in IET Computer VisionReceived on 7th August 2008Revised on 3rd January 2009doi: 10.1049/iet-cvi.2008.0043

ISSN 1751-9632

Robust face recognition using posteriorunion model based neural networksJ. Lin1,2 J. Ming2 D. Crookes2

1School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu 610054,People’s Republic of China2School of Electronics, Electrical Engineering and Computer Science, Queen’s University Belfast, Belfast BT7 1NN, UKE-mail: [email protected]

Abstract: Face recognition with unknown, partial distortion and occlusion is a practical problem, and has a widerange of applications, including security and multimedia information retrieval. The authors present a newapproach to face recognition subject to unknown, partial distortion and occlusion. The new approach is basedon a probabilistic decision-based neural network, enhanced by a statistical method called the posterior unionmodel (PUM). PUM is an approach for ignoring severely mismatched local features and focusing therecognition mainly on the reliable local features. It thereby improves the robustness while assuming no priorinformation about the corruption. We call the new approach the posterior union decision-based neuralnetwork (PUDBNN). The new PUDBNN model has been evaluated on three face image databases (XM2VTS,AT&T and AR) using testing images subjected to various types of simulated and realistic partial distortion andocclusion. The new system has been compared to other approaches and has demonstrated improvedperformance.

1 IntroductionHuman face recognition has been widely explored and appliedin security, human-computer intelligent interaction, digitallibraries and robotics. There are many methods andtechniques that have been applied to facial recognition,including principal component analysis (PCA) [1–4],support vector machines (SVM) [5–8], linear discriminantanalysis (LDA) [9–11], and neural networks [12–15]).However, most of the systems designed to date work mainlyfor images that are captured under controlled conditions.They usually lack robustness when dealing with imagesinvolving unexpected mismatches, including, for example,mismatched poses, scale, facial expression and illumination.They are also sensitive to partial distortion and occlusion. Inthis paper, we focus on the problem of improving therobustness against local distortion and occlusion.

A number of techniques have been developed to deal withthe problem of face recognition with partial occlusion anddistortion. Many of these are based on the idea of‘recognition by parts’, also called local matching techniques.

he Institution of Engineering and Technology 2009

These techniques aim to focus the recognition on the parts ofthe images not affected by the distortion/occlusion, therebyimproving the robustness. The techniques comprise twostages: (i) dividing a face image into several local parts andrepresenting each part independently of the other parts, and(ii) combining the local matching scores from the individualparts into an overall score to reach a recognition decision. Tocombine the local matching scores into an overall decision, acommon approach is to use a pre-defined voting space (e.g.[16–21]). This approach works for matched training andtesting, but less so for the presence of mismatches betweenthe training and testing features. In an approach, which isdifferent from the voting space approach, Martinez [22]presented a probabilistic approach in which each partial imageis modelled by a Gaussian mixture model (GMM), and thefinal decision is based on the sum of the local likelihoodsfrom the individual GMMs. Recently, this approach has beenextended to include weights to deemphasise those localfeatures that are affected by facial expression changes, wherethe weights are estimated using an optical flow approach [23].An alternative approach to that by Martinez, described in[24], uses a self-organising map instead of the GMM to

IET Comput. Vis., 2009, Vol. 3, Iss. 3, pp. 130–142doi: 10.1049/iet-cvi.2008.0043

IETdoi

www.ietdl.org

represent each partial image. More recently, Jongsun et al. [25]proposed a part-based local representation approach, namely,locally salient ICA, which calculates robust features forimportant facial parts as a representation of the face. Su et al.[26] described the selection of discriminative Gabor Fisherpatches and the linear combination of multiple classifiers onthe selected features for face recognition. Additionally, Mittaland Sasi [27] presented a skin colour preprocessing approachto recover face images from distortion caused by beards.Heisele et al. [28] proposed a component-based method toface recognition with pose changes, which showed improvedrobustness over a global feature-based system. Otherapproaches also showing robustness to partial occlusioninclude the constellation models applied to object (e.g. car,tree, human face) detection (e.g. [29, 30]). In theconstellation models, a prior probability for the absence of alocal object is estimated during the training stage, and this isused in recognition to accommodate the possible partialocclusion using a Bayesian approach.

In this paper, we present a new approach, namely,the posterior union decision-based neural network(PUDBNN), as a complement to the above approaches forrecognising face images with partial distortion andocclusion. We assume some common types of partialmismatch on the images, for example, the addition ofsunglasses/beard/scarf, or the blackening/whitening of arandomly selected area of varying size (to simulateocclusion). Furthermore, we assume no advancedknowledge about the nature of the mismatch nor about theaffected areas. Our new approach employs a statisticalmethod, called the posterior union model (PUM), to dealwith unknown local distortion/occlusion. This is embeddedinto a probabilistic decision-based neural network(PDBNN) to provide robust recognition. PUM is anapproach for focusing the recognition on reliable features,thereby improving the mismatch robustness, whileassuming no prior information about the corruption. It haspreviously been applied to speech and speaker recognitionto select reliable frequency bands to improve the robustnessto unknown band-selective noise (e.g. [31]). In theproposed approach, PUM is incorporated into a PDBNNas a hidden layer, to select reliable image parts to improvethe robustness to unknown local mismatches. PDBNN is amodular and hierarchical architecture neural network. It hasthe merits of both neural networks and statisticalapproaches, and has been used for both face detection andrecognition [13]. The proposed combination of PUM andPDBNN, that is, PUDBNN, is a new way ofimplementing the ‘recognition by parts’ concept, not basedon a voting space, but based on the probability theory forthe union of random events, which is used as a model forthe uncertainty of the reliable image parts.

The remainder of this paper is organised as follows. Section2 introduces the face models used in the research and thetheory of PDBNN. The proposed new approach, includingPUM and its combination into PDBNN, is described in

Comput. Vis., 2009, Vol. 3, Iss. 3, pp. 130–142: 10.1049/iet-cvi.2008.0043

Sections 3 and 4. The results of experiments are presentedin Section 5. This is followed by the conclusions in Section 6.

2 Face modelling and PDBNNAssume that a person’s face image can be divided into N localimages (blocks), and the features of each local image arefound independently of the other local images. LetX ¼ (x1, x2, . . . , xN ) represent an entire face image, wherexn is the feature vector characterising the n’th local image.Further, assume that X for the face class (i.e. person) v canbe modelled by a GMM, that is

p(X jv) ¼XR

r¼1

p(rjv)p(X jv, r) (1)

where p(X jv, r) is the rth Gaussian component for class v,p(rjv) the prior probability (i.e. mixture weight) forcomponent r, and R is the number of Gaussian componentsin the model.

In the PDBNN, each face class is assigned a subnet, whichcalculates the logarithm of p(X jv). The discriminantfunction of the subnet for person v can thus be written asfollows [13]

f (X , uv) ¼ ln p(X jv) (2)

where uv denotes the parameter set of the subnet, whichincludes the mean vectors, covariance matrices and weightsof the individual Gaussian components, and a confidencethreshold.

The PDBNN approach, like other approaches, which baserecognition on the probability p(X jv), lacks robustness topartial distortion and occlusion. To show this, we can writep(X jv) as a product of the probabilities of the individuallocal features p(xnjv), assuming that xn are independent ofone another. That is

p(X jv) ¼ p(x1, x2, . . . , xN jv) ¼ p(x1jv)p(x2jv) � � � p(xN jv)

(3)

The probability p(X jv) produced by (3) is typicallydominated by small probabilities of local features, as aresult of the product. This characteristic makes the modeleffective in discriminating between correct and incorrectclasses based on local feature differences, but also makes itsensitive to partial distortion or occlusion. When the modelis trained on clean face images and is applied to a testimage with some partial distortion or occlusion, the modelprobabilities p(xnjv) for the corrupted xn may becomesmall for the correct class due to the mismatch caused bythe distortion or occlusion. When these small and randomlocal feature probabilities become dominant, the model’sability to produce a high probability for the correct classwill be destroyed.

131

& The Institution of Engineering and Technology 2009

13

&

www.ietdl.org

As an example, Fig. 1 shows the effect of a partial occlusionon f (X , uv), calculated based on (3), associated with thecorrect person for a task of recognising 20 people (classes).The model shown in the figure uses 16 local feature vectors tocharacterise an image, that is, X ¼ (x1, x2, . . . , x16), wherexn is the feature vector for the nth local image (The modeland features will be discussed in full detail in Section 5.).Fig. 1 shows two curves, f (X , uv) for the correct class v, andthe difference between f (X , uv) and f (X , uv� ) for the mostcompetitive incorrect class v�. Values are calculated for bothclean images without corruption and images with partialocclusion, the latter being a function of the number ofoccluded local images within the full 16 local images. Theocclusion was simulated by setting all the pixels in the affectedlocal areas to zero. The values are averaged over 20 test imagesin each case, with one test image for each person. Asindicated in Fig. 1, the partial occlusion leads to a lower valueof f (X , uv) for the correct person and most importantly, tolower likelihood comparisons between the correct person andthe competing people. These low likelihood comparisons leadto a high recognition error rate.

The above problem can be resolved by removing theprobabilities corresponding to the unreliable features fromthe product, leaving a marginal probability including only

Figure 1 Effect of a partial occlusion on f(X, uv)

a Log likelihoodsb Differences with the most competitive personBoth are averaged over 20 persons, as a function of the number ofoccluded local image parts including clean images withoutocclusion, showing that a partial occlusion reduces both thelikelihoods of the correct person and the likelihood comparisonswith the competitive persons

2The Institution of Engineering and Technology 2009

the reliable features in X for recognition. This is the ideaimplemented in our PUM, to be described below. PUM isan approach we use to identify the reliable features withoutassuming information about the corruption.

3 Posterior union modelAssume an N-part face representation X ¼ (x1, x2, . . . , xN ),and assume that some of the local features xn are corrupt – butknowledge about the number and identities of the corruptedxn is not available. Express the full feature set as twocomplementary subsets X ¼ (XI , XJ ), where XI indexed byI , {1, 2, . . . , N } represents the feature set containingreliable local features, and XJ indexed by J , {1, 2, . . . , N }with J

TI ¼ f represents the feature set containing noisy

and thus unreliable local features. We aim to base therecognition on XI thereby improving the robustness.Without assuming prior information about the corruption,this problem can be expressed as

[v, XI ] ¼ arg maxv,I ,{1,2,...,N }

p(vjXI ) (4)

The expression seeks to find the most likely class v by jointlymaximising the posterior probability p(vjXI ) over all classes vand all possible local feature subsets XI . Here XI is the optimalfeature subset, indexed by I , found for the most likely class v.Using Bayes’ rules the posterior probability p(vjXI ) can beexpressed as

p(vjXI ) ¼p(XI jv)p(v)Pv0 p(XI jv

0)p(v0)(5)

where p(XI jv) is the marginal conditional probability offeature set XI given class v, that is

p(XI jv) ¼

ðp(XI , XJ jv)dXJ (6)

and p(v) is a prior probability for v. The summation in thedenominator of (5) is over all classes. We can show that, forthe correct class v, the posterior probability p(vjXI ) is likelyto reach maximum when the reliable subset XI contains allthe reliable local features. To show this, assume that XI is afeature set consisting of reliable local features and v is thecorrect class such that p(XI jv) � p(XI jv

0) for any v0 = v.Express XI as a union of two subsets: XI1

andthe complement XI2

, and assume independence between thelocal features. We can have

p(XI jv)

p(XI jv0)¼

p(XI1jv)p(XI2

jv)

p(XI1jv0)p(XI2

jv0)

�p(XI1jv)

p(XI1jv0)

(7)

The inequality is obtained because p(XI2jv)=p(XI2

jv0) � 1given the assumption that the reliable feature subset XI2

achieves the maximum likelihood over the correct class v.


IETdo

www.ietdl.org

Rewrite the posterior probability (5) in terms of the likelihoodratios, that is

p(vjXI ) ¼p(v)

p(v)þP

v0=v ( p(XI jv0)=p(XI jv))p(v0)

(8)

Substituting inequality (7) into (8) we can obtain

p(vjXI ) � p(vjXI1) (9)

Equation (9) indicates that the posterior probability for thecorrect class increases when more reliable features areincluded in the computation. Thus, by maximising theposterior probability for each potential class we may find allthe reliable features associated with the correct class. Becauseof this maximised model-feature match, the correct class islikely to have the maximum posterior probability among allthe incorrect classes. This explains (4) for identifying thecorrect class by choosing the maximum among the posteriorprobabilities each maximised for the feature subset for apotential class.

Searching for the optimal subset of local features tomaximise the posterior probability can be computationallyexpensive, of a complexity O(2N ), for a system using a largenumber of local features N. This problem can be relievedby replacing the conditional probability p(XI jv), for thesought optimal subset XI for any class v, with theprobability of the union of all feature subsets in X of thesame size as XI . To express this, we assume that there areQ local features in XI and we indicate this explicitly byrewriting XI as XIQ

, where Q is the number of localfeatures in XIQ

and IQ ¼ (n1, n2, . . . , nQ) is the index setof these local features with each ni [ (1, 2, . . . , N ). Theprobability of the union of all feature subsets XIQ

, Xwhere IQ ¼ (n1, n2, . . . , nQ) can be expressed as

p[

IQ,{1,2,...,N }

XIQjv

0@

1A/ X

IQ,{1,2,...,N }

p(XIQjv)

¼X

n1n2...nQ

p(xn1jv)p(xn2

jv) � � �p(xnQjv)

(10)

where the last summation is over all possible combinations ofn1, n2, . . . , nQ with each ni [ (1, 2, . . . , N ). In (10), theproportionality is due to ignoring the probabilities of theintersections between different XIQ

as they are small incomparison to the non-intersection terms; the last equationis obtained by assuming independence between the localfeatures with p(xnjv) being the conditional probability ofthe nth local feature associated with class v. An example isprovided in Appendix A.

Since (10) is a sum of the marginal conditionalprobabilities of all Q-sized local feature subsets, it containsthe marginal conditional probability of the optimal subset

Comput. Vis., 2009, Vol. 3, Iss. 3, pp. 130–142i: 10.1049/iet-cvi.2008.0043

XIQ. For v to be the correct class and XIQ

to be the featureset containing all the reliable features and no others, we canassume that the sum will be dominated by p(XIQ

jv), that is

p[

IQ,{1,2,...,N }

XIQjv

0@

1A/ p(XIQ

jv) (11)

There are two reasons for this domination. First, becauseof the maximised model-feature match, p(XIQ

jv) will reachthe maximum among all XIQ

. Second, the other XIQ

will each contain at least one noisy local feature, with acorrespondingly lower p(XIQ

jv), and hence make only a

small contribution to the sum. So p(XIQjv) dominates (see

Appendix A for an example). Therefore the unionprobability (10) may be used in place of p(XI jv) foridentifying v, in the sense that both produce large valuesfor the correct class. Before we discuss the computationaladvantage of this approximation, we further show anexperiment justifying this approximation. Fig. 2 shows acomparison between two sets of probabilities produced forrecognising one person from 100 persons. One set ofprobabilities were generated based on p(XI jv) assumingthat the optimal index set I is known a priori (this is anOracle model which will be discussed in detail in Section5), and the other set of probabilities were generated from(10) assuming that I is not known but the size of I , that is,Q, is known and this is all that is needed to calculate theunion probability. Each face image was represented using16 local images, among which different numbers of localimages were artificially occluded for the experiment. Asshown in Fig. 2, the two sets of probabilities are clearlycomparable for the correct class.

Note that the union probability is not a function of theidentity of I but only a function of the size of I , that is, Q.With the approximation, therefore we effectively reduce theproblem of finding the set of optimal local features to findingthe number of optimal local features but not the exact set.

While computing p(XIQjv) for all possible XIQ

for Q ¼ 1 to

N involves 2N possible combinations, computing the unionprobability (10) concerning the sum of the probabilities of allthese combinations can be done using an efficient recursivealgorithm, which involves much less computation. Thealgorithm is illustrated in Appendix B. With theapproximation, the problem (4) can thus be rewritten as

[v, Q] ¼ arg maxv,1�Q�N

p(vjXIQ) (12)

where, by definition

p(vjXIQ) ¼

p(XIQjv)p(v)P

v0 p(XIQjv0)p(v0)

/p(S

IQ,{1,2,...,N } XIQjv)p(v)P

v0 p(S

IQ,{1,2,...,N } XIQjv0)p(v0)

(13)

133


134

& T

www.ietdl.org

Figure 2 Comparing the log probabilities between anOracle model and a union approximation, for the correctclass (No. 11) against 99 incorrect classes

From top to bottom, the face images included one to fiveoccluded local images


The last equation is obtained by replacing p(XIQjv) with the

appropriate union probabilities to allow for an efficient,approximate computation for optimising the featureselection. The above model, called the PUM, can beincorporated into a PDBNN to improve the robustness tounknown partial distortion/occlusion, to be discussedbelow.

Note that the union approximation (11) lies between theproduct model (3) and the Gaussian sum (GS) model[22], and includes both as its special cases. The productmodel is a special case of the union model whenQ ¼ N , leading to an overall likelihood equalling theproduct of the individual feature likelihoods. The GSmodel is a special case of the union model when Q ¼ 1,leading to an overall likelihood equalling the sum of theindividual feature likelihoods. The union model with1 , Q , N leads to an overall likelihood equalling thesum of the products of Q individual feature likelihoods,where Q corresponds to the number of reliable localfeatures. Combining reliable features using a productenables each feature to reinforce the other for thediscrimination. Previously, the union model has beenapplied to speech and speaker recognition. For moredetails, see [31–33].

4 Posterior union decision-basedneural network4.1 Model

The above PUM can be incorporated into a PDBNN, whichuses a GMM to represent a face (i.e., (1)), by replacing eachmixture Gaussian component with a posterior unionprobability. To obtain the new expression, we rewrite (1) as

p(X jv) ¼p(vjX )

p(v)p(X )

¼

PRr¼1 p(v, rjX )

p(v)p(X ) (14)

The last term in (14), p(X ), is not a function of the classindex and thus has no effect on recognition. Assuming anequal prior probability p(v) for all classes, we obtain

p(X jv)/XR

r¼1

p(v, rjX ) (15)

where p(v, rjX ) is a posterior probability of class index v

and mixture index r given feature set X. Following (13), wecan define a posterior union probability to replace eachp(v, rjX ). The union probability is a function of theoptimal number of local features Q. We write the newposterior as p(v, rjXIQ

) and, based on (13), (10) and


IETdoi

www.ietdl.org

assuming all classes having an equal prior probability, it canbe expressed as (as shown at the bottom of the page)

where p(xnjv, r) is the Gaussian distribution of the nth localfeature associated with class index v and mixture index r, andp(rjv) is the corresponding mixture weight. Substitutingp(v, rjXIQ

) into (15) as the mixture component, we obtaina mixture model as a function of the optimal number oflocal features, which can be expressed as

p(X jv, Q)/XR

r¼1

p(v, rjXIQ) (17)

The most likely class v can be estimated by jointlymaximising p(X jv, Q) over all classes and all possible Q,that is

[v, Q] ¼ arg maxv,Q

p(X jv, Q) (18)

The new model (18) implements a joint optimisation overthe feature estimation and class identification. It improvesover the conventional GMM, or the product model, byfocusing the recognition on optimal local features subject toa maximum probability criterion. Ignoring the local featuresproducing small probabilities effectively reduces the effectof partial distortion or occlusion on the recognition.Implementing the model using the union probabilityformulation reduces the problem of finding the set ofoptimal local features to finding the number of optimallocal features. This makes the new model computationallytractable and efficient. Note that the new model does notassume any prior knowledge about the corruption.

On the basis of the new model (18), we define a newdiscriminant function for class v

g(X , Q, uv) ¼ ln p(X jv, Q) (19)

where uv denotes the parameter set for class v, including themean vectors and covariance matrices of the mixture Gaussiandistributions p(xnjv, r) of the individual local features xn, andthe weights p(rjv). A neural subnet is implemented tocalculate g(X , Q, uv) for each class v. A decision-basednetwork combining the subnets of all the classes isimplemented for recognition. We call the new networkPUDBNN. Fig. 3 shows the structure of the proposedPUDBNN, in which G(X , Q, v, r) is a shorthand expression


for the following union-based probability

G(X , Q, v, r)¼X

xn1xn2

...xnQ

p(xn1jv, r)p(xn2

jv, r) � � �p(xnQjv, r)

(20)

This can be calculated using the recursive algorithm describedin Appendix B.

4.2 Learning algorithms for PUDBNN

The PUDBNN can be trained similarly to the PDBNN,by using a scheme which includes two steps: (i) locallyunsupervised learning, and (ii) globally supervised learning.During the locally unsupervised training phase, each subnet istrained individually using the training data for each class.Since the posterior union probabilities p(v, rjXIQ

) (i.e., (16))

are formed from the union-based probabilities G(X , Q, v, r)

Figure 3 Schematic structure of the proposed PUDBNN

a Illustrating the network computing a single G(X, Q, v, r)b Computing the PUM

p(v, rjXIQ)/

p(S

IQ,{1,2,...,N } XIQjv, r)p(rjv)P

v0

PRr0¼1 p(

SIQ,{1,2,...,N } XIQ

jv0, r0)p(r0jv0)

’P

n1n2...nQp(xn1jv, r)p(xn2

jv, r) � � � p(xnQjv, r)p(rjv)P

v0

PRr0¼1

Pn1n2...nQ

p(xn1jv0, r0)p(xn2

jv0, r0) � � � p(xnQjv0, r0)p(r0jv0)

(16)

135


136& T

www.ietdl.org

(i.e., (20)), in the training stage we only estimate G(X , Q, v, r).Further, we assume that clean face image data are used in thetraining. Since there is no feature corruption for clean trainingdata, all local features are used in the computation, that is,Q ¼ N . Hence G(X , Q, v, r) is reduced to a conventionalGaussian component p(X jv, r) as in (1) which uses the fullset of features (i.e., the product model (3)). Thus the locallyunsupervised training for the PUDBNN can be implementedusing the same techniques as for the traditional PDBNN. Inour system, this is achieved by using an EM algorithm.

Following the locally unsupervised training, a globallysupervised training is conducted by using a decision-basedlearning rule, which reinforces or anti-reinforces thedecision boundaries obtained by the locally unsupervisedtraining. When a training face X belonging to class v ismisclassified to class c, the following reinforced and anti-reinforced learning procedures are applied [12]

u jþ1v ¼ u j

v þ hrg(X , Q ¼ N , u jv)

ujþ1c ¼ u

jc � hrg(X , Q ¼ N , u

jc)

(21)

where u jv represents an estimate of the class v’s parameter set

after j iterations, g(X , Q ¼ N , u jv) is the subnet discriminant

function for class v as defined in (19), with Q ¼ N meaningthat all the local image features are used in the computationfor clean image training, r is the gradient operator and h is asmall positive constant determining the adapting size. In(21), the first equation adapts the parameter set of thecorrect class v to increase its discriminant function value,while the second equation adapts the parameter set of theincorrect class c to decrease its discriminant function value,for the given training image X.

In the above globally supervised training with all localfeatures used in the computation, the subnet discriminantfunction g(X , Q ¼ N , uv) for the new PUDBNN can bewritten as

g(X , Q ¼ N , uv) ¼ ln p(X jv, Q ¼ N )

/ lnp(X jv)p(v)

p(X )

¼ lnp(v)

p(X )þ ln p(X jv) (22)

Assuming a uniform prior p(v), it can be seen that thederivatives of g(X , Q ¼ N , uv) with respect to uv equalsthe same derivatives for f (X , uv), which is the subnetdiscriminant function of the PDBNN defined in (2). Sothe same gradient ascent algorithm can be used in theglobally supervised training to learn the parameters for thenew PUDBNN, as that used for the conventionalPDBNN. Building up a PUDBNN recognition system isthus as efficient as building up a normal PDBNN system.The difference between the two approaches lies in therecognition, where the PDBNN system uses the full set of


local features while the PUDBNN system uses optimallyselected features.

5 ExperimentsIn this section, we describe the experiments conducted toevaluate the effectiveness of the new PUDBNN system fordealing with partial distortion and occlusion. We considered avariety of commonly encountered partial mismatches on theimages, for example, the addition of sunglasses/beard/scarf,and the blackening/whitening of a randomly selected area ofvarying size. We validated the new model using differentdatabases. Furthermore, we used both simulated andrealistically corrupted images. Controlled simulation was usedto supplement realistic data to allow the new model to betested under a variety of conditions.

We used three facial image databases in the experiments,which are XM2VTS [34], AT&T [35] and AR [36]. Inaddition to our new PUDBNN system, we also implementedfour systems for comparison. They are as follows:

1. A PDBNN-based system [13], that is built on a productmodel (i.e. (3)) that uses the full set of local features forrecognition. As described in (11), this model is included asa special case in the new PUDBNN model when Q ¼ N .

2. A GS model [22] for dealing with partial mismatches,which builds a GMM for each local feature and bases therecognition on the sum of the individual GMM scores. Asdescribed in (11), this model is included as a special case inthe new PUDBNN model when Q ¼ 1.

3. An oracle model, which assumes full a priori knowledgeabout the corrupted local features and manually removes thesefeatures from recognition. We have tried to implement otherrobust methods but found that it was difficult to achieve a fulloptimisation. The oracle model, thus, was used instead toserve as an ‘ideal’ recognition-by-parts model for comparison.

4. Finally, we implemented a global PCA-based system,serving as an example for the effect of partial mismatch onglobal feature-based recognition. The PCA coefficients werecalculated using the algorithm described in [1], andrecognition was performed using an associative memorymodel [37].

In our experiments, we dealt with gray-scale images. Aspreprocessing, we localised the face within each image andresized each face image to 100 � 100 pixels. Then wecalculated the local image features used by our PUDBNNsystem for recognition. The local features were obtained byfirst performing a 2-level db4 wavelet transform on each faceimage, only retaining the lowest resolution sub-band with asize 28 � 28 (with a boundary extension), and then dividingit uniformly into 16 non-overlapping local transform 7 � 7‘images’. Each local image thus contains a feature vector of 49coefficients. The combination of all the 16 local feature


IETdoi

www.ietdl.org

vectors form the full feature set X ¼ (x1, x2, . . . , xN ), whereN ¼ 16, for a face image. All the systems implemented inthe paper for comparison used the same local feature set,except the PCA system, which used a 160-dimensional globalfeature vector to represent each face image. Gaussian densityfunctions with diagonal covariances were used in thePDBNN, PUDBNN, GS and oracle models.

5.1 Experiments on two databasesXM2VTS and AT&T with simulatedcorruption

First, experiments were conducted on two databases withsimulated corruption. The first database was the XM2VTSfacial database. The database contains 295 persons ofdifferent races, genders and ages. Each person consists of fourdifferent images. There are variations in facial expressionssuch as open/closed eyes, smiling/non-smiling, and facialdetails such as glasses/no glasses. All of the images weretaken in a homogeneous illumination and background. Fig. 4shows examples of the face images used in the experiments.We have run four recognition experiments on the database,each experiment including 100 persons selected randomlyfrom the database. Of the four images for each person, threewere used to train the model (Fig. 4a), and the remaining onewas used to create two testing sets, one set simulating partialdistortion and the second set simulating partial occlusion.Partial distortion was simulated by adding sunglasses, beard(for male) or scarf (for female), and their combination,respectively, to the original image. This testing set alsoincluded the original clean image, and thus contained fourtesting conditions (Figs. 4b–4e). Partial occlusion wassimulated by setting all the pixels of a randomly selectedsquare of size k � k pixels to 0 and 255, respectively. Wetested vales of k from a minimum of 10 to a maximum of 50,increasing 5 at each step. This contained a total of 18 testingconditions with examples shown in Figs. 4f and 4g.

Table 1 shows the recognition accuracy rates obtained by thevarious systems on the XM2VTS database, with clean andpartially distorted images. The rates are averaged over the four


experiments, each involving 100 persons, as described above.Table 1 indicates that on the clean images, all the systemsachieved similar recognition accuracy except the GS model,which performed less well. The reason is that the sumoperation in the GS model, effectively averages the ability ofeach local feature to discriminate between correct andincorrect classes, unlike the PDBNN and PUDBNN models,which use product combination where each local featurereinforces the other. On the distorted images, the newPUDBNN outperformed all the other systems, especially forthe beard/scarf, and combined beard/scarf-sunglassesdistortions. The PDBNN system used the full set of featuresin a product model and its performance was thus impaired bythe distorted features. The GS model showed robustness tothe distortions. This is because the noisy local features tendedto be ignored in the sum operation due to their lowprobabilities. However, unlike the new PUDBNN, the GSmodel averages the contribution of each remaining cleanfeature, which explains its poorer performance thanPUDBNN for both clean and noisy testing conditions. ThePCA system suffered from the local distortions being spreadover the entire feature space. The PUDBNN model alsoshowed the possibility to outperform the ‘binary’ oracle modelimplemented in the experiments. This is because some localimages were only partially affected by the distortion. We have

Table 1 Recognition accuracy rates (%) for partiallydistorted images on the XM2VTS database, by the newPUDBNN system, compared to the PDBNN, GS, Oracleand PCA systems

Distortiontype

PUDBNN PDBNN GS Oracle PCA

clean 96.5 95.0 83.5 95.0 93.5

sunglasses 94.2 92.5 80.2 93.5 66.2

beard/scarf 90.0 20.5 78.2 88.2 61.0

combined 89.5 16.7 72.5 85.3 50.5

average 92.5 56.1 78.6 90.5 67.8

Figure 4 Examples of the face images used in the experiments

a Clean training imagesb Clean testing imagesc Testing images with partial distortion by sunglassesd Beard/scarfe Combinationf–g Testing images with partial occlusion

137


13

&

www.ietdl.org

tried to maximise the oracle model’s performance by alternatingbetween the selection and deselection of each partially distortedlocal image. The performance shown in Table 1 is the best weobtained. Throwing away some partially distorted local featuresimproved the performance. However, this also caused a loss ofuseful information. So a ‘soft’ rather than a binary decision ispreferred when deciding to include or exclude a particularlocal image. The union model (10) provides such a soft-decision mechanism. In the model, while the noisy localimages with small probabilities can be largely ignored, theyare not physically removed from recognition. As such eachlocal image retains a contribution to recognition, proportionalto its probability value.

Fig. 5 shows the accuracy rates for the four systems,PUDBNN, PDBNN, GS and PCA, tested with the imageswith partial blackening/whitening occlusion, as describedabove, on the XM2VTS database. The performances areshown as a function of the size of the occluded areas. Wehave observed similar performance improvement for the newPUDBNN model over the other systems.

Further experiments were conducted on a second database,the AT&T database. The database contains 40 persons andeach person has 10 face images. We randomly selected fiveimages for each person to train the model, and used theremaining five for each person for testing. As in the aboveexperiments with the XM2VTS database, we added

Figure 5 Recognition accuracy for partially occludedimages on the XM2VTS database, as a function of thesize k of the occluded area (k � k pixels), by the newPUDBNN system, compared to the PDBNN, GS and PCAsystems

Table 2 Recognition accuracy rates (%) for partiallydistorted images on the AT&T database, averaged over fourtesting conditions, one without distortion and three withpartial distortions

Distortion type PUDBNN PDBNN GS PCA

average 84.5 55.2 56.7 53.5


sunglasses, beard/scarf, and their combination, respectively, tothe test images in the AT&T database to simulate the effectof partial distortion, and set the pixels of a randomly selectedsquare of size from 10 � 10 up to 50 � 50 pixels on the testimages to 0 and 255, respectively, to simulate the effect ofpartial occlusion. The recognition results were summarised inTable 2 and Fig. 6. Table 2 shows the recognition accuracy

Figure 7 Comparing the new PUDBNN model and an Oraclemodel for recognising partially occluded images, showingthe accuracy as a function of the number of corruptedlocal image parts

a XM2VTS databaseb AT&T database

Figure 6 Recognition accuracy for partially occludedimages on the AT&T database, as a function of the size kof the occluded area (k � k pixels)


IETdoi:

www.ietdl.org

Figure 8 Examples of the images in the AR database used in the experiments

a Trainingb Testing

averaged over the four testing conditions, one without distortionand three with partial distortions (sunglasses, beard/scarf andtheir combination). Fig. 6 shows the recognition accuracy withthe test images subject to partial blackening/whiteningocclusion. The results on the AT&T database have furtherdemonstrated the improved robustness for the new PUDBNNmodel over the other models. The above results have extendedour previous preliminary studies reported in [38].

Finally, a further comparison was performed between thenew PUDBNN model and the oracle model, using testingimages with partial blackening/whitening occlusion. Thisoracle model experiment is unlike the previous one describedin Table 1 for the images shown in Fig. 4, in which a softdecision may be needed for local images only partially affectedby the corruption. In this experiment, we first divided eachoriginal test image uniformly into 16 non-overlapping localimages (each of a size 25 � 25), and then simulated theocclusion by setting all the pixels of a number of randomlyselected local images to 0 and 255, respectively. Thus, a localimage is either completely clean, or completely corrupted, andfor which a binary oracle model would be close to an idealmodel. The number of corrupted local images was from oneto six, respectively, within the total 16 local images of eachface. In recognition, the PUDBNN model assumed no priorknowledge about the corruption, while the oracle modelassumed exact knowledge about the number and locations ofthe corrupted areas and removed these from the computation.The comparison results are shown in Fig. 7 for both theXM2VTS and AT&T databases. We see that the PUDBNNmodel was able to perform as well as the oracle model untilsix local images had been corrupted. The PUDBNNperformed slightly better than the oracle model in some cases(for example, with up to four corrupted parts on theXM2VTs database, and with one, four and five corruptedparts on the AT&T database). This is because, in addition tothe corruption, there also exists mismatches on facialexpression between the training and testing images. Selecting

Table 3 Recognition accuracy rates (%) on the AR database

PUDBNN PDBNN GS PCA

67.7 36.2 51.7 51.3

Comput. Vis., 2009, Vol. 3, Iss. 3, pp. 130–14210.1049/iet-cvi.2008.0043

optimal features in the PUDBNN model reduced theinfluence of not only the corruption, but also these additionalmismatches, on the recognition.

5.2 Experiments on the AR databasewith realistic corruption

Further experiments were conducted to evaluate the newPUDBNN model using the AR database, which containsrealistic corruptions. The data set used in our experimentscontains 900 frontal facial images from 100 subjects (nineimages per subject). In addition to containing local distortionsand occlusions, these images also contain different illuminationconditions, scales, and degrees of rotation. In our experiments,we used three images without occlusion to train a model for eachperson and the remaining six images with occlusions from theperson for testing. Fig. 8 shows examples of the face imagesfor two people used in the training and testing.

The recognition results are presented in Table 3. Again, thenew PUDBNN system offered improved robustness over theother systems. The improvement appears to be smaller thanthe improvements on the previous two databases. This ismainly due to the additional variations in the database, asmentioned above, which cause training and testingmismatches which are not necessarily local.

6 ConclusionsPartial distortion and occlusion on face images can cause seriousproblems for conventional face recognition algorithms, asshown in the paper. In this paper, we described a newapproach, namely, the PUDBNN, to address the problem.The new model uses a posterior union approach to selectoptimal local features for the recognition, which improves themismatch robustness while assuming no prior informationabout the distortion. The new model has been tested on threedatabases: XM2VTS, AT&T and AR, involving various typesof simulated and realistic partial distortion and occlusion. Theexperimental results have demonstrated improved performancefor the new model on all three databases. Our future work willbe focused on extending the capability of the new model fordealing with other types of image distortion, for example,mismatched illumination, scales or rotation. Thesemismatches are not necessarily local and may be addressed

139


14

&

www.ietdl.org

by combining the new model with existing techniques.Conventional compensation/normalisation techniques maybe used to reduce the global mismatch, while the newPUDBNN approach may be used to deal with the remainingpartial mismatches due to inaccurate compensation ornormalisation. Furthermore, the PUDBNN, like otherGMM-based statistical methods, is inaccurate to modelclasses given only a single or a small number of trainingsamples. In our further work, we will study the application ofthe union principle to other models capable of performingrecognition based on a small number of training samples.

7 References

[1] TURK M.A., PENTLAND A.P.: ‘Eigenfaces for recognition’,Cogn. Neurosci., 1991, 3, pp. 71–86

[2] BELHUMEUR V., HESPANDA J., KIREGEMAN D.: ‘Eigenfaces vs.fisherfaces: recognition using class specific linear projection’,IEEE Trans. Pattern Anal. Mach. Intell., 1997, 19, pp. 711–720

[3] RAJAGOPALAN A.N., CHELLAPPA R., KOTERBA N.T.: ‘Backgroundlearning for robust face recognition with PCA in thepresence of clutter’, IEEE Trans. Image Process., 2005, 14,pp. 832–843

[4] GUDUR N., ASARI V.: ‘Gabor wavelet based modular PCAapproach for expression and illumination invariant facerecognition’. The 35th Applied Image Pattern RecognitionWorkshop (AIPR 2006), Washington, DC, USA, 2006,pp. 13–18

[5] GUO G., LI S.Z., CHAN K.: ‘Face recognition by supportvector machines’. IEEE International Conference onAutomatic Face and Gesture Recognition (FG’2000),Grenoble, France, 2000, pp. 196–201

[6] HSU C.W., LIN C.J.: ‘A comparioson on methods for multi-class support vector machines’, IEEE Trans. Neural Netw.,2002, 13, pp. 415–425

[7] LIU Y.H., CHEN Y.T.: ‘Face recognition using total margin-based adaptive fuzzy support vector machines’, IEEETrans. Neural Netw., 2007, 18, pp. 178–192

[8] KOTSIA I., PITAS I.: ‘Facial expression recognition in imagesequences using geometric deformation features andsupport vector machines’, IEEE Trans. Image Process., 2007,16, pp. 172–187

[9] SWETS D.L., WENG J.: ‘Using discriminant eigenfeatures forimage retrieval’, IEEE Trans. Pattern Anal. Mach. Intell.,1996, 18, pp. 831–836

[10] XIANG C., FAN X.A., LEE T.H.: ‘Face recognition usingrecursive fisher linear discriminant’, IEEE Trans. ImageProcess., 2006, 15, pp. 2097–2105


[11] MARIOS K.: ‘Weighted piecewise LDA for solving thesmall sample size problem in face verification’, IEEE Trans.Neural Netw., 2007, 18, pp. 506–519

[12] KUNG S.Y., TAUR J.S.: ‘Decision-based neural networks withsignal/image classification applications’, IEEE Trans. NeuralNetw., 1995, 6, pp. 170–181

[13] LIN S.H., KUNG S.Y.: ‘Face recognition/detection byprobabilistic decision-based neural network’, IEEE Trans.Neural Netw., 1997, 8, pp. 114–132

[14] ZHANG H.H., ZHANG B.L., HUANG W.M., TIAN Q.: ‘Gabor waveletassociative memory for face recognition’, IEEE Trans. NeuralNetw., 2005, 16, pp. 275–278

[15] LU J.M., YUAN X., YAHAGI T.: ‘A method of facerecognition based on fuzzy c-means clustering andassociated sub-NNs’, IEEE Trans. Neural Netw., 2007, 18,pp. 150–160

[16] BRUNELLI R., FALAVIGNA D.: ‘Person recognition usingmultiple cues’, IEEE Trans. Pattern Anal. Mach. Intell.,1995, 17, pp. 955–966

[17] OHBA K., IKEUCHI K.: ‘Detectability, uniqueness, andreliability of eigen windows for stable verification ofpartially occluded objects’, IEEE Trans. Pattern Anal.Mach. Intell., 1996, 19, pp. 1043–1048

[18] HUANG C.Y., CAMPS O.I., KANUNGO T.: ‘Object recognitionusing appearance-based parts and relations’. IEEEConference on Computer Vision and Pattern Recognition(CVPR’97), Puerto Rico, USA, 1997, pp. 878–884

[19] MOGHADDAM B., PENTLAND A.: ‘Probabilistic visual learningfor object representation’, IEEE Trans. Pattern Anal. Mach.Intell., 1997, 19, pp. 696–710

[20] HOTTA K.: ‘View independent face recognition based onkernel principal component analysis of local parts’. IEEEInternational Conference on Image Process (ICIP’05),Genova, Italy, 2005, pp. 760–763

[21] ZHANG W.C., SHAN S.G., CHEN X.L., GAO W.: ‘Local Gabor binarypatterns based on Kullback–Leibler divergence for partiallyoccluded face recognition’, Signal Process. Lett., 2007, 14,pp. 875–878

[22] MARTINEZ A.M.: ‘Recognizing imprecisely localized,partially occluded, and expression variant faces from asingle sample per class’, IEEE Trans. Pattern Anal. Mach.Intell., 2002, 24, pp. 748–763

[23] ZHANG Y., MARTINEZ A.M.: ‘A weighted probabilisticapproach to face recognition from multiple images andvideo sequences’, Image Vis. Comput., 2006, 24,pp. 626–638


IETdo

www.ietdl.org

[24] TAN X.Y., CHEN S.C., ZHOU Z.H., ZHANG F.Y.: ‘Recognizing partiallyoccluded, expression variant faces from single training imageper person with SOM and soft K-NN ensemble’, IEEE Trans.Neural Netw., 2005, 16, pp. 875–886

[25] JONGSUN K., JONGMOO C., JUNEHO Y., TURK M.: ‘Effectiverepresentation using ICA for face recognition robust tolocal distortion and partial occlusion’, IEEE Trans. PatternAnal. Mach. Intell., 2005, 27, pp. 1977–1981

[26] SU Y., SHAN S.G., CHEN X., GAO W.: ‘Patch-based Gabor fisherclassifier for face recognition’. International Conference onPattern Recognition (ICPR’2006), Hong Kong, China, 2006,pp. 528–531

[27] MITTAL G., SASI S.: ‘Robust preprocessing algorithmfor face recognition’. 3rd Canadian Conference onComputer and Robot Vision (CRV’2006), Quebec, Canada,2006, p. 57

[28] HEISELE B., HO P., WU J., POGGIO T.: ‘Face recognition:component-based versus global approaches’, Comput. Vis.Image Underst., 2003, 91, pp. 6–12

[29] WEBER M., WELLING M., PERONA P.: ‘Unsupervised learning ofmodels for recognition’. European Conference on ComputerVision (ECCV’2000), Dublin, Ireland, 2000, p. 7

[30] LI F.F., FERGUS R., PERONA P.: ‘Learning generativevisual models from few training examples: an incrementalBayesian approach tested on 101 object categories’.2004 Conference on Computer Vision and PatternRecognition Workshop (CVPRW’04), Washington, DC, USA,2004, p. 178

[31] MING J., LIN J., SMITH F.J.: ‘A posterior union model withapplications to robust speech and speaker recognition’,EURASIP J. Appl. Signal Process., 2006, Article ID 75390

[32] MING J., JANCOVIC P., SMITH F.J.: ‘Robust speech recognitionusing probabilistic union models’, IEEE Trans. Speech AudioProcess., 2002, 10, pp. 403–414

[33] MING J., SMITH F.J.: ‘Speech recognition with unknownpartial feature corruption – a review of the union model’,Comput. Speech Lang., 2003, 17, pp. 287–305

[34] MESSER K., MATAS J., KITTLER J., LUETTIN J., MAITRE G.: ‘XM2VTSDB:the extended M2VTS database’. International Conference onAudio- and Video-Based Biometric Person Authentication,Washington, DC, USA, 1999, pp. 72–77

[35] http://www.uk.research.att.com/facedatabase.html,AT&T Laboratories Cambridge Facial Database, accessedOctober 2007

[36] MARTINEZ A.M., BENAVENTE R.: ‘The AR face database’. CVCTechnical Report 24, June 1998

Comput. Vis., 2009, Vol. 3, Iss. 3, pp. 130–142i: 10.1049/iet-cvi.2008.0043

[37] VALENTIN D., ABDI H., EDELMAN B.: ‘What represents a face? acomputational approach for the integration of physiologicaland psychological data’, Perception, 1997, 26, pp. 1271–1288

[38] LIN J., MING J., CROOKES D.: ‘A probabilistic union approachto robust face recognition with partial distortion andocclusion’. International Conference on Acoustics, Speech,and Signal Processing (ICASSP’2008), Las Vegas, USA,2008, pp. 993–996

8 Appendix A: an example of theprobability of unionConsider an example in which X is a three-component featureset X ¼ (x1, x2, x3) and XI2

is a subset containing two featurecomponents (i.e. Q ¼ 2). The probability of the union of allXI2

is given by (we have omitted the class index for simplicity)

p(x1x2 < x1x3 < x2x3) ¼ p(x1x2)þ p(x1x3)þ p(x2x3)

� p(x1x2 > x1x3)� p(x1x2 > x2x3)� p(x1x3 > x2x3)

þ p(x1x2 > x1x3 > x2x3)

which equals the sum of the marginal probabilities of theindividual XI2

after ignoring the probabilities of theirintersections, that is, (10).

Suppose that in X there is one component (say x1) that isnoisy but the identity of the noisy component is not known.Since the above union probability includes all marginalprobabilities of two components, it includes p(x2x3) of thetwo clean components that should dominate the sum forthe correct class, because the other terms in the sum willeach be affected by the noisy x1 and thus produce acorrespondingly low value. In other words,

p(x1x2 < x1x3 < x2x3)/ p(x2x3)

That is, (11). Therefore given no information about the identityof the noisy component, we may use the union probabilityp(x1x2 < x1x3 < x2x3) as an approximation for the marginalprobability of the true clean components p(x2x3), in the sensethat both produce large values for the correct class.

Figure 9 Recursive algorithm for calculating the sum of theprobabilities of all Q-element combinations, for Q ¼ 1 to 4,from a set consisting of N ¼ 4 elements

141


14

&

www.ietdl.org

9 Appendix B: computing theprobability (10)

For simplicity, let pn denote the probability p(xnjv). Fig. 9shows an efficient recursive algorithm for calculating (10),


that is, the sum of the probabilities of all Q-elementcombinations, for all Q from 1 to 4, from a set consistingof N ¼ 4 elements. The last row of the figure shows thesought probabilities. The algorithm has a complexity aboutO(N (N � 1)), where N is the number of the local imagesin this paper.


Documents

Robust face recognition using posterior union model based neural networks