16
Outdoor recognition at a distance by fusing gait and face Zongyi Liu * , Sudeep Sarkar Computer Science and Engineering, University of South Florida, Tampa, FL 33647, USA Received 19 June 2004; received in revised form 17 November 2005; accepted 19 May 2006 Abstract We explore the possibility of using both face and gait in enhancing human recognition at a distance performance in outdoor condi- tions. Although the individual performance of gait and face based biometrics at a distance under outdoor illumination conditions, walk- ing surface changes, and time variations are poor, we show that recognition performance is significantly enhanced by combination of face and gait. For gait, we present a new recognition scheme that relies on computing distances based on selected, discriminatory, gait stances. Given a gait sequence, covering multiple gait cycles, it identifies the salient stances using a population hidden Markov model (HMM). An averaged representation of the detected silhouettes for these stances are then built using eigenstance shape models. Similarity between two gait sequences is based on the similarities of these averaged representations of the salient stances. This gait recognition strategy, which essentially emphasizes shape over dynamics, significantly outperforms the HumanID Gait Challenge baseline algorithm. For face, which is a mature biometric for which many recognition algorithms exists, we chose the elastic bunch graph matching based face rec- ognition method. This method was found to be the best in the FERET 2000 studies. On a gallery database of 70 individuals and two probe sets: one with 39 individuals taken on the same day and the other with 21 individuals taken at least 3 months apart, results indicate that although the verification rate at 1% false alarm rate of individual biometrics are low, their combination performs better. Specifically, for data taken on the same day, individual verification rates are 42% and 40% for face and gait, respectively, but is 73% for their com- bination. Similarly, for the data taken with at least 3 months apart, the verification rates are 48% and 25% for face and gait, respectively, but is 60% for their combination. We also find that the combination of outdoor gait and one outdoor face per person is superior to using two outdoor face probes per person or using two gait probes per person, which can considered to be statistical controls for showing improvement by biometric fusion. Ó 2006 Elsevier B.V. All rights reserved. Keywords: Gait recognition; Face recognition; Biometrics fusion 1. Introduction Biometrics for outdoor conditions, especially one that can operate at a distance, is a challenging proposition. For some security situations, it is necessary to identify individuals or even short list possible candidates as far away as possible from a sensitive site. Two biometric sources that are available in such situations are face, which is a physical biometric, and gait, which is a behav- ioral biometric. Face image based biometrics is now a mature technology. Unlike other traditional biometric sources, such as fingerprints, iris, or hand, face does not require direct contact and is easy to acquire. There are now several commercial systems that can be used for face recognition in somewhat controlled indoor situ- ations. However, outdoor recognition from faces is still an open area of research. The 2002 face recognition ven- dor tests (FRVT 2002) [34], which is presently the most comprehensive and extensive evaluation, show that when comparing a gallery of indoor full frontal images with a probe set of outdoor images, the best verification perfor- mance is 54% at 1% false alarm rate. This is in contrast to nearly 96% verification rate at 1% false alarm for indoor face recognition. 0262-8856/$ - see front matter Ó 2006 Elsevier B.V. All rights reserved. doi:10.1016/j.imavis.2006.05.022 * Corresponding author. Tel.: +1 813 974 2113; fax: +1 813 974 5456. E-mail addresses: [email protected] (Z. Liu), [email protected] (S. Sarkar). www.elsevier.com/locate/imavis Image and Vision Computing 25 (2007) 817–832

Outdoor recognition at a distance by fusing gait and facesarkar/PDFs/IVC-GaitFace.pdf · 2014. 9. 12. · gait characterization, the latter is emphasized, while some recent recognition

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Outdoor recognition at a distance by fusing gait and facesarkar/PDFs/IVC-GaitFace.pdf · 2014. 9. 12. · gait characterization, the latter is emphasized, while some recent recognition

www.elsevier.com/locate/imavis

Image and Vision Computing 25 (2007) 817–832

Outdoor recognition at a distance by fusing gait and face

Zongyi Liu *, Sudeep Sarkar

Computer Science and Engineering, University of South Florida, Tampa, FL 33647, USA

Received 19 June 2004; received in revised form 17 November 2005; accepted 19 May 2006

Abstract

We explore the possibility of using both face and gait in enhancing human recognition at a distance performance in outdoor condi-tions. Although the individual performance of gait and face based biometrics at a distance under outdoor illumination conditions, walk-ing surface changes, and time variations are poor, we show that recognition performance is significantly enhanced by combination of faceand gait. For gait, we present a new recognition scheme that relies on computing distances based on selected, discriminatory, gait stances.Given a gait sequence, covering multiple gait cycles, it identifies the salient stances using a population hidden Markov model (HMM). Anaveraged representation of the detected silhouettes for these stances are then built using eigenstance shape models. Similarity betweentwo gait sequences is based on the similarities of these averaged representations of the salient stances. This gait recognition strategy,which essentially emphasizes shape over dynamics, significantly outperforms the HumanID Gait Challenge baseline algorithm. For face,which is a mature biometric for which many recognition algorithms exists, we chose the elastic bunch graph matching based face rec-ognition method. This method was found to be the best in the FERET 2000 studies. On a gallery database of 70 individuals and twoprobe sets: one with 39 individuals taken on the same day and the other with 21 individuals taken at least 3 months apart, results indicatethat although the verification rate at 1% false alarm rate of individual biometrics are low, their combination performs better. Specifically,for data taken on the same day, individual verification rates are 42% and 40% for face and gait, respectively, but is 73% for their com-bination. Similarly, for the data taken with at least 3 months apart, the verification rates are 48% and 25% for face and gait, respectively,but is 60% for their combination. We also find that the combination of outdoor gait and one outdoor face per person is superior to usingtwo outdoor face probes per person or using two gait probes per person, which can considered to be statistical controls for showingimprovement by biometric fusion.� 2006 Elsevier B.V. All rights reserved.

Keywords: Gait recognition; Face recognition; Biometrics fusion

1. Introduction

Biometrics for outdoor conditions, especially one thatcan operate at a distance, is a challenging proposition.For some security situations, it is necessary to identifyindividuals or even short list possible candidates as faraway as possible from a sensitive site. Two biometricsources that are available in such situations are face,which is a physical biometric, and gait, which is a behav-ioral biometric. Face image based biometrics is now a

0262-8856/$ - see front matter � 2006 Elsevier B.V. All rights reserved.

doi:10.1016/j.imavis.2006.05.022

* Corresponding author. Tel.: +1 813 974 2113; fax: +1 813 974 5456.E-mail addresses: [email protected] (Z. Liu), [email protected] (S.

Sarkar).

mature technology. Unlike other traditional biometricsources, such as fingerprints, iris, or hand, face doesnot require direct contact and is easy to acquire. Thereare now several commercial systems that can be usedfor face recognition in somewhat controlled indoor situ-ations. However, outdoor recognition from faces is stillan open area of research. The 2002 face recognition ven-dor tests (FRVT 2002) [34], which is presently the mostcomprehensive and extensive evaluation, show that whencomparing a gallery of indoor full frontal images with aprobe set of outdoor images, the best verification perfor-mance is 54% at 1% false alarm rate. This is in contrastto nearly 96% verification rate at 1% false alarm forindoor face recognition.

Page 2: Outdoor recognition at a distance by fusing gait and facesarkar/PDFs/IVC-GaitFace.pdf · 2014. 9. 12. · gait characterization, the latter is emphasized, while some recent recognition

818 Z. Liu, S. Sarkar / Image and Vision Computing 25 (2007) 817–832

Like face, gait of a person, as captured in the video of thewalking person, is another biometric source that can beacquired in outdoor conditions and from a distance. Recog-nition of a person from gait has been a recent focus in com-puter vision [33,26,43,3,45,42,13,2,25,21,10,38,49]. Tofacilitate the objective, quantitative measurement of pro-gress, and the characterization of the properties of gait rec-ognition on a common data set the HumanID GaitChallenge Problem was formulated [18,19]. The challengeproblem consists of a baseline algorithm, a set of twelveexperiments (A through L), and a large data set (1870sequences, 122 subjects, 1.2 terabytes of data). Based onthe reported performance on this dataset and other datasets,three observations can be made regarding gait recognition.

1. Performance on indoor sequences [49,13,10] generallytend to be higher than on outdoor sequences [10,20,2].

2. When comparing sequences across surface change condi-tions, such as grass vs. concrete, the performance is low.The identification rates on a gallery set of 71 subjectsrange from 21% to 36%, using variety of recognitionstrategies, ranging from the use of HMM to simple tem-plate based matching [44,47,24,50,18].

3. Gait recognition performance drops when comparingsequences taken at different times. When the differencein time between gallery (the pre-stored template) andprobe (the input data) is in the order of minutes, theidentification performance ranges from 91% to 95%[49,13,10], whereas the performances drop to 30–45%when the differences are in the order of months and days[25,11,10] for similar sized datasets.

It is unlikely that gait recognition performance acrossthese conditions can be improved significantly by simplemodifications of existing strategies. From a computervision perspective, gait signal is comprised of both shapeand dynamics information. Gait shape can be defined tobe the configuration the person makes; it is determinedby both body shape and stance. Gait dynamics determinesthe nature of the transition between stances. In traditionalgait characterization, the latter is emphasized, while somerecent recognition from gait studies seem to point to theobservation that shape is a better cue than dynamics. First,is the CMU study [10] that explicitly ignores dynamics andjust considers silhouette shapes with improved perfor-mance. Second, is the UMD study [48] that decouple shapeand dynamics using rigorous shape modeling. One hypoth-esis for this observation could be that the variability of gaitdynamics under different conditions, such as walking sur-face change, is high. Building on this recent observationin gait recognition research that shape is a better cue thandynamics for gait recognition, we present a new strategythat focuses on stance shapes. The strategy relies on com-puting distances based on selected, discriminatory, gaitstances. Given a gait sequence, covering multiple gaitcycles, it identifies the salient stances using a populationhidden Markov model (HMM). An averaged representa-

tion of the detected silhouettes for these stances are thenbuilt using eigenstance shape models. Similarity betweentwo gait sequences is based on the similarities of these aver-aged representations of the salient stances. It essentiallyuses gait stances that are more robust across the covariatesfor the similarity computation. This results in improvedperformance, but still does not achieve high recognitionrates comparable to indoor scenarios.

It has been demonstrated that combination or fusion ofbiometrics can offer a way to break the barrier of poor indi-vidual biometric performance. Lin et al. [14] demonstratethat multi-biometric integration does indeed result in aconsistent performance improvement. Schiele [40] empiri-cally showed that the more classifiers we combine, the bet-ter results we can get. One can talk about inter-modalcombination [39,51,5,41,42,23,16,15], e.g. combination offace with iris, and intra-modal combination [1,53,31,54,17,36,32,55], e.g., combination of outputs of two classi-fiers on the same modality, or the combination of outputsof two different sensors, such as IR and visible [9,8] and vis-ible and 3D [6–8], on the same modality. In Table 1 wesummarize the work in computer-vision based multi-modalbiometric combination. Fusion can be done at three levels[39]:

1. The feature extraction level, where data from each sen-sor are: combined to form one feature vector [12,5].

2. The matching score level, where the similarity scorescomputed by individual classifier are fused[39,31,17,16,1,51,54,41,42,23]. The scores from differentclassifiers are usually first transformed into the samerange using linear transformation, polynomial transfor-mation, or logarithm transformation. The normalizedscores are then combined using rules, such as sum, prod-uct, maximum, and minimum.

3. The decision level where the each classifier makes its ownclassification and votes for the final decision[36,15,23,32]. The popular vote rules include rank sumand majority vote.

In this paper, we show the combination of gait and facecan effectively enhance the performance of outdoor bio-metrics at a distance. We demonstrate this for conditionsthat are known to be ‘‘hard’’ in face and gait recognition.Experiments also show that cross modal combination ofgait and face is superior to the fusion of multiple instanceswithin each modality, which can be considered to sort ofstatistical control to show improvement by biometricfusion. Gait and face combination studies have been pre-sented by others [42,41,22]. However, unlike previous stud-ies that used either indoor data or outdoor data taken onthe same day, resulting in high performance of the individ-ual biometrics to begin with, our study involves outdoor

data, taken months apart. We show that we can significantlyimprove recognition at a distance in outdoor conditionsand over time, both of which are hard conditions, usingbiometric fusion.

Page 3: Outdoor recognition at a distance by fusing gait and facesarkar/PDFs/IVC-GaitFace.pdf · 2014. 9. 12. · gait characterization, the latter is emphasized, while some recent recognition

Table 1Inter- and intra-modal biometric fusion

Work Combination level Face Fingerprint Hand geometry Iris Ear Gait Speech

MSU [39] Scorep p p

MSU [31] Scorepp

MSU [17] Scorepp

MSU [16] Scorep p p

MSU [15] Decisionp p

MSU [36] Decisionpp

U. Bern [1] Scorep

CAS and MSU [51] Scorep p

UND and USF [5] Scorep p

HK polytechnic [54] Scorep

MIT [41,42] Scorep p

U. of Surrey [23] Score and Decisionp p

Rutgers [32] Score and Decisionpp

UMD [22] Score and Decisionp p

UND [9,6–8] Decisionpp

Z. Liu, S. Sarkar / Image and Vision Computing 25 (2007) 817–832 819

2. Face recognition algorithm

The primary focus of this paper is to investigate thepower of the face and gait biometric fusion. So, the individ-ual biometric algorithms we used are not necessarily theabsolute best that are currently available, but they haveperformances that are close to the best available ones. Theybeat their corresponding established baseline algorithms bysignificant amounts. First, we consider the face recognitionalgorithm (Fig. 1).

Face recognition is a mature biometric for which manyrecognition approaches exist. We did not see the necessityfor designing yet another approach since the focus of thispaper is biometric fusion. From among the many possiblechoices, we used the Gabor features based Elastic BunchGraph Matching (EBGM) [52] algorithm for face recogni-tion. It is a feature based method for face recognition thathas superior performance than other template based meth-ods, such as PCA, LDA, or Bayesian. We used the CSUimplementation of the algorithms that is available athttp://www.cs.colostate.edu/evalfacerec/. The approachfirst locates landmarks on a face, related to salient pointson eyes, nose, and mouth, and then employs the frequencyinformation of the local regions that surround the land-mark locations as the landmark features (landmark jet).We did not re-train the algorithm. Instead we used theCSU trained version, which is based on 70 subjects. Withregard to distance measurements, we choose the phase sim-ilarity, corrected by small displacements [4]:

Fig. 1. Samples of computed intermediate representations fac

SDðJ i; J 0i;~dÞ ¼

PNi

j¼0

aija0ij cosð/ij � ð/0ij þ~d~kijÞÞffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiPNi

j¼0

a2ij

PNi

j¼0

a02ij

s ð1Þ

where Jj and J 0i are the landmark jets of ith landmarkpoint for graph J and J 0, Ni is the number of waveletcoefficients in the jet, a and / are the magnitude andphase, ~d is the estimated displacement vector, and ~k isa vector pointing in the direction of the wave and havingthe magnitude equal to frequency of the wave. Obviously,the estimation of the displacement vector ~k is veryimportant for Eq. (1). In this paper, we use the Displace-ment Estimation Narrowing Local Search (DENarrow-ingLocalSearch), which uses a local search method tofind an optimum and empirically gives the bestperformance.

According to the FERET evaluations [35], the EBGMapproaches provides the best recognition performance.Fig. 2 summarizes the reported top rank identification per-formance (with a gallery size of 1200) on three experimentsinvolving matching (i) across indoor illumination varia-tions, (ii) across 1 year time differences, and (iii) acrossmore than 1 year time difference. EGBM had the top rankamong five algorithms, for all the three experiments. It out-performed by around 20% the next best algorithm. Also,note the poor performance on datasets that involve com-parison over time.

e biometric that are matched. (a–d) gallery, (e–h) probes.

Page 4: Outdoor recognition at a distance by fusing gait and facesarkar/PDFs/IVC-GaitFace.pdf · 2014. 9. 12. · gait characterization, the latter is emphasized, while some recent recognition

0

10

20

30

40

50

60

70

80

90

Fc Dup. I Dup. II

EBGM

Best of the other algorithms

Mean

Fig. 2. Top rank identification performance (on a gallery set of 1200) theEBGM and four other face recognition algorithms as reported byFERET-2000 [35]. The experiment Fc matches across illuminationvariation, the Dup. I experiment involves temporal difference within 1year, and the Dup. II experiment involves temporal difference more than 1year.

820 Z. Liu, S. Sarkar / Image and Vision Computing 25 (2007) 817–832

3. Gait recognition algorithm

Compared with face, gait biometrics is a relatively newresearch area. New approaches are beginning to emergebut with varying performances. Fusion of face with gaitwill make sense only if the performance of these two bio-metrics are comparable, especially on the hard cases thatwe consider. For this reason, we designed a new gait recog-nition algorithm, that is among the best that is currentlyavailable, as judged based on commonly availablebenchmarks.

The recognition approach builds on the observation inrecent gait recognition experiments [48,10] that silhouetteshape, which includes body shape and gait stance shape,

Fig. 3. The adopted gait recognition approach bas

has equal, if not more, recognition power than gait dynam-ics. The approach is based on matching silhouettes fromselected gait stances, specifically those stances that havelargest variance across a population. The schematic ofthe gait recognition approach is shown in Fig. 3. The firststep is, of course, silhouette detection, for which we usestandard statistical background subtraction, based onMahalanobis distances in the pixel color space. Second, isthe definition and modeling of the stances to be used forrecognition. For this, we conduct analysis by learningone population EigenStance-HMM, defined for a subjectpopulation. The variation in shape for each stance will helpcharacterize the importance of that stance. Third, is therecognition approach. Any given gait sequence is decodedusing the learnt population HMM. The frames that arematched to the selected stances are cleaned-up and aver-aged to arrive at an averaged silhouette representationfor that stance. The averaged stance representation fromthe gallery and the probe are then compared using thesum of the Euclidean distances between the selected stancesilhouettes. In the rest of this section, we present detailsregarding the above the steps.

3.1. Silhouette detection

We employ the standard RGB based background differ-encing technique to segment silhouettes. We first computethe statistics of the individual background pixels in termsof mean and covariance of RGB values. Then we computethe Mahalanobis distance of a pixel from this backgroundpixel value distribution. Next, we decide on an optimalthreshold to segregate the two classes using ExpectationMaximization (EM), with the distance values as the Gauss-

ed on comparing specific, salient, gait stances.

Page 5: Outdoor recognition at a distance by fusing gait and facesarkar/PDFs/IVC-GaitFace.pdf · 2014. 9. 12. · gait characterization, the latter is emphasized, while some recent recognition

Z. Liu, S. Sarkar / Image and Vision Computing 25 (2007) 817–832 821

ian distributed observations. Finally, we assemble discon-nected components with a proximity based grouping pro-cess, using the area and angle features of eachcomponent. Further details of this process are availablein [28,30].

3.2. Stance modeling and selection using population HMM

To identify the gait stances that have recognition power,we need to be able to define the different gait stances. Giventhat gait can be viewed as involving periodic state transi-tion, we use a Hidden Markov Model (HMM) to defineand to identify the underlying stances. A HMM is specifiedby the possible states, qt 2 {1, . . . ,Ns}, and the triplek = (A,B,p), representing the state transition matrix,observation model, and priors, respectively. The states ofthe HMM are the different gait stances. The state transitionmatrix and the priors capture the gait dynamics and theobservation model is based on the stance shapes. Theseparameters should capture the variation across a subjectpopulation. Note that we do not seek a HMM for eachsubject, but one HMM defined over the whole population,i.e. a population HMM. The trained or learnt parametersof this model will help identify the underlying stances. Thismodel will also used in the similarity computation step.

3.2.1. Training of the HMM

The gait stance states and state transition probabilitiesof this model are learnt from a set of manually specified sil-

Fig. 4. Top row shows the corresponding part-level, manually specifiedsilhouettes. And the bottom row shows the scaled silhouettes of the kindused by gait recognition algorithms.

Fig. 5. Stance exemplars for seven sample states over a gait cyc

houettes, which can be taken to be the best ‘‘clean’’ silhou-ettes that are available. This silhouette database includes 70subjects over one walking cycle of approximately 30 to 40image frames [27]. This cycle was chosen to begin at theright heel strike phase of the walking cycle through to thenext right heel strike. The height of the silhouettes are nor-malized to occupy 128 pixels in order to reduce the effect ofvarying distance of subjects from camera. Fig. 4 showsexamples of the raw images and the normalized silhouettesof one subject.

We consider distinct states, ðq1; q2; . . . ; qN sÞ, spanning

one full cycle (two strides) so as to retain the asymmetryin gait, i.e. to differentiate stances with left foot forwardfrom those with right foot forward. We will select the totalnumber of states, Ns by minimizing the Akaikie’s informa-tion criterion (AIC). There are three sets of parameters tobe estimated. First, we pick equal state priors, i.e. pi ¼ 1

N s,

since, in practice, any given sequence can begin from anystate.

Second, is the observation model, which we base uponstate exemplars selected from the training set sequences.These state exemplars are determined as follows. We startwith a linear map of the silhouettes in a sequence intothe states, which are then refined by K-Means clustering.The clustering technique relies a distance measurementfor two frames fi and fj, which we define as:

Dði; jÞ ¼ 1� fTi f j

fTi fi þ fT

j f j � fTi fj

ð2Þ

This is also commonly known as the Tanimoto distancemeasure. Since the gait cycles of the manual silhouettesare aligned by design, we simply group the frames withinthe jth partition of all subjects into an exemplar set forthe jth gait stance, Ej. Fig. 5 shows the mean images ofthe exemplars for some stances, built from our trainingdata. The observation model for each stance is chosen tobe an exponential function of the Tanimoto distance, D,between any given silhouette, ft, to the mean of the stateexemplars, Ej, where the parameter lj is directly estimatedfrom the corresponding exemplars which have been com-puted from the given training sequences.

bjðftÞ ¼1

lje�

Dðft ;EjÞlj ð3Þ

The third set of parameters comprise the state transitionmatrix A. The transition matrix is constrained to be a leftto right, cyclical Bakis state transition model over the

le used in the observation model of the population HMM.

Page 6: Outdoor recognition at a distance by fusing gait and facesarkar/PDFs/IVC-GaitFace.pdf · 2014. 9. 12. · gait characterization, the latter is emphasized, while some recent recognition

822 Z. Liu, S. Sarkar / Image and Vision Computing 25 (2007) 817–832

states. We estimate this matrix from the multiple observa-tion sequences using the iterative Baum–Welch algorithm[37]. However, unlike the traditional use of this algorithm,the priors and the observation models are not iteratedupon. Just the entries of the transition matrix are iterativelyupdated to maximize the likelihood of the training set. Thisinitialization of this iterative process is based on the transi-tions implied by the association of frames to each stance ar-rived by K-means clustering used to arrive at theobservation model. Note that since the training set consistsof gait sequences from a number of subjects, the transitionmatrix represents the average gait dynamics over the wholepopulation.

We determine the number of states based on theAkaike’s information criterion (AIC) that combines thelikelihood of the data with model complexity in a probabi-listically meaningful manner. Fig. 6 shows the AIC varia-tion for different number of states. Small values of Ns

would result in a compact model but would not captureall the variations in the training set. Whereas very large val-ues would results in over fitting. We see that Ns = 20 offersa good compromise. Given the context of gait recognition,we would like to preserve variations so as to be able to dis-tinguish between individuals. We also found that thischoice is quite stable with respect to some variation of

0 5 10 15 20 25 301.15

1.2

1.25

1.3

1.35

1.4

1.45x 10

4

Number of States

AIC

Val

ues

GrassConcrete

Fig. 6. Variation of AIC with number of states, for models constructedusing two different training sets of 70 subjects; one for grass walkingsurface and the other for concrete walking surface.

Fig. 7. Samples of the first eigenstances over one gait cycle, rep

training set. Fig. 6 shows AIC variation for two differenttraining sets, one based on sequences on grass and otherfor sequences on concrete.

3.2.2. Gait shape – eigenstances

With each stance of the trained population HMM, weassociate an eigenstance model to capture the shape varia-tions in the silhouettes for each stance across persons. Weuse this model to compute similarity between two gaitsequences. We model the shape as a multivariate Gaussiandistribution, which is estimated from the clustered set ofexemplar silhouettes associated with each HMM stance.Note that the exemplar means were used to construct theobservation model in the HMM. We use the variance forshape modeling. We use principal component analysis(PCA) to arrive at a compact representation of this distri-bution. For each stance, k, we have a reduced dimensional(with Ne dimensions) shape space, /(k), characterized bythe mean, lk and the eigenvectors fek;1; . . . ; ek;Neg. Thesample eigenstances representing the most discriminatingdirections among persons are shown in Fig. 7. The numberof eigenvectors, Ne, is chosen so that at least 80% of thevariation is modeled. In another work [30], we had usedthe EigenState-HMM model to clean up shadow artifactsin silhouettes. In this work, we build our recognition strat-egy around it.

3.2.3. Stance selection

For recognition, we are interested the stances that offerthe most discrimination between subjects. To select thesediscriminatory stances, we consider the variation in shapefor each stance as reflected in the first and the second eigen-values associated with the corresponding Eigenstancemodel. These are plotted in Fig. 8. We see that states atthe ends (states 1–3 and 18–20) and at the middle (9–12)has the largest scatters, indicating that these gait stancescarry the bulk of the discriminatory power. The meanstances for these states are shown in Fig. 9 for illustration.We notice that these stances correspond to near the fullstride stances.

3.3. Similarity computation

We based the similarity computation between any twogait sequences on the differences in silhouette shapes ofthe discriminative stances, as identified during the con-struction of the eigenstance models. The first step in this

resenting the most discriminating directions among persons.

Page 7: Outdoor recognition at a distance by fusing gait and facesarkar/PDFs/IVC-GaitFace.pdf · 2014. 9. 12. · gait characterization, the latter is emphasized, while some recent recognition

2 4 6 8 10 12 14 16 18 201500

2000

2500

3000

3500

4000

4500

5000

State

Largest2nd largest

Fig. 8. The variation of largest and second largest eigenvalues associatedwith each stance shape, as computed in the eigenstance model.

Z. Liu, S. Sarkar / Image and Vision Computing 25 (2007) 817–832 823

process is the detection of the discriminative stances in anygiven gait sequence spanning multiple gait cycles. For this,we use the learnt population HMM. We use the dynamicprogramming based Viterbi algorithm [37], which returnsthe most likely state assignment for each frame in the givensequence. To reduce the errors of this decoding process, wepartition the input sequence into subsequences of roughlyone gait cycle length. If frames in one portion of the gaitcycle have segmentation errors, this strategy minimizesthe effect the stance assignment on other parts of thesequence. We estimate the gait cycle length from the peri-odic variation in the number of foreground pixels in thebottom half of the silhouettes. Note that the starting stateof these subsequences need not match the starting HMMstate since we use a cyclical Bakis model for the HMMstate transition.

Given the stance labels of each frame, we average theframes mapped to each discriminative stance, identifiedearlier, to arrive at one averaged representation per stance.However, instead of averaging the raw silhouettes that con-tain errors due to missed detection and shadows. We usethe eigenstance shape model to clean up the silhouettesbefore averaging.

An input frame fi, estimated to be at stance state k of theHMM, is projected into the corresponding eigenspace,/ðkÞ ¼ flk; ek;1; . . . ; ek;Neg, and then reconstructed as fr

i

fri ¼ lk þ

XN e

j¼1

ðeTk;jðf i � lkÞÞek;j ð4Þ

Fig. 9. Exemplar means for the states exhibiting significant recogn

The reconstructed silhouette, fri , has continuous values be-

tween 0 and 1 that we threshold to arrive at binary silhou-ettes. Instead of simple thresholding, we employ a two-levelthresholding scheme to minimize the side effect of recon-struction process, which can make silhouettes more similarto each other.

Fri ðjÞ ¼

Foreground if fri ðjÞ > T high or lkðjÞ ¼ 0

Background if fri ðjÞ < T low

fiðjÞ otherwise:

8><>:

9>=>;ð5Þ

For the experiments in this paper, Tlow = 0.2 andThigh = 0.8. Fig. 10 show some averaged stance shape rep-resentations constructed for a subject based on a given in-put sequence.

Given two averaged stance representation, the corre-sponding stances can be simply compared and the resultssummed to arrive at an overall similarity score. Let usdenote the subset of salient discriminatory states by Sd.To arrive at one similarity score, we compute Euclideandistances between the averaged representation for thesestances from the probe sequence IP i and the gallerysequence IGj .

SðIP i ; IGjÞ ¼ �Xk2Sd

ðIP iðfkÞ � IGjðfkÞÞ2 ð6Þ

3.4. Performance

The selected stance based recognition scheme performswell in practice. Table 2 reports top rank identification ratefor two key experiments, comparing sequences across sur-face and across time, from the HumanID Gait Challengeproblem [18,19]. These experiments, denoted by D and Kin the original problem, are among the toughest experi-ments on which most algorithms have poor performance.The gallery set consists of sequences from 122 subjects.The first column of Table 2 lists the performances for thebaseline algorithm defined along with the Gait Challenge.We also report performances of four other gait recognitionalgorithms based on hidden Markov models, shape cluster-ing, part features, and template matching (note that sincethe individual groups are yet to publish the results official-ly, we cannot identify them at this time). In the last two col-umns, we list the performance of the our gait recognitionalgorithm with and without stance selection. We see thatstance selection greatly enhances performance. Note thatthe proposed gait recognition algorithm’s performance isthe second best among the six algorithms listed.

ition power. The numbers below each frame denote the state.

Page 8: Outdoor recognition at a distance by fusing gait and facesarkar/PDFs/IVC-GaitFace.pdf · 2014. 9. 12. · gait characterization, the latter is emphasized, while some recent recognition

Fig. 10. Averaged gait stances for one subject computed from a sequence spanning several gait cycles.

Table 2The top rank identification rate for different gait recognition approaches on the gait challenge experiments involving the ‘‘hard’’ covariates of surface andtime

Covariates Baseline Algol Algo2 Algo3 Algo4 Average gait stance (all states) Average gait stance (partial states)

(ExpD) surface 32 33 45 19 23 25 38(ExpK) time 3 15 24 3 6 24 24

The gallery size is 122 subjects.

–4–2

02

4

–5

0

50

50

100

150

FaceGait

His

togr

am

Fig. 11. The 2D histogram of face and gait non-match scores.

824 Z. Liu, S. Sarkar / Image and Vision Computing 25 (2007) 817–832

4. Fusion schemes

Before combination, scores from each classifier aretransformed to a common range. Here, we choose theGaussian model based z-normalization, which was alsoused in FRVT-2002 [34]. For a given probe p, we computeits similarity values with all subjects in the gallery setðg1; g2; . . . ; gNG

Þ. Then we compute the mean value (lp)and standard deviation (rp) of the similarity values. Thesimilarity value between each p and Gj is normalized as:

NormSimðp; gjÞ ¼Simðp; gjÞ � lp

rpð7Þ

This normalization not only maps the score onto a com-mon scale, but also removes the dependencies of the scoreson the particular probe. It is common in biometrics to ob-serve that the non-match similarity scores are dependent onthe chosen probe. This impacts the optimality of the singlethreshold decision rule chosen for verification in biometricsystems.

We experimented with score level and decision levelintegration.

1. Score Sum combination strategy makes decision simplybased on the sum of the similarity scores from the gaitand face classifiers:

CombSimðp; gjÞ ¼ NormSim1ðp; gjÞ þNormSim2ðp; gjÞð8Þ

2. The second score fusion scheme is based on the Bayesian

decision rule. For a given pair of probe and gallery sub-jects, the similarity values from the individual modalitiesform the observation vector, v. The two classes corre-spond to the match (genuine, xm) and non-match(imposter, xnm) classes. The likelihoods of these twoclasses are modeled as multi-dimensional Gaussian dis-tribution, which is usually a good choice empirically.Fig. 11 shows a 2D histogram representation of the gaitand face non-match (imposter) scores.

PrðvjxmÞ ¼1

2pjRmj1=2e �

12ðv�lmÞT R�1ðv�lmÞ½ � ð9Þ

PrðvjxmÞ ¼1

2pjRmj1=2e �

12ðv�lmÞT R�1ðv�lmÞ½ � ð10Þ

The difference in the posterior probabilities of these twoclasses form the combined similarity score.

CombSimðp; gjÞ ¼ ProbðxmjvÞ � ProbðxnmjvÞ ð11Þ

3. The third scheme is the Confidence Weighted Score Sum

as suggested by the HumanID group at University ofNotre Dame. The main idea is that for a given probesubject p, we weight its similarity scores in a classifierbefore combination. The weight is computed from thesimilarity values of the first few ranks:

W cðpÞ ¼SimcðpÞð1Þ � SimcðpÞð2ÞSimcðpÞð2Þ � SimcðpÞð3Þ

ð12Þ

Page 9: Outdoor recognition at a distance by fusing gait and facesarkar/PDFs/IVC-GaitFace.pdf · 2014. 9. 12. · gait characterization, the latter is emphasized, while some recent recognition

Z. Liu, S. Sarkar / Image and Vision Computing 25 (2007) 817–832 825

where Simc(p)(k) is the kth largest similarity value of p

when compared to the entire gallery set. The score com-bination is then given by:

CombSimðp; gjÞ ¼ W 1ðpÞSim1ðp; gjÞ þ W 2ðpÞSim2ðp; gjÞð13Þ

4. In addition to the score level combination schemes men-tioned above, we also use a decision level combination:Rank Sum. It takes the negated sum of ranks from theface classifier and gait classifier as the similarity value:

CombSimðp; gjÞ ¼ �Rank1ðp; gjÞ þRank2ðp; gjÞ ð14Þ

A problem of this scheme is that there might be two ormore gallery subjects having a same similarity value witha probe. In this paper, the tie is broken by the sum of theoriginal scores in each classifier.

5. Results

We conducted a series of studies geared towards answer-ing the following questions in the context of outdoorbiometrics:

1. What is the performance of face + gait combination forthe same-day data and months-apart data? How doesthe combination of face and gait compare with singlemodality?

2. Which combination scheme performs the best?3. How does the combination of face and gait compare

against using multiple samples of the same modality,i.e. face + face or gait + gait?

Answers to the above questions requires careful specifi-cation of multiple gallery and probe sets. For faces, themain gallery set ðF In;Mug;t1Þ consists of 70 faces takenindoors with regular expression and mugshot lighting con-ditions. The alternate gallery set ðF In;Ov;t1Þ consists of thecorresponding faces taken with overhead lighting. The out-door images form the probes. Fig. 12 shows examples offaces for various lighting conditions. There are four faceprobe sets, with two probe sets per imaging session. Foreach imaging session, the near images form one set and

Fig. 12. The face samples under different conditions. The candidates for the gaexpression, overhead lighting images. The probes are taken outdoors with (c)

the far images form the other set. One pairðF Out;Near;t1 ; F Out;far;t1Þ was taken on the same day as theindoors images; there 39 such subjects. And the other pairðF Out;Near;t2 ; F Out;far;t2Þ was taken at least 3 months apart;there are 21 such subjects.

For gait, the probes and the gallery are constructed fromthe HumanID Gait Challenge Dataset [18]. The main gal-lery ðGGrass;R;t1Þ consists of sequences from 70 individualswalking on grass, outdoors, viewed from the right camera.The alternate gallery set ðGGrass;R;t1

Þ consists of the corre-sponding sequences taken from the left camera, with averging angle of approximately 30� to the right view. Likethe face, we consider four different probes. The left andright views of the gait on a different surface condition,i.e. concrete, taking on the same day as the gallery, formtwo probes ðGConcrete;R;t1 ;GConcrete;L;t1

Þ, respectively. Fig. 13shows some sample views. Like face, we also consider thetime covariate and consider two more probe setsðGGrass;R;t2

;GGrass;L;t2Þ taken 6 months apart. The sizes ofthe probe sets match that for the face to allow us to consid-er biometric combinations.

Based on these gallery and probe, ten experiments weredesigned, as shown in Table 3. First five experiments dealwith same day data and the next five deal with comparingdata taken more than 3 months apart. Each set of fiveexperiments consists of experiments to study face and gait,individually and with inter-modal and intra-modal combi-nations. We report two kinds of recognition rates: one forthe verification scenario (1 to 1 matching) and the other forthe identification scenario (1 to N matching). For the ver-ification scenario, the performance is specified in terms ofstandard false alarm and detection rates, plotted as areceiver operating characteristic (ROC). In our experi-ments, we plot the ROCs based on the Parzen windowed,non-parametric, estimates of the match (genuine) or non-match (imposter) distributions based on the given data.For the identification scenario, we report the correct iden-tification rates at rank, k, i.e. fraction of times the correctmatch to a probe (the input data) within the top k-rankedmatch among all the matches of that probe to the completegallery set (the pre-stored templates or models). The plot ofthe identification rate with rank is called the CumulativeMatch Characteristic (CMC). This is a standard perfor-

llery sets are (a) regular expression with mugshot lighting, and (b) regularregular expression, far view and (d) regular expression, near view.

Page 10: Outdoor recognition at a distance by fusing gait and facesarkar/PDFs/IVC-GaitFace.pdf · 2014. 9. 12. · gait characterization, the latter is emphasized, while some recent recognition

Fig. 13. Sample frames in the gait challenge dataset as viewed from (a) the left camera on concrete surface, (b) the right camera on concrete surface, (c) theleft camera on grass surface, (d) the right camera on grass surface.

Table 3Gallery and probe specification for the various experiments conducted

Num Exp (Gallery, #) (Probe, #) Covariate

SF Face F ln;Mug;t1, 70 F Out;near;t1

, 39 In/outdoor, SameDaySG Gait GGrass;R;t1

, 70 GConcrete;R;t1, 39 Surface, SameDay

SF+G Face+ F In;Mug;t1, 70 F Out;near;t1

, 39 In/outdoor, SameDayGait GGrass;R;t1

, 70 GConcrete;R;t1, 39 Surface, SameDay

SF+F Face+ F ln;Mug;t1, 70 F Out;near;t1

, 39 In/outdoor, SameDayFace F ln;Ov;t1

, 70 F Out;far;t1, 39 In/outdoor, SameDay

SG+G Gait+ GGrass;R;t1, 70 GConcrete;R;t1

, 39 Surface, SameDayGait GGrass;L;t1

, 70 GConcrete;L;t1, 39 Surface, SameDay

DF Face F ln;Mug;t1, 70 F Out;near;t2

21 In/outdoor, P3 months apartDG Gait GGrass;R;t1

, 70 GGrass;R;t2, 21 6 months apart

DF+G Face F ln;Mug;t1, 70 F Out;near;t2

, 21 In/outdoor, P 3 months apartGait GGrass;R;t1

, 70 GGrass;R;t2, 21 6 months apart

DF+F Face F ln;Mug;t1, 70 F Out;near;t2

, 21 In/outdoor, P3 months apartFace F ln;Ov;t1

, 70 F Out;far;t2, 21 In/outdoor, P 3 months apart

DG+G Gait+ GGrass;R;t1, 70 GGrass;R;t2

, 21 6 months apartGait GGrass;L;t1

, 70 GGrass;L;t1, 21 6 months apart

826 Z. Liu, S. Sarkar / Image and Vision Computing 25 (2007) 817–832

mance metric used in biometrics for the identification sce-nario [35].

5.1. Inter-modal combination

Performance of outdoor face (ExpSF), cross surface gait(ExpSG), and gait + face (Exp SF+G) on same day datawith various combination schemes is shown in Fig. 14,which plots the CMC curve up to rank 5 and ROC curveup to 5% false alarm rate. As expected, the recognitionfrom a single biometric is low, specifically, 40% for faceand 39% for gait at rank 1. However, the combinations

of the two weak biometrics using the four schemes discussedabove substantially boosts performance. Particularly, 71%for score sum, 70% for Bayesian rule, 58% for confidenceweighted score sum, and 68% for rank sum. As Fig. 15shows, a similar pattern is seen for performance of outdoorface (ExpDF), cross surface gait (Exp DG), and gait + face(ExpDF+G) on data taken months apart.

5.2. Intra-modal combination

The performance of inter-modal combination has to bejustified in the context of intra-modal combination.

Page 11: Outdoor recognition at a distance by fusing gait and facesarkar/PDFs/IVC-GaitFace.pdf · 2014. 9. 12. · gait characterization, the latter is emphasized, while some recent recognition

1 2 3 4 50

10

20

30

40

50

60

70

80

90

100

Rank

Iden

tific

atio

n R

ate

Rank SumWeighted SumScore SumBayesian RuleFace OnlyGait Only

0 1 2 3 4 50

10

20

30

40

50

60

70

80

90

100

False Alarm Rate

Ver

ifica

tion

Rat

e

Rank SumWeighted SumScore SumBayesian RuleFace OnlyGait Only

a b

Fig. 14. Performance of outdoor face (ExpSF), cross surface gait (ExpSG), and gait + face (ExpSF+G) on same day data with various combinationschemes for (a) identification and (b) verification scenarios.

1 2 3 4 50

10

20

30

40

50

60

70

80

90

100

Rank

Iden

tific

atio

n R

ate

Rank SumWeighted SumScore SumBayesian RuleFace OnlyGait Only

0 1 2 3 4 50

10

20

30

40

50

60

70

80

90

100

False Alarm Rate

Ver

ifica

tion

Rat

e

Rank SumWeighted SumScore SumBayesian RuleFace OnlyGait Only

a b

Fig. 15. Performance of outdoor face (ExpDF), cross surface gait (ExpDG), and gait + face (ExpDF+G) on data taken months apart with variouscombination schemes for (a) identification and (b) verification scenarios.

Z. Liu, S. Sarkar / Image and Vision Computing 25 (2007) 817–832 827

Inter-modal combinations involves the use of differenttypes of sensors resulting in added integration costs. Theinter-modal performance gain has to be justified in thiscontext. Inter-modal combination performance has to begreater than intra-modal combination [Kevin Bowyer, per-sonal communication]. In the present context, performanceof gait and face should be greater than combination of twofaces or combination of two gait signatures. For this, weconsider the experiments, SF+F, SG+G, DF+F, and DG+G,in Table 3.

These intra-modal experiments involve the use of twosamples per subject in the gallery and in the probe. Eachprobe is matched against the two gallery samples per per-son and the maximum similarity score is chosen as the sim-

ilarity score for that probe. These similarity scores are thencombined, as before, using the rules described in Section 4.

Fig. 16 plots the ROCs of the intra-modal combinationsup to a false alarm rate of 5%. Each plot shows the perfor-mance with individual probes and their combinations. Wesee that the intra-modal combination does not seem toimprove performance by a significant amount. Fig. 17shows a summary comparison of the inter-modal andintra-model comparison schemes based on the verificationrate at a false alarm rate of 5%. We see that face + gait per-formance is better than the face + face or gait + gait com-binations. This is explained by the strong correlation thatexist between the scores for two probes from the same bio-metric. It is 0.7 for the intra-modal case and is only 0.05 for

Page 12: Outdoor recognition at a distance by fusing gait and facesarkar/PDFs/IVC-GaitFace.pdf · 2014. 9. 12. · gait characterization, the latter is emphasized, while some recent recognition

0 1 2 3 4 50

10

20

30

40

50

60

70

80

90

100

False Alarm Rate

Ver

ifica

tion

Rat

e

Ver

ifica

tion

Rat

eV

erifi

catio

n R

ate

Ver

ifica

tion

Rat

e

Rank SumWeighted SumScore SumBayesian RuleFace Classifier1Face Classifier2

0 1 2 3 4 50

10

20

30

40

50

60

70

80

90

100

False Alarm Rate

Rank SumWeighted SumScore SumBayesian RuleGait Classifier1Gait Classifier2

0 1 2 3 4 50

10

20

30

40

50

60

70

80

90

100

False Alarm Rate

Rank SumWeighted SumScore SumBayesian RuleFace Classifier1Face Classifier2

0 1 2 3 4 50

10

20

30

40

50

60

70

80

90

100

False Alarm Rate

Rank SumWeighted SumScore SumBayesian RuleGait Classifier1Gait Classifier2

a b

c d

Fig. 16. Performance of intra-modal combination using different strategies. The ROC curves are shown for (a) face + face, same day, (b) gait + gait, sameday, (c) face + face, months apart, and (d) gait + gait, months apart. Each plot shows the performance with individual probes and their combinations.

0

10

20

30

40

50

60

70

80

90

100

Rank Sum Weighted Sum Score Sum Bayesian Rule Rank Sum Weighted Sum Score Sum Bayesian Rule

Face+Gait

Face+Face

Gait+Gait

0

10

20

30

40

50

60

70

80

90

100 Face+Gait

Face+Face

Gait+Gait

ba

Fig. 17. Bar plot of verification rate at a false alarm rate of 5% for inter- and intra-modal combination of gait and face for (a) same day data, and (b) dataseparated by months.

828 Z. Liu, S. Sarkar / Image and Vision Computing 25 (2007) 817–832

the inter-modal case. Stronger the correlation between thescores, less is the improvement with combination [46]. Infact, this improvement are not only limited to the two hardcovariates listed in this paper. In Fig. 18 we compare the

performances of all five covariates in our database: view,shoe-type, surface, briefcase, and time, in terms of PV at5% false alarm rate. And the results demonstrate the com-bination substantially improves the recognition.

Page 13: Outdoor recognition at a distance by fusing gait and facesarkar/PDFs/IVC-GaitFace.pdf · 2014. 9. 12. · gait characterization, the latter is emphasized, while some recent recognition

0

10

20

30

40

50

60

70

80

90

100

View Shoe Surf BriefCase Time

Ver

ific

atio

n R

ate

at 5

% F

alse

Ala

rm

Face OnlyGait OnlyFace+Gait

Fig. 18. Bar plot of verification rate at a false alarm rate of 5% of fivecovariates in USF HumanID dataset: view, shoe-type, surface, briefcaseand time.

Z. Liu, S. Sarkar / Image and Vision Computing 25 (2007) 817–832 829

6. Discussion

6.1. Underlying reasons of performance improvement from

intra-modal combination

Table 4 lists the number of subjects that are affected bythe combination. It lists the number of subjects who werefailed to be recognized by each individual modality orboth, but were successfully recognized after combination.It also lists the number of subjects who were successfullyrecognized by one modality or both, but their combinationresulted in failure. We see that the performance gained bythe combination are mostly from subjects who failed onlyfor one of the two biometrics. The combination helps littlefor subjects who were not correctly identified by both theindividual biometrics. On the other hand, we found thatfewer subjects were correctly identified in one classifierbut failed after combinations. This is especially true forthe score sum combination.

To gain some insight into the nature of the face andgait combination, we plot the decision boundary of theexperiment for score sum and Gaussian Bayesian fusionat 5% false alarm rate in Fig. 19. The axes are the nor-

malized similarity scores from each modality. We see that(i) the optimal Bayesian decision boundary is roughlylinear and is close to the score sum boundary, which

Table 4Number of subject correctly recognized or failed to be recognized by each ind

Combination scheme # failed before combination but succeedcombination

Face only Gait only

Rank sum 15 12Confidence weighted sum 8 12Score sum 14 14Bayesian rule 14 14

The total number of subjects is 39.

explains the high performance with just score sumschemes. And (ii) the non-match scores seem to beuncorrelated forming a nice, symmetric central cluster.This observation would be important for parametricmodeling studies. Gaussian models seem to be good fornon-match scores.

6.2. Investigation of gait shape changes with covariates

We studied the gait changes in terms of silhouette shapesfor any particular gait stance. For this, we employ lineardiscriminant analysis (LDA) of the silhouette shapes ofeach stance from a set of individuals, with the covariateunder study being the ‘‘class’’ or ‘‘category’’ variable. Forinstance, to study the effect of surface, we place theHMM exemplars of a particular stance from grass in oneclass and the exemplars for the corresponding stance onconcrete in the other class. The leading LDA dimensionwould be along the direction of maximum discriminability(ratio of between class to within class scatter). It would beinteresting to see how different are these directions for dif-ferent covariates.

Fig. 20 shows these dominant directions (leading eigen-vectors of LDA space) as images when comparing silhou-ettes on grass with shoe type A to (i) sequences on grasswith shoe type B, or to (ii) sequences on concrete with shoetype A, or to (iii) sequences on grass with briefcase, or to(iv) sequences on grass taken 6 months apart. The intensityin the images indicates the discriminative body part for thecorresponding conditions. All the shown frames are nor-malized by a same scaling factor for display purposes.For shoe-type change no significant bright areas are seen,indicating that the silhouette shapes in the two classes aresimilar. For surface change, we find bright spots in theleg portion, particularly in the leg swing phase leg (seeimage in column 2 and 3 of Fig. 20), suggesting changesin the minimum knee angles due to surface change. In addi-tion, the feet part is highlighted because they are occludedon grass for many subjects. For briefcase carrying, wenoticed that the bright spots are concentrated mostly onthe trunk part, suggesting that people adjust their upperbody stance when carrying weight, which is consistent toour expectations. Finally, for the temporal change, surpris-ingly, no significant pattern is seen despite the sharp dropin recognition.

ividual modality or their combination for the same-day data

ed after # succeeded before combination but failed aftercombination

Both Face only Gait only Both

4 5 1 00 1 4 03 2 1 03 1 3 0

Page 14: Outdoor recognition at a distance by fusing gait and facesarkar/PDFs/IVC-GaitFace.pdf · 2014. 9. 12. · gait characterization, the latter is emphasized, while some recent recognition

0 1 2 3 4

0

1

2

3

Face

Sco

re(Z

- N

orm

)

Face

Sco

re(Z

- N

orm

)

Face

Sco

re(Z

- N

orm

)

Face

Sco

re(Z

- N

orm

)

Face

Sco

re(Z

- N

orm

)

Face

Sco

re(Z

- N

orm

)

Score SumGaussian Bayes

Match

0 1 2 3

0

1

2

3

Score SumGaussian Bayes

Match

0 1 2 3

0

1

2

3

Score SumGaussian Bayes

Match

0 1 2 3

0

1

2

3 Score SumGaussian Bayes

Match

0 1 2 3

0

1

2

3

4 Score SumGaussian Bayes

Match

0 1 2

0

1

2

3 Score SumGaussian Bayes

Match

a b

c d

e f

Fig. 19. The decision boundary of the score sum and Bayesian rule combination rules at a false alarm rate of 5% for (a), (b) face + gait same and differentdays, (c), (d) face + face same and different days, and (e), (f) gait + gait same and different days, respectively.

830 Z. Liu, S. Sarkar / Image and Vision Computing 25 (2007) 817–832

7. Conclusion

Outdoor face recognition and gait recognition acrosssurface conditions have been found to be hard problems.

In addition, face and gait recognition over time (>3 monthsapart) is poor. We showed that stance selection can signif-icantly improve gait recognition but not to a level wherejust gait suffices. We demonstrated that biometric combina-

Page 15: Outdoor recognition at a distance by fusing gait and facesarkar/PDFs/IVC-GaitFace.pdf · 2014. 9. 12. · gait characterization, the latter is emphasized, while some recent recognition

Fig. 20. Samples of the first eigenvectors in LDA space in the changes of shoe-type, surface, and time. The gait phase is column-wise aligned.

Z. Liu, S. Sarkar / Image and Vision Computing 25 (2007) 817–832 831

tion is an effective strategy for improving performance ofthese hard biometric problems, involving comparing tem-plates across indoor and outdoors conditions and acrossmonths. We find that the score sum rule of combinationoffer the best performance. We also find the inter-modalcombination, i.e. face + gait, is better than not only theindividual modalities but also combinations of the samemodality, i.e. face + face and gait + gait, which can consid-ered to be sort of statistical control experiments to showimprovement of biometric fusion. The inter-modal combi-nation has excellent potential for overcoming the ‘‘tough’’covariates affecting individual biometrics.

Acknowledgements

This research was supported by funds from the DARPAHuman ID program (F49620-00-1-00388). Ning Yangfrom Electrical Engineering of USF helped groundtruthingthe face images. The code developed by Ross Beveridgeet al. at CSU was used to perform face recognition. We alsothank Kevin Bowyer and P. Jonathon Phillips for theircomments.

References

[1] B. Achermann, H. Bunke, Combination of face classifiers for personidentification, in: International Conference on Pattern Recognition,1996.

[2] C. BenAbdelkader, R. Cutler, L. Davis, Motion-based recognition ofpeople in eigengait space, in: International Conference on AutomaticFace and Gesture Recognition, 2002, pp. 267–272.

[3] A. Bobick, A. Johnson, Gait recognition using static, activity-specificparameters, in: Computer Vision and Pattern Recognition, 2001, pp.1:423–430.

[4] D.S. Bolme, Elastic bunch graph matching, Degree of master ofscience, Colorado State University, 2003.

[5] K. Chang, K.W. Bowyer, S. Sarkar, B. Victor, Comparison andcombination of ear and face images in appearance-based biometrics,IEEE Trans. Pattern Anal. Mach. Intel. (2003) 1160–1165.

[6] K.I. Chang, K.W. Bowyer, P.J. Flynn, Face recognition using 2d and3d facial data, in: Workshop in Multimodal User Authentication,Santa Barbara, California, 2003, pp. 25–32.

[7] K.I. Chang, K.W. Bowyer, P.J. Flynn, Multi-modal 2d and 3dbiometrics for face recognition, in: IEEE International Workshopon Analysis and Modeling of Faces and Gestures, Nice, France,2003.

[8] K.I. Chang, K.W. Bowyer, P.J. Flynn, X. Chen, Multi-biometricsusing facial appearance, shape and temperature, in: Sixth IEEEInternational Conference on Automatic Face and Gesture Recogni-tion, Seoul Korea, May 17–19, 2004.

[9] X. Chen, P.J. Flynn, K.W. Bowyer, Visible-light and infrared facerecognition, in: The proceedings of Workshop on Multimodal UserAuthentication, Santa Barbara, CA, USA, 2003, pp. 48–55.

[10] R. Collins, R. Gross, J. Shi, Silhouette-based human identificationfrom body shape and gait, in: International Conference on AutomaticFace and Gesture Recognition, 2002, pp. 366–371.

[11] N. Cuntoor, A. Kale, R. Chellappa, Combining multiple evidencesfor gait recognition, in: IEEE International Conference on Acoustics,Speech and Signal Processing, 2003.

[12] U. Dieckmann, P. Plankensteiner, T. Wagner, A biometric personidentification system using sensor fusion, Pattern Recogn. Lett. (1997)827–833.

[13] J. Hayfron-Acquah, M. Nixon, J. Carter, Automatic gait recognitionby symmetry analysis, in: 3rd International Conference on Audio-and Video-Based Biometric Person Authentication, 2001, pp. 272–277.

[14] L. Hong, A. Jain, S. Pankanti, Can multibiometrics improveperformance? in: IEEE Workshop on Identification of AdvancedTechnologies, 1999, pp. 59–64.

[15] L. Hong, A.K. Jain, Integrating faces and fingerprints for personalidentification, IEEE Trans. Pattern Anal. Mach. Intel. 20 (12) (1998)1295–1307.

Page 16: Outdoor recognition at a distance by fusing gait and facesarkar/PDFs/IVC-GaitFace.pdf · 2014. 9. 12. · gait characterization, the latter is emphasized, while some recent recognition

832 Z. Liu, S. Sarkar / Image and Vision Computing 25 (2007) 817–832

[16] A.K. Jain, L. Hong, Y. Kulkarni, A multimodal biometric systemusing fingerprints, face and speech, in: 2nd Int’l Conference on Audio-and Video-based Biometric Person Authentication, 1999, pp. 182–187.

[17] A.K. Jain, S. Prabhakar, S. Chen, Combining multiple matchers for ahigh security fingerprint verification system, Pattern Recogn. Lett. 20(11–13) (1999) 1371–1379.

[18] P. Jonathon Phillips, S. Sarkar, I. Robledo, P. Grother, K. Bowyer,Baseline results for the challenge problem of Human ID using gaitanalysis, in: International Conference on Automatic Face andGesture Recognition, 2002, pp. 137–142.

[19] P. Jonathon Phillips, S. Sarkar, I. Robledo, P. Grother, K. Bowyer,The gait identification challenge problem: data sets and baselinealgorithm, in: International Conference on Pattern Recognition,2002, pp. 385–388.

[20] A. Kale, N. Cuntoor, R. Chellappa, A framework for activity specifichuman identification, in: International Conference on Acoustics,Speech and Signal Processing, 2002.

[21] A. Kale, A. Rajagopalan, N. Cuntoor, V. Kruger, Gait-basedrecognition of humans using continuous HMMs, in: InternationalConference on Automatic Face and Gesture Recognition, 2002, pp.336–341.

[22] A. Kale, A. Roy Chowdhury, R. Chellappa, Fusion of gait and facefor human recognition, in: International Conference on Acoustics,Speech, and Signal Processing, 2004.

[23] J. Kittler, M. Hatef, R.P. Duin, J. Matas, On combining classifiers,IEEE Trans. Pattern Anal. Mach. Intel. 20 (3) (1998).

[24] L. Lee, G. Dalley, K. Tieu, Learning pedestrian models for silhouetterefinement, in: International Conference on Computer Vision, 2003.

[25] L. Lee, W. Grimson, Gait analysis for recognition and classification,in: International Conference on Automatic Face and GestureRecognition, 2002, pp. 155–162.

[26] J. Little, J. Boyd, Recognizing people by their gait: the shape ofmotion, Videre 1 (2) (1998) 1–33.

[27] Z. Liu, L. Malave, A. Osuntogun, P. Sudhakar, S. Sarkar, Towardunderstanding the limits of gait recognition, in: SPIE Processings ofDefense and Security Symposium: Biometric Technology for HumanIdentification, 2004.

[28] Z. Liu, L. Malave, S. Sarkar, Studies on silhouette quality and gaitrecognition, in: Computer Vision and Pattern Recognition, vol. II,2004, pp. 704–711.

[30] Z. Liu, S. Sarkar, Effect of silhouette quality on hard problems in gaitrecognition, IEEE Trans. Systems, Man, and Cybernetics (Part B) 35(2) (2005) 170–183.

[31] X. Lu, Y. Wang, A.K. Jain, Combining classifiers for face recogni-tion, in: IEEE International Conference on Multimedia And Expo,vol. 3, 2003, pp. 13–16.

[32] O. Melnik, Y. Vardi, C. Zhang, Mixed group ranks: Preference andconfidence in classifier combination, Rutgers Statistics DepartmentTech Report, 2003.

[33] S. Niyogi, E. Adelson, Analyzing gait with spatiotemporal surfaces,Computer Vision and Pattern Recognition (1994).

[34] P.J. Phillips, P. Grother, R.J. Micheals, D.M. Blackburn, E. Tabassi,M. Bone, Face recognition vendor test 2002, http://www.frvt.org,March 2002.

[35] P.J. Phillips, H. Moon, S.A. Rizvi, P.J. Rauss, The FERETevaluation methodology for face-recognition algorithms, IEEE Trans.Pattern Anal. Mach. Intel. 22 (10) (2000).

[36] S. Prabhakar, A.K. Jain, Decision-level fusion in fingerprint verifi-cation, Pattern Recogn. 35 (4) (2002) 861–874.

[37] L. Rabiner, B.H. Juang, Fundamental of Speech Recognition,Prentice Hall, 1993.

[38] I. Robledo Vega, S. Sarkar, Representation of the evolution offeature relationship statistics: human gait-based recognition, IEEETrans. Pattern Anal Mach. Intel, to appear.

[39] A. Ross, A.K. Jain, Information fusion in biometrics, PatternRecogn. Lett. 24 (2003) 2115–2125.

[40] B. Schiele, How many classifiers do i need?, in: International

Conference on Pattern Recognition, 2002, pp. II: 176–179.[41] G. Shakhnarovich, T. Darrell, On probabilistic combination of face

and gait cues for identification, in: Proceedings of the Fifth IEEEInternational Conference on Automatic Face and Gesture Recogni-tion, 2002.

[42] G. Shakhnarovich, L. Lee, T. Darrell, Integrated face and gaitrecognition with multiple views, Computer Vision Pattern Recogn.(2001).

[43] J. Shutler, M. Nixon, C. Carter, Statistical gait description viatemporal moments, in: 4th IEEE Southwest Symposium on ImageAnalysis and Interference, 2000, pp. 291–295.

[44] A. Sundaresan, A.R. Chowdhury, R. Chellappa, A hidden markovmodel based framework for recognition of humans from gaitsequences, in: IEEE International Conference on Image Processing,2003.

[45] R. Tanawongsuwan, A. Bobick, Gait recognition from time-normal-ized joint-angle trajectories in the walking plane, in: Computer Visionand Pattern Recognition, 2001, pp. 11:726–731.

[46] D.M. Tax, M. van Breukelen, R.P. Duin, J. Kittler, Combiningmultiple classifiers by averaging or by multiplying? Pattern Recogn.33 (9) (2000) 1475–1485.

[47] D. Tolliver, R. Collins, Gait shape estimation for identification, in:3rd International Conference on Audio- and Video-Based BiometricPerson Authentication, 2003.

[48] A. Veeraraghavan, A.R. Chowdhury, R. Chellappa, Role of shapeand kinematics in human movement analysis, in: Computer Visionand Pattern Recognition, Washington D.C., USA, 2004.

[49] L. Wang, W. Hu, T. Tan, A new attempt to gait-based humanidentification, in: International Conference on Pattern Recognition,vol. 1, 2002, pp. 115–118.

[50] L. Wang, T. Tan, H. Ning, W. Hu, Silhouette analysis-based gaitrecognition for human identification, IEEE Trans. Pattern Anal.Mach. Intel. 25 (2003) 1505–1518.

[51] Y. Wang, T. Tan, A.K. Jain, Combining face and iris biometrics foridentity verification, in: International Conference on Audio- andVideo-Based Biometric Person Authentication, 2003, pp. 805–813.

[52] L. Wiskott, J.-M. Fellous, N. Kruger, C. von der Malsburg, Facerecognition by elastic bunch graph matching, IEEE Trans. PatternAnal. Mach. Intel. 19 (7) (1997) 775–779.

[53] K. Woods, W. Philip Kegelmeyer Jr., K. Bowyer, Combination ofmultiple classifiers using local accuracy estimates, IEEE Trans.Pattern Anal. Mach. Intel. 19 (4) (1997).

[54] J. Zhou, D. Zhang, Face recognition by combining several algo-rithms, in: International Conference on Pattern Recognition, 2002,pp. III: 497–500.

[55] Y. Zuev, S. Ivanon, The voting as a way to increase the decisionreliability, Found. Inf./Decis. Fusion Appl. Eng. Probl. (1996) 206–210.