1738 IEEE TRANSACTIONS ON INFORMATION … · YANG et al.: MONOGENIC BINARY CODING: EFFICIENT LOCAL FEATURE EXTRACTION APPROACH TO FACE RECOGNITION 1739 LBP to the Gabor magnitude

1738 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 7, NO. 6, DECEMBER 2012

Monogenic Binary Coding: An Efficient LocalFeature Extraction Approach to Face RecognitionMeng Yang, Student Member, IEEE, Lei Zhang, Member, IEEE, Simon Chi-Keung Shiu, Member, IEEE, and

David Zhang, Fellow, IEEE

Abstract—Local-feature-based face recognition (FR) methods,such as Gabor features encoded by local binary pattern, couldachieve state-of-the-art FR results in large-scale face databasessuch as FERET and FRGC. However, the time and space com-plexity of Gabor transformation are too high for many practicalFR applications. In this paper, we propose a new and efficientlocal feature extraction scheme, namely monogenic binary coding(MBC), for face representation and recognition. Monogenicsignal representation decomposes an original signal into threecomplementary components: amplitude, orientation, and phase.We encode the monogenic variation in each local region andmonogenic feature in each pixel, and then calculate the statis-tical features (e.g., histogram) of the extracted local features.The local statistical features extracted from the complementarymonogenic components (i.e., amplitude, orientation, and phase)are then fused for effective FR. It is shown that the proposed MBCscheme has significantly lower time and space complexity than theGabor-transformation-based local feature methods. The extensiveFR experiments on four large-scale databases demonstrated theeffectiveness of MBC, whose performance is competitive with andeven better than state-of-the-art local-feature-based FR methods.

Index Terms—Face recognition, Gabor filtering, LBP, mono-genic binary coding, monogenic signal analysis.

I. INTRODUCTION

A UTOMATIC face recognition (FR) has been extensivelystudied in the past two decades, yet it is still an active re-

search topic, partially because of its many challenges in practicesuch as uncontrolled environments, occlusions and variations inpose, illumination, expression and time. FR has a wide range ofapplications, including entertainment, smart cards, informationsecurity, law enforcement, access control and video surveillance[1], [2]. Various methods have been proposed for facial featureextraction and classification, among which the representativesinclude subspace learning (e.g., Eigenface [5], Fisherface [6],Laplacianfaces [37], 2DPCA-based discriminant analysis [17]),discriminative models for age invariant FR [3], [4], Gabor fea-ture based classification (GFC) [7], local binary pattern (LBP)and its variants [8]–[11], [22], [23], local appearance and shapebased methods for 3-D FR [24], [38], the recently developed

Manuscript received March 29, 2012; revised July 01, 2012; accepted Au-gust 18, 2012. Date of publication September 06, 2012; date of current ver-sion November 15, 2012. This work was supported by the HK RGC PPR Grant(PolyU5019-PPR-11). The associate editor coordinating the review of this man-uscript and approving it for publication was Dr. Jaihie Kim.The authors are with the Department of Computing, The Hong Kong Poly-

technic University, Hong Kong, China (e-mail: [email protected]).Color versions of one or more of the figures in this paper are available online

at http://ieeexplore.ieee.org.Digital Object Identifier 10.1109/TIFS.2012.2217332

Fig. 1. Pipeline of local statistical feature-based face recognition (LSF-FR).

sparse representation based methods [29], [30], [35], [39], [40],and the like.Unlike many appearance-based FRmethods, which are either

holistic feature based (e.g., Eigenface and Fisherface) or localfeature based (e.g., GFC), the adoption of LBP in FR [8] triggersthe use of local statistical feature (LSF) in the FR field. Fig. 1shows the pipeline of LSF based FR. One can see that LSF-FRmethods have two main phases: statistical histogram featureextraction and statistical feature combination. Histogram fea-ture extraction could be further divided into three steps: featuremap generation (e.g., original image, Gabor feature), patternmap coding (e.g., LBP) and subregion histogram computing.Almost all the LSF-FR methods [8]–[10], [11], [22], [23] havesimilar procedures of subregion histogram computing (i.e., ex-tracting the statistical information of pattern feature in each sub-region), which shows certain robustness to local deformations(e.g., pose, expression, and occlusion) of face images. However,different schemes of feature map generation and pattern mapcoding lead to different LSF-FR methods.The well-known LBP operator [8] directly encodes the image

intensity to extract image local pattern features. In order to over-come the sensitiveness of intensity to image variations (e.g.,illumination), Zhang et al. [9] proposed to extract directionalGabor magnitude features at multiple scales, and then apply

1556-6013/$31.00 © 2012 IEEE

Downloaded from http://www.elearnica.ir www.Matlabi.ir

www.Matlabi.ir

http://www.matlabi.ir

YANG et al.: MONOGENIC BINARY CODING: EFFICIENT LOCAL FEATURE EXTRACTION APPROACH TO FACE RECOGNITION 1739

LBP to the Gabor magnitude maps for robust LSF. The studiesof Gabor phase based LSF-FR were conducted in [10], [11],[22]. Zhang et al. [10] adopted multiscale Gabor phase to takethe place of Gabor magnitude in [9], and the global and localvariations of real part and imagery part of complex Gabor fil-tering coefficients were encoded in [11]. Recently, Xie et al. [22]utilized XOR (exclusive or) operator to encode the local varia-tion of Gabor phase, and then fused Gabor-magnitude local pat-tern and Gabor-phase local pattern. This scheme achieves verypromising FR results.For the second phase of LSF-FR, many feature combination

methods have been proposed [8]–[11], [22], [26], [36]. Themostcommonly used strategy is weighting the histogram feature ex-tracted in different blocks [8]–[11]. In [26] and [36] the kernellinear discriminant analysis was used to reduce the dimensionof histogram feature, and in [22] a block-based Fisher linear dis-criminant (BFLD) method was proposed to extract the low-di-mensional discriminative features. Compared with weightingbased histogram feature combination, the discriminant subspacelearning methods [26], [36], [22] are much more preferred be-cause they can not only improve the feature discrimination, butalso greatly reduce the storage space.Despite the great success of Gabor feature-based LSF-FR

methods [9]–[11], [22], the expenses of Gabor transformationsin these methods are also rather high, including both the compu-tational cost and the storage space, because the Gabor transfor-mations of an image need to be implemented at multiple scalesand orientations. For example, 5-scale and 8-orientation Gabortransformations are used in [7], [9]–[11], [22]. Therefore, for asingle image 80 convolutions (40 real convolutions and 40 im-agery convolutions) are required to generate the Gabor features(e.g., 40 Gabor magnitude feature maps). The many convolu-tions and the Gabor feature maps make Gabor feature genera-tion a process of high time and space complexity, preventing itswide acceptance in practical applications.In our previous work [23], we investigated the use of

monogenic signal analysis [12] for LSF-FR, and our proposedmonogenic signal decomposition based local feature extractionhas much lower time and space complexity than the Gaborfiltering based methods [9], [11] but leads to competitiveperformance in FR. As a two-dimensional generalization ofone-dimensional analytic signal representation, monogenicsignal representation decomposes an image into amplitude,phase, and orientation components, which represent the signalenergetic, structural, and geometric information, respectively[12]. Different from Gabor transformation, monogenic signalrepresentation does not use steerable filters to extract mul-tiple-orientation features, and thus it has much lower time (e.g.,3 convolutions on each scale) and space (e.g., 3 feature mapson each scale) complexity than Gabor transformations.To further exploit the discrimination information embedded

in the amplitude, phase and orientation components of mono-genic signal representation, in this paper we propose anefficient and effective LSF-FR scheme, namely monogenicbinary coding (MBC), which encodes the local pattern indifferent monogenic feature maps. The BFLD [22] is thenadopted to extract the low-dimensional discriminative featuresfrom the generated MBC feature maps of amplitude, phase and

orientation, respectively. Finally, the three types of LSFs arefused for face classification. One of the most important advan-tages of the proposed MBC scheme is its low time and spacecomplexity while achieving very competitive performance withthe Gabor-feature based LSF-FR methods [9]–[11], [22]. Theproposed MBC method is validated on benchmark large scaleface databases, including Multi-PIE [49], FERET [14], [41],FRGC 2.0 [13] and PolyU-NIR [28]. The experimental resultsverified the efficiency and effectiveness of the proposed MBCbased LSF-FR method.The rest of the paper is organized as follows. Section II intro-

duces briefly the monogenic signal representation. Section IIIpresents in detail the MBC algorithm. Section IV describes thewhole scheme of MBC based LSF-FR. Section V presents theexperimental results, and Section 6 concludes the paper.

II. MONOGENIC SIGNAL REPRESENTATION

Monogenic signal is an important generalization of the ana-lytic signal from one dimension (1-D) to two dimensions (2-D),and it preserves the desired properties of 1-D analytic signal.One prominent property of monogenic signal representationlies in that its feature extraction process (phase, amplitudeand orientation estimation) is truly rotation-invariant [32].Monogenic signal representation has been used in many appli-cations such as face recognition [23], scale-space analysis [31],and texture classification [33]. In this section, we first brieflyintroduce the analytic signal, and then describe the monogenicsignal representation.

A. Analytic Signal Representation

The analytic signal is a complex-valued representation in 1-Dsignal processing. Given a real valued 1-D signal , its com-plex analytic signal is defined as

(1)

where , and refers to theHilbert transform kernel in the spatial domain. In general theHilbert transform is performed in frequency domain and the re-sponse of in frequency domain is

.The analytic signal representation allows one to retrieve the

local amplitude and local phase :

(2)

(3)

where the local phase is invariant with respect to the localenergy of the signal but changes if the local structure varies,and the local amplitude is invariant with respect to thelocal structure but represents the local energy. Table I shows therelation between feature type and phase angle [18], respectively.

B. Monogenic Signal Representation

Monogenic signal was introduced by Felsberg and Sommerin 2001 [12] to generalize the analytic signal from 1-D to 2-D.The monogenic signal is built around the Riesz transform whichis a natural multidimensional extension of the Hilbert transform


Fig. 2. Toy examples of monogenic phase. (a) Two image structures; (b) their shapes shown in 3-D; and (c) the monogenic phases of them.

TABLE ISOME 1-D FEATURES AND THEIR CORRESPONDING PHASE ANGLES (DEGREE)

[19], [32]. The Riesz transform is the scalar-to-vector signaltransformation whose frequency response is . In 2-D,the Riesz transform can be expressed as

(4)

where with is the input signal, and filtersand are characterized by the 2-D frequency responses

and with . It is easyto obtain the spatial representation of Riesz kernel, which is

(5)

For an image , its monogenic signal is defined as thecombination of and its Riesz transform

(6)

In (6), is called the real part of the monogenic signal, andand the two imagery parts. Based on the real and

imagery parts, the original image signal could be orthogo-nally decomposed into three components, local amplitude, localphase and local orientation, which are defined as [12]

(7)

where describes the local energetic information, describesthe local structural information, and describes the geometricinformation.The local phase and local orientation compose the monogenic

phase of a 2-D signal [12]. In contrast to 1-D case, the mono-genic phase includes additional geometric information, whichindicates the main orientation in ideal case (i.e., the signal isconstant in one direction). Two toy examples of image features(e.g., step and roof) are shown in Fig. 2. Fig. 2(a) shows the stepedge and roof edge features in 2-D image, with Fig. 2(b) the cor-responding 3-D shapes of them. The geometric information (i.e.,the main orientation) is illustrated by local orientation shown inFig. 2(a), with the structure information shown by local phasein Fig. 2(c).

C. Multiresolution Monogenic Signal Representation

In practice, the signals are of finite length, and we need toperform band-pass filtering to an image before applying theRiesz transforms. On the other side, as indicated by [12], band-pass filtering has the benefits of maintaining the invariance-equivariance property of signal decomposition. Here the invari-ance-equivariance property [21], [12] means that energy (localamplitude) and structure (local phase on local orientation) areindependent information.Classical Gabor filters are commonly used as band-pass fil-

ters for that they could offer the best simultaneous localizationof spatial and frequency. However, these filters overlap more inlow frequencies than in high frequencies, leading to a nonuni-form coverage of the Fourier domain. Moreover, Gabor filtershave nonzero means, and thus they are affected by direct cur-rent (DC) components [34]. An alternative to the Gabor filtersis the log-Gabor filters proposed in [20]. The log-Gabor fil-ters discard the DC components and can overcome the band-width limitation of traditional Gabor filters. Meanwhile, it has aGaussian shaped response along the logarithmic frequency scale


Fig. 3. (a) Face image and its monogenic representation on one scale: (b) am-plitude component, (c) orientation component, and (d) phase component.

instead of a linear one. This allows more information to be cap-tured in the high frequency band and endows desirable high passcharacteristics.The frequency response of log-Gabor filters can be described

as

(8)

where is the center frequency and is the scaling factor ofthe bandwidth. In order for filters with the constant shape ratio,we set as a constant as suggested in [43]. It is worth notingthat a log-Gabor filter with a 3-octave bandwidth has the samespatial width as that of a 1-octave Gabor filter, demonstratingits ability to capture broad spectral information with compactsupport [42].The band-pass monogenic signal representation is defined as

(9)

where , and represents the 2-Dinverse Fourier transform. Similarly, the local amplitude ,orientation and phase of a 2-D signal can be computedby [12]

(10)

Since log-Gabor filters are band-pass filters, usually multi-scale monogenic representation is required to fully describe asignal. In multiscale log-Gabor filters, the parameters andcan be rewritten as

(11)

where is the minimal wavelength, is the multiply factorof wavelength, is the scale index, and is the ratio .As an example, Fig. 3 shows the monogenic representation

of a face image on one scale. We can see that the amplitudecomponents reflect the local variation of gray value; while thefacial local structures are well captured in the local phase andorientation components.

D. Multiscale Monogenic Signal Representation versusMultiscale Gabor Wavelet Representation

Traditional 2-D Gabor filters are highly jointly localizedin spatial location, orientation and frequency. Usually 5-scaleand 8-orientation Gabor filters are used in face recognition

[7], [9]–[11], [22], [36], where the Gabor features at eachorientation are extracted by using steerable filters.It is easy to see that both multiscale 2-D Gabor wavelet (i.e.,

Gabor filters) representation and multiscale monogenic signalrepresentation are redundant. For multiscale Gabor wavelets,the redundancy comes from two aspects: one is the redundancyacross multiple scales, and the other is the redundancy acrossmultiple orientations on each scale. However, for multiscalemonogenic signal representation, the redundancy only comesfrom the multiscale representation, because on each scale thelocal amplitude, local phase, and local orientation are orthog-onal. Althoughmultiscale Gabor filtering is effective for FR, thehigh cost of Gabor representation (80 convolutions for complexGabor filtering across 5 scales and along 8 orientations) limitsits application in practice. On the other side, although the multi-scale monogenic signal representation has no redundancy alongorientation, this does not mean that it is not effective for facerepresentation. Our previous work in [23] has shown that thecombination of orientation and amplitude components of mono-genic signal representation could achieve even better recogni-tion rates than the multiscale Gabor wavelet representation butwith significantly lower time and space complexity.

III. MONOGENIC BINARY CODING

In this section, we describe the proposed algorithm ofcoding the monogenic signal features, which contains twoparts: monogenic local variation coding and monogenic localintensity coding. The first part encodes the variation betweenthe central pixel and its surrounding pixels in a local patch,while the second part encodes the value of central pixel itself.Obviously, these two parts are complementary.

A. Binary Coding of Monogenic Local Variation

Refer to Fig. 4, for each local patch (e.g., 3 3 neighborhood)a typical local variation coding usually consists of the followingthree stages: binary quantification, binary sequence generation,and binary sequence to decimal value conversion. In the pro-cedures of local variation coding, the binary sequence genera-tion and binary sequence to decimal value conversion are thecommon stages in most binary pattern coding methods, such asLBP [8], [16], LGBP [9], HGPP [11] and LGXP [22]. An ex-ample of these two steps is shown in the second row of Fig. 4.The decimal value of the binary code is defined as follows:

(12)

where denotes the central pixel position in the local patchof feature map, denotes the binary code (0 or 1) of the thneighbor of the central pixel and is the number of involvedneighbors (we fix ).However, the first step of local variation coding, i.e., binary

quantization, usually depends on the property of feature map,and should be designed based on the physical meaning of thefeature. For example, in the binary quantization of LBP [8]which regards original image as feature map, the surrounding


Fig. 4. Typical process of local variation coding.

pixel is coded as 1 if its value is not less than the value of centralpixel; otherwise, the surrounding pixel is coded to 0. In HGPP[11], the surrounding pixel is coded to 0 if it has the same signas that of the central pixel; otherwise, it is quantified to 1. Themonogenic signal representation has three components: localamplitude, local orientation and local phase. The local orien-tation and local phase compose monogenic phase. Accordingto the properties of local amplitude and monogenic phase, dif-ferent binary quantization strategies are utilized to generate thebinary code.1) Local Variation of Monogenic Amplitude: The local am-

plitude of monogenic signal representation is a measurement oflocal energetic information. For example, high amplitude usu-ally indicates higher energetic local features (e.g., edges, lines,textures). Therefore, the local variation of monogenic amplitudecould be coded by comparing the amplitude value of central lo-cation with those of its neighbors, which is similar to LBP [8].For a local patch, let’s denote by the amplitude value ofthe central pixel , and by the amplitude value of the thneighbor of the central pixel. Usually, we normalize the rangeof the amplitude to [0, 255]. Then the amplitude binary code ofthe th neighbor is defined as

(13)

Consequently, the amplitude binary code, denoted by, of the central pixel is formed as

.2) Local Variation of Monogenic Phase: For 2-D signals,

monogenic phase is decomposed into local orientation andlocal phase. The local orientation describes the geometricinformation, more specifically the main orientation of the localstructure, and the local phase describes the information of thelocal structure along the main orientation (i.e., structure typeas shown in Table I). Both local orientation and local phaseare represented by angle degree. In this paper, we unwrap bothof them into the range of [0, 360). Because the angle of localphase or local orientation indicates the structural or geometricinformation, it is more meaningful to compare the feature typethan to directly compare the angle’s degree. Therefore, thebinary coding of local phase and local orientation variationsshould adopt a different strategy from that of local amplitude.Inspired by the coding strategy of Gabor phase in HGPP [11]and LGXP [22], we consider two cases when comparing themonogenic features: one is that the local phases/orientations

are the same or close enough to each other and the other is thatthe local phases/orientations are different.In order to increase the robustness of binary coding, we firstly

divide the range [0, 360) into intervals (we fix in thispaper). If two phases/orientations belong to the same interval,they are believed to be similar local features; otherwise, theyare regarded as different. For a local patch, denote by thelocal phase of the central pixel , and by the phase of itsth neighbor. The phase binary code of the th neighbor is

(14)

where is the quantification function, defined as

(15)

Then the local phase binary code, denoted by , of the centralpixel is formed as .Similarly, let’s denote by the local orientation of the

central pixel , and by the local orientation of its thneighbor. The orientation binary code of the th neighbor is

(16)

And then the local orientation binary code, denoted by, of the central pixel is formed as

.

B. Binary Coding of Monogenic Local Imagery Intensity

Almost all the LSF-FR methods, including LBP [8], LGBP[9] and LGXP [22], code the variation information of a patchfor pattern recognition but omit the feature information of thecentral pixel in the local patch. In fact, the information of cen-tral pixel itself can have discriminative information which maynot be carried out by the local variation. For instance, the twopixels having the same local variation pattern may have verydifferent intensities. In [25], Guo et al. proposed to quantify thelocal intensity of central pixel based on the average gray level ofthe whole image for texture classification; however, this binaryquantization scheme may not be robust, especially for FR withillumination changes.In the monogenic signal representation, the Riesz transform is

asymmetric and it can suppress the DC component [12]. Suchproperties make the imaginary part of Riesz transform able toenhance the feature intensity while being robust to disturbances


Fig. 5. Quadrant bit coding of monogenic imaginary feature at each location.

such as illumination changes. Therefore, in this paper we pro-pose to use the imagery part of monogenic signal representationto encode the local feature intensity information of the centralpixel. Since there are two components (e.g., and )in the imagery part of monogenic signal representation, the bi-nary coding leads to a quadrant bit for the monogenic imageryfeature at a location, as illustrated in Fig. 5.Refer to Fig. 5, the imagery part of monogenic signal

representation at pixel is encoded into two bits,, by the following rule:

(17)

where and (refer to (9) please) are respectively thehorizontal and vertical Riesz transform outputs of monogenicsignal representation.

C. Monogenic Binary Code (MBC)

With the coding procedures described in Sections III-A andIII-B, for each of the amplitude, phase and orientation compo-nents of the monogenic signal representation, we can have amonogenic binary code (MBC) map. We denote by MBC-A,MBC-P and MBC-O the code maps for amplitude, phase andorientation, respectively. At each location, MBC-A, MBC-Pand MBC-O are formed by combining the local imagery inten-sity code and the local variation code as follows

(18)

(19)

(20)

It can be seen that when the 8 closest neighbors of a pixelare involved in local variation coding, each MBC pattern willhave 10 bits. Then the number of possible patterns for eachMBC is 1024, which is larger than that of previous binary codingmethods such as LBP, LGBP, HGPP and LGXP (refer to Table IIplease). The more patterns can characterize more accurately thelocal signal structure; at the same time, it will also make thegenerated histogram in each subregion a little sparser. However,

TABLE IINUMBERS OF PATTERNS IN MBC-X , LBP, AND

GABOR-RELATED CODING METHODS (E.G., LGBP, HGPP, AND LGXP) WHENTHE EIGHT CLOSEST NEIGHBORS OF A PIXEL ARE INVOLVED IN CODING

this sparser histogram can still have enough (even higher) dis-crimination capability and result in very good FR performance(please refer to the section of experimental results).The above three MBC maps can be used individually as the

feature for face classification, and they can also be used togetherto enhance the FR performance, as we will describe in the nextsection.

IV. FACE RECOGNITION BY MONOGENIC BINARY CODE

A. Face Recognition by Individual MBC

The statistical information of image local areas can be de-scribed by local histograms, which are robust to the image oc-clusion and variations of pose, expression, and noise, etc. Aftercomputing the multiscale MBC-A, MBC-P and MBC-O fea-ture maps of the face image, we can construct three histograms,

and , through the following pro-cedures. For each kind of pattern map on each scale, it is parti-tioned into multiple nonoverlapping regions, and then the localhistogram is built for each subregion. Finally, all the local his-tograms across different scales and different regions are con-catenated into a single histogram vector to represent the faceimage. Formally, the and his-tograms are formulated as

(21)

where is the number of subregions on each scale; denotesthe scale index; and is the histogram of feature map inthe th subregion on scale .To be identical to other LSF based FRmethods [9]–[11], [22],

the similarity between two histograms and is defined astheir intersection:

(22)

where is the number of bins in the histogram, and anddenote the frequency of the th bin in and , respec-

tively. By measuring the histogram intersection, the similarityof and histogram featuresfrom gallery and probe face images could be computed forclassification.

B. Face Recognition by Fused MBC

It is obvious that different subregions in the face imagewill have different discriminative power for FR. Therefore,many methods [8]–[11], [23] assign different weights to thehistograms of different subregions before matching. This isactually a feature combination process and could improve


the FR rate. However, this weighting scheme cannot reducethe feature dimensionality. Considering that the kernel lineardiscrimination analysis adopted in [26], [36] has high timecomplexity, we adopt the BFLD [22] scheme to reduce thehistogram feature dimension while enhancing its discrimina-tion. The MBC feature map is first partitioned intoblocks, and then each block is further partitioned intosubregions. In each subregion, the histogram of feature map oneach scale is built as described in Section IV-A. The histogramfeature of each block is the concatenation of all the histogramsof its subregions. Then for the histogram feature of each block,LDA [6] is used to learn a projection matrix from the trainingset, and then the dimensionality reduced histogram feature canbe obtained by using this projection matrix.Denote by and the dimensionality re-

duced features of in the th block of a probe faceimage and a gallery image, respectively, where

. Since and arenot histogram features anymore, we compute their similarityusing the cosine distance:

(23)

where is the inner product operator. The cosine distance iswidely used in face recognition [22], [26], [36], and in [45] ithas been reported that the cosine distance performs better than

and Mahalanobis distances in most face recognition ex-periments. In addition, using cosine distance could also makethe comparison with [22] fair. The similarity between the wholeprobe and gallery images is computed as

(24)

Themonogenic amplitude, phase and orientation componentsdescribe the energetic, structural and geometric information ofa 2-D signal, respectively, and they are complementary to eachother. Therefore, it is useful to fuse these three types of featuresfor a more robust FR result. In this paper, we use the simpleweighted average to fuse the three similarities, i.e.,

and , as follows

(25)

where is the weight. Note that in (25) we let the weights as-signed to and equal because the local orien-tation and local phase have almost the same performance in var-ious databases. Therefore, there is only one weight parameter,i.e., , to be determined in our fusion scheme, and accordinglywe call the FR scheme with similarity score as fusedMBC, denoted by MBC-F.

V. EXPERIMENTAL RESULTS

In this section, we first discuss the parameter settings, andthen give the time and space complexity comparison betweenMBC and some representative and state-of-the-art LSF-FRmethods, including LGBP [9], HGPP [11] and LGXP [22].Finally, we evaluate the proposed MBC algorithm on four large

scale benchmark face databases: Multi-PIE [49], FERET [14],[41], FRGC 2.0 [13] and PolyU-NIR [28]. The Matlab sourcecode of the proposed MBC algorithm can be downloaded athttp://www4.comp.polyu.edu.hk/~cslzhang/code.htm.

A. Parameter Settings

There are some parameters in different stages of MBC, e.g.,multiscale log-Gabor filtering, subregion histogram computing,and feature combination by BFLD [22]. In order to be consistentwith other methods and provide a fair comparison, no prepro-cessing is used in all the following experiments.Refer to Section II-C, the parameters in multiscale log-Gabor

filtering are theminimal wavelength , the ratio factor ,themultiply factor of wavelength and the number of scale (i.e.,the scale index ). In our paper, if no specific instruction, we fix

, and the number of scale as3. Because one-scale log-Gabor filter could capture more fre-quency information than one-scale Gabor filter [42], [43], thenumber of scales in multiscale log-Gabor filtering is often set to3 [23], [47], [48]. Since the number of scales will affect muchthe time and space complexity of the proposed method, we per-form two experiments on Fb and DupI datasets of the FERETdatabase to discuss the selection of scale parameter .The detailed experimental settings on Fb and DupI are pre-

sented in Section V-D. We change the number of scales from 1to 4, and plot the FR rates of MBC versus the scale parameterin Fig. 6. It could be seen that in most cases (especially for thetest in DupI), the recognition rate will benefit from the increaseof from 1 to 3 or 4. It is interesting to see that in the test on Fb,there are some fluctuations as the scale number increases. Thisis possibly because the test data in Fb has little variation com-pared to the gallery set Fa, so it is not necessary to use multiscalerepresentation. Nevertheless, considering the balance of recog-nition rate and complexity, particularly in more challenging FRscenarios (e.g., DupI), we prefer to set the scale number to 3 inall the experiments.In the stage of subregion histogram computing, the whole

image needs to be divided into blocks, which is fur-ther divided into subregions. In each subregion, thehistogram is extracted; and in each block, the discriminative fea-ture is extracted by BFLD. In the stage of fusing, three similari-ties, i.e., and , will be averaged byusing the weight . If no specific instruction, we fix

, and the dimensionality of discriminativefeature in each block as 200.

B. Time and Space Complexities

In the representative LSF based FR methods [8]–[11], [22],[23], the feature map generation, pattern map coding, histogramcomputing, and histogram similarity computing are the maintime-consuming parts. The number of image convolution di-rectly decides the computational burden of feature map gen-eration, while the number of pattern maps directly affects therunning time of other parts and space complexity of algorithms.Fig. 7 compares the time and space complexity of MBC (in-cluding MBC-A, MBC-O and MBC-P) and MBC-F with somestate-of-the-art LSF-FR methods, including LGBP [9], HGPP


Fig. 6. Curves of FR rate versus the number of scales by the proposed MBC methods. (a) MBC on Fb; (b) MBC BFLD and MBC-F on Fb; (c) MBC on DupI;(d) MBC BFLD and MBC-F on DupI.

Fig. 7. Comparison of the time and space complexities among LGBP [9],HGPP [11], LGXP [22], LGB&XP [22] (fusing LGXP and LGBP), and theproposed MBC (MBC denotes any one of MBC-A, MBC-P, and MBC-O) andMBC-F.

[11], LGXP [22], and LGB&XP (fusing Gabor magnitude andphase parts in [22]).All the methods of LGBP, HGPP, LGXP, and LGB&XP use

Gabor transformations (with 5 scales and 8 orientations) to gen-erate feature maps, which means that 80 convolutions (40 realconvolutions and 40 imagery convolutions) are needed in pro-cessing a single image. However, for the process of feature gen-

eration in the proposed MBC and MBC-F, only 9 convolutionsare executed (1 log-Gabor filtering and 2 Riesz transformationson each scale) [12], [23], [32], [33]. Fig. 7 clearly shows thatthe time complexity of MBC in feature map generation is only1/9 times of that of [9]–[11], [22].In the pattern map coding, both LGBP and LGXP encode

40 feature maps into 40 pattern maps, and HGPP produces90 pattern maps, including 10 global phase pattern maps and80 local pattern maps, with LGB&XP outputting 40 Gabormagnitude and 40 Gabor phase pattern maps. In comparison,our proposed MBC-A, MBC-P and MBC-O each only needs toencode 3 monogenic pattern maps. Even the fusing version ofMBC, i.e., MBC-F, only has 9 pattern maps together. Thereforethe proposed MBC has significantly lower time and spacecomplexity than the previous Gabor-based ones [9]–[11], [22]in pattern map coding, as illustrated in Fig. 7.Considering that the time and space complexity for histogram

generating depends on the number of patternmaps, the proposedMBC methods will also need much less computational cost andspace in histogram feature generation than [9]–[11], [22].In similarity computing, when the original histogram features

are used, according to the number of pattern maps, the timeand space complexity of proposed MBC scheme is much lessthan that of LGBP, HGPP and LGXP. However, when BFLD[22] is used to reduce the dimensionality of histogram features,the dimensionality of projected discriminative features for all


Fig. 8. Subject in Multi-PIE database. (a) Training samples with only illumina-tion variations. (b) Testing samples with surprise expression and illuminationsin Session 2. (c) Testing samples with squint expression and illuminations inSession 2. (d) and (e) Testing samples with smile expression and illuminationvariations in Sessions 1 and 3, respectively.

methods, including MBC and LGBP [9], HGPP [11], LGXP[22], is nearly the same as each other. Therefore, when cou-pled with BFLD [22], all the methods would have similar timeand space complexity in the phase of similarity computing. Notethat BFLDwill increase the computational complexity in offlinelearning stage, but it will not add computational burden in theonline recognition stage.Wewill compare the empirical runningtime of the competing methods in Section V-C.

C. Experiments on the Multi-PIE Expression Database

In this section, we use the large-scale Multi-PIE [49] to verifythe effectiveness and efficiency ofMBCwithout considering di-mensionality reduction. All the 249 subjects in Session 1 areused as the training set in this experiment. To make the FR morechallenging, four subsets with both illumination and expressionvariations in Sessions 1, 2 and 3 are used for testing. For thetraining set, we use the 7 frontal images with extreme illumi-nations and neutral expression (refer toFig. 8(a) for examples). For the testing set, 4 typical frontal im-ages with illuminations and different expressions(smile in Sessions 1 and 3, squint and surprise in Session 2) areused (refer to Fig. 8(b) for examples with surprise in Session2, Fig. 8(c) for examples with squint in Session 2, Fig. 8(d) forexamples with smile in Session 1, and Fig. 8(e) for exampleswith smile in Session 3). We crop and normalize the images to100 82. Here the programming environment isMatlab version

TABLE IIIFACE RECOGNITION RATES (%) AND AVERAGE RUNNING TIME (SECOND)

ON THE MULTI-PIE EXPRESSION DATABASE

Fig. 9. (a) Examples of Gabor feature and monogenic amplitude feature maps.(b) Similarity scores by HGPP and MBC-A, where the left image is a templateimage, the middle image is a test image which is from the same class as thetemplate image, and the right test image is from a different class.

R2011a. The desktop used is of 1.86 GHz CPU and with 2.99GRAM.Here LBP [8] is used as the baseline method, and we com-

pare the proposed MBC with the state-of-the-art methods, suchas LGBP [9], HGPP [11] and LGXP [22]. Table III lists the facerecognition rates and average running time of all the competingmethods. It is clear to see that LBP is the fastest one, but withthe worst recognition rate. The proposedMBC has very compet-itive recognition rates as LGBP, HGPP and LGXP. EspeciallyforMBC-A, it has the highest recognition rates in the testing setsof “smile-S1”, “smile-S3” and “squint-S2”. The running time ofMBC is about 1 second for each image, less than 1/10 times ofthat of LGBP, HGPP and LGXP. It is noted that HGPP has thelongest running time. The running time results are very consis-tent with our analysis in Section V-B.In this experiment, it can be seen that MBC-A outperforms

LGBP and LGXP in all cases. Particularly,MBC-A outperformsmuch LGBP, HGPP and LGXP on Smile-S3. This is becausethat on the Smile-S3 dataset, the Gabor features from differentsubjects may not have enough discrimination. Fig. 9(a) showssome examples of Gabor feature and monogenic amplitude fea-ture maps. One can see that the Gabor feature maps of the twoimages, which are from two different classes, are very similar,while the monogenic amplitude feature maps of them can haveclear difference. Fig. 9(b) shows the similarity scores between


Fig. 10. Some samples of normalized face images of FERET.

TABLE IVRECOGNITION RATES (%) BY USING LOCAL VARIATION CODE ONLY ANDBY USING THE COMBINATION OF LOCAL VARIATION CODE AND IMAGERY

INTENSITY CODE ON THE FB AND DUP1 PROBE SETS

two test images and a template image byHGPP andMBC-A.Wecan see that MBC-A could lead to the correct classification, butGabor feature based HGPP will lead to a misclassification. Notethat the MBC-A similarity score values are usually much lowerthan those of HGPP because the histogram feature of MBC-Ais sparser than that of HGPP. However, this does not reduce thediscrimination capability of MBC-A.

D. Experiments on the FERET Database

The FERET database [14], [41] is often used to validate an al-gorithm’s effectiveness because it contains many kinds of imagevariations. By taking Fa subset as a gallery, the probe subsetsFb and Fc are captured with expression and illumination vari-ations. Especially, DupI and DupII consist of images taken atdifferent times. For some people, the time intervals are morethan two years between the gallery set and DupI or DupII set.The facial portion of each image is cropped based on the loca-tions of eyes. The cropped face images are then normalized to150 130 pixels. Fig. 10 shows some examples of normalizedface images. Here the projection matrix of BFLD is learnt fromthe training set which has 1 002 frontal images from 429 sub-jects and we set the number of blocks as .In order to illustrate that the combination with local imag-

inary intensity code could improve much the local variationcode, in Table IV we report the recognition rates by using localvariation code only and by using the combination of local vari-ation code and local intensity code on the probe sets of Fb andDupI. It can be clearly seen that in all the tests, the latter oneoutperforms much the former one. Without BFLD, the com-bination with local intensity code could improve in average4.6% the recognition rate for MBC-A, 1.8% forMBC-P, and 3%for MBC-O, respectively. With BFLD, the combination with

TABLE VRECOGNITION RATES (%) OF THE PROPOSED MBC METHODS

ON THE FERET PROBE SETS

TABLE VIRECOGNITION RATES (%) OF DIFFERENT METHODS ON THE

FERET FACE DATABASE

local intensity code could still improve in average 3.8% forMBC-A, 2.2% for MBC-P, and 1.5% for MBC-O, respectively.This shows that the local imaginary code could provide com-plementary information to the local variation code.Table V lists the FR results by each individual MBC (e.g.,

MBC-A, MBC-P or MBC-O), their combinations with BFLD,and the fused MBC (MBC-F). Meanwhile, we give the resultsby LGBP and LGBP BFLD reported in [22] as the baselinemethods. From Table V, we can see that all the proposed mono-genic codes, MBC-A, MBC-P or MBC-O, could achieve higherrecognition rate than LGBP, especially with the biggest im-provement of about 11% in DupI and DupII for MBC-O. BFLDcould improve the performance of all methods. When com-bined with BFLD, MBC-A, MBC-P or MBC-O could achievehigher recognition rates than the method LGBP BLFD inmost cases. In this experiment, the weight in MBC-F is setto 0.45 since the amplitude component performs a little worsethan the monogenic phase part.The FR results of MBC-F and other state-of-the-art LSF-FR

methods, including the feature fusion version of LGXP [22],Tan’s method in [26], Zou’s method in [27], HGPP [11], LGBP[9] and LBP [8], are compared in Table VI. It can be seen thatthe proposed MBC-F is the best one for Fb and Fc. In DupIand DupII, the recognition rate of MBC-F is slightly lower thanthe method in [22]. However, it should be noted that our pro-posed MBC-F method has much lower time and space com-plexity than the method in [22]. Therefore, the overall perfor-mance of MBC-F is better.

E. Experiments on the FRGC Database

FRGC 2.0 [13] is a very large scale face database, which con-sists of 50 000 recordings divided into training and validation


Fig. 11. Some samples of normalized face images of FRGC 2.0.

TABLE VIIVERIFICATION RATES BY THE PROPOSED MBC METHODS ON FRGC 2.0

WHEN %

partitions. The FRGC challenge problem consists of six exper-iments, which evaluate FR performance on various conditions,e.g., controlled or uncontrolled lighting and background, 3-Dimagery, and multistill imagery [13]. There are three masks,which could be used to generate three receiver operator char-acteristic (ROC) curves. ROC1 could verify FR on the imagescollected within semesters, while ROC2 and ROC3 evaluate FRon the images collected within a year and between semesters,respectively.In our experiments, we learn the projection matrix of BFLD

based on the given training set (12 776 images of 222 persons)and implement our tests on the 1st and 4th experiments, denotedby Exp.1 and Exp.4. Exp.4 is much more difficult than Exp.1since the query images in Exp.4 are taken under uncontrolledenvironment. All the face images are cropped based on the man-ually located eye centers provided by the original database andnormalized to 168 128 pixels. Some samples are shown inFig. 11. Here is set as 4. In order for a stable and effec-tive discriminative feature extraction by dimensionality reduc-tion, BFLD usually requires that the input should have morefeatures. For the feature of monogenic signal representation, thelocal amplitude (i.e., in (10)) is a compact composition ofthree partial features (i.e., , and in (10)). In orderto capture more information in amplitude, for MBC-A BFLDin Exp4, instead of directly encoding on the amplitude, we firstencode the local variations of , and , and thenin each block we concatenate the three kinds of histogram fea-tures. Finally we apply BFLD to the concatenated higher dimen-sional histogram feature. Although a higher dimensional his-togram feature of monogenic amplitude is generated, the timeand space complexity is only slightly increased, and the dimen-sion of the final discriminative feature generated by BFLD doesnot increase.Similar to Table V, we list the verification rates of different

MBC based methods when the FAR is 0.1% in Table VII. Itcan be seen that MBC could achieve much better performancethan LGBP in most cases. However, for Exp.4, all MBC and

Fig. 12. ROC3 curves of different methods on Exp4 of FRGC 2.0 databases.

TABLE VIIIVERIFICATION RATES (%) BY DIFFERENT METHODS ON FRGC 2.0

FACE DATABASE

LGBP methods have low validation rates, not larger than 20%.When BLFD is used to extract the discriminative features, allthe methods have great improvements, e.g., more than 60% forMBC-A and LGBP.We could also see that MBC BLFD, espe-cially MBC-A BFLD, has very competing performance withLGBP but with much lower complexity. The ROC3 curves ofMBC BFLD and MBC-F are plotted in Fig. 12, which clearlyshow that MBC-F could improve the final performance byfusing the complementary monogenic information.The recognition results of MBC-F and other state-of-the-art

methods, including the fusion version of LGXP [22], Tan’smethod in [26], Huang’s method in [44], Liu’s method in[45], Liu’s method in [46], Su’s method in [15], are listed inTable VIII. In the table, all the results for the comparison aredirectly cited from the related papers. We can see that MBC-Fachieves very close results to the best results reported in [15]and [22]. However, considering that both Xie’s method in [22]and Su’s method in [15] adopt multiscale and multiorientationGabor transformation to generate features, MBC-F has muchlower complexity than them, which further validates the effec-tiveness of the proposed techniques.

F. Experiments on the Polyu NIR Database

The PolyU-NIR face database [28] is a large scale near-in-frared face database, consisting of 350 subjects, each subjectproviding about 100 samples. Various variations of face images,such as expression, pose, scale, focus, time, are involved in thecapturing. In this paper, we do three tests following the experi-ment setting in [28]. These three subsets are Exp. 1 (419 trainingsamples, 574 gallery samples and 2 762 probe samples), Exp. 2(1 876 training samples, 1 159 gallery samples and 4 747 probe


Fig. 13. Some samples of normalized face images of Exp.3 on the PolyU NIRdatabase.

TABLE IXRECOGNITION RATES BY THE PROPOSED MBC METHODS ON THE

POLYU-NIR DATABASE

TABLE XRECOGNITION RATES (%) BY DIFFERENT METHODS ON THE

POLYU-NIR PROBE SETS

samples), and Exp. 3 (576 training samples, 951 gallery sam-ples and 3 648 probe samples). Some samples of Exp. 3 areshown in Fig. 13. Because the face images are normalized to64 64 pixels, which are much smaller than that of Multi-PIE,FERET and FRGC, we set and the di-mensionality of discriminative feature as 100.Table IX lists the recognition rates by MBC methods and the

baseline method, Gabor-DBC, proposed in [28]. From Table IX,we can see that all the MBCmethods are better than the baselinemethod; and BFLD could further improve MBC’s performance.The recognition results of MBC-F and other state-of-the-art re-sults reported in [28] are shown in Table X. It can be seen thatthe proposed MBC-F outperforms the second best method by1.2% in Exp.1, 2.2% in Exp.2 and 2.8% in Exp.3. Meanwhile,compared to these Gabor feature based methods, the proposedMBC methods have much lower time and space complexity.

G. Discussions

From the above complexity analysis and experimental re-sults, it can be seen that the proposed MBC has competitiveaccuracy with or higher accuracy than the Gabor feature basedmethods (e.g., LGBP [9], HGPP [11], and LGXP [22]), while ithas much lower cost (e.g., less than 1/10 times running time onthe MPIE database). This merit of MBC comes from two parts:the power of monogenic signal representation and the effective-ness of MBC’s pattern coding strategy.

Like Gabor wavelet representation, monogenic signal repre-sentation can effectively extract the discrimination informationembedded in the original signal by decoupling the local energy(local amplitude) and structure (local phase and local orien-tation). Different from Gabor features, which are extractedalong several orientations by using steerable filters, in mono-genic signal representation the local orientation is the mainorientation of the linear structure with large support, and thelocal phase describes the structure along the main orientation[12]. This strategy of extracting only the information along thedominant orientation is somewhat similar to the maximum-likepooling mechanism in the visual processing in cortex [50],and it leads to a compact but effective representation of theimage. Our experimental results validate that monogenic signalanalysis is very suitable for face image representation, probablybecause human facial local structures can be well characterizedalong one dominant orientation. In addition, the adaptation oflog-Gabor filters in multiscale monogenic signal representationmakes it efficient in capturing the multiscale information.The promising performance of MBC also comes from the

proposed binary coding scheme, which is well suited to describethe monogenic signal representation. In MBC, we not only en-code the local variation of monogenic amplitude and mono-genic phase, but also encode the monogenic local imagery in-tensity to provide complementary information to the local vari-ation. As shown in Table IV, this coding strategy could ex-ploit more useful information of monogenic signal represen-tation. In addition, the three parts of MBC: MBC-A, MBC-Pand MBC-O, encode the energetic, structural and geometric in-formation of a 2-D signal, respectively, which are complemen-tary to each other. The experimental results show that MBC onmonogenic amplitude, orientation and phase could individuallyachieve high recognition rate on different datasets. More impor-tantly, a strategy by using a weighted average to fuse these threesimilarity scores (e.g., MBC-A, MBC-P and MBC-O) couldfurther improve the recognition accuracy and achieve state-of-the-art FR performance.

VI. CONCLUSION

We proposed a novel face representation model, namelymonogenic binary coding (MBC), based on the monogenicsignal representation. One of the best merits of MBC is thatit has much less time and space complexity than the widelyused Gabor transformation based local feature extractionmethod. Through multiscale monogenic signal representation,three kinds of features (e.g., local amplitude, local orientationand local phase) can be generated, and then encoded by theproposed monogenic local variation coding and monogenicimagery intensity coding procedures. The produced MBCpattern maps are used to compute the statistical features (e.g.,histogram), which are then used to measure the similarity forface recognition. The extensive experiments on the benchmarkface databases, including FERET, FRGC 2.0, Multi-PIE andPolyU-NIR, clearly showed that the proposed MBC methodsnot only have significantly lower time and space complexitythan the state-of-the-art Gabor feature based face recognitionmethods, but also have very competitive or even better recog-nition rates.


REFERENCES

[1] W. Y. Zhao, R. Chellppa, P. J. Phillips, and A. Rosenfeld, “Face recog-nition: A literature survey,” ACM Comput. Survey, vol. 35, no. 4, pp.399–458, 2003.

[2] N. Poh, C. H. Chan, J. Kittler, S. Marcel, C. M. Cool, E. A. Rúa, J. L.A. Castro, M. Villegas, R. Paredes, V. Struc, N. Pavesić, A. A. Salah,H. Fang, and N. Costen, “An evaluation of video-to-video face verifi-cation,” IEEE Trans. Inf. Forensics Security, vol. 5, no. 4, pp. 781–801,Dec. 2010.

[3] Z. F. Li, U. Park, and A. K. Jain, “A discriminative model for ageinvariant face recognition,” IEEE Trans. Inf. Forensics Security, vol.6, no. 3, pp. 1028–1037, Sep. 2011.

[4] H. B. Ling, S. Soatto, N. Ramanathan, and D. W. Jacobs, “Face ver-ification across age progression using discriminative methods,” IEEETrans. Inf. Forensics Security, vol. 5, no. 1, pp. 82–91, Mar. 2010.

[5] M. Turk and A. Pentland, “Eigenfaces for recognition,” J. CognitiveNeurosci., vol. 13, no. 1, pp. 71–86, 1991.

[6] P. Belhumeur, J. Hespanha, and D. Kriegman, “Egienfaces vs. fisher-faces: Recognition using class specific linear projection,” IEEE Trans.Pattern Anal. Mach. Intell., vol. 19, no. 7, pp. 711–720, Jul. 1997.

[7] C. Liu and H. Wechsler, “Gabor feature based classification using theenhanced fisher linear discriminant model for face recognition,” IEEETrans. Image Process., vol. 11, no. 4, pp. 467–476, Apr. 2002.

[8] A. Timo, H. Abdenour, and P. Matti, “Face recognition with Localbinary patterns,” in Proc. Eur. Conf. Computer Vision, Prague, CzechRepublic, 2004.

[9] W. Zhang, S. Shan, W. Gao, X. Chen, and H. Zhang, “Local gaborbinary pattern histogram sequence (LGBPHS): A novel non-statisticalmodel for face representation and recognition,” in Proc. IEEE Int.Conf. Computer Vision, Beijing, China, 2005.

[10] W. Zhang, S. Shan, L. Qing, X. Chen, and W. Gao, “Are Gabor phasesreally useless for face recognition?,” Pattern Anal. Appl., vol. 12, no.3, pp. 301–307, 2009.

[11] B. Zhang, S. Shan, X. Chen, and W. Gao, “Histogram of gabor phasepatterns (HGPP): A novel object representation approach for facerecognition,” IEEE Trans. Image Process., vol. 16, no. 1, pp. 57–68,Jan. 2006.

[12] M. Felsberg and G. Sommer, “The monogenic signal,” IEEE Trans.Signal Process., vol. 49, no. 12, pp. 3136–3144, Dec. 2001.

[13] P. J. Phillips, P. J. Flynn, T. Scruggs, K. W. Bowyer, J. Chang, K.Hoffman, J. Marques, J. Min, and W. Worek, “Overview of the facerecogniton grand challenge,” in Proc. IEEE Int. Conf. Computer Visionand Pattern Recognition, San Diego, CA, 2005.

[14] P. J. Phillips, H. Moon, S. A. Rizvi, and P. J. Rauss, “TheFERET evaluation methodology for face recognition algorithms,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 22, no. 10, pp.1090–1104, Oct. 2000.

[15] Y. Su, S. Shan, X. Chen, andW. Gao, “Hierarchical ensemble of globaland local classifiers for face recognition,” in Proc. Int. Conf. ComputerVision, Rio de Janeiro, Brazil, 2007.

[16] T. Ojala, M. Pietikäinen, and D. Harwood, “A comparative study oftexture measures with classification based on feature distributions,”Pattern Recognit., vol. 29, no. 1, pp. 51–59, 1996.

[17] J. Yang and C. J. Liu, “Horizontal and vertical 2DPCA-based discrim-inant analysis for face verification on a large-scale database,” IEEETrans. Inf. Forensics Security, vol. 2, no. 4, pp. 781–792, Dec. 2007.

[18] S. Venkatesh and R. Owens, “On the classification of image features,”Pattern Recognit. Lett., vol. 11, pp. 339–349, 1990.

[19] E. Stein and G. Weiss, Introduction to Fourier Analysis on EuclideanSpaces. Princeton, NJ: Princeton Univ. Press, 1971.

[20] D. Fields, “Relations between the statistics of natural images and theresponse properties of cortical cells,” J. Opt. Soc. Amer., vol. 4, no. 12,pp. 2379–2394, 1987.

[21] G. H. Granlund and H. Knutsson, Signal Processing for Computer Vi-sion. Norwell, MA: Kluwer, 1995.

[22] S. F. Xie, S. G. Shan, X. L. Chen, and J. Chen, “Fusing local patterns ofgabor magnitude and phase for face recognition,” IEEE Trans. ImageProcess., vol. 19, no. 5, pp. 1349–1361, May 2010.

[23] M. Yang, L. Zhang, L. Zhang, and D. Zhang, “Monogenic binary pat-tern (MBP): A novel feature extraction and representation model forface recognition,” in Proc. Int. Conf. Pattern Recognition, Istanbul,Turkey, 2010.

[24] H. K. Ekenel, H. Gao, and R. Stiefelhagen, “3-D face recognition usinglocal appearance-based models,” IEEE Trans. Inf. Forensics Security,vol. 2, no. 3, pp. 630–636, Sep. 2007.

[25] Z. H. Guo, L. Zhang, and D. Zhang, “A completed modeling of localbinary pattern operator for texture classification,” IEEE Trans. ImageProcess., vol. 19, no. 6, pp. 1657–1663, Jun. 2010.

[26] X. Tan and B. Triggs, “Fusing gabor and LBP feature sets for kernel-based face recognition,” in Proc. IEEE Int. Workshop on Analysis andModeling of Faces and Gestures, Rio de Janeiro, Brazil, 2007.

[27] J. Zou, Q. Ji, and G. Nagy, “A comparative study of local matchingapproach for face recognition,” IEEE Trans. Image Process., vol. 16,no. 10, pp. 2617–2628, Oct. 2007.

[28] B. C. Zhang, L. Zhang, D. Zhang, and L. L. Shen, “Directional binarycode with application to PolyU near-infrared face database,” PatternRecognit. Lett., vol. 31, no. 14, pp. 2337–2344, 2010.

[29] J. Wright, A. Yang, A. Ganesh, S. Sastry, and Y. Ma, “Robust facerecognition via sparse representation,” IEEE Trans. Pattern Anal.Mach. Intell., vol. 31, no. 2, pp. 210–227, Feb. 2009.

[30] M. Yang and L. Zhang, “Gabor feature based sparse representation forface recognition with gabor occlusion dictionary,” in Proc. Eur. Conf.Computer Vision, Crete, Greece, 2010.

[31] M. Felsberg and G. Sommer, “The monogenic scale-space: A unifyingapproach to phase-based image processing in scale-space,” J. Math.Imag. Vis., vol. 21, no. 1, pp. 5–26, 2004.

[32] M. Unser, D. Sage, and D. V. D. Ville, “Multiresolution monogenicsignal analysis using the riesz-laplace wavelet transform,” IEEE Trans.Image Process., vol. 18, no. 11, pp. 2402–2418, Nov. 2009.

[33] L. Zhang, L. Zhang, Z. H. Guo, and D. Zhang, “Monogenic-LBP: Anew approach for rotation invariant texture classification,” in Proc. Int.Conf. Pattern Recognition, Istanbul, Turkey, 2010.

[34] S. V. Fischer, F. Sroubek, L. Perrinet, R. Redondo, and G. Cristobal,“Self-invertible 2D log-gabor wavelet,” Int. J. Comput. Vis., vol. 75,no. 2, pp. 231–246, 2007.

[35] M. Yang, L. Zhang, J. Yang, and D. Zhang, “Robust sparse codingfor face recognition,” in Proc. IEEE Int. Conf. Computer Vision andPattern Recognition, Colorado Springs, CO, 2011.

[36] X. Tan and B. Triggs, “Enhanced local texture feature sets for facerecognition under difficult lighting conditions,” IEEE Trans. ImageProcess., vol. 19, no. 6, pp. 1635–1650, Jun. 2010.

[37] X. He, S. Yan, Y. Hu, P. Niyogi, and H. J. Zhang, “Face recognitionusing laplacianfaces,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 27,no. 3, pp. 328–340, Mar. 2005.

[38] T. Inan and U. Halici, “3D face recognition with local shape descrip-tors,” IEEE Trans. Inf. Forensics Security, vol. 7, no. 2, pp. 577–587,Jun. 2012.

[39] Z. Zhou, A. Wagner, H. Mobahi, J. Wright, and Y. Ma, “Face recogni-tion with contiguous occlusion using markov random fields,” in Proc.IEEE Int. Conf. Computer Vision, Kyoto, Japan, 2009.

[40] E. Elhamifar and R. Vidal, “Robust classification suing structuredsparse representation,” in Proc. IEEE Int. Conf. Computer Vision andPattern Recognition, Colorado Springs, CO, 2011.

[41] P. J. Phillips, H. Wechsler, J. Huang, and P. Rauss, “The FERET data-base and evaluation procedure for face recognition algorithms,” ImageVis. Comput., vol. 16, no. 5, pp. 295–306, 1998.

[42] J. Cook, V. Chandran, S. Sridharan, and C. Fookes, “Gabor filter bankrepresentation for 3D face recognition,” in Proc. Digital Imaging Com-puting: Techniques and Applications, Cairns, Australia, 2005.

[43] P. Kovesi, “Invariant Measures of Image Features From Phase Infor-mation,” Ph.D. Thesis, Univ. of Western Australia, Perth, Australia,1996.

[44] W. Huang, G. Park, and J. Lee, “Multiple face model of hybrid fourierfeature for large face image set,” in Proc. IEEE Int. Conf. ComputerVision and Pattern Recognition, New York, 2006.

[45] C. Liu, “Capitalize on dimensionality increasing techniques forimproving face recognition grand challenge performance,” IEEETrans. Pattern Anal. Mach. Intell., vol. 28, no. 5, pp. 725–737,May 2006.

[46] Z. Liu and C. Liu, “Fusion of the complementary discrete cosine fea-tures in the YIQ color space for face recognition,” Comput. Vis. ImageUnderstand., vol. 111, no. 3, pp. 249–262, 2008.

[47] J. Cook, C.McCool, V. Chandran, and S. Sridharan, “Combined 2D/3Dface recognition using log-gabor templates,” in Proc. IEEE Int. Conf.Video and Signal Based Surveillance, Sydney, Australia, 2006.

[48] J. Cook, V. Chandran, and C. Fookes, “3D face recognition using log-gabor templates,” in Proc. British Machine Vision Conf., Edinburgh,Scotland, U.K., 2006.

[49] R. Gross, I. Matthews, J. Cohn, T. Kanade, and S. Baker, “Multi-PIE,”Image Vis. Comput., vol. 28, pp. 807–813, 2010.

[50] M. Riesenhuber and T. Poggio, “Hierarchical models of object recogni-tion in cortex,” Nature Neurosci., vol. 2, no. 11, pp. 1019–1025, 1999.


MengYang (S’11) received the B.Sc. degree in infor-mation engineering and the M.Sc. degree in controltheory and applications from the Northwestern Poly-technical University, Xi’an, China, in 2006 and 2009,respectively. He received the Ph.D. degree in com-puter science at The Hong Kong Polytechnic Univer-sity, Hong Kong, in 2012.His research interests include computer vision,

pattern recognition, sparse representation, and facerecognition.

Lei Zhang (M’04) received the B.S. degree in1995 from Shenyang Institute of Aeronautical En-gineering, Shenyang, China, and the M.S. and Ph.D.degrees in automatic control theory and engineeringfrom Northwestern Polytechnical University, Xi’an,China, in 1998 and 2001, respectively.From 2001 to 2002, he was a research associate

in the Department of Computing, The Hong KongPolytechnic University. From January 2003 toJanuary 2006, he worked as a Postdoctoral Fellowin the Department of Electrical and Computer

Engineering, McMaster University, Canada. In 2006, he joined the Depart-ment of Computing, The Hong Kong Polytechnic University, as an AssistantProfessor. Since September 2010, he has been an Associate Professor in thesame department. His research interests include image and video processing,biometrics, computer vision, pattern recognition, multisensor data fusion andoptimal estimation theory, etc.Dr. Zhang is an Associate Editor of IEEE TRANSACTIONS ON SYSTEMS,

MAN, AND CYBERNETICS—PART C: APPLICATIONS AND REVIEWS, IEEETRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, andImage and Vision Computing Journal. Dr. Zhang was awarded the FacultyMerit Award in Research and Scholarly Activities in 2010 and 2012, and theBest Paper Award of SPIE VCIP2010.

Simon Chi-Keung Shiu (M’90) received the M.Sc.degree in computing science from University ofNewcastle Upon Tyne, U.K., in 1985, the M.Sc.degree in business systems analysis and design fromCity University, London, in 1986, and the Ph.D.degree in computing in 1997 from The Hong KongPolytechnic University.He is an assistant professor in the Department

of Computing, Hong Kong Polytechnic University,Hong Kong. He worked as a system analyst andproject manager between 1985 and 1990 in several

business organizations in Hong Kong. His current research interests includecase-base reasoning, machine learning, and soft computing. He has coauthoredthe book Foundations of Soft Case-Based Reasoning, published by John Wileyand Sons, and co-guest edited a special issue on soft case based reasoning of theApplied Intelligence Journal. He is a member of the British Computer Society.

David Zhang (F’09) graduated in computer sciencefrom Peking University in 1974 and received theM.Sc. and Ph.D. degrees in computer science andengineering from the Harbin Institute of Technology(HIT), Harbin, China, in 1983 and 1985, respec-tively. He received the Ph.D. degree in electricaland computer engineering from the University ofWaterloo, Waterloo, Canada, in 1994.From 1986 to 1988, he was a Postdoctoral Fellow

at Tsinghua University, Beijing, China, and becamean Associate Professor at Academia Sinica, Beijing,

China. Currently, he is a Professor with the Hong Kong Polytechnic University,Hong Kong. He is Founder and Director of Biometrics Research Centerssupported by the Government of the Hong Kong SAR (UGC/CRC). He isalso Founder and Editor-in-Chief of the International Journal of Image andGraphics (IJIG), Book Editor, The Kluwer International Series on Biometrics,and an Associate Editor of several international journals. His research inter-ests include automated biometrics-based authentication, pattern recognition,biometric technology and systems. As a principal investigator, he has finishedmany biometrics projects since 1980. So far, he has published over 200 papersand 10 books.

Documents

1738 IEEE TRANSACTIONS ON INFORMATION … · YANG et al.: MONOGENIC BINARY CODING: EFFICIENT LOCAL FEATURE EXTRACTION APPROACH TO FACE RECOGNITION 1739 LBP to the Gabor magnitude