2003 - A Novel Technique for Face Recognition Using Range Imaging

8/13/2019 2003 - A Novel Technique for Face Recognition Using Range Imaging

1/4

A NO VEL TECHNIQUE FOR FACE RECOGNITION USING RANGE IMAGINGCurt Hesher; Anuj Srivustuvu Gordon E rlebucherFlorida State University,Tallahassee,FL, 32306

ABSTRACTWe consider a novel technique for recognizing people fromrange images RI) oftheir faces. Range images have the ad-vantage of capturing the shape variation irrespective of illu-mination variabilities. We describe a procedure for generat-ing RI of faces using data from a 3D scanner, and registeringthem in the image plane by aligning salient facial features.For statistical analysis of RI, we use standard projectionssuch as PCA and ICA, and then impose probability modelson the coefficients. An experiment describing recognitionof faces using the FSU 3D face database is presented.

1. INTRODUCTIONThe problem of recognizing people from their facial imageshas gained wide attention in recent times. This problem hasbeen studied using several sensing modalities, such as visi-ble spectrum and infrared, and using several pattern recog-nition techniques [2, 51. Methods based on visible spec-trum images have been limited in their applications for sev-eral reasons. The main difficulty comes from the variabilityin images due to variations in illumination. Research hasshown that the variation in facial images due to illuminationcan be greater than the variation due to identity. A possiblesolution is to involve sensors that are relatively insensitiveto the visible light. In this paper, we focus on range imag-ing that captures the shape variation while being robust toillumination and texture variations. In range imaging, thepixel values denote the distance (range) of the nearest partof the object along a direction perpendicular to the image.In general there is a great interest in the development ofmathematical models that capture the variability among faceimages [4]. A long-term strategy is to model the physicalfactors that lead to differences among face images. Thesefactors are: (i) shapes of facial surfaces, (ii) face textures,(iii) illumination models, and (iv) modeling of pose rela-tive to the camera. In this paper we restrict ourselves to thedevelopment of mathematical models for the variations offacial surfaces in R 3 using facial meshes. A mesh is a dis-crete approximation of a 2D surface in P3and is obtainedby sampling points on a surface and connecting them viapolygons. A typical facial mesh containing 10 000 pointsis an element of 30K dimensional space. Since it is diffi-

0-7803-7946-2/03/ 17.00 02003 IEEE 201

Figure : Image (a) is an example data capture session.Image (h) shows facial meshes captured using the MinoltaVivid 700 3D camera.

cult to analyze variability or impose probability models onthis space, we use range imaging to form images from themeshes. Then, we use standard projections to reduce theobservations to a low-dimensional Euclidean space. Havingobtained low-dimensional representations of the observedsurfaces, the next task will be to develop classification algo-rithms under a certain choice of metrics.In section 2 we describe data collection, generation of

RI, and image registration using feature points. In section 3we use principal component analysis (PCA) and indepen-dent component analysis (ICA) to reduce image dimension-ality. Section 4 explains the identification process and sec-tion 5 displays some experimental results. Conclusions andfuture work can be found in section 6.

2 REPRESENTATION OF FACIAL SURFACESWe are interested in mathematically representing and ana-lyzing shapes of facial surfaces. However, analysis of 3Dshapes is difficult relative to analysis of 2D images, and wewill use the latter in our approach. Since the 3D scannerused in collecting facial surfaces provides data in the formof meshes, one needs to pre-process this data into RI be-fore image analysis techniques can he applied. We start bydescribing the data acquisition process.2.1. Data AcquisitionThe meshes displayed in fig. I(b) represent typical facial


2/4

Image Plane

Figure 2: An illustration of the Line Crossing Algorithm[I] The pixel in question, P, is exterior.

meshes acquired using a Minolta Vivid 700 camera. Thenormal scan resolution for the Vivid 700 is around 10,000points 15,000 triangles); for illustration these have beendecimated to 1,000 triangles each. The exact number oftri -angles captured for a given face depends on the size, shape,and relative position of the subjects face with respect to thecamera, hut not lighting; the data capture mechanism usedby the Vivid 700 does not depend on the illumination ortexture in the imaged scene. To acquire each mesh, subjectsare asked to stay in a predetermined position and orienta-tion with respect to the camera (fig. I(a)) resulting in roughglobal registration of meshes. Subjects are then imaged un-der six different facial expressions: neutral, smile, frown,angry, squint, and scared. Meshes are stored in Alias Wave-fronts OBJ file format for use in RI generation.

2 2 Generation o f Range ImagesA RI (of a mesh) is an array of depth values from each tri-angle of the mesh onto a 2D image plane with the imageplane being perpendicular to the camera view. To deter-mine which pixels in the image plane lie inside the projec-tion of a given mangle we follow the Line Crossing Al-gorithm [ ] illustrated in fig. 2. In this example, the pixelP is exterior (outside the triangle). As each interior pixelon the image plane is found we compute the height (the2 value) that pixel would have on the mesh. Tn do this,the vertices of the triangle PI,P2, nd P3) are used tocompute the equation of a plane. Organize PI, P . , andcheck for collinearity using the determinant. For a non-zero determinant we solve for the equation of the plane,(c = zA- ) . (If the determinant of A is zero, the triangleis skipped.) The height value o f.the pixel P is then foundusing, z = GIP, CZP, q n this way, each triangle inthe mesh is traversed and provides pixel values on the im-age plane. In case two riangles project to the same pixellocation, the one closest to the camera is selected. Notethat differently scaled meshes lead to different RI. Figure 3presents six RI generated using this technique.

P3 as a matrix, A = [zl r2 r3;YlrY2 Y3;21 z2 z3] nd

Figure 3: Facial RI. The top row of images displays threesubjects with the same expression. The bottom row of im-ages displays one subject under three different expressions.

Figure 4: The line extending from the tip-pixel towardthe forehead indicates the bridge of the nose. The imagepairs illustrate RI before rotational correction (left) and af-ter (right).

2.3. Registration o f Range ImagesThere are two sources of undesired variability in RI: po-sition and orientation of the subjects faces relative to thecamera during data capture. This is first dealt with by con-trolling the data capture environment as described in sec-tion 2. I. To correct the remainingvariability, rotation, translation, and depth adjustments are made to the RI; ach stepimproves the registration between RI through feature align-ments. All registrations are performed on the 2D RI, avoid-ing the computational complexity associated with 3D reg-istration. Here, we have chosen to align images using twofeatures: the nose tip and the bridge of the nose.

Rotational registration reduces the error induced bysmall rotations of the subjects face in the image plane.We assume that the tip of the nose is closest to the cam-era. Therefore, the pixel corresponding to the tip of thenose (the tip-pixel) in the RI will have the smallest inten-sity. After finding the tip-pixel we inspect several succes-sive rows above the row containing the tip-pixel. Alongeach row we choose the pixel with the smallest intensityrelative to that row. This procedure leads to a number of 2Dpoints appearing as a line along the bridge of the nose (left

202


3/4

image in fig. 4 . These points are fed to a line-fitting al-gorithm that returns the rotation necessaly to make the linevertical. Figure 4 showstwo mage pairs which illustrate RIbefore (left) and after (right) rotational correction.Next, each RI is translated in the image plane so that

the tip-pixel corresponds to the center point location. Thisenhances the translational alignment of the subjects face inthe image plane. Finally, depth adjustment corrects inaccu-racies in the subjects distance from the camera during datacollection. A constant is added to all pixels so that the tip-pixel has the same value for all RI.

3. DIMENSION REDUCTION: COMPONENTANALYSIS

Now that the shape information is captured in the form ofan image, the remaining task is to utilize techniques fromimage analysis for face recognition. In this paper we havechosen to focus on PCA ICA as they have been found toperform well in previous face recognition situations when-ever faces are well registered. Two preprocessing steps areimplemented: masking and patching of images. Firstly, anelliptical mask is used to crop each RI identically to focuson the central features and to avoid peripheral noise. Sec-ondly, holes in each RI are patched. A hole is any pixel inthe RI hat lies within the boundary ofthe mask but does nothave an intensity value (its value is zero which represents nodata.) Holes are patched by linearly interpolating adjacentpixel values.

To perform PCA the data is first reorganized. We re-shape each of the k RI Ro,R I , . R k into vectors. Letr be the vector obtained by reshaping the image Ri. Thenplace image vectors vertically into a matrix, B, where eachcolumn in B is one RI: B = [VI,V2 , . ,Vk] E Rmnxk.Using singular value decomposition of B one obtains thePCAbasisX whereX E RmnxdandX'X = I d , forsomed < k To project a RI R, we reshape it into a columnvectorV and define =< V T ,X >E T I d In this experiment, weuse the eigenvectors EV) associated with a number of thelargest eigenvalues (EVA) d = 10or d = 30 , so that thereduced representation of a range image is d dimensional.

Figure is a plot of the SO largest EVA. These valueswere generated from 222 facial RI (37 subjects under 6 fa-cial expressions) of size 41 x 57. In fig. 6 we see the threeEV associated with the three largest EVA in descending or-der from left to right as images.

To compute ICA we utilized the FastlCA algorithm 161.The resulting basis X)s used for projection identically toPCA.

Figure 5: The first 5 EVA plotted along the horizontal axisin order of decreasing value.

Figure 6: The EV associated with the three largest EVA indescending order from left to right.

4. NEAREST NEIGHBOR RULE FORIDENTIFICATIONIdentification begins by acquiring a test mesh The new meshmust be captured in the same position and orientation as thedata used to generate the linear basis. Once a single facemesh is captured, it is projected into a RI, R, preprocessedapd aligned, and projected to a d dimensional vector using=< V T , X >. P is then compared to the classifiedtraining images 5 , z, ,P, using the nearest neigh-bor criterion on the Euclidean metric. Di = ll i P;II* de-scribes the distance of the target image from a training im-age p;.The identity of the nearest training image s) is thenassigned to the test image as, = argmini~[ l ,z , . . . c l Di ) .

5. EXPERIMENTAL SETUP AND RESULTSTo perform identification experiments we first create twosets of images: training and testing. Meshes used in thegeneration of these sets are taken from the FSU 3D facedatabase containing 222 scans of 37 unique subjects. Train-ing images are used to generate an orthogonal basis as de-scribed in section 3, into which each RI in the training dataset is projected. This results in a d dimensional represen-tation of each RI in the training data set. The orthogonalbasis and d dimensional representations for each RI in thetraining data set are then stored for comparison later. Testimages are a set of RI of faces we wish to identify. Any sub-

203


4/4

I Train Test ID IOEV ID 30EV ICA ID OSet I Set 57 x 4 I 242 x 347 I 57 x 41185 I 37 I 90 I 94 97

I48

Table : Each row in this table indicates the results of anexperiment in identification using RI. The columns indicate(from'lefi to right): the number oftraining images; the num-ber of test images; the percentage of correct identificationsusing O eigenvectors with image size 57 x 41; the percent-age of correct identifications using 30 EV with images size242 x 347; the percentage of correct identifications using Oindependent components on images of size 41 x 57. Thereis no intersection between the training and test data sets.

ject we wish to identify must have at least one facial RI inthe training data set. The test images need not have the samefacial expressions as those in the training data set. Each testimage is then reshaped as a column vector and projectedinto the orthogonal basis. Using the nearest neighbor algo-rithm from section 4, compare the test images d dimensionalrepresentation to all other training image ten dimensionalrepresentations to find the identity of the subject in the testimage.To demonstrate this metric for comparing shapes, wepresent the following results. The first column of table 1

shows the number of training images used to find the or-thogonal basis; the second column indicates how manyfaces were used in the matching test; the third column givesthe percentage of correctly identified persons when usingPCA and O EV for projection on RI of size 57 x 41; thefourth column shows results for PCA with 30 EV and RI ofsize 242 x 347; the fifth column provides results using ICAand 10 EV with RI of size 57 x 41. If a test used 185 train-ing images and 37 test images, then five facial expressionsfrom each person were used in the training data set, and onefacial expression from each person was used in the test dataset.The clustering performance of this metric is also demon-

strated in fig. 7. This dendrogram illustrates the clusteringbetween RI of twelve faces from t w o different persons af-ter projection into the PCA basis. Faces 1-6 are from oneperson and faces 7-12 are from a second person. The sepa-ration of these faces into two clusters shows the discrimina-tion ability of range images.

Figure 7: This dendrogram illustrates the Euclidean dis-tance between the PCA projections of the first twelve facesin the data set.

6. CONCLUSIONS AND FUTURE WORKThis paper demonstrates that RI can be an effective way toidentify persons. However, a number of issues remain tobe investigated. The amount of facial deformation capturedby these meshes isunknown.Computational resources alsolimit the size of RI that can be effectively used thereforereducing the accuracy of RI. Also, our current use of non-robust PCA and ICA does not work well with noise in thedata, possibly induced by error in the mesh capture, reduc-tion techniques or background clutter. This work was sup-posed byNFSO101429.

7. REFERENCES[11 Arvo J. Graphics Gems II Academic Press, Inc., 1991.[?] Bledsoe, W. The Model Method in Facial Recognition.Panoramic Research Inc.Tech. Rep. PRI:15, Palo AltoCA, 1964.[3] Chellappa, R., C. Wilson, S.Sirohey. Human and Ma-chine Recognition ofFace si A Sur v q . Proceedings ofthe IEEE, 83(5), May 1995.[4] Hallinan, P., G. Gordon, A. Yuille, P.Giblin, D. Mum-ford. Two and Three Dimensional Patterns of the Face.A. K. Peters, 1999.[5 ] Kaufman Jr., G., K. Breeding. The Automatic Recog-

nition of Human Facesfrom Profile Silhouettes. IEEETransactions on System, Man, and Cybernetics, SMC-6,pp. 113-121, 1976.

[ ] Helsinki University of Technology, Labora-tory of Computer and Information Science.http://www.cis.but.fdprojects/icdfasticdindex.shtmlApril 24,2003.

204
http://www.cis/http://www.cis/

Documents

2003 - A Novel Technique for Face Recognition Using Range Imaging