Lecture Notes in Computer Science 2688 Edited by G. Goos, J.
Hartmanis, and J. van Leeuwen
3 Berlin Heidelberg New York Barcelona Hong Kong London Milan Paris
Tokyo
Josef Kittler Mark S. Nixon (Eds.)
Audio- andVideo-Based Biometric Person Authentication
4th International Conference, AVBPA 2003 Guildford, UK, June 9-11,
2003 Proceedings
1 3
Series Editors
Volume Editors
Josef Kittler University of Surrey Center for Vision, Speech and
Signal Proc. Guildford, Surrey GU2 7XH, UK E-mail:
[email protected]
Mark S. Nixon University of Southampton Department of Electronics
and Computer Science Southampton, SO17 1BJ, UK E-mail:
[email protected]
Cataloging-in-Publication Data applied for
A catalog record for this book is available from the Library of
Congress
Bibliographic information published by Die Deutsche Bibliothek Die
Deutsche Bibliothek lists this publication in the Deutsche
Nationalbibliografie; detailed bibliographic data is available in
the Internet at <http://dnb.ddb.de>.
CR Subject Classification (1998): I.5, I.4, I.3, K.6.5, K.4.4,
C.2.0
ISSN 0302-9743 ISBN 3-540-40302-7 Springer-Verlag Berlin Heidelberg
New York
This work is subject to copyright. All rights are reserved, whether
the whole or part of the material is concerned, specifically the
rights of translation, reprinting, re-use of illustrations,
recitation, broadcasting, reproduction on microfilms or in any
other way, and storage in data banks. Duplication of this
publication or parts thereof is permitted only under the provisions
of the German Copyright Law of September 9, 1965, in its current
version, and permission for use must always be obtained from
Springer-Verlag. Violations are liable for prosecution under the
German Copyright Law.
Springer-Verlag Berlin Heidelberg New York a member of
BertelsmannSpringer Science+Business Media GmbH
http://www.springer.de
Typesetting: Camera-ready by author, data conversion by DA-TeX Gerd
Blumenstein Printed on acid-free paper SPIN 10927847 06/3142 5 4 3
2 1 0
Preface
This book collects the research work presented at the 4th
International Con- ference on Audio- and Video-Based Biometric
Person Authentication that took place at the University of Surrey,
Guildford, UK, in June 2003. We were pleased to see a surge of
interest in AVBPA. We received many more submissions than before
and this reflects not just the good work put in by previous
organizers and participants, but also the increasing world-wide
interest in biometrics. With grateful thanks to our program
committee, we had a fine program indeed.
The papers concentrate on major established biometrics such as face
and speech, and we continue to see the emergence of gait as a new
research focus, together with other innovative approaches including
writer and palmprint iden- tification. The face-recognition papers
show advances not only in recognition techniques, but also in
application capabilites and covariate analysis (now with the
inclusion of time as a recognition factor), and even in synthesis
to evaluate wider recognition capability. Fingerprint analysis now
includes study of the ef- fects of compression, and new ways for
compression, together with refined study of holistic vs. minutiae
and feature set selection, areas of interest to the biomet- rics
community as a whole. The gait presentations focus on new
approaches for temporal recognition together with analysis of
performance capability and new approaches to improve generalization
in performance.
The speech papers reflect the wide range of possible applications
together with new uses of visual information. Interest in data
fusion continues to increase. But it is not just the more
established areas that were of interest at AVBPA 2003. As ever in
this innovative technology, there are always new ways to recognize
people, as reflected in papers on on-line writer identification and
palm print analysis. Iris recognition is also represented, as are
face and person extraction in video.
The growing industry in biometrics was reflected in presentations
with a specific commercial interest: there are papers on smart
cards, wireless devices, architectures, and implementation factors,
all of considerable consequence in the deployment of biometric
systems. A competition for the best face-authentication
(verification) algorithms took place in conjunction with the
conference, and the results are reported here.
The papers are complemented by invited presentations by Takeo
Kanade (Carnegie Mellon University), Jerry Friedman (Stanford
University), and Fred- eric Bimbot (INRIA). All in all, AVBPA
continues to offer a snapshot of research in this area from leading
institutions around the world. If these papers and this conference
inspire new research in this fascinating area, then this conference
can be deemed to be truly a success.
April 2003 Josef Kittler and Mark S. Nixon
Organization
AVBPA 2003 was organized by
– the Centre for Vision, Speech and Signal Processing, University
of Surrey, UK, and
– TC-14 of IAPR (International Association for Pattern
Recognition).
Executive Committee
Conference Co-chairs Josef Kittler and Mark S. Nixon University of
Surrey and University of Southampton, UK
Local Organization Rachel Gartshore, University of Surrey
Program Committee
Samy Bengio (Switzerland) Josef Bigun (Sweden) Frederic Bimbot
(France) Mats Blomberg (Sweden) Horst Bunke (Switzerland) Hyeran
Byun (South Korea) Rama Chellappa (USA) Gerard Chollet (France)
Timothy Cootes (UK) Larry Davis (USA) Farzin Deravi (UK) Sadaoki
Furui (Japan) M. Dolores Garcia-Plaza (Spain) Dominique Genoud
(Switzerland) Shaogang Gong (UK) Steve Gunn (UK) Bernd Heisele
(USA) Anil Jain (USA) Kenneth Jonsson (Sweden) Seong-Whan Lee
(South Korea) Stan Li (China) John Mason (UK) Jiri Matas (Czech
Republic) Bruce Millar (Australia) Larry O’Gorman (USA) Sharath
Pankanti (USA)
Organization VII
P. Jonathon Phillips (USA) Salil Prabhakar (USA) Nalini Ratha (USA)
Marek Rejman-Greene (UK) Gael Richard (France) Massimo Tistarelli
(Italy) Patrick Verlinde (Belgium) Juan Villanueva (Spain) Harry
Wechsher (USA) Pong Yuen (Hong Kong)
Sponsoring Organizations
Table of Contents
Face I
Robust Face Recognition in the Presence of Clutter . . . . . . . .
. . . . . . . . . . . . . . . . . 1 A.N. Rajagopalan, Rama
Chellappa, and Nathan Koterba
An Image Preprocessing Algorithm for Illumination Invariant Face
Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . 10 Ralph Gross and Vladimir Brajovic
Quad Phase Minimum Average Correlation Energy Filters for Reduced
Memory Illumination Tolerant Face Authentication . . . . . . . . .
. . .19 Marios Savvides and B.V.K. Vijaya Kumar
Component-Based Face Recognition with 3D Morphable Models . . . . .
. . . . . . 27 Jennifer Huang, Bernd Heisele, and Volker
Blanz
Face II
A Comparative Study of Automatic Face Verification Algorithms on
the BANCA Database . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . 35 M. Sadeghi,
J. Kittler, A. Kostin, and K. Messer
Assessment of Time Dependency in Face Recognition: An Initial Study
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . 44 Patrick J.
Flynn, Kevin W. Bowyer, and P. Jonathon Phillips
Constraint Shape Model Using Edge Constraint and Gabor Wavelet
Based Search . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . 52 Baochang Zhang, Wen Gao, Shiguang
Shan, and Wei Wang
Expression-Invariant 3D Face Recognition . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . .62 Alexander M. Bronstein,
Michael M. Bronstein, and Ron Kimmel
Speech
Automatic Estimation of a Priori Speaker Dependent Thresholds in
Speaker Verification . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
Javier R. Saeta and Javier Hernando
A Bayesian Network Approach for Combining Pitch and Reliable
Spectral Envelope Features for Robust Speaker Verification . . . 78
Mijail Arcienega and Andrzej Drygajlo
X Table of Contents
Searching through a Speech Memory for Text-Independent Speaker
Verification . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . .95 Dijana Petrovska-Delacretaz, Asmaa El Hannani, and
Gerard Chollet
Poster Session I
LUT-Based Adaboost for Gender Classification . . . . . . . . . . .
. . . . . . . . . . . . . . . . 104 Bo Wu, Haizhou Ai, and Chang
Huang
Independent Component Analysis and Support Vector Machine for Face
Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . 111 Gianluca
Antonini, Vlad Popovici, and Jean-Philippe Thiran
Real-Time Emotion Recognition Using Biologically Inspired Models .
. . . . . . 119 Keith Anderson and Peter W. McOwan
A Dual-Factor Authentication System Featuring Speaker Verification
and Token Technology . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 Purdy
Ho and John Armington
Wavelet-Based 2-Parameter Regularized Discriminant Analysis for
Face Recognition . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
Dao-Qing Dai and P.C. Yuen
Face Tracking and Recognition from Stereo Sequence . . . . . . . .
. . . . . . . . . . . . . 145 Jian-Gang Wang, Ronda Venkateswarlu,
and Eng Thiam Lim
Face Recognition System Using Accurate and Rapid Estimation of
Facial Position and Scale . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . 154 Takatsugu
Hirayama, Yoshio Iwai, and Masahiko Yachida
Fingerprint Enhancement Using Oriented Diffusion Filter . . . . . .
. . . . . . . . . . . 164 Jiangang Cheng, Jie Tian, Hong Chen, Qun
Ren, and Xin Yang
Visual Analysis of the Use of Mixture Covariance Matrices in Face
Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . .172 Carlos E.
Thomaz and Duncan F. Gillies
A Face Recognition System Based on Local Feature Analysis . . . . .
. . . . . . . . .182 Stefano Arca, Paola Campadelli, and Raffaella
Lanzarotti
Face Detection Using an SVM Trained in Eigenfaces Space . . . . . .
. . . . . . . . . .190 Vlad Popovici and Jean-Philippe Thiran
Face Detection and Facial Component Extraction by Wavelet
Decomposition and Support Vector Machines . . . . . . . . . . . . .
. . . . 199 Dihua Xi and Seong-Whan Lee
Table of Contents XI
U-NORM Likelihood Normalization in PIN-Based Speaker Verification
Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. 208 D. Garcia-Romero, J. Gonzalez-Rodriguez, J. Fierrez-Aguilar,
and J. Ortega-Garcia
Facing Position Variability in Minutiae-Based Fingerprint
Verification through Multiple References and Score Normalization
Techniques . . . . . . . . . 214 D. Simon-Zorita, J. Ortega-Garcia,
M. Sanchez-Asenjo, and J. Gonzalez-Rodriguez
Iris-Based Personal Authentication Using a Normalized Directional
Energy Feature . . . . . . . . . . . . . . . . . . . . . . . . . .
. 224 Chul-Hyun Park, Joon-Jae Lee, Mark J.T. Smith, and Kil-Houm
Park
An HMM On-line Signature Verification Algorithm . . . . . . . . . .
. . . . . . . . . . . . . 233 Daigo Muramatsu and Takashi
Matsumoto
Automatic Pedestrian Detection and Tracking for Real-Time Video
Surveillance . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . 242 Hee-Deok Yang, Bong-Kee Sin, and
Seong-Whan Lee
Visual Features Extracting & Selecting for Lipreading . . . . .
. . . . . . . . . . . . . . . 251 Hong-xun Yao, Wen Gao, Wei Shan,
and Ming-hui Xu
An Evaluation of Visual Speech Features for the Tasks of Speech and
Speaker Recognition . . . . . . . . . . . . . . . . . . . . . . . .
. .260 Simon Lucey
Feature Extraction Using a Chaincoded Contour Representation of
Fingerprint Images . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 268 Venu
Govindaraju, Zhixin Shi, and John Schneider
Hypotheses-Driven Affine Invariant Localization of Faces in
Verification Systems . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 276 M. Hamouz, J. Kittler,
J.K. Kamarainen, and H. Kalviainen
Shape Based People Detection for Visual Surveillance Systems . . .
. . . . . . . . . 285 M. Leo, P. Spagnolo, G. Attolico, and A.
Distante
Real-Time Implementation of Face Recognition Algorithms on DSP Chip
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . 294
Seong-Whan Lee, Sang-Woong Lee, and Ho-Choul Jung
Robust Face-Tracking Using Skin Color and Facial Shape . . . . . .
. . . . . . . . . . . 302 Hyung-Soo Lee, Daijin Kim, and Sang-Youn
Lee
Fingerprint
Fusion of Statistical and Structural Fingerprint Classifiers . . .
. . . . . . . . . . . . . 310 Gian Luca Marcialis, Fabio Roli, and
Alessandra Serrau
XII Table of Contents
Learning Features for Fingerprint Classification . . . . . . . . .
. . . . . . . . . . . . . . . . . . 318 Xuejun Tan, Bir Bhanu, and
Yingqiang Lin
Fingerprint Matching with Registration Pattern Inspection . . . . .
. . . . . . . . . . 327 Hong Chen, Jie Tian, and Xin Yang
Biometric Template Selection: A Case Study in Fingerprints . . . .
. . . . . . . . . . 335 Anil Jain, Umut Uludag, and Arun Ross
Orientation Scanning to Improve Lossless Compression of Fingerprint
Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . 343 Johan Tharna,
Kenneth Nilsson, and Josef Bigun
Image, Video Processing, Tracking
A Nonparametric Approach to Face Detection Using Ranklets . . . . .
. . . . . . . 351 Fabrizio Smeraldi
Refining Face Tracking with Integral Projections . . . . . . . . .
. . . . . . . . . . . . . . . . . 360 Gines Garca Mateos
Glasses Removal from Facial Image Using Recursive PCA
Reconstruction . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . 369 Jeong-Seon Park, You Hwa Oh, Sang Chul
Ahn, and Seong-Whan Lee
Synthesis of High-Resolution Facial Image Based on Top-Down
Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . 377 Bon-Woo Hwang,
Jeong-Seon Park, and Seong-Whan Lee
A Comparative Performance Analysis of JPEG 2000 vs. WSQ for
Fingerprint Image Compression . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 385 Miguel A.
Figueroa-Villanueva, Nalini K. Ratha, and Ruud M. Bolle
General
New Shielding Functions to Enhance Privacy and Prevent Misuse of
Biometric Templates . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . 393 Jean-Paul
Linnartz and Pim Tuyls
The NIST HumanID Evaluation Framework . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . 403 Ross J. Micheals, Patrick
Grother, and P. Jonathon Phillips
Synthetic Eyes . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.412 Behrooz Kamgar-Parsi, Behzad Kamgar-Parsi, and Anil K.
Jain
Table of Contents XIII
Dental Biometrics: Human Identification Using Dental Radiographs .
. . . . . . 429 Anil K. Jain, Hong Chen, and Silviu Minut
Effect of Window Size and Shift Period in Mel-Warped Cepstral
Feature Extraction on GMM-Based Speaker Verification . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . 438 C.C. Leung and Y.S.
Moon
Discriminative Face Recognition . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . 446 Florent
Perronnin and Jean-Luc Dugelay
Cross-Channel Histogram Equalisation for Colour Face Recognition .
. . . . . . 454 Stephen King, Gui Yun Tian, David Taylor, and Steve
Ward
Open World Face Recognition with Credibility and Confidence
Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 462 Fayin Li and Harry
Wechsler
Enhanced VQ-Based Algorithms for Speech Independent Speaker
Identification . . . . . . . . . . . . . . . . . . . . . . . . . .
. . 470 Ningping Fan and Justinian Rosca
Fingerprint Fusion Based on Minutiae and Ridge for Enrollment . . .
. . . . . . . 478 Dongjae Lee, Kyoungtaek Choi, Sanghoon Lee, and
Jaihie Kim
Face Hallucination and Recognition . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . 486 Xiaogang Wang and
Xiaoou Tang
Robust Features for Frontal Face Authentication in Difficult Image
Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . 495 Conrad Sanderson and Samy
Bengio
Facial Recognition in Video . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . 505 Dmitry O.
Gorodnichy
Face Authentication Based on Multiple Profiles Extracted from Range
Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . 515 Yijun Wu,
Gang Pan, and Zhaohui Wu
Eliminating Variation of Face Images Using Face Symmetry . . . . .
. . . . . . . . . .523 Yan Zhang and Jufu Feng
Combining SVM Classifiers for Multiclass Problem: Its Application
to Face Recognition . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . 531 Jaepil Ko and Hyeran Byun
A Bayesian MCMC On-line Signature Verification . . . . . . . . . .
. . . . . . . . . . . . . . 540 Mitsuru Kondo, Daigo Muramatsu,
Masahiro Sasaki, and Takashi Matsumoto
XIV Table of Contents
Illumination Normalization Using Logarithm Transforms for Face
Authentication . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .549 Marios
Savvides and B.V.K. Vijaya Kumar
Performance Evaluation of Face Recognition Algorithms on the Asian
Face Database, KFDB . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . 557 Bon-Woo Hwang, Hyeran Byun,
Myoung-Cheol Roh, and Seong-Whan Lee
Automatic Gait Recognition via Fourier Descriptors of Deformable
Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . 566 Stuart D. Mowbray
and Mark S. Nixon
A Study on Performance Evaluation of Fingerprint Sensors . . . . .
. . . . . . . . . . 574 Hyosup Kang, Bongku Lee, Hakil Kim,
Daecheol Shin, and Jaesung Kim
An Improved Fingerprint Indexing Algorithm Based on the Triplet
Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . 584 Kyoungtaek Choi, Dongjae Lee,
Sanghoon Lee, and Jaihie Kim
A Supervised Approach in Background Modelling for Visual
Surveillance . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . 592 P. Spagnolo,
M. Leo, G. Attolico, and A. Distante
Human Recognition on Combining Kinematic and Stationary Features .
. . . 600 Bir Bhanu and Ju Han
Architecture for Synchronous Multiparty Authentication Using
Biometrics . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 609 Sunil
J. Noronha, Chitra Dorai, Nalini K. Ratha, and Ruud M. Bolle
Boosting a Haar-Like Feature Set for Face Verification . . . . . .
. . . . . . . . . . . . . . 617 Bernhard Froba, Sandra Stecher, and
Christian Kublbeck
The BANCA Database and Evaluation Protocol . . . . . . . . . . . .
. . . . . . . . . . . . . . 625 Enrique Bailly-Bailliere, Samy
Bengio, Frederic Bimbot, Miroslav Hamouz, Josef Kittler, Johnny
Mariethoz, Jiri Matas, Kieron Messer, Vlad Popovici, Fabienne
Poree, Belen Ruiz, and Jean-Philippe Thiran
A Speaker Pruning Algorithm for Real-Time Speaker Identification .
. . . . . . 639 Tomi Kinnunen, Evgeny Karpov, and Pasi Franti
“Poor Man” Vote with M -ary Classifiers. Application to Iris
Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . 647 V. Vigneron, H. Maaref, and S.
Lelandais
Handwriting, Signature, Palm
Table of Contents XV
Personal Verification Using Palmprint and Hand Geometry Biometric .
. . . . 668 Ajay Kumar, David C.M. Wong, Helen C. Shen, and Anil K.
Jain
A Set of Novel Features for Writer Identification . . . . . . . . .
. . . . . . . . . . . . . . . . . 679 Caroline Hertel and Horst
Bunke
Combining Fingerprint and Hand-Geometry Verification Decisions . .
. . . . . . 688 Kar-Ann Toh, Wei Xiong, Wei-Yun Yau, and Xudong
Jiang
Iris Verification Using Correlation Filters . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . 697 B.V.K. Vijaya Kumar,
Chunyan Xie, and Jason Thornton
Gait
Gait Analysis for Human Identification . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . 706 A. Kale, N. Cuntoor,
B. Yegnanarayana, A.N. Rajagopalan, and R. Chellappa
Performance Analysis of Time-Distance Gait Parameters under
Different Speeds . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . .715 Rawesak
Tanawongsuwan and Aaron Bobick
Novel Temporal Views of Moving Objects for Gait Biometrics . . . .
. . . . . . . . .725 Stuart P. Prismall, Mark S. Nixon, and John N.
Carter
Gait Shape Estimation for Identification . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . 734 David Tolliver and
Robert T. Collins
Fusion
Audio-Visual Speaker Identification Based on the Use of Dynamic
Audio and Visual Features . . . . . . . . . . . . . . . . . 743
Niall Fox and Richard B. Reilly
Scalability Analysis of Audio-Visual Person Identity Verification .
. . . . . . . . . 752 Jacek Czyz, Samy Bengio, Christine Marcel,
and Luc Vandendorpe
A Bayesian Approach to Audio-Visual Speaker Identification . . . .
. . . . . . . . . .761 Ara V. Nefian, Lu Hong Liang, Tieyan Fu, and
Xiao Xing Liu
Multimodal Authentication Using Asynchronous HMMs . . . . . . . . .
. . . . . . . . . 770 Samy Bengio
Theoretic Evidence k-Nearest Neighbourhood Classifiers in a Bimodal
Biometric Verification System . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . 778 Andrew Teoh Beng Jin, Salina Abdul
Samad, and Aini Hussain
XVI Table of Contents
Poster Session III
Combined Face Detection/Recognition System for Smart Rooms . . . .
. . . . . . 787 Jia Kui and Liyanage C. De Silva
Capabilities of Biometrics for Authentication in Wireless Devices .
. . . . . . . . 796 Pauli Tikkanen, Seppo Puolitaival, and Ilkka
Kansala
Combining Face and Iris Biometrics for Identity Verification . . .
. . . . . . . . . . . 805 Yunhong Wang, Tieniu Tan, and Anil K.
Jain
Experimental Results on Fusion of Multiple Fingerprint Matchers . .
. . . . . . . 814 Gian Luca Marcialis and Fabio Roli
Predicting Large Population Data Cumulative Match Characteristic
Performance from Small Population Data . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . 821 Amos Y. Johnson, Jie Sun,
and Aaron F. Bobick
A Comparative Evaluation of Fusion Strategies for Multimodal
Biometric Verification . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .830 J. Fierrez-Aguilar, J.
Ortega-Garcia, D. Garcia-Romero, and J. Gonzalez-Rodriguez
Iris Feature Extraction Using Independent Component Analysis . . .
. . . . . . . .838 Kwanghyuk Bae, Seungin Noh, and Jaihie Kim
BIOMET: A Multimodal Person Authentication Database Including Face,
Voice, Fingerprint, Hand and Signature Modalities . . . . . . . .
845 Sonia Garcia-Salicetti, Charles Beumier, Gerard Chollet,
Bernadette Dorizzi, Jean Leroux les Jardins, Jan Lunter, Yang Ni,
and Dijana Petrovska-Delacretaz
Fingerprint Alignment Using Similarity Histogram . . . . . . . . .
. . . . . . . . . . . . . . . 854 Tanghui Zhang, Jie Tian, Yuliang
He, Jiangang Cheng, and Xin Yang
A Novel Method to Extract Features for Iris Recognition System . .
. . . . . . . .862 Seung-In Noh, Kwanghyuk Bae, Yeunggyu Park, and
Jaihie Kim
Resampling for Face Recognition . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . 869 Xiaoguang Lu and
Anil K. Jain
Toward Person Authentication with Point Light Display Using Neural
Network Ensembles . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . 878 Sung-Bae Cho and Frank E.
Pollick
Fingerprint Verification Using Correlation Filters . . . . . . . .
. . . . . . . . . . . . . . . . . 886 Krithika Venkataramani and
B.V.K. Vijaya Kumar
On the Correlation of Image Size to System Accuracy in Automatic
Fingerprint Identification Systems . . . . . . . . . . . . . . . .
. . . . . . . . . . 895 J.K. Schneider, C.E. Richardson, F.W.
Kiefer, and Venu Govindaraju
Table of Contents XVII
A JC-BioAPI Compliant Smart Card with Biometrics for Secure Access
Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . 903 Michael Osborne and
Nalini K. Ratha
Comparison of MLP and GMM Classifiers for Face Verification on
XM2VTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . 911 Fabien Cardinaux, Conrad Sanderson, and
Sebastien Marcel
Fast Frontal-View Face Detection Using a Multi-path Decision Tree .
. . . . . . 921 Bernhard Froba and Andreas Ernst
Improved Audio-Visual Speaker Recognition via the Use of a Hybrid
Combination Strategy . . . . . . . . . . . . . . . . . . . . . . .
. . . . . 929 Simon Lucey and Tsuhan Chen
Face Recognition Vendor Test 2002 Performance Metrics . . . . . . .
. . . . . . . . . . . 937 Patrick Grother, Ross J. Micheals, and P.
Jonathon Phillips
Posed Face Image Synthesis Using Nonlinear Manifold Learning . . .
. . . . . . . .946 Eunok Cho, Daijin Kim, and Sang-Youn Lee
Pose for Fusing Infrared and Visible-Spectrum Imagery . . . . . . .
. . . . . . . . . . . . 955 Jian-Gang Wang and Ronda
Venkteswarlu
AVBPA2003 Face Authentication Contest
Face Verification Competition on the XM2VTS Database . . . . . . .
. . . . . . . . . . 964 Kieron Messer, Josef Kittler, Mohammad
Sadeghi, Sebastien Marcel, Christine Marcel, Samy Bengio, Fabien
Cardinaux, C. Sanderson, Jacek Czyz, Luc Vandendorpe, Sanun Srisuk,
Maria Petrou, Werasak Kurutach, Alexander Kadyrov, Roberto Paredes,
B. Kepenekci, F.B. Tek, G.B. Akar, Farzin Deravi, and Nick
Mavity
Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.975
Robust Face Recognition
A.N. Rajagopalan1, Rama Chellappa2, and Nathan Koterba2
1 Indian Institute of Technology, Madras, India
[email protected]
2 Center for Automation Research, University of Maryland, College
Park, USA {rama,nathank}@cfar.umd.edu
Abstract. We propose a new method within the framework of princi-
pal component analysis to robustly recognize faces in the presence
of clutter. The traditional eigenface recognition method performs
poorly when confronted with the more general task of recognizing
faces appear- ing against a background. It misses faces completely
or throws up many false alarms. We argue in favor of learning the
distribution of background patterns and show how this can be done
for a given test image. An eigen- background space is constructed
and this space in conjunction with the eigenface space is used to
impart robustness in the presence of back- ground. A suitable
classifier is derived to distinguish non-face patterns from faces.
When tested on real images, the performance of the proposed method
is found to be quite good.
1 Introduction
Two of the very successful and popular approaches to face
recognition are the Principal Components Analysis (PCA) [1] and
Fisher’s Linear Discriminant (FLD) [2]. Methods based on PCA and
FLD work quite well provided the input test pattern is a face i.e.,
the face image has already been cropped out of a scene. The problem
of recognizing faces in still images with a cluttered background is
more general and difficult as one doesn’t know where a face pattern
might appear in a given image. A good face recognition system
should i) detect and recognize all the faces in a scene, and ii)
not mis-classify background patterns as faces. Since faces are
usually sparsely distributed in images, even a few false alarms
will render the system ineffective. Also, the performance should
not be too sensitive to any threshold selection. Some attempts to
address this situation are discussed in [1, 3] where the use of
distance from eigenface space (DFFS) and distance in eigenface
space (DIFS) are suggested to detect and eliminate non-faces for
ro- bust face recognition in clutter. In this study, we show that
DFFS and DIFS by themselves (in the absence of any information
about the background) are not sufficient to discriminate against
arbitrary background patterns. If the threshold is set high,
traditional eigenface recognition (EFR) invariably ends up missing
faces. If the threshold is lowered to capture faces, the technique
incurs many false alarms.
J. Kittler and M.S. Nixon (Eds.): AVBPA 2003, LNCS 2688, pp. 1–9,
2003. c© Springer-Verlag Berlin Heidelberg 2003
2 A.N. Rajagopalan et al.
One possible approach to handle clutter in still images is to use a
good face detection module to find face patterns and then feed only
these patterns as inputs to the traditional EFR scheme. In this
paper, we propose a new method- ology within the PCA framework to
robustly recognize frontal faces in a given test image with
background clutter. Towards this end, we construct an ‘eigen-
background space’ which represents the distribution of the
background images corresponding to the given test image. The
background is learnt ‘on the fly’ and provides a sound basis for
eliminating false alarms. An appropriate pattern classifier is
derived and the eigenbackground space together with the eigenface
space is used to simultaneously detect and recognize faces. Results
are given on several test images to validate the proposed
method.
2 Eigenface Recognition in Clutter
In the EFR technique, when a face image is presented to the system,
its weight vector is determined with respect to the eigenface
space. In order to perform recognition, the difference error
between this weight vector and the a priori stored mean weight
vector corresponding to every person in the training set is
computed. This error is also called the distance in face space
(DIFS). That face class in the training set for which the DIFS is
minimum is declared as the recognized face provided the difference
error is less than an appropriately chosen threshold. The case of a
still image containing face against background is much more complex
and some attempts have been made to tackle it [1, 3]. In [1], the
authors advocate the use of distance from face space (DFFS) to
reject non-face patterns. The DFFS can be looked upon as the error
in the reconstruction of a pattern. It has been pointed out in [1]
that a threshold θDFFS could be chosen such that it defines the
maximum allowable distance from the face space. If DFFS is greater
than θDFFS , then the test pattern is classified as a non-face
image. In a more recent work [3], DFFS together with DIFS has been
suggested to improve performance. A test pattern is classified as a
face and recognized provided its DFFS as well as DIFS values are
less than suitably chosen thresholds θDFFS
and θDIFS , respectively. Although DFFS and DIFS have been
suggested as possible candidates for
discriminating against background patterns, it is difficult to
conceive that by learning just the face class we can segregate any
arbitrary background pattern against which face patterns may
appear. It may not always be possible to come up with threshold
values that will result in no false alarms and yet can catch all
the faces. To better illustrate this point, we show some examples
in Fig. 1(a) where faces appear against background. Our training
set contains faces of these individuals. The idea is to locate and
recognize these individuals in the test images when they appear
against clutter. The DFFS and DIFS values corresponding to every
subimage pattern in these images were calculated and an attempt was
made to recognize faces based on these values as suggested in [3].
It turns out that not only do we catch the face but also end up
with many false alarms (see Fig. 1(b)) since information about the
background is completely
Robust Face Recognition in the Presence of Clutter 3
ignored. It is interesting to note that some of the background
patterns have been wrongly identified as one of the individuals in
the training set. If the threshold values are made smaller to
eliminate false alarms, we end up missing some of the faces. Thus,
the performance of the EFR technique is quite sensitive to the
threshold values chosen.
3 Background Representation
If only the eigenface space is learnt, then background patterns
with relatively small DFFS and DIFS values will pass for faces and
this can result in an unac- ceptable number of false alarms. We
argue in favor of learning the distribution of background images
specific to a given scene. A locally learnt distribution can be
expected to be more effective (than a universal background class
learnt as in [4, 5] which is quite data intensive) for capturing
the background character- istics of the given test image. By
constructing the eigenbackground space for the given test image and
comparing the proximity of an image pattern to this subspace versus
the eigenface subspace, background patterns can be rejected.
3.1 The Eigenbackground Space
We now describe a simple but effective technique for constructing
the ‘eigen- background space’. It is assumed that faces are
sparsely distributed in a given image, which is a reasonable
assumption. Given a test image, the background is learnt ‘on the
fly’ from the test image itself. Initially, the test image is
scanned for those image patterns that are very unlikely to belong
to the ‘face class’.
• A window pattern x in the test image is classified (positively)
as a back- ground pattern if its distance from the eigenface space
is greater than a cer- tain (high) threshold θb.
Note that we use DFFS to initially segregate only the most likely
background patterns. Since the background usually constitutes a
major portion of the test image, it is possible to obtain a
sufficient number of samples for learning the ‘background class’
even if the threshold θb is chosen to be large for higher confi-
dence. Since the number of background patterns is likely to be very
large, these patterns are distributed into K clusters using simple
K-means clustering so that K-pattern centers are returned. The mean
and covariance estimated from these clusters allow us to
effectively extrapolate to other background patterns in the image
(not picked up due to high value of θb) as well.
The pattern centers which are much fewer in number as compared to
the number of background patterns are then used as training images
for learning the eigenbackground space. Although the pattern
centers belong to different clusters, they are not totally
uncorrelated with respect to one another and further dimensionality
reduction is possible. The procedure that we follow is similar to
that used to create the eigenface space. We first find the
principal components of the background pattern centers or the
eigenvectors of the covariance matrix Cb
4 A.N. Rajagopalan et al.
of the set of background pattern centers. These eigenvectors can be
thought of as a set of features which together characterize the
variation among pattern centers of the background space. The
subspace spanned by the eigenvectors corresponding to the largest K
′ eigenvalues of the covariance matrix Cb is called the
eigenbackground space. The significant eigenvectors of the matrix
Cb, which we call ‘eigenbackground images’, form a basis for
representing the background image patterns in the given test
image.
4 The Classifier
Let the face class be denoted by ω1 and the background class be
denoted by ω2. Assuming the conditional density function for the
two classes to be Gaussian
f(x|ωi) = 1
exp [ −1
2 di(x)
] (1)
where di(x) = (x − µi)tC−1 i (x − µi). Here, µ1 and µ2 are the
means while C1
and C2 are the covariance matrices of the face and the background
class, respec- tively. If the image pattern is of size M×M , then N
= M2. Diagonalization of Ci
results in di(x) = (x − µi)t(φiΛ −1 i φt
i)(x − µi) = yt i Λ−1
i y i
where φi is a matrix containing eigenvectors of Ci and is of the
form [φ1iφ2i . . . φNi]. The weight vec- tor y
i = φt
i(x− µi) is obtained by projecting the mean-subtracted vector x
onto the subspace spanned by the eigenvectors in φi. Written in
scalar form, di(x)
becomes di(x) = ∑N
λij . Since d1(x) is approximated using only L′ principal
projections, we seek to formulate an estimator for d1(x) as
follows.
d1(x) = L′∑
j=1
y2 1j
1 ρ1
ε21(x) (2)
where ε21(x) is the reconstruction error in x with respect to the
eigenface space. This is because ε21(x) can be written as ε21(x) =
||x − xf ||2 where xf is the estimate of x when projected onto the
eigenface space. Because xf is computed using only L′ principal
projections in the eigenface space, we have
ε21(x) = ||x−(µ1+ L′∑
y2 1j
as the φ1js are orthonormal. In a similar vein, since d2(x) is
approximated using only K ′ principal pro-
jections
Robust Face Recognition in the Presence of Clutter 5
where ε22(x) is the reconstruction error in x with respect to the
eigenbackground space and ε22(x) =
∑N j=K′+1 y2
2j = ||x − xb||2 Here, xb is the estimate of x when projected onto
the eigenbackground space.
From equations (1) and (2), the density estimate based on d1(x) can
be written as the product of two marginal and independent Gaussian
densities in the face space F and its orthogonal complement F⊥,
i.e.,
f(x|ω1) = exp
= fF (x|ω1) · fF⊥(x|ω1)
Here, fF (x|ω1) is the true marginal density in the face space
while fF⊥(x|ω1) is the estimated marginal density in F⊥.
Along similar lines, the density estimate for the background class
can be expressed as
f(x|ω2) = exp
= fB(x|ω2) · fB⊥(x|ω2)
Here, fB(x|ω2) is the true marginal density in the background space
while fB⊥(x|ω2) is the estimated marginal density in B⊥.
The optimal values of ρ1 and ρ2 can be determined by minimizing the
Kullback-Leibler distance [6] between the true density and its
estimate. The resultant estimates can be shown to be
ρ1 = 1
N − L′
λ2j (6)
Thus, once we select the L′-dimensional principal subspace F , the
optimal den- sity estimate f(x|ω1) has the form given by equation
(4) where ρ1 is as given above. A similar argument applies to the
background space also.
Assuming equal a priori probabilities, the classifier can be
derived as
log fF⊥(x|ω1)−log fB⊥ (x|ω2) = ε22(x)
2ρ2 − ε21(x)
2 log (2πρ1)
(7)
When L′ = K ′ i.e., when the number of eigenfaces and
eigenbackground pat- terns are the same, and when ρ1 = ρ2, i.e.,
when the arithmetic mean of the eigenvalues in the orthogonal
subspaces is the same, the above classifier inter- estingly
simplifies to
log fF⊥(x|ω1) − log fB⊥(x|ω2) = ε22(x) − ε21(x) (8)
6 A.N. Rajagopalan et al.
which is simply a function of the reconstruction error. Clearly,
the face space would favour a better reconstruction of face
patterns while the background space would favour the background
patterns.
5 The Proposed Method
Once the eigenface space and the eigenbackground space are learnt,
the test image is examined again, but now for the presence of faces
at all points in the image. For each of the test window patterns,
the classifier proposed in Section 4 is used to determine whether a
pattern is a face or not. Ideally, one must use equation (7) but
for computational simplicity we use equation (8) which is the
difference in the reconstruction error. The classifier works quite
well despite this simplification.
To express the operations mathematically, let the subimage pattern
under consideration in the test image be denoted as x. The vector x
is projected onto the eigenface space as well as the
eigenbackground space to yield estimates of x as xf and xb,
respectively. If
x − xf2 < x − xb2
and x − xf2 < θDFFS (9)
where θDFFS is an appropriately chosen threshold then recognition
is carried out based on its DIFS value. The weight vector W
corresponding to pattern x in the eigenface space is compared (in
the Euclidean sense) with the pre-stored mean weights of each of
the face classes. The pattern x is recognized as belonging to the
ith person if
i = min j
W − mj2, j = 1, . . . , q
and W − mi2 < θDIFS (10)
where q is the number of face classes or people in the database and
θDIFS is a suitably chosen threshold.
In the above discussion, since a background pattern will be better
approxi- mated by the eigenbackground images than by the eigenface
images, it is to be expected that x− xb2 would be less than x− xf2
for a background pattern x. On the other hand, if x is a face
pattern, then it will be better represented by the eigenface space
than the eigenbackground space. Thus, learning the eigen-
background space helps to reduce the false alarms considerably.
Moreover, the threshold value can now be raised comfortably without
generating false alarms because the reconstruction error of a
background pattern would continue to re- main a minimum with
respect to the background space only. Knowledge of the background
leads to improved performance (fewer misses as well as fewer false
alarms) and reduces sensitivity to the choice of threshold values
(properties that are highly desirable in a recognition
scenario).
Robust Face Recognition in the Presence of Clutter 7
6 Experimental Results
Because our experiment requires individuals in the test images
(with background clutter) to be the same as the ones in the
training set, we generated our own face database. The training set
consisted of images of size 27×27 pixels of 50 subjects with 10
images per subject. The number of significant eigenfaces was found
to be 50 for satisfactory recognition. For purpose of testing, we
captured images in which subjects in the database appeared (with an
approximately frontal pose) against different types of background.
Some of the images were captured within the laboratory. For other
types of clutter, we used big posters with different types of
complex background. Pictures of the individuals in our database
were then captured with these posters in the backdrop. We captured
about 400 such test images each of size 120 × 120 pixels. If a face
pattern is recognized by the system, a box is drawn at the
corresponding location in the output image.
Thresholds θDFFS and θDIFS were chosen to be the maximum of all the
DFFS and DIFS values, respectively, among the faces in the training
set (which is a reasonable thing to do). The threshold values were
kept the same for all the test images and for both the schemes as
well. For the proposed scheme, the number of background pattern
centers was chosen to be 600 while the number of eigenbackground
images was chosen to be 100 and these were kept fixed for all the
test images. The number of eigenbackground images was arrived at
based on the accuracy of reconstruction of the background
patterns.
Due to space constraint, only a few representative results are
given here (see Fig. 1 and Fig. 2). The figures are quite
self-explanatory. We observe that traditional EFR (which does not
utilize background information) confuses too many background
patterns (Fig. 1(b)) with faces in the training set. If θDFFS is
decreased to reduce the false alarms, then it ends up missing many
of the faces. On the other hand, the proposed scheme works quite
well and recognizes faces with very few false alarms, if any. When
tested on all the 400 test images, the proposed method has a
detection capability of 80% with no false alarms, and the
recognition rate on these detected images is 78%. Most of the
frontal faces are caught correctly. Even if θDFFS is increased to
accommodate slightly difficult poses, we have observed that the
results are unchanged for the proposed method. This can be
attributed to the fact that the proximity of a background pattern
continues to remain with respect to the background space despite
changes in θDFFS .
7 Conclusions
In the literature, the eigenface technique has been demonstrated to
be very useful for face recognition. However, when the scheme is
directly extended to recognize faces in the presence of background
clutter, its performance degrades as it cannot satisfactorily
discriminate against non-face patterns. In this paper, we have
presented a robust scheme for recognizing faces in still images of
natural scenes. We argue in favor of constructing an
eigenbackground space from the
8 A.N. Rajagopalan et al.
(a) (b) (c)
Fig. 1. (a) Sample test images. Results for (b) traditional EFR,
and (c) the proposed method. Note that traditional EFR has many
false alarms
Fig. 2. Some representative results for the proposed method
background images of a given scene. The background space which is
created ‘on the fly’ from the test image is shown to be very useful
in distinguishing non-face patterns. The scheme outperforms the
traditional EFR technique and gives very good results with almost
no false alarms, even on fairly complicated scenes.
References
[1] M. Turk and A. Pentland, “Eigenfaces for recognition”, J.
Cognitive Neuro- sciences, vol. 3, pp. 71-86, 1991. 1, 2
[2] P. Belhumeur, J. Hespanha and D. Kriegman, ”Eigenfaces vs.
Fisherfaces: Recog- nition using class specific linear projection”,
IEEE Trans. Pattern Anal. and Machine Intell., vol. 19, pp.
711-720, 1997. 1
[3] B. Moghaddam and A. Pentland, “Probabilistic visual learning
for object repre- sentation”, IEEE Trans. Pattern Anal. and Machine
Intell., vol. 19, pp. 696-710, 1997. 1, 2
Robust Face Recognition in the Presence of Clutter 9
[4] K. Sung and T. Poggio, “Example-based learning for view-based
human face detection”, IEEE Trans. Pattern Anal. and Machine
Intell., vol. 20, pp. 39-51, 1998. 3
[5] H.A. Rowley, S. Baluja, and T. Kanade, “Neural network-based
face detection”, IEEE Trans. Pattern Anal. and Machine Intell.,
vol. 20, pp. 23-38, 1998. 3
[6] K. Fukunaga, Introduction to Statistical Pattern Recognition,
Academic Press, 1991. 5
An Image Preprocessing Algorithm for
Illumination Invariant Face Recognition
The Robotics Institute Carnegie Mellon University, 5000 Forbes
Avenue, Pittsburgh, PA 15213
{rgross,brajovic}@cs.cmu.edu
Abstract. Face recognition algorithms have to deal with significant
amounts of illumination variations between gallery and probe
images. State-of-the-art commercial face recognition algorithms
still struggle with this problem. We propose a new image
preprocessing algorithm that compensates for illumination
variations in images. From a single brightness image the algorithm
first estimates the illumination field and then compensates for it
to mostly recover the scene reflectance. Unlike previously proposed
approaches for illumination compensation, our algo- rithm does not
require any training steps, knowledge of 3D face models or
reflective surface models. We apply the algorithm to face images
prior to recognition. We demonstrate large performance improvements
with several standard face recognition algorithms across multiple,
publicly available face databases.
1 Introduction
Besides pose variation, illumination is the most significant factor
affecting the appearance of faces. Ambient lighting changes greatly
within and between days and among indoor and outdoor environments.
Due to the 3D shape of the face, a direct lighting source can cast
strong shadows that accentuate or diminish certain facial features.
Evaluations of face recognition algorithms consistently show that
state-of-the-art systems can not deal with large differences in
illumi- nation conditions between gallery and probe images [1, 2,
3]. In recent years many appearance-based algorithms have been
proposed to deal with the prob- lem [4, 5, 6, 7]. Belhumeur showed
[5], that the set of images of an object in fixed pose but under
varying illumination forms a convex cone in the space of images.
The illumination cones of human faces can be approximated well by
low- dimensional linear subspaces [8]. The linear subspaces are
typically estimated from training data, requiring multiple images
of the object under different illu- mination conditions.
Alternatively, model-based approaches have been proposed to address
the problem. Blanz et al. [9] fit a previously constructed
morphable 3D model to single images. The algorithm works well
across pose and illumination, however, the computational expense is
very high.
In general, an image I(x, y) is regarded as product I(x, y) = R(x,
y)L(x, y) where R(x, y) is the reflectance and L(x, y) is the
illuminance at each point
J. Kittler and M.S. Nixon (Eds.): AVBPA 2003, LNCS 2688, pp. 10–18,
2003. c© Springer-Verlag Berlin Heidelberg 2003
An Image Preprocessing Algorithm 11
(x, y) [10]. Computing the reflectance and the illuminence fields
from real im- ages is, in general, an ill-posed problem. Therefore,
various assumptions and simplifications about L, or R, or both are
proposed in order to attempt to solve the problem. A common
assumption is that L varies slowly while R can change abruptly. For
example, homomorphic filtering [11] uses this assumption to ex-
tract R by high-pass filtering the logarithm of the image.
Closely related to homomorphic filtering is Land’s ”retinex” theory
[12]. The retinex algorithm estimates the reflectance R as the
ratio of the image I(x, y) and its low pass version that serves as
estimate for L(x, y). At large discontinuities in I(x, y) ”halo”
effects are often visible. Jobson [13] extended the algorithm by
combining several low-pass copies of the logarithm of I(x, y) using
different cut-off frequencies for each low-pass filter. This helps
to reduce halos, but does not eliminate them entirely.
In order to eliminate the notorious halo effect, Tumblin and Turk
introduced the low curvature image simplifier (LCIS) hierarchical
decomposition of an im- age [14]. Each component in this hierarchy
is computed by solving a partial differential equation inspired by
anisotropic diffusion [15]. At each hierarchical level the method
segments the image into smooth (low-curvature) regions while
stopping at sharp discontinuities. The algorithm is computationally
intensive and requires manual selection of no less than 8 different
parameters.
2 The Reflectance Perception Model
Our algorithm is motivated by two widely accepted assumptions about
human vision: 1) human vision is mostly sensitive to scene
reflectance and mostly in- sensitive to the illumination
conditions, and 2) human vision responds to local changes in
contrast rather than to global brightness levels. These two assump-
tions are closely related since local contrast is a function of
reflectance.
Having these assumptions in mind our goal is to find an estimate of
L(x, y) such that when it divides I(x, y) it produces R(x, y) in
which the local contrast is appropriately enhanced. In this view
R(x, y) takes the place of perceived sen- sation, while I(x, y)
takes the place of the input stimulus. L(x, y) is then called
perception gain which maps the input sensation into the perceived
stimulus, that is:
I(x, y) 1
L(x, y) = R(x, y) (1)
With this biological analogy, R is mostly the reflectance of the
scene, and L is mostly the illumination field, but they may not be
”correctly” separated in a strict physical sense. After all, humans
perceive reflectance details in shadows as well as in bright
regions, but they are also cognizant of the presence of shad- ows.
From this point on, we may refer to R and L as reflectance and
illuminance, but they are to be understood as the perceived
sensation and the perception gain, respectively.
To derive our model, we turn to evidence gathered in experimental
psychol- ogy. According to Weber’s Law the sensitivity threshold to
a small intensity
12 Ralph Gross and Vladimir Brajovic
← log(I) slope = 1/I
(a) (b)
Fig. 1. (a) Compressive logarithmic mapping emphasizes changes at
low stim- ulus levels and attenuates changes at high stimulus
levels. (b) Discretization lattice for the PDE in Equation
(5)
change increases proportionally to the signal level [16]. This law
follows from ex- perimentation on brightness perception that
consists of exposing an observer to a uniform field of intensity I
in which a disk is gradually increased in brightness by a quantity
I. The value I from which the observer perceives the existence of
the disk against the background is called brightness discrimination
threshold. Weber noticed that I
I is constant for a wide range of intensity values. Weber’s law
gives a theoretical justification for assuming a logarithmic
mapping from input stimulus to perceived sensation (see Figure
1(a)).
Due to the logarithmic mapping when the stimulus is weak, for
example in deep shadows, small changes in the input stimulus elicit
large changes in perceived sensation. When the stimulus is strong,
small changes in the input stimulus are mapped to even smaller
changes in perceived sensation. In fact local variations in the
input stimulus are mapped to the perceived sensation variations
with the gain 1
I , that is:
I(x, y) 1
IΨ (x, y) = R(x, y), (x, y) ∈ Ψ (2)
where IΨ (x, y) is the stimulus level in a small neighborhood Ψ in
the input image. By comparing Equation (1) and (2) we arrive at the
model for the perception gain:
L(x, y) = IΨ (x, y) .= I(x, y) (3)
where the neighborhood stimulus level is by definition taken to be
the stimulus at point (x, y). As seen in Equation 4 we regularize
the problem by imposing a smoothness constraint on the solution for
L(x, y). The smoothness constraint takes care of producing IΨ ;
therefore, the replacement by definition of IΨ by I
An Image Preprocessing Algorithm 13
in Equation 3 is justified. We do not need to specify any
particular region Ψ . The solution for L(x, y) is found by
minimizing:
J(L) = ∫∫
y)dxdy (4)
where the first term drives the solution to follow the perception
gain model, while the second term imposes a smoothness constraint.
Here refers to the image. The parameter λ controls the relative
importance of the two terms. The space varying permeability weight
ρ(x, y) controls the anisotropic nature of the smoothing
constraint.
The Euler-Lagrange equation for this calculus of variation problem
yields:
L + λ
Discretized on a rectangular lattice, this linear partial
differential equation becomes:
Li,j + λ [
1 hρ
= I (6)
where h is the pixel grid size and the value of each ρ is taken in
the middle of the edge between the center pixel and each of the
corresponding neighbors (see Figure 1(b)). In this formulation, ρ
controls the anisotropic nature of the smoothing by modulating
permeability between pixel neighbors. Equation 6 can be solved
numerically using multigrid methods for boundary value problems
[17]. Multigrid algorithms are fairly efficient having complexity
O(N), where N is the number of pixels [17]. Running our
non-optimized code on a 2.4GHz Pentium 4 produced execution times
of 0.17 seconds for a 320x240-pixel image, and 0.76 seconds for a
640x480-pixel image.
The smoothness is penalized at every edge of the lattice by weights
ρ (see Figure 1(b)). As stated earlier, the weight should change
proportionally with the strength of the discontinuities. We need a
relative measure of local contrasts that will equally ”respect”
boundaries in shadows and bright regions. We call again upon
Weber’s law and modulate the weights ρ by Weber’s contrast 1
hρ a+b 2
where ρ a+b 2
is the weight between two neighboring pixels whose intensities are
Ia
and Ib. 1 In our experiments equally good performance can be
obtained by using Michelson’s
contrast (Ia + Ib)/(Ia − Ib).
Original PIE images
Processed PIE images
Fig. 2. Result of removing illumination variations with our
algorithm for a set of images from the PIE database
3 Face Recognition across Illumination
3.1 Databases and Algorithms
We use images from two publicly available databases in our
evaluation: CMU PIE database and Yale database. The CMU PIE
database contains a total of 41,368 images taken from 68
individuals [18]. The subjects were imaged in the CMU 3D Room using
a set of 13 synchronized high-quality color cameras and 21 flashes.
For our experiments we use images from the more challenging
illumination set which was captured without room lights (see Figure
2).
The Yale Face Database B [6] contains 5760 single light source
images of 10 subjects each seen under 576 viewing conditions: 9
different poses and 64 illumi- nation conditions. Figure 3 shows
examples for original and processed images. The database is divided
into different subsets according to the angle the light source
direction forms with the camera’s axis (12, 25, 50 and 77)
We report recognition accuracies for two algorithms: Eigenfaces
(Principal Component Analysis (PCA)) and FaceIt, a commercial face
recognition system from Identix. Eigenfaces [19] is a standard
benchmark for face recognition al- gorithms [1]. FaceIt was the top
performer in the Facial Recognition Vendor Test 2000 [2]. As
comparison we also include results for Eigenfaces on histogram
equalized and gamma corrected images.
3.2 Experiments
The application of our algorithm to the images of the CMU PIE and
Yale databases results in accuracy improvements across all
conditions and all algo- rithms. Figure 4 shows the accuracies of
both PCA and FaceIt for all 13 poses of the PIE database. In each
pose separately the algorithms use one illumination condition as
gallery and all other illumination conditions as probe. The
reported
An Image Preprocessing Algorithm 15
Original Yale images
Processed Yale images
Fig. 3. Example images from the Yale Face Database B before and
after pro- cessing with our algorithm
22 02 25 37 05 07 27 09 29 11 14 31 34 0
10
20
30
40
50
60
70
80
90
100
Original SI Histeq Gamma
22 02 25 37 05 07 27 09 29 11 14 31 34 0
10
20
30
40
50
60
70
80
90
100
(a) PCA (b) FaceIt
Fig. 4. Recognition accuracies on the PIE database. In each pose
separately the algorithms use one illumination condition as gallery
and all other illumination conditions as probe. Both PCA and FaceIt
achieve better recognition accuracies on the images processed with
our algorithm (SI) than on the original. The gallery poses are
sorted from right profile (22) to frontal (27) and left profile
(34)
results are averages over the probe illumination conditions in each
pose. The performance of PCA improves from 17.9% to 48.6% on
average across all poses. The performance of FaceIt improves from
41.2% to 55%. On histogram equal- ized and gamma corrected images
PCA achieves accuracies of 35.7% and 19.3%, respectively.
Figure 5 visualizes the recognition matrix for PCA on PIE for
frontal pose. Each cell of the matrix shows the recognition rate
for one specific gallery/probe illumination condition. It is
evident that PCA performs better in wide regions of
16 Ralph Gross and Vladimir Brajovic
03 02 04 10 18 05 07 19 06 20 11 08 12 09 21 14 13 22 17 15
16
03
02
04
10
18
05
07
19
06
20
11
08
12
09
21
14
13
22
17
15
16
03 02 04 10 18 05 07 19 06 20 11 08 12 09 21 14 13 22 17 15
16
03
02
04
10
18
05
07
19
06
20
11
08
12
09
21
14
13
22
17
15
16
20
30
40
50
60
70
80
90
100
03 02 04 10 18 05 07 19 06 20 11 08 12 09 21 14 13 22 17 15
16
03
02
04
10
18
05
07
19
06
20
11
08
12
09
21
14
13
22
17
15
16
(a) Original (b) Histogram equalized (c) Our algorithm
Fig. 5. Visualization of PCA recognition rates on PIE for frontal
pose. Gallery illumination conditions are shown on the y-axis,
probe illumination conditions on the x-axis, both spanning
illumination conditions from the leftmost illumination source to
the rightmost illumination source
Subset 2 Subset 3 Subset 4 20
30
40
50
60
70
80
90
100
30
40
50
60
70
80
90
100
(a) PCA (b) FaceIt
Fig. 6. Recognition accuracies on the Yale database. Both
algorithms used images from Subset 1 as gallery and images from
Subset 2, 3 and 4 as probe. Using images processed by our algorithm
(SI) greatly improves accuracies for both PCA and FaceIt
the matrix for images processed with our algorithm. For comparison
the recog- nition matrix for histogram equalized images is shown as
well.
We see similar improvements in recognition accuracies on the Yale
database. In each case the algorithms used Subset 1 as gallery and
Subsets 2, 3 and 4 as probe. Figure 6 shows the accuracies for PCA
and FaceIt for Subsets 2, 3 and 4. For PCA the average accuracy
improves from 59.3% to 93.7%. The accuracy of FaceIt improves from
75.3% to 85.7%. On histogram equalized and gamma corrected images
PCA achieves accuracies of 71.7% and 59.7%, respectively.
An Image Preprocessing Algorithm 17
4 Conclusion
We introduced a simple and automatic image-processing algorithm for
compen- sation of illumination-induced variations in images. The
algorithm computes the estimate of the illumination field and then
compensates for it. At the high level, the algorithm mimics some
aspects of human visual perception. If desired, the user may adjust
a single parameter whose meaning is intuitive and simple to un-
derstand. The algorithm delivers large performance improvements for
standard face recognition algorithms across multiple face
databases.
Acknowledgements
The research described in this paper was supported in part by
National Science Foundation grants IIS-0082364 and IIS-0102272 and
by U.S. Office of Naval Research contract N00014-00-1-0915.
References
[1] Phillips, P., Moon, H., Rizvi, S., Rauss, P.: The FERET
evaluation methodology for face-recognition algorithms. IEEE PAMI
22 (2000) 1090–1104 10, 14
[2] Blackburn, D., Bone, M., Philips, P.: Facial recognition vendor
test 2000: evalu- ation report (2000) 10, 14
[3] Gross, R., Shi, J., Cohn, J.: Quo vadis face recognition? In:
Third Workshop on Empirical Evaluation Methods in Computer Vision.
(2001) 10
[4] Belhumeur, P.N., Hespanha, J.P., Kriegman, D.J.: Eigenfaces vs.
Fisherfaces: Recognition using class specific linear projection.
IEEE PAMI 19 (1997) 711–720 10
[5] Belhumeur, P., Kriegman, D.: What is the set of images of an
object under all possible lighting conditions. Int. J. of Computer
Vision 28 (1998) 245–260 10
[6] Georghiades, A., Kriegman, D., Belhumeur, P.: From few to many:
Generative models for recognition under variable pose and
illumination. IEEE PAMI (2001) 10, 14
[7] Riklin-Raviv, T., Shashua, A.: The Quotient image: class-based
re-rendering and recognition with varying illumination conditions.
In: IEEE PAMI. (2001) 10
[8] Georghiades, A., Kriegman, D., Belhumeur, P.: Illumination
cones for recognition under variable lighting: Faces. In: Proc.
IEEE Conf. on CVPR. (1998) 10
[9] Blanz, V., Romdhani, S., Vetter, T.: Face identification across
different poses and illumination with a 3D morphable model. In:
IEEE Conf. on Automatic Face and Gesture Recognition. (2002)
10
[10] Horn, B.: Robot Vision. MIT Press (1986) 11 [11] Stockam, T.:
Image processing in the context of a visual model. Proceedings
of
the IEEE 60 (1972) 828–842 11 [12] Land, E., McCann, J.: Lightness
and retinex theory. Journal of the Optical
Society of America 61 (1971) 11 [13] Jobson, D., Rahman, Z.,
Woodell, G.: A multiscale retinex for bridging the gap
between color imges and the human observation of scenes. IEEE
Trans. on Image Processing 6 (1997) 11
18 Ralph Gross and Vladimir Brajovic
[14] Tumblin, J., Turk, G.: LCIS: A boundary hierarchy for
detail-preserving contrast reduction. In: ACM SIGGRAPH. (1999)
11
[15] Perona, P., Malik, J.: Scale-space and edge detection using
anisotropic diffusion. IEEE PAMI 12 (1990) 629–639 11
[16] Wandel, B.: Foundations of Vision. Sunderland MA: Sinauer
(1995) 12 [17] Press, W., Teukolsky, S., Vetterling, W., Flannery,
B.: Numerical Recipes in C.
Cambridge University Press (1992) 13 [18] Sim, T., Baker, S., Bsat,
M.: The CMU Pose, Illumination, and Expression (PIE)
database. In: IEEE Int. Conf. on Automatic Face and Gesture
Recognition. (2002) 14
[19] Turk, M., Pentland, A.: Eigenfaces for recognition. Journal of
Cognitive Neuro- science 3 (1991) 71–86 14
J. Kittler and M.S. Nixon (Eds.): AVBPA 2003, LNCS 2688, pp. 19-26,
2003. Springer-Verlag Berlin Heidelberg 2003
Quad Phase Minimum Average Correlation Energy Filters for Reduced
Memory Illumination Tolerant Face
Authentication
Electrical and Computer Engineering Department Carnegie Mellon
University, 5000 Forbes Ave, Pittsburgh PA 15217, USA
[email protected] [email protected]
Abstract. In this paper we propose reduced memory biometric filters
for performing distortion tolerant face authentication. The focus
of this research is on implementing authentication algorithms on
small factor devices with limited memory and computational
resources. We com- pare the full complexity minimum average
correlation energy filters for performing illumination tolerant
face authentication with our proposed quad phase minimum average
correlation energy filters[1] utilizing a Four-Level correlator.
The proposed scheme requires only 2bits/frequency in the frequency
domain achieving a compression ratio of up to 32:1 for each
biometric filter while still attaining very good verification
performance (100% in some cases). The results we show are based on
the illumination subsets of the CMU PIE database[2] on 65 people
with 21 facial images per person.
1 Introduction
Biometric authentication systems are actively being researched for
access control, and a growing interest is emerging where these
systems need to be integrated into small factor devices such as
credit cards, PDA’s, cell phones and other devices with limited
memory and computational resources, with memory being the most
costly resource in such systems.
Traditional correlation filter based methods have not been favored
in areas of pat- tern recognition, mainly because the filters
employed were Matched filters[3] which meant that as many filters
as training images were used, leading to large amount of memory
needed for storing these filters and, more importantly one would
have to perform cross-correlation with each of the training images
(or matched filters) for each test image. Clearly this is very
expensive computationally and requires huge memory resources.
20 Marios Savvides and B.V.K. Vijaya Kumar
Fig. 1. Correlation schematic block diagram. A single correlation
filter is synthesized from many training images and stored directly
in the frequency domain. FFT’s are used to perform
cross-correlation fast and the correlation output is examined for
sharp peaks
Recent work using advanced correlation filter designs have shown to
be successful for performing face authentication in the presence of
facial expressions[4][5]. Ad- vanced correlation filters[6] such as
the minimum average correlation energy (MACE) filters[1],
synthesize a single filter template from a set of training images
and produce sharp distinct correlation peaks for the authentic
class and no discernible peaks for impostor classes. MACE filters
are well suited in applications where high discrimination is
required. In authentication applications we are typically given
only a small number of training images. These are used to
synthesize a single MACE filter. This MACE filter will typically
produce sharp distinct peaks only for the class it has been trained
on, and will automatically reject any other classes without any a
priori information about the impostor classes. Previous work
applying these types of filters for eye detection can be found here
[7].
1.1 Minimum Average Correlation Energy Filters
Minimum Average Correlation Energy (MACE)[1] filters are
synthesized in closed form by optimizing a criterion function that
seeks to minimize the average correlation energy resulting from
cross-correlations with the given training images while satis-
fying linear constraints to provide a specific peak value at the
origin of the correlation plane for each training image. In doing
so, the resulting correlation outputs from the training images
resemble 2D-delta type outputs, i.e. sharp peaks at the origin with
values close to zero elsewhere. The position of the detected peak
also provides the location of the recognized object.
The MACE filter is given in the following closed form
equation:
1 1( )− + − −= 1h D X X D X u (1)
Assuming that we have N training images, then X in Eq.(1) is an LxN
matrix, where L is the total number of pixels of a single training
images (L=d1xd2). X con- tains the Fourier transforms of each of
the N training images lexicographically re- ordered and placed
along each column. D is a diagonal matrix of dimension LxL
containing the average power spectrum of the training images
lexicographically re- ordered and placed along its diagonal. u is a
row vector with N elements, containing the corresponding desired
peak values at the origin of the correlation plane of the
Quad Phase Minimum Average Correlation Energy Filters 21
training images. The MACE filter is formulated directly in the
frequency domain for efficiency. Note that + denotes complex
conjugate transpose. Also, h is a row vector that needs to be
lexicographically re-ordered to form the 2-D MACE filter. In terms
of memory requirements, h is typically a complex 32 bit double. For
example for a 64x64 resolution images, that would need 32x2x64x64 ~
32Kb for a single MACE filter array stored in the frequency domain
as shown in Fig. 1.
1.2 Peak-to-Sidelobe Ratio (PSR) Measure
The Peak-to-Sidelobe Ratio (PSR) is a metric used to test the
whether a test image belongs to the authentic class. First, the
test image is cross-correlated with the synthe- sized MACE filter,
then the resulting correlation output is searched for the peak cor-
relation value. A rectangular region (we use 20x20 pixels) centered
at the peak is extracted and used to compute the PSR as follows. A
5x5 rectangular region centered at the peak is masked out and the
remaining annular region defined as the sidelobe region is used to
compute the mean and standard deviation of the sidelobes. The
peak-to-sidelobe ratio is given as follows:
peak meanPSR σ −= (2)
The peak-to-sidelobe ratio measures the peak sharpness in a
correlation output which is exactly what MACE filter tries to
optimize, hence the larger the PSR the more likely the test image
belongs to the authentic class. It is also important to realize
that the authentication decision is not based on a single
projection but many projec- tions which should produce a specific
response in order to belong to the authentic class, i.e. the peak
value should be large, and the neighboring correlation values which
correspond to projections of the MACE point spread function with
shifted versions of the test image should yield values close to
zero. Another important prop- erty of the PSR metric is that it is
invariant to any uniform scale changes in illumina- tion. This can
be easily be verified from Eq. (2) as multiplying the test image by
any constant scale factor can be factored out from the peak, mean
and standard deviation terms to cancel out.
Fig. 2. Peak-to-sidelobe ratio computation uses a 20x20 region of
the correlation output cen- tered at the peak
22 Marios Savvides and B.V.K. Vijaya Kumar
2 Quad Phase MACE Filters – Reduced Memory Representation
It is well known that in the Fourier domain, phase information is
more important than magnitude for performing image reconstruction
[8][9]. Since phase contains most of the intelligibility of an
image, and can be used to retrieve the magnitude information, we
propose to reduce the memory storage requirement of MACE filters by
preserving and quantizing the phase of the filter to 4 levels.
Hence the resulting filter will be named Quad-Phase MACE filter
where each element in the filter array will take on ± 1 for the
real component and ± j for the imaginary component in the following
man- ner.
1 { ( , )} 0 1 { ( , )} 0
H u v H u v
−
− ℑ <
(3)
Essentially 2 bits/per/frequency are needed to encode the 4 phase
levels, namely π /4, 3π /4, 5π /4, 7π /4. Details on partial
information filters can be found here [10].
2.1 Four-Level Correlator – Reduced Complexity Correlation
The QP-MACE filter described in Eq. (3) has unit magnitude at all
frequencies, en- coding 4 phase levels. In order to produce sharp
correlation outputs (that resemble delta-type outputs), then when
multiplying the QP-MACE with the conjugate of the Fourier transform
of the test image the phase should cancel out in order to provide a
large peak. The only way the phases will cancel out is if the
Fourier transform of the test image is also phase quantized in the
same way such that phases can cancel out to produce a large peak at
the origin. Therefore in this described architecture we also
propose to quantize the Fourier transform of the test images as in
Eq. (3).
Fig. 3. Correlation Outputs: (Left) Full Phase MACE filter ( Peak
=1.00, PSR=66) (right) Quad Phase-MACE Filter using Four-Level
Correlator (Peak=0.97, PSR=48)
Quad Phase Minimum Average Correlation Energy Filters 23
Fig. 4. Sample images of Person 2 from the Illumination subset of
PIE database captured with no background lighting
This effectively results in using a Four-Level correlator in the
frequency domain, where multiplication involves only performing
sign changes. Thus partly reducing the computational complexity of
the correlation block in the authentication process. Ob- taining
the quad-phase MACE filters and quad-phase Fourier transform arrays
is achieved very simply (we do not require to implement the if…then
branches shown in Eq. (3)), we need only to extract the sign bit
from each element in the array.
3 Experiments Using CMU PIE Database
For applications such as face authentication, we can assume that
the user will be co- operative and that he/she will be willing to
provide a suitable face pose in order to be verified. However,
illumination conditions cannot be controlled, especially for out-
door authentication. Therefore our focus in this paper is to
provide robust face authentication in the presence of illumination
changes. To test our proposed method, we used the illumination
subset CMU PIE database containing 65 people each with 21 images
captured under varying illumination conditions. There are 2
sessions of these dataset, one captured with background lights on
(easier dataset), and another captured with no background lights
(harder dataset). The face images were extracted and normalized for
scale using selected ground truth feature points provided with the
database. The resulting face images used in our experiments were of
size 100x100 pixels.
We selected 3 training images from each person to build their
filter. The images selected were those of extreme lighting
variation, namely image 3, 7 and 16 shown in Fig. 3. The same image
numbers were selected for every one of the 65 people, and a single
MACE filter was synthesized for each person from those images using
Eq. (1) and similarly a reduced memory Quad Phase MACE filter was
also synthesized using Eq. (3). For each person’s filter, we
performed cross-correlation with the whole da- taset (65*21=1365
images), to examine the resulting PSRs for images from that per-
son and all the other impostor faces. This was repeated for all
people (total of 88,725 cross-correlations), for each of the two
illumination datasets (with and without back-
24 Marios Savvides and B.V.K. Vijaya Kumar
ground lighting). We have observed in these results, that there is
a clear margin of separation between the authentic class and all
other impostors (shown as the bottom line plot depicting the
maximum impostor PSR among all impostors) for all 65 people
yielding 100% verification performance for both the full-complexity
MACE filters and the reduced-memory Quad-Phase MACE filters. Figure
5. shows a PSR compari- son plot for both types of filters for
Person 2 for the dataset that was captured with background lights
on (note that this plot is representative of the comparison plots
of the other people in the database). Since this is the easier
illumination dataset, it is reasonable that we observe that the
authentic PSRs are very high in comparison to Figure 6. which shows
the comparison plot for the harder dataset captured with no lights
on. The 3 distinct peaks shown, are those that belong to the 3
training images (3,7,16) used to synthesize Person 2’s filter. We
observe that while there is a degra- dation in PSR performance
using QP-MACE filters, this degradation is non-linear i.e. the PSR
degrades more for very large PSR values (but still provides a large
margin of separation from the impostor PSRs) of the full-complexity
MACE filters, but this is not the case for low PSR values resulting
from the original full-complexity MACE filters. We see that for the
impostor PSRs which are in 10 PSR range and below, QP- MACE
achieves very similar performance as the full-phase MACE
filters.
Another very important observation that was consistent throughout
all 65 people, is that the impostor PSRs are consistently below
some threshold (e.g. 12 PSR). This observed upper bound is
irrespective of illumination or facial expression change as
reported in [4]. This property makes MACE type correlation filters
ideal for verifica- tion, as we can select a fixed global
threshold, above which the user get authorized, and this is
irrespective of what type of distortion occurs, and even
irrespective of the person to be authorized. In contrast however,
this property does not hold in other approaches such as traditional
Eigenface or IPCA methods, who’s residue or distance to face space
is highly dependent on any illumination changes.
Fig. 5. PSR plot for Person 2 comparing the performance of
full-complexity MACE filters and the reduced-complexity Quad Phase
MACE filter using the Four-Level Correlator on the easier
illumination dataset that was captured with background lights
on
Quad Phase Minimum Average Correlation Energy Filters 25
Fig. 6. PSR plot for Person 2 comparing the performance of
full-complexity MACE filters and the reduced-complexity Quad Phase
MACE filter using the Four-Level Correlator on the harder
illumination dataset that was captured with background lights
off
Fig. 7. (left) PSF of Full-Phase MACE filter (right) PSF of QP-MACE
filter
Examining the point spread functions (PSF) of the full-phase MACE
filter and the QP-MACE filter show that they are very similar as
shown in Fig. 6. for Person 2. Since magnitude response is unity
for all frequencies for the QP-MACE filter; it ef- fectively acts
as an all pass filter. This justifies why we are able to see more
salient features (lower spatial frequency features) of the face,
while in contrast the full com- plexity MACE filter emphasizes
higher spatial frequencies; hence we are able to see only edge
outlines of mouth, nose, eyes and eye brows. MACE filters work as
well as they do in the presence of illumination variations because
they emphasize higher spatial frequency features such as outlines
of nose, eyes, mouth, their size and the relative geometrical
structure between these features on the face. Majority of illumi-
nation variations affect the lower spatial frequency content of
images, and these fre- quencies are attenuated by the MACE filters
hence the output is unaffected. Shadows for example will introduce
new features that have higher spatial frequency content, however
MACE filters look at the whole image and do not focus at any one
single feature, thus these types of filters provide a graceful
degradation in performance as more distortions occur.
26 Marios Savvides and B.V.K. Vijaya Kumar
4 Conclusions
We have shown that our proposed Quad Phase MACE (QP-MACE) filters
perform comparably to the full-complexity MACE filters achieving
100% verification rates on both illumination datasets on the CMU
PIE database using only 3 training images. These Quad-Phase MACE
filters only occupy 2bits per frequency (essentially 1 bit each for
the real and imaginary component). Assuming that full phase MACE
filter uses 32 bit double data type occupying 64 bits per frequency
for complex data, then the proposed Quad-Phase MACE filters only
require 2bits/per/Frequency achieving a compression ratio of up to
32 times smaller. A 64x64 pixel biometric filter will only require
1 Kilobytes of memory for storage, making this scheme ideal for
implementa- tion on limited memory devices.
This research is supported in part by SONY Corporation.
References
[1] Mahalanobis, B.V.K. Vijaya Kumar, and D. Casasent: Minimum
average cor- relation energy filters. Appl. Opt. 26, pp. 3633-3630,
1987.
[2] T. Sim, S. Baker, and M. Bsat: The CMU Pose, Illumination, and
Expression (PIE) Database of Human Faces. Tech. Report
CMU-RI-TR-01-02, Robotics Institute, Carnegie Mellon University,
January 2001.
[3] Vanderlugt: Signal detection by complex spatial filtering. IEEE
Trans. Inf. Theory 10, pp. 139-145, 1964.
[4] M. Savvides, B.V.K. Vijaya Kumar and P. Khosla: Face
verification using correlation filters. Proc. Of Third IEEE
Automatic Identification Advanced Technologies, Tarrytown, NY,
pp.56-61, 2002.
[5] B.V.K. Vijaya Kumar, M. Savvides, K. Venkataramani and C. Xie:
Spatial Frequency Domain Image Processing for Biometric
Recognition. Proc. Of Intl. Conf. on Image Processing (ICIP),
Rochester, NY, 2002.
[6] B.V.K. Vijaya Kumar: Tutorial survey of composite filter
designs for optical correlators. Applied Optics 31, 1992.
[7] R. Brunelli, T. Poggio: Template Matching: Matched Spatial
Filters and be- yond. Pattern Recognition, Vol. 30, No. 5,
pp.751-768, 1997.
[8] S. Unnikrishna Pillai and Brig Elliott: Image Reconstruction
from One Bit of Phase Information. Journal of Visual Communication
and Image Representa- tion, Vol. 1, No. 2, pp. 153-157, 1990.
[9] V. Oppenheim and J. W. Lim: The importance of phase on signals.
Proc. IEEE 69, pp. 529-532, 1981.
[10] V. K. Vijaya Kumar: A Tutorial Review of Partial-Information
Filter Designs for Optical Correlators, Asia-Pacific Engineering
Journal (A), Vol. 2, No. 2, pp. 203-215, 1992.
Component-Based Face Recognition
Jennifer Huang1, Bernd Heisele1,2, and Volker Blanz3
1 Center for Biological and Computational Learning, M.I.T.,
Cambridge, MA, USA
[email protected]
2 Honda Research Institute US, Boston, MA, USA
[email protected]
3 Computer Graphics Group, Max-Planck-Institut, Saarbrucken,
Germany
[email protected]
Abstract. We present a novel approach to pose and illumination in-
variant face recognition that combines two recent advances in the
com- puter vision field: component-based recognition and 3D
morphable mod- els. First, a 3D morphable model is used to generate
3D face models from three input images from each person in the
training database. The 3D models are rendered under varying pose
and illumination conditions to build a large set of synthetic
images. These images are then used to train a component-based face
recognition system. The resulting system achieved 90% accuracy on a
database of 1200 real images of six people and significantly
outperformed a comparable global face recognition sys- tem. The
results show the potential of the combination of morphable models
and component-based recognition towards pose and illumination
invariant face recognition based on only three training images of
each subject.
1 Introduction
The need for a robust, accurate, and easily trainable face
recognition system becomes more pressing as real world applications
such as biometrics, law en- forcement, and surveillance continue to
develop. However, extrinsic imaging pa- rameters such as pose,
illumination and facial expression still cause much diffi- culty in
accurate recognition. Recently, component-based approaches have
shown promising results in various object detection and recognition
tasks such as face detection [7, 4], person detection [5], and face
recognition [2, 8, 6, 3].
In [3], we proposed a Support Vector Machine (SVM) based
recognition sys- tem which decomposes the face into a set of
components that are interconnected by a flexible geometrical model.
Changes in the head pose mainly lead to changes in the position of
the facial components which could be accounted for by the flexi-
bility of the geometrical model. In our experiments, the
component-based system consistently outperformed global face
recognition systems in which classification was based on the whole
face pattern. A major drawback of the system was the need of a
large number of training images taken from different
viewpoints
J. Kittler and M.S. Nixon (Eds.): AVBPA 2003, LNCS 2688, pp. 27–34,
2003. c© Springer-Verlag Berlin Heidelberg 2003
28 Jennifer Huang et al.
and under different lighting conditions. These images are often
unavailable in real-world applications.
In this paper, the system is further developed through the addition
of a 3D morphable face model to the training stage of the
classifier. Based on only three images of a person’s face, the
morphable model allows the computation of a 3D face model using an
analysis by synthesis method [1]. Once the 3D face models of all
the subjects in the training database are computed, we generate
arbi- trary synthetic face images under varying pose and
illumination to train the component-based recognition system.
The outline of the paper is as follows: Section 2 briefly explains
the genera- tion of 3D head models. Section 3 describes the
component-based face detector trained from the synthetic images.
Section 4 describes the component-based face recognizer, which was
trained from the output of the component-based face de- tection
unit. Section 5 presents the experiments on component-based and
global face recognition. Finally, Section 6 summarizes results and
ou