167
Lehrstuhl f¨ ur Mensch-Maschine-Kommunikation der Technischen Universit¨at M¨ unchen A System for Automatic Face Analysis Based on Statistical Shape and Texture Models Ronald M¨ uller Vollst¨andiger Abdruck der von der Fakult¨at ur Elektrotechnik und Informationstechnik der Technischen Universit¨at M¨ unchen zur Erlangung des akademischen Grades eines Doktor-Ingenieurs genehmigten Dissertation Vorsitzender: Prof. Dr. rer. nat. Bernhard Wolf Pr¨ ufer der Dissertation: 1. Prof. Dr.-Ing. habil. Gerhard Rigoll 2. Prof. Dr.-Ing. habil. Alexander W. Koch Die Dissertation wurde am 28.02.2008 bei der Technischen Universit¨ at M¨ unchen eingereicht und durch die Fakult¨ at f¨ ur Elektrotechnik und Informationstechnik am 18.09.2008 angenommen.

A System for Automatic Face Analysis Based on Statistical ... · PDF fileLehrstuhl fur Mensch-Maschine-Kommunikation der Technischen Universit at Munc hen A System for Automatic Face

Embed Size (px)

Citation preview

  • Lehrstuhl fur Mensch-Maschine-Kommunikationder Technischen Universitat Munchen

    A System for Automatic Face AnalysisBased on

    Statistical Shape and Texture Models

    Ronald Muller

    Vollstandiger Abdruck der von der Fakultatfur Elektrotechnik und Informationstechnik

    der Technischen Universitat Munchenzur Erlangung des akademischen Grades eines

    Doktor-Ingenieursgenehmigten Dissertation

    Vorsitzender: Prof. Dr. rer. nat. Bernhard Wolf

    Prufer der Dissertation:

    1. Prof. Dr.-Ing. habil. Gerhard Rigoll

    2. Prof. Dr.-Ing. habil. Alexander W. Koch

    Die Dissertation wurde am 28.02.2008 bei der Technischen Universitat Muncheneingereicht und durch die Fakultat fur Elektrotechnik und Informationstechnikam 18.09.2008 angenommen.

  • A System for Automatic Face AnalysisBased on

    Statistical Shape and Texture Models

    Dissertation

    Ronald Muller

    Technische Universitat [email protected]

    January 28th 2008

  • Abstract

    This dissertation gives an overview and insight in the structure and the scien-tific algorithms of a system designed for the automatic analysis of human faces.Thereby, Face Analysis addresses the goal to extract as much abstract informa-tion as possible from a face. The applied methods of statistical shape and texturemodels base on the idea of Active Appearance Models (AAM). An AppearanceModel for face analysis describes the variations in shape and texture of humanfaces derived from a careful selection of photographs showing different personswith different facial expressions and head poses in various lighting conditions de-pending on the specific focus of the analysis. During the analysis of a human facewithin a video or a picture, the Appearance Model is used to re-synthesize thisface as optimal as possible. Apart from an introduction to AAMs with a unifiedmathematical notation, this document describes the various optimizations andmodifications on several steps of the basic algorithm.

    While the recognition and interpretation of faces is comparatively lightweightfor the human visual cortex, this task requires computer vision approaches of high-est computational complexity. Thus, this thesis not only fights the challenge ofmost accurate face analysis, but also the difficulties of building up an integrated,fully automatic software system which provides a high computational efficiencyplus techniques for the extensive exploitation of modern standard hardware.

    The evaluations compare the different developed algorithms with respect tothe quality of the re-synthesized face, computational complexity, and patternrecognition tasks, such as the determination of e.g. the gender, age, head pose,and facial expression of a person.

    Acknowledgments Apart from the official bodies of the Technische Univer-sitat Munchen, especially Professor Dr.-Ing. habil. Gerhard Rigoll, my specialthanks go to all the students who conducted their Master, Diploma, and BachelorTheses, their Interdisciplinary Projects, and Seminar Presentations with me. Ivery much enjoyed the collaboration with them and I am happy having built afruitful network of young students and professionals. This work would not havebeen such successful without the indescribable diligence, efforts, and skills of RalfNikolaus and Michael Geisinger. They deserve the greatest thank and respect fortheir contribution. Eventually, I thank Karin Hammerschmid in devotion for herconsiderateness, relief, and footing.

    Trademarks Trademarks appear throughout this document without any trade-mark symbol; they are the property of their respective trademark owner. There isno intention of infringement; the usage is to the benefit of the trademark owner.

  • Contents

    Abstract i

    Contents iii

    1 Introduction 11.1 The Challenge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 The Background . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.3 FEASy a FacE Analysis System . . . . . . . . . . . . . . . . . 31.4 The Thesis in a Nutshell . . . . . . . . . . . . . . . . . . . . . . . 5

    2 A Multi-Threading Framework for Signal Processing Systems 72.1 Conditions for the development of Signal Processing Systems . . . 82.2 Other works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.3 Requirements of a Software Framework for High-Performance Sig-

    nal Processing Systems . . . . . . . . . . . . . . . . . . . . . . . . 102.4 Concepts of MMER Lab . . . . . . . . . . . . . . . . . . . . . . . 11

    2.4.1 Software Architecture . . . . . . . . . . . . . . . . . . . . . 142.4.2 Design Decisions . . . . . . . . . . . . . . . . . . . . . . . 15

    2.5 Application Examples and Evaluation . . . . . . . . . . . . . . . . 152.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

    3 Object Localization with AdaBoost Variants on Haar- andGabor-Wavelet Features 193.1 Haar-like and Gabor-Wavelet features . . . . . . . . . . . . . . . . 20

    3.1.1 Haar-like features . . . . . . . . . . . . . . . . . . . . . . . 203.1.2 Gabor-Wavelets . . . . . . . . . . . . . . . . . . . . . . . . 21

    3.2 Feature Selection and Classification with AdaBoost . . . . . . . . 243.2.1 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243.2.2 The Standard AdaBoost Algorithm . . . . . . . . . . . . . 253.2.3 Gentle AdaBoost . . . . . . . . . . . . . . . . . . . . . . . 263.2.4 Weak classifiers . . . . . . . . . . . . . . . . . . . . . . . . 273.2.5 Cascaded AdaBoost Classification . . . . . . . . . . . . . . 29

    3.3 Evaluation of Localization Performance . . . . . . . . . . . . . . . 303.3.1 Databases . . . . . . . . . . . . . . . . . . . . . . . . . . . 313.3.2 Head and Eye Localization Results . . . . . . . . . . . . . 31

  • iv CONTENTS

    3.3.3 Feature selection . . . . . . . . . . . . . . . . . . . . . . . 323.3.4 Localization Performance . . . . . . . . . . . . . . . . . . . 33

    3.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

    4 The Theory of Active Appearance Models 374.1 Preparation of Training Data . . . . . . . . . . . . . . . . . . . . 38

    4.1.1 Alignment and Normalization of Landmarks . . . . . . . . 384.1.2 Warping . . . . . . . . . . . . . . . . . . . . . . . . . . . . 404.1.3 Normalization of Textures . . . . . . . . . . . . . . . . . . 41

    4.2 Generation of an Appearance Model . . . . . . . . . . . . . . . . 434.2.1 Shape Model . . . . . . . . . . . . . . . . . . . . . . . . . 434.2.2 Texture Model . . . . . . . . . . . . . . . . . . . . . . . . 444.2.3 Combined Model . . . . . . . . . . . . . . . . . . . . . . . 44

    4.3 Coefficient Optimization . . . . . . . . . . . . . . . . . . . . . . . 454.3.1 Objective Function . . . . . . . . . . . . . . . . . . . . . . 464.3.2 Offline Prediction . . . . . . . . . . . . . . . . . . . . . . . 484.3.3 Numerical Estimation of the Jacobian Matrix . . . . . . . 494.3.4 Iterative Optimization . . . . . . . . . . . . . . . . . . . . 51

    5 Derivatives and Advancements of Active Appearance Models 535.1 A Survey on Active Appearance Models and Variants . . . . . . . 535.2 Appearance Models based on NMF . . . . . . . . . . . . . . . . . 55

    5.2.1 Data Modeling with Non-Negative Matrix Factorization . . 565.2.2 Generation of Appearance Models with NMF . . . . . . . 685.2.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

    5.3 Online Optimization of AAM Coefficients . . . . . . . . . . . . . . 725.3.1 Gradient Descent . . . . . . . . . . . . . . . . . . . . . . . 735.3.2 Grid Sampling . . . . . . . . . . . . . . . . . . . . . . . . . 755.3.3 Nelder-Mead or Simplex Optimization . . . . . . . . . . . 765.3.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

    5.4 GPU-Accelerated Active Appearance Models . . . . . . . . . . . . 795.4.1 Warping . . . . . . . . . . . . . . . . . . . . . . . . . . . . 795.4.2 Coefficient Optimization . . . . . . . . . . . . . . . . . . . 865.4.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

    5.5 Evaluation Measures for the Quality of AAM Re-synthesis . . . . 895.5.1 Dataset Annotation . . . . . . . . . . . . . . . . . . . . . . 895.5.2 Quality Measures . . . . . . . . . . . . . . . . . . . . . . . 905.5.3 Evaluation of Quality Measures . . . . . . . . . . . . . . . 93

    5.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

    6 Application of Active Appearance Models to Face Analysis 956.1 Classification Based on Results of the AAM Optimization . . . . . 96

    6.1.1 Classification based on class specific AAMs . . . . . . . . . 966.1.2 Statistical classification based on AAM coefficients . . . . 976.1.3 Support Vector Machines . . . . . . . . . . . . . . . . . . . 98

  • CONTENTS v

    6.1.4 N-fold Cross-Validation . . . . . . . . . . . . . . . . . . . . 996.2 Image Databases . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

    6.2.1 The AR Database . . . . . . . . . . . . . . . . . . . . . . . 1006.2.2 The NIFace1 Database . . . . . . . . . . . . . . . . . . . . 1016.2.3 The FG-NET Aging Database . . . . . . . . . . . . . . . . 1016.2.4 The MMI Face Database . . . . . . . . . . . . . . . . . . . 102

    6.3 Gender Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . 1036.3.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1036.3.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1036.3.3 State-of-the-Art . . . . . . . . . . . . . . . . . . . . . . . . 105

    6.4 Facial Expression Recognition . . . . . . . . . . . . . . . . . . . . 1066.4.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1066.4.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1066.4.3 State-of-the-Art . . . . . . . . . . . . . . . .