Upload
sarvani-videla
View
582
Download
1
Embed Size (px)
Citation preview
Problem StatementThe goal is to use a model based approach for facial
emotion recognition of driver in real time environmentThe system should work on a embedded platformThe program is so developed to work on pipelined
architecture and parallel processing.
Why Model Based Approach?Illumination and pose variations are considered as major
concerns in facial emotion recognition which can be overcome by model based approach.
State Of The ArtRobert Niese, Ayoub Al-Hamadi, Axel Panning and Bernd
Michaelis“Emotion Recognition based on 2D-3D Facial Feature Extraction from Color Image Sequences”
Narendra Patel, Mukesh Zaveri“ 3D Facial Model Construction and Expression Synthesis using a Single Frontal Face Image
State Of The ArtAitor Azcarate, Felix Hageloh, Koen van de
Sande, Robert Valenti“ Automatic facial emotion recognition
Tie Yun , Ling Guan“Human Emotion Recognition Using Real 3D Visual Features from Gabor Library”
ChallengesSince the faces are non-rigid and have a high degree
of variability in location, colour and pose, several features of the face make facial expressions based emotion recognition more complex.
Occlusion and lighting distortions, as well as illumination conditions can also change the overall appearance of the face. Such changes will cause the emotion classification complex.
Spontaneous emotion recognition.Complexity of background when there is more than
one face in the image, system should be able to distinguish which one is being tracked
Emotion Recognition based on 2D-3D Facial Feature Extraction from Color Image Sequences
Facial Feature Points in 2D1. Detect the face2. Define fiducial points
3. Detect eyes and mouth
Complete set of feature points areas shown in the figure
Training
Set(sub-
windows)
IntegralRepresentation
Featurecomputation
AdaBoostFeature Selection
Cascade trainer
Testing phaseTraining phase
Strong Classifier 1(cascade stage 1)
Strong Classifier N(cascade stage N)
Classifier cascade framework
Strong Classifier 2(cascade stage 2)
FACE IDENTIFIED
Overview | Integral Image | AdaBoost | Cascade
Slide courtesy:Kostantina Palla, University Of Edinburgh
Camera ModelPin hole camera model is
usedCamera parameters Facilitating the camera
parameters, the transformation of 3D world points to image points is well described
Geometric 3D modelInitial registration step of the subject is captured one time
in frontal pose and with neutral expression.The face is localized in the stereo point cloud by using the
observation “surfaces are represented by more or less connected point clusters.
Similarity criterion h for clustering that combines color and the Euclidean distance of points
Surface reconstruction of the face cluster
Estimation of face poseCorrespondence between model and real-world is
established using fiducial points.According to camera model, the image projection of each
anchor point is determined.Goal of pose estimation is to reduce error between 3d
anchor points and fiducial points.After pose determination, the image feature points are
projected to the surface model at its current pose
Feature VectorThe feature vector consists of angles and distances
between a series of facial feature points in 3D.Feature vectors are normalized to increase classification
robustness
ClassificationClassification is done using ANN
ConsMisclassification at transition between facial expression
due to indistinct features.Performance is not optimised.Stereo-based initialization step can be inconvenient. Requires calibrated cameras.
References [1] H. D. Vankayalapati and K. Kyamakya, "Nonlinear Feature Extraction
Approaches for Scalable Face Recognition Applications," ISAST transactions on computers and intelligent systems, vol. 2, 2009.
[2]Hang-Bong Kang , “Various Approaches for Driver and Driving Behavior Monitoring: A Review ", ICCV2013 workshop.
[3]Emotion Recognition using Speech Features By K. Sreenivasa Rao, Shashidhar G. Koolagudi
[4] Yun Tie,” Human emotional state recognition using 3D facial expression features-thesis”, Ryerson University
Thank You
http://en.wikipedia.org/wiki/Perspective_%28graphical%29
camera calibrationand use subject specific surface model data reduces perspective foreshortening like in thecase of out of plane rotations
Photogrammetry is the science, technology and art of obtaining reliable information from noncontact imaging and other sensor systems about the Earth
Different Face Detection Techniques Two groups: holistic where face is treated as a whole unit and analytic where co-occurrence of
characteristic facial elements is studied.
Holistic face models: • Huang and Huang [7] used Point Distribution Model (PDM) which represents mean geometry of human face. Firstly, Canny edge detector is applied to find two symmetrical vertical edges which estimate the face position and then PDM is fitted. • Pantic and Rothkrantz [8] proposed system which process images of frontal and profile face view. Vertical and horizontal histogram analysis is used to find face boundaries. Then, face contour is obtained by thresholding the image with HSV color space values.
Analytic face models: • Kobayashi and Hara [9] used image captured in monochrome mode to find face brightness distribution.
Position of face is estimated by iris localization. • Kimura and Yachida [10] technique processes input image with an integral projection algorithm to find
position of eye and mouth corners by color and edge information. Face is represented with Potential Net model which is fitted by the position of eyes and mouth.
All of the above mentioned systems were designed to process facial images, however, they are not able to detect whether the face is present in the image. Systems which handle arbitrary images are listed below:
• Essa and Pentland [11] created the “face space” by performing Principal Component Analysis of eigenfaces from 128 face images. Face is detected in the image if its distance from the face space is acceptable.
• Rowley et al. [12] proposed neural network based face detection. Input image is scanned with a window and neural network decides if particular window contains a face or not.
• Viola and Jones [13] introduced very efficient algorithm for object detection with use of Haar-like features as object representation and Adaboost as machine learning method. This algorithm is widely used in face detection.
Three Dimensional Techniques Three dimensional models inherently provide more information than 2D models due to the presence of depth information, and are more robust
than 2D models. Many 3D model extraction solutions are subject to expensive computational complexity, or over simplified models that do not accurately represent the object.
The acquisition of 3D data can also produce image artifacts that may affect the rendered model [25]. The camera can receive light at intensities that saturate the detector or receive light levels too low to produce high quality images. This can occur in areas where there is specular reflection in stereo systems. Stereo based systems also have trouble getting true dense sampling of the face surface, and spare sampling points in regions where there is too much natural texture, leading to the exclusion of certain features (too smooth). Multimodal analysis with 3D and 2D data may be able to provide better data for classification (of face recognition) than single modalities, but compared to multiple 2D images (without 3D rendering), it does not show significant improvement, leading to a possible optimization problem in determining the best ways to use the acquired data [25]
A process completed by Chaumont et al. [26] breaks this problem into two steps, which first formulates an estimation of the 3D model, followed by model refinement. In the estimation section, a CANDIDE wireframe model (3D wireframe of an average face) is projected onto the 10
2D space from the 3D space under the assumption that all feature points are coplanar. This approximation is realistic because the differences in depth between features are very small compared to the distance to the camera. Making this assumption results in a projection of a 2D image on a 2D plane, which is a problem much easier solved. Also, since few 2D-3D correspondence points are available for use, the matrix is very sparse, and can be solved very quickly. After this approximation is determined, the wireframe is refined by perturbing the 3D points separately to match with the 2D points. This method is a fast method for face tracking and 3D face model extraction, can predict feature positions due to rotations and translations and model recovery in the presence of occultation because 3D information is known about the object.
Soyel et al. [27] used 3D distance vectors to obtain 3D FAPs between feature points to measure quantities like openness of eyes, height of eyebrows, openness of mouth, etc. to obtain distance vectors for test and training data for different expressions. They use only 23 facial features that are associated with the selected measurements and classify with a neural network. Tang et al. [28] utilizes the same approach, but performs an algorithm on the set of distances between the 83 points to determine the measurements that contain the most variation and are the most discriminatory, allowing for better recognition than empirically determined measurements.
Shape information is located in geometric features like ridges, ravines, peaks, pits, saddles, etc. local surface fitting is done, by centering the coordinate system at the vertex of interest (for ease of computation) . The patch can expressed in local coordinates and a cubic approximation (x^3, x^2y, xy^2, etc) can used to fit the surface locally, yielding two principle vectors that describe the maximum and minimum curvature at that point, and two corresponding eigenvalues. Along with the normal direction at that point, the surface properties can be classified into labels (flat, peak, ridge, etc) and a Primitive Surface Feature Distribution (PSFD) [29] can be generated as feature.
Other methods attempt to fit surface models onto point clouds of 3D sensor data. Mpiperis et al. [30,31] used a neutral face with an average identity and deformed it to the appropriate expression/identity. A triangular 3D mesh is placed on the face and subdivided into sub-triangles to increase the density. First a set of landmarks is associated with vertices on the mesh, which remain unchanged during the fitting process. Fitting is done as an energy minimization problem that consists of terms describing opposing forces between the landmarks 11
and mesh points, the distance between the surface and the mesh, and a smoothness constraint, which is solved by setting partial derivatives to 0 and solved using SVD. Asymmetric Bilinear models are used for facial expression recognition in which models identity in one dimension and expression in another. 3D Facial shapes obtained through finding the difference between neutral and expressive faces in 3D can also be used to classify facial expressions [32].
Venkatesh et al. employed principal component analysis on 3D mesh datasets to attempt to classify facial expressions [10]. PCA is a popular mathematical technique that for allows the dimensions of the problem to be reduced, making it easier to solve. For the training set, 68 feature points, which have been known to effectively represent facial expressions, have been manually selected around the eyes, mouth and eyebrows. PCA is done on the x, y, and z locations of these feature points to determine eigenvalues that can be used to find matrix projections on a given matrix A. This method automatically extracts features after they are divided into bounding boxes using anthropomorphic properties. This method achieves the automatic selection of points; however it is very computationally expensive.