Upload
others
View
5
Download
0
Embed Size (px)
Citation preview
Gowri Somanath, Rohith MV, Chandra Kambhamettu Video/Image Modeling and Synthesis (VIMS) Lab,
Department of Computer and Information Sciences, University of Delaware. USA.
http://vims.cis.udel.edu
First IEEE International Workshop on Benchmarking Facial Image Analysis Technologies (BeFIT) 2011
Context and motivation
VADANA Overview Comparison to existing datasets VADANA subject and image metadata and annotations.
Experiments with face verification Standard partitions for benchmarking Comparison of various baseline and state-of-the-art algorithms.
Discussions and future direction
Face recognition/ Person identification Face verification Age related studies – face verification across
age progression, age estimation, etc. Blood-relations/kinship verification Expression analysis Gender estimation Pose determination ….
Face recognition/ Person identification Face verification Age related studies – face verification across
age progression, age estimation, etc. Blood-relations/kinship verification Expression analysis Gender estimation Pose determination ….
Large-scale testbed to test scalability of algorithms. For recognition, scale may be determined by number of
subjects and images For verification, it may be measured through the number
of image pairs available for training and testing
‘Real-world’, ‘uncontrolled’, ‘in the wild’ data to check practical application.
Have preset standardized training-testing partitions and uniform performance measures to allow fair and direct comparison of algorithms.
Key aspects Images: 2298 images from 43 subjects (26 Male ; 17 Female ) Mostly high quality 24-bit color digital images (30 scanned)
Annotations Subject: gender, blood-relations (if-any) Image: age, pose, facial hair, occlusions, spectacles
Distribution Large number of images per subject (depth vs. breadth) Largest number of intra-personal pairs (essential for verification) Good distribution of images in different age groups
‘Real-world’ with natural pose, expression and illumination variations
Age
Identity /subject
3 4 19 24
…
…
…
…
…
-
-
-
Verification, age-progression, morphing studies
Age
-est
imat
ion,
agi
ng m
odel
stu
dies
Dataset Natural variations
High quality images
Age annotation
Large # of intra-pairs
Other annotations
FGNET X X * MORPH X X X X LFW X X PubFig X X VADANA *
* FGNET and VADNA include other image annotations (such as pose). In addition VADANA includes other subject related annotations
VADANA FGNET MORPH (Album 1) MORPH (Album 2)
#subjects (# adults)
43 (35) 82 (64) 631 (623) 13673 (13060)
Age range 0-78 0-69 16-69 16-99
Total # images (adults)
2298 (1913) 1002 (363) 1690 (1520) 55608 (52271)
# images per subject
3-300 (avg: 53) 6-18 (avg: 12)
1-6 (avg:2) 1-53 (avg:4)
Total # intra-personal pairs (adults)
168,833 (146,528)
5,808 (1,164)
1,597 (1,324)
138,368 (130,397)
Number in parenthesis indicates when only images with age >=18 were considered (adults) (see next)
VADANA FGNET MORPH (Album 1) MORPH (Album 2)
#subjects (# adults)
43 (35) 82 (64) 631 (623) 13673 (13060)
Age range 0-78 0-69 16-69 16-99
Total # images (adults)
2298 (1913) 1002 (363) 1690 (1520) 55608 (52271)
# images per subject
3-300 (avg: 53) 6-18 (avg: 12)
1-6 (avg:2) 1-53 (avg:4)
Total # intra-personal pairs (adults)
168,833 (146,528)
5,808 (1,164)
1,597 (1,324)
138,368 (130,397)
Depth of dataset vs. Breadth: For face verification, having large number of image pairs (intra-
personal) is desired, which needs large number of images per subject taken across different age and imaging conditions.
(More subjects)
VADANA FGNET MORPH (Album 1) MORPH (Album 2)
#subjects (# adults)
43 (35) 82 (64) 631 (623) 13673 (13060)
Age range 0-78 0-69 16-69 16-99
Total # images (adults)
2298 (1913) 1002 (363) 1690 (1520) 55608 (52271)
# images per subject
3-300 (avg: 53) 6-18 (avg: 12)
1-6 (avg:2) 1-53 (avg:4)
Total # intra-personal pairs (adults)
168,833 (146,528)
5,808 (1,164)
1,597 (1,324)
138,368 (130,397)
Largest number of intra-personal pairs among age annotated datasets. This allows for large-scale testing of algorithms – more training and
testing data available.
(More pairs)
VADANA FGNET MORPH (Album 1) MORPH (Album 2)
Total # intra-personal pairs (adults)
168,833 (146,528) 5,808 (1,164)
1,597 (1,324)
138,368 (130,397)
Age gap range (adults)
0-37 (0-37) 0-54 (0-45) 0-29 (0-28) 0-69 (0-69)
# intra-pairs in age gap [0,2] (adults)
158,262 (137,683) 855 (146) 402 (351) 128,742 (120,903)
# intra-pairs in age gap [3,5] (adults)
8,517 (7,967) 1,162 (229) 458 (389) 7,896 (7,805)
# intra-pairs in age gap [6,8] (adults)
696 (685) 932 (169) 292 (246) 664 (657)
# intra-pairs in age gap [9,inf] (adults)
1358 (193) 2,859 (620) 441 (338) 1,066 (1,032)
VADANA FGNET MORPH (Album 1) MORPH (Album 2)
Total # intra-personal pairs (adults)
168,833 (146,528) 5,808 (1,164)
1,597 (1,324)
138,368 (130,397)
Age gap range (adults)
0-37 (0-37) 0-54 (0-45) 0-29 (0-28) 0-69 (0-69)
# intra-pairs in age gap [0,2] (adults)
158,262 (137,683) 855 (146) 402 (351) 128,742 (120,903)
# intra-pairs in age gap [3,5] (adults)
8,517 (7,967) 1,162 (229) 458 (389) 7,896 (7,805)
# intra-pairs in age gap [6,8] (adults)
696 (685) 932 (169) 292 (246) 664 (657)
# intra-pairs in age gap [9,inf] (adults)
1358 (193) 2,859 (620) 441 (338) 1,066 (1,032)
This age gap contains least effect from appearance change due to aging – hence generally used to estimate upper-bound on algorithm
performance
VADANA FGNET MORPH (Album 1) MORPH (Album 2)
Total # intra-personal pairs (adults)
168,833 (146,528) 5,808 (1,164)
1,597 (1,324)
138,368 (130,397)
Age gap range (adults)
0-37 (0-37) 0-54 (0-45) 0-29 (0-28) 0-69 (0-69)
# intra-pairs in age gap [0,2] (adults)
158,262 (137,683) 855 (146) 402 (351) 128,742 (120,903)
# intra-pairs in age gap [3,5] (adults)
8,517 (7,967) 1,162 (229) 458 (389) 7,896 (7,805)
# intra-pairs in age gap [6,8] (adults)
696 (685) 932 (169) 292 (246) 664 (657)
# intra-pairs in age gap [9,inf] (adults)
1358 (193) 2,859 (620) 441 (338) 1,066 (1,032)
Age gap of ~10 years is encountered in applications such as passport image verification.
VADANA
FGNET
MORPH (1)
MORPH (2)
[3,5] (Max. VADANA 8.5K)
[6,8] (Max. FGNET 0.9K) [9,inf) (Max. FGNET 0.2K)
[0,2] (Max. VADANA 158K)
Area proportional to number of intra-personal pairs in corresponding age-gap groups
Our age gaps due to depth than breadth
VADANA FGNET MORPH (Album 1) MORPH (Album 2)
Variations Wide range of natural variations in pose, expression, illumination. Facial hair, spectacles Partial occlusions
Mostly frontal pose (some profile pose) Neutral expression, Controlled illumination.
Other metadata Gender, Relation b/w subjects (if-any), Facial hair, Spectacles, pose
Gender, Facial hair, Spectacles, 68 fiducial points
Gender, race Gender, race
Image quality 24-bit color digital images (30 scanned)
Mostly scanned images
Digitally scanned at 300dpi; grayscale pgm
8-bit color images
Aligned faces Yes (additional parallel version with all faces aligned)
No No No
#images in age group (# subjects)
VADANA FGNET MORPH (Album 1)
[0,18] 387 (15) 687 (80) 247 (209)
[19,30] 1084 (13) 186 (62) 921 (513)
[31,60] 761 (21) 122 (35) 516 (304)
[61,inf) 66 (4) 8 (4) 6 (5)
0
500
1000
[0,18] [19,30] [31,60] [61,inf)
VADANA
FGNET
MORPH (Album 1)# Im
ages
Age
Raw Image set Images of faces for each subject (varying image sizes)
Aligned Image set
Images resized to 250 x 250 pixels and aligned using funneling
Subject data Subject ID Gender Year of Birth
Image data Age Binary indicators for facial hair, spectacles and occlusions Horizontal and vertical pose (left, right, center ; up, down, center)
Relationship data Sibling pairs and parent-child pairs (e.g., [011,020] indicates subject 011 is the parent of subject 020)
Experiment sets Multiple partitions of the dataset for uniform comparison
We include preset standard partitions (experiment sets) of VADANA for benchmarking purposes.
For each set Fixed the age gap bin Fix a maximum limit on intra-pairs per subject Sample dataset to obtain the intra-pairs Divide the intra-pairs into folds such that
▪ Subjects across folds are non-overlapping (hence training-testing subjects are non-overlapping) ▪ The number of pairs in each fold is nearly equal
Each fold also has a corresponding set of extra-pairs. The number of extra-pairs is equal to number of intra-pairs in the fold.
The different sets for each age gap has less than 60% overlap (by random sampling of images and pairs per subject)
Age gap # of intra-pairs per fold X # folds # of sets
[0,2] ~7000 X 5 folds 4
[3,5] ~240 X 3 folds 2
[6,8] ~ 145 X 2 folds 3
For each age gap, average the performance over the sets
For each set, Measure CAR and CRR for different parameters of the algorithm
▪ Correct Acceptance Rate (CAR)= # of correctly classified intra-pairs/ total # of intra-pairs
▪ Correct Rejection Rate (CRR) = # of correctly classified extra-pairs/ total # of extra-pairs
The ROC curve is obtained by plotting the CARs, CRRs (averaged
over the folds)
Accuracy is determined as performance at Equal Error Rate (EER). That is the CAR at the point where (CAR=CRR) on the ROC curve.
Baseline: Variant of eigenfaces Formed eigenfaces basis by using 10 randomly sampled images from subjects. Select
sufficient number of eigenvectors to explain at least 75% variation. Represent each face image using the eigenfaces basis by projection. Use difference of above representation as feature for a given image pair. Use Random Forest as classifier for binary classification of each pair as inter-pair or extra-pair.
Pair matching algorithm of Nowak et.al. 2007
Works by training classifiers based on features extracted from image patches. Implementation provided by authors. We have used same parameters as used for LFW dataset and found to be fairly stable with
slight changes.
State-of-the art face verification algorithm under age progression by Ling et. al. 2010. Gradient Orientation Pyramids used as features to train SVM classifiers (Gaussian kernel). Specifically designed to handle age separation in image pairs. Reported to be state-of-the-art
on FGNET and Passport dataset. No public implementation currently available. Our implementation was verified by closely
matching author results on FGNET.
All experiments performed on [0,2] age gap All experiments performed on aligned version of VADANA Baseline algorithm using eigenfaces performs very well on current benchmark dataset
(FGNET) but performs just above chance on VADANA – showing the contrasting nature of the two.
VADANA FGNET LFW Jain
Eigenfaces (PCA+RF) 52.33 99.33
Nowak pair matching (SIFT+ERFC) 61.52 67+-2.2 73* 84.2*
Ling et.al (SVM + GOP) 57.43 73
* Obtained from original papers respectively.
All experiments performed on [0,2] age gap All experiments performed on aligned version of VADANA Baseline algorithm using eigenfaces performs very well on current benchmark dataset
(FGNET) but performs just above chance on VADANA – showing the contrasting nature of the two.
LFW and Jain are large-scale real-world datasets. Nowak algorithm performs better on the VADANA -- since it also works well on other wild imagery
VADANA FGNET LFW Jain
Eigenfaces (PCA+RF) 52.33 99.33
Nowak pair matching (SIFT+ERFC) 61.52 67+-2.2 73* 84.2*
Ling et.al (SVM + GOP) 57.43 73
* Obtained from original papers respectively.
All experiments performed on [0,2] age gap All experiments performed on aligned version of VADANA Baseline algorithm using Eigenfaces performs very well on current benchmark dataset
(FGNET) but performs just above chance on VADANA – showing the contrasting nature of the two.
LFW and Jain are large-scale real-world datasets. Nowak algorithm performs better on the VADANA -- since it also works well on other wild imagery
The state-of-the-art algorithm specifically designed for verification across age progression, performs better than eigenfaces. Yet the performance does not reach the same accuracy as for FGNET.
VADANA FGNET LFW Jain
Eigenfaces (PCA+RF) 52.33 99.33
Nowak pair matching (SIFT+ERFC) 61.52 67+-2.2 73* 84.2*
Ling et.al (SVM + GOP) 57.43 73
* Obtained from original papers respectively.
Algorithms based on image difference (Eigenfaces, Ling et.al) seem to perform well on FGNET but not so on VADANA. While, Nowak pair matching is based on image patches, performs well on VADANA compared to the above two. Number of training pairs may be a contributing factor for Nowak success
since FGNET offers only 148 intra-pairs in [0,2], while VADANA offers 35,000. Image quality may also be a minor contributing factor (SIFT does better).
VADANA FGNET LFW Jain
Eigenfaces (PCA+RF) 52.33 99.33
Nowak pair matching (SIFT+ERFC) 61.52 67+-2.2 73* 84.2*
Ling et.al (SVM + GOP) 57.43 73
* Obtained from original papers respectively.
VADANA provides Real-world: A large-scale real world age annotated dataset. Depth vs. Breadth: Contains the largest number of intra-personal pairs in total and age
gaps of [0,2] and [3,5]. Various cross-sections: Offers different cross-sections and variations within and across
subjects (pose, expression, illumination and subject appearance) Distribution: Offers a good distribution of subjects and images in various ages and age-
gaps. Annotations: Annotated with subject and image data for various other facial analysis
algorithms. Benchmark: Offers multiple standard partitions for uniform comparisons.
Future directions
Detailed comparison between the different types of algorithm (patch-based vs. image difference) is better scalable for large scale age annotated real-world data.
Quantitative study of sensitivity of the algorithms to the different factors (age difference, gender, image quality, etc)
Requests can be sent through the dataset webpage http://vims.cis.udel.edu/vadana.html