View
217
Download
0
Category
Preview:
Citation preview
7/27/2019 2012_A study of quantitative comparisons of photographs and video images.pdf
1/11
A study of quantitative comparisons of photographs and video images based on
landmark
derived
feature
vectors
Krista F. Kleinberg a,*, J. Paul Siebert b
a Forensic Medicine and Science, Joseph Black Building, University of Glasgow, Glasgow G12 8QQ, UKbDepartment of Computing Science, Sir Alwyn Williams Building, University of Glasgow, Glasgow G12 8QQ, UK
1. Introduction
As a result of the wide deployment of surveillance cameras,
there is both opportunity and motivation, given the amount of
visual material being collected digitally, to identify suspects from
CCTV. Although rapidly improving in terms of spatial resolution,
the majority of video surveillance equipment does not produce
images of sufficient quality needed to provide identificationswhen
othermore conclusive evidence, such asDNA or fingerprints, is not
available. It is in these kinds of cases that anthropometrymay havethe potential to provide a useful identification technique.
Surveillance video can be important supportive evidence because
it may show a crime being committed, although, it is not always
easy to recognise,and therefore convict,a criminal caughtonCCTV.
Video surveillance can bemore reliable than eyewitness testimony
because the story told is always consistent and also corroborates
what the eyewitness reported [1]. However, a more comprehen-
sive analysis is necessary because even when facial video images
are of sufficient quality, it is possible that two people may look
similar to each other in this medium.
The roles of anthropometry and forensic science have inter-
twined beginning with Bertillon in the 1800s [2,3] and anthro-
pometry was one of the identification methods used in [46].
Although more sophisticated vision based methods of image
comparison are being developed [7,8], it remains to be seen whatcanbe achievedbyutilizing ratiosbetweenkey facial landmarks on
single 2D images. Even if reliable automatic methods for face
image comparison can be developed, the need for manual
intervention in terms of landmark placement are likely to be
required where low-quality images have to be analysed, such as
generatedbymany currently installedCCTV systems. In contrast to
comparing two images, anthropometric proportions from the face
and body of live suspects were compared against 2D images and
was one of the identification methods resulting in convictions in
two out of three cases in Halbersteins 2001 paper [9]. One of the
fundamental problems with comparing 2D images is facial pose.
Forensic Science International 219 (2012) 248258
A R T I C L E I N F O
Article history:
Received 6 July 2011Received in revised form 22 November 2011
Accepted 4 January 2012
Available online 24 January 2012
Keywords:
Facial identification
Anthropometry
Image comparison
Face database
A B S T R A C T
An abundunce of surveillance cameras highlights the necessity of identifying individuals recorded.
Images captured are often unintelligible and are unable to provide irrefutable identifications by sight,
and therefore a more systematic method for identification is required to address this problem. An
existing database of video and photograhic imageswas examined, which hadpreviously been used in a
psychological research project; material consisted of 80 video (Sample 1) and 119 photograhic (Sample
2) images, though taken with different cameras. A set of 38 anthropometric landmarks were placed by
hand capturing 59 ratios of inter-landmark distances to conduct within sample and between sample
comparisons using normalised correlation calculations; mean absolute value between ratios, Euclidean
distance and Cosine u distance between ratios. The statistics of the two samples were examined to
determine which calculation best ascertained if there were any detectable correlation differences
between faces that fall under the same conditions. A comparison of each face in Sample 1 was then
compared against thedatabase of faces in Sample 2. Wepresent pilot results showing that theCosineu
distance equation usingZ-normalisedvaluesachieved the largest separation between True Positive and
True Negative faces.Having applied theCosineu distance equationwewere then able to determine that
if a match value returned is greater than 0.7, it is likely that the best match will be a True Positive
allowing a decrease of database images to be verified by a human. However, a much larger sample of
images requires to be tested to verify these outcomes. 2012 Elsevier Ireland Ltd. All rights reserved.
* Corresponding author. Present address: PEACH Unit, University of Glasgow,
Queen Mothers Hospital, 8th Floor Tower Block, Dalnair Street, Glasgow G3 8SJ,
UK. Tel.: +44 141 201 1988; fax: +44 141 201 6943.
E-mail addresses: Krista.Kleinberg@glasgow.ac.uk , kristakleinberg@yahoo.com
(K.F. Kleinberg), Paul.Siebert@glasgow.ac.uk (J.P. Siebert).
Contents
lists
available
at
SciVerse
ScienceDirect
Forensic Science International
journal homepage : www.elsev ier .co m/locate / fo rsc i in t
0379-0738/$ see front matter 2012 Elsevier Ireland Ltd. All rights reserved.
doi:10.1016/j.forsciint.2012.01.014
http://dx.doi.org/10.1016/j.forsciint.2012.01.014http://dx.doi.org/10.1016/j.forsciint.2012.01.014http://dx.doi.org/10.1016/j.forsciint.2012.01.014http://dx.doi.org/10.1016/j.forsciint.2012.01.014http://dx.doi.org/10.1016/j.forsciint.2012.01.014http://dx.doi.org/10.1016/j.forsciint.2012.01.014http://dx.doi.org/10.1016/j.forsciint.2012.01.014http://dx.doi.org/10.1016/j.forsciint.2012.01.014http://dx.doi.org/10.1016/j.forsciint.2012.01.014http://dx.doi.org/10.1016/j.forsciint.2012.01.014http://dx.doi.org/10.1016/j.forsciint.2012.01.014mailto:Krista.Kleinberg@glasgow.ac.ukmailto:kristakleinberg@yahoo.commailto:Paul.Siebert@glasgow.ac.ukmailto:Paul.Siebert@glasgow.ac.ukhttp://www.sciencedirect.com/science/journal/03790738http://www.sciencedirect.com/science/journal/03790738http://www.sciencedirect.com/science/journal/03790738http://dx.doi.org/10.1016/j.forsciint.2012.01.014http://dx.doi.org/10.1016/j.forsciint.2012.01.014http://www.sciencedirect.com/science/journal/03790738mailto:Paul.Siebert@glasgow.ac.ukmailto:kristakleinberg@yahoo.commailto:Krista.Kleinberg@glasgow.ac.ukhttp://dx.doi.org/10.1016/j.forsciint.2012.01.0147/27/2019 2012_A study of quantitative comparisons of photographs and video images.pdf
2/11
Attempts to rectify this in facial recognition pose invariant
systems described in [10,11] reported greater recognition rates
than when used without the pose transformations. Using soft
biometric traits was shown to be beneficial in improving
recognition accuracy when combined with a commercial based
face matching program [12].
Three questions should be asked of a comparison method; is it
possible to carry out the comparison objectively, is it possible to
avoid manual input, and is it applicable to checking large
databases? An identification made based on 2D images will be
more decisive if there is a way to quantify the comparison, rather
than if the identification is based solely on a subjective analysis, as
the result is a comparison that is objectivewithminimalbias.Once
quantification of a comparison is achieved, the process should be
automated. An automated process would decrease the error from
involving many different operators in the comparison process and
would allow largedatabases to be checkedquickly.As a face search
could potentially be extended to full populations by reviewing
internationalised databases, i.e. Interpol [13], the need for
automation is high. According to The Ministry of Justice Statistics
(UK) bulletin, the reoffending rate for criminals in England and
Wales in 2006was 146.1 offences per 100 offenders [14].Although
this is a decrease of 22.9% from 2000, the numbers indicate there is
justification for a database of convicted criminal images that couldbe quickly automated and checked.
We document an investigation into the comparison of
anthropometric ratios of facial landmark pairs manually located
on 2D images. The constraints in this study are that we consider
best-case scenario situations as a bench mark given that scenarios
in thefield,bydefinition, cannotbe asbenign. The subjectmatter is
based on the analysis of comparing high quality full-face frontal
video and photographic images of individuals of a similar ethnic
background with neutral expressions.
This investigation expanded previous research carried out by
Kleinberg, Vanezis and Burton [15] and was conducted to test the
hypothesis: Using a comparison of anthropometric facial ratios, it
is
possible
to
discriminate
between
individuals
of
two
samples.
The objective of this study was to derive measurements betweenspecific landmarks on the face in both print and video media and
incorporate them into a feature vector to use in statistical analysis
to
determine
if
identifications
of
an
individual
can
be
made
based
on
these
measurements.
Knowledge
of
the
type
of
information
gathered in this studymay help in future to rankpotential suspects
for human identification verification. However, in order to
establish
that
two
faces
were
the
same
and
use
this
identification
method
to
identify
positively
rather
than
eliminate
suspects,
it
would be necessary to show that the probability of a false match in
the rest of the population at random was of an acceptably low
probability
[16].
To
investigate
the
hypothesis
in
this
study,
we
seek
to
address
the following questions:
Of the proposed images, can similar faces be separated from
dissimilar faces within a single sample using vector compar-
isons?
How distinguishable are individual faces in the samples? Is it
possible to distinguish true positive faces from true negative
faces using vector comparisons where the statistics from two
samples are known?
Using
a
small sample
of
re-landmarked images,
how signifi-
cant is the error contribution in re-landmarked images and
what is the operator induced measurement spread under ideal
conditions?
Given
a
specific
example
and
set
of
comparisons
with
the
database, what constitutes a manageable subsample, worthy of
further
manual
verification?
2. Materials
A total of 199 images of Caucasian male police volunteers were available which
hadbeenused previously in research conducted byBruce et al. [17]. The199 images
comprised 80 different video still faces (Sample 1) and 119 different photographic
faces (Sample 2). According to Bruce et al. [17], The image quality on the videos
was high-equivalent to what would be produced by a good amateur photographer
trying to reveal a good likeness of someone onahome videotape. Thephotographic
images in Sample 2 included the same 80 faces depicted in the video cohort, and an
additional 39 new faces not included as video stills. The photographs were of
policemen, both retired and presently working and except for photographs, which
have already been published elsewhere are, for this reason, unable to be exhibited
in this paper. However, an example of each type of image is provided in Fig. 1. Both
sets of images, taken on the same day, were displayed from the frontal viewpoint,
showing features from the neck up, in what appeared to be the format of police
identification photographs. In this study the identity of the subjects in the video
images was known and could be cross referenced with the corresponding
photographic images. This means that identifications made on the basis of facial
anthropometry could be designated as true or false. One positive feature of these
video images was that because they were recorded on the same day as the
photographs, the study images didnothave anyof thepossible facial changeswhich
can occurdue to time factors such asweight loss/gain, increase in age orpresence of
facial hair.
3. Methodology
Given a set of landmarks there is a need to be able to quantify the landmarks
numerically such that they can be used to compare faces. Ideally, the measure
should
be
invariant
to
in-plane
translations
and
rotations
and
be
tolerant
to
adegree of out-of-plane rotation in order to accommodate the variability inherent
when posing a subject for full frontal image capture. Thirty-eight landmarks (Table
1), ten unilateral and 14 bilateral, were chosen for inclusion in the anthropometric
study and are shown in Fig. 2. Careful consideration was given to the selection of
landmarks that were used is this study. Anthropometric research by Farkas [18],
Purkait [19], Fieller [20], Evison [21], and facial recognition research by Craw et al.
[22] and Okada et al. [23] were consulted when choosing the landmarks that were
included in the present study. When choosing a landmark it was important that it
was one that could be placed consistently. It had to be a point where an operator
performing the comparison would be able to locate it in the same place within an
acceptable error. According to Fieller [20], the criteria used to determine a
successful/reliable landmark are: observer knowledge, consistency of landmark
Fig.
1.
High
resolution
video
image
(a)
and
selection
of
ten
database
photographs
(b).
K.F. Kleinberg, J.P. Siebert/Forensic Science International 219 (2012) 248258 249
7/27/2019 2012_A study of quantitative comparisons of photographs and video images.pdf
3/11
placement, discriminatory power, and landmark visible in majority of cases.
Excluded landmarkswere eliminated on the basis of their inability to be located on
photographs.
Although thenumber ofpossible linearmeasurements increases combinatorially
with the number of landmarks, not all are reliable or pertinent to the research
undertaken
for
this
study.
A
total
of
73
linear
measurements
(21
unilateral,
26bilateral) were chosen for this study. The majority of these were chosen by
consulting the literature [18,22,24]. Two of these measurements used in a previous
study [25], ex-n and ex-sto, were chosen because they utilise landmarks that were
considered tobe less affected by facial expression than others and also because they
would be visible even if the subject was wearing a hat. Three bilateral
measurements were unique to the present study.
From these landmarks and linear measurements, a total of 59 ratios (also
unilateral and bilateral) were selected for comparison of images (Table 2). The
linear measurements that make up the ratios are shown in Fig. 3. A ratio wasderived by dividing the smaller linear measurement (numerator) by the larger
linear measurement (denominator). The ratios were chosen to achieve a balance of
the horizontal and vertical regions of the face. Intuitively, it is expected that longer
lines between landmarks located on different sections of the face would make a
more reliable proportion than two short lines in the same section of the face. This is
because small variations in landmarkplacement makingup short lineswould result
in large changes in proportions, which may not accurately portray true variations
between individuals. The ratios utilised in this research were deliberately chosen to
include linear measurements between landmarks in different sections of the face
and others that covered a small section of the face, such as the length vs. the width
of the eye. As it is more common to use absolute measurements in anthropometric
comparisons [18,19,26,27] rather than ratios, there was less guidance with respect
to which ratios would be more reliable or more relevant than others in the present
study. Halberstein used a combination of up to twelve face and body ratios when
comparing a photograph to a live subject, and three of these ratios were used [9].
These ratioswere ear length/facial height (sa-sba/n-gn), nasal height/ear length (n-
sn/sa-sba)
and
nasal
width/nasal
height
(al-al/n-sn).
The
remainder
of
the
ratiosthat were used by Halberstein were not incorporated into this research because
they either included facial landmarks thatwere not chosen for the present study or
Table 1
Landmarks and their definitions used in this study [18,24].
1. Glabella (g): the most prominent midline point between the eyebrows.
2. Nasion (n): the point in the midline of both the nasal root and the
nasofrontal suture. This point is always above the line that connects the
two inner canthi. A canthus is the angle at either end of the fissure
between the eyelids.
3. Exocanthion (ex): the point at the outer commissure of the eye fissure. A
commissure is the site of union of corresponding parts and a fissure is any
cleft or groove, in this case of the eye [bilateral].
4. Endocanthion
(en):
the
point
at
the
inner
commissure
of
the
eye
fissure[bilateral].
5. Palpebrale superius (ps): highest point in the midportion of the free
margin of each upper eyelid. The free margin portion of the eyelid is the
unattached edge [bilateral].
6. Palpebrale inferius (pi): the lowest point in the midportion of the free
margin of each lower eyelid [bilateral].
7. Orbitale (or): the lowest point on the margin of the orbit. The orbit is the
bony cavity that contains the eyeball [bilateral].
8. Superaurle (sa): the highest point of the free margin of the auricle. The
auricle is the portion of the external ear that is not contained within the
head [bilateral].
9. Subaurale (sba): the lowest point on the free margin of the ear lobe
[bilateral].
10. Postaurale (pa): the most posterior point on the free margin of the ear
helix. The helix refers to the coiled structure of the ear. [bilateral].
11. Otobasion inferius (obi): the lowest point of attachment of the external
ear to the head [bilateral].
12. Alare (al): the most lateral point on each nostril contour [bilateral].
13. Subnasale (sn): the midpoint of the angle at the columella (fleshy, lower
margin) base where the lower border of the nasal septum and the surface
of the upper lip meet.
14. Pronasale (prn): the most protruded point of the nasal tip.
15. Subalare (sbal): the point on the lower margin of the base of the nasal
ala where the ala disappears into the upper lip skin [bilateral].
16. Stomion (sto): the imaginary point at the crossing of the vertical facial
midline and the horizontal labial (lip) fissure between gently closed lips,
with teeth shut in the natural position.
17. Crista philtri landmark (cph): the point on the elevated margin of the
philtrum just above the vermilion line. The philtrum is the vertical
groove in the median portion of the upper lip and vermilion refers to the
exposed
red portion of the upper or lower lip [bilateral].
18. Cheilion (ch): the point located at each labial commissure [bilateral].
19. Labiale inferius (li): the midpoint of the vermilion border of the lower lip.
20. Labiale superius (ls): the midpoint of the vermilion border of the upperlip.
21. Gonion (go): the most lateral point at the angle of the mandible. The
mandible is the bone of the lower jaw [bilateral].
22. Sublabiale (sl): determines the lower border of the lower lip or the upper
border of the chin.
23. Pogonion (pg): the most anterior midpoint of the chin.
24. Gnathion (gn): the lowest point in the midline on the lower border of the
chin.
Fig. 2. Facial landmarks and their location.
Table 2
Ratios used in this study.
go-go/n-gn sn-sto/sto-sl sn-gn/n-sto li-sl/sn-ls
n-prn/g-pg sbal-sn/sn-prn [bilateral] gn-go/n-gn [bilateral] sl-gn/sto-gn
al-al/ex-ex ex-go/go-go [bilateral] al-al/n-sn n-sn/n-sto
sa-sba/n-gn [bilateral] n-gn/n-sto n-sn/sa-sba [bilateral] en-al/ex-ch [bilateral]
ex-ex/go-go obi-ch/g-sa [bilateral] ex-n/ex-sto [bilateral] sbal-ls/n-al [bilateral]
ex-n/n-sto [bilateral] pi-al/sa-ex [bilateral] ex-sto/n-sto [bilateral] ex-obi/ex-ch [bilateral]
en-ex/ps-pi [bilateral] ex-al/ch-gn [bilateral] en-en/ex-ex ch-ls/n-prn [bilateral]
pi-or/en-ex [bilateral] al-ls/ch-gn [bilateral] sa-sba/pa-obi [bilateral] ch-li/ex-ch [bilateral]
cph-cph/sn-ls ex-sto/rt ex-lt ch [bilateral] ls-sto/ch-ch sn-gn/ex-gn [bilateral]
sto-li/ch-ch
K.F. Kleinberg, J.P. Siebert/Forensic Science International 219 (2012) 248258250
7/27/2019 2012_A study of quantitative comparisons of photographs and video images.pdf
4/11
because they were body ratios, such as shoulder width, leg or shoe lengths. Two
ratios (n-sn/n-sto, n-gn/n-sto) were used by Catterick for his research [28]. The
remainder of the ratios chosenwereunique to this study. Inorder to continuewith a
best case scenario situation, one volunteer, with previous experience in placing
landmarks on 2D images, placed the 38 landmarks on all 199 images using the
measurement programme produced in-house, Facial Identification Centre Version0.32Forensic Medicine and Science Glasgow University.
The group of 59 ratios is treated as a 59 dimensional vector and this has been
evaluated as a means of comparing all faces. In this study, the feature vector is the
series of 59 ratios derived from chosen linear measurements between facial
landmarks. The alternative for comparing ratios between landmarks is to compare
the raw distances between the landmarks. Comparing raw distances can be
accomplished using the Procrustes [29] alignment techniques, and although
outside of the scope of the current project, may be used in future studies. The
advantage of using ratios is that they are both scale and rotation invariant and also
to a slight degree auto-corrective (in terms of errors added during landmarking). In
addition, ratiosexhibit adegree of invariance to the effects of out-of-plane rotations
for small angles (when the effects of such rotations are sufficiently small to
approximate a 2D affine transformation on the imaging plane).
Three equations were used to test the comparison of a feature vector from one
sample against another; mean absolute difference, Euclidean distance and Cosine u
distance. The first two equations compare the length of the difference vector and
the
third
equation
compares
the
angle
between
the
vectors.
The
three
equations
areas follows:
3.1. The mean absolute difference between ratio vectors
Eq. (1) determines the distance that separates one face from another by taking
the absolute value of one face ratio vector subtracted from the same ratio of a
second face. This is carried out for each ratio element in the feature vector. The
summation of this feature vector is then divided by the total number of elements
(59 ratios in this case). A difference of 0 between two faces establishes that those
two faces have identical facial ratio vectors. The smaller the difference in facial
ratios is indicative of a smaller difference between faces. A disadvantage of using
this equation is that the maximum difference between faces is not bounded:
Meanabsdiff
XnNn1
F1n F2nj j
N (1)
3.2. The Euclidean distance between ratios
The Euclidean distance (Eq. (2)) also measures the distance between twomulti-
dimensional vectors. This is the square root of the sum of the squares of the
elements, in this case ratios. A difference of 0 between two faces establishes that
those two faces have identical facial ratio vectors. The smaller the difference in
facial ratios is indicative of a smaller difference between faces. A disadvantage of
using this equation is that the maximum difference between faces is not bounded:
Euclideandistance ffiffiffiffiffiffiffiffiffiffiffiffi ffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffiffiffiffiffiffiffiffiffiffiffiffiffi ffiffiffiXnNn1
F1n F2n 2vuut
(2)
3.3. The Cosine u distance
The Cosine u distance equation (Eq. (3)) is a similarity measurement and is used
to measure the angle between two vectors. A cosine difference of 1.0 between two
faces establishes that those two faces have identical vectors of facial ratios. An
advantage of using this equation is that the range of values is bound from 1.0 to
+1.0 and useful comparisons are ranged from zero to one. A difference of zero is
indicative of a face that shows no correlation whereas a result of 0.5 is achieved by
random chance. Anynegative result shows the face comparison produces an inverse
correlation:
Cosu
XnNn1
F1n F2n
F1kk F2k
k
(3)
A comparison between two faces was deemed a true positive match (TP) if the
match was a correct match between the video image and photograph of the same
subject. A true negative match (TN)wasone that excludes the faces andwhichwasa
correct exclusion because it involved a video image and a photograph of two
different subjects.A falsepositivematch (FP)wasonewhich wasan incorrect match
between a video image and a photograph of two different subjects and a false
negative match (FN) was one which is excluded but which was an incorrect
exclusion because it involved the video image and photograph of the same subject.
To answer the questions laid forth in Section 1, the three equations were applied
in the following four scenarios to test the comparison of Sample 1 faces to Sample 2
faces; within sample comparisons, between sample comparisons, error in landmark
placement, and the potential sample of photographs subject tomanual verification.
4. Results
4.1. Within sample comparisons
To test if similar faces were separable from dissimilar faces
within
a
single
sample
the
equations
were
applied
so
that
every face
in
a
single
sample
was
compared
to
itselfand
everyother
face
within
this sample. Each sample contained only one image of each face and
for this reason allthatcould bedetermined was the true negativity of
this
collection
of
different faces.
Therefore,
no
estimate
of
the
degree
to
which
two
same
faces
(true
positives)
would
match
when
captured at different times could be made from this data. The same
tests were carried out on Sample 1 (video) and then separately on
Sample
2
(photographs).
Testing all combinations
of
pairs
of
faces
within
each
sample
was
important
because
it
compared
faces
acquired under the same capture conditions, allowing the tests to
ascertain
if
it
were
possible
to
discriminate
between
different
(truenegative)
faces.
Therefore,
in
this
experiment
the
primary
source
of
variability
between
faces
should
be
attributable
to
differences
in
the
measured facial landmark ratios, i.e. generated by genuine face
shape differences, whilst the statistics of the remaining sources of
variability
remain
constant;
same
media, same
operator
placing
landmarks
and
same
facial
pose.
The similarity or dissimilarity between the faces in a single
sample is cross-checked by comparing the distributions of the
similarity
statistics
of
Sample
1
to
those
of
Sample
2.
A
Sample
1
to
Sample
2
cross-comparison
of
the
statistics
produced
by
matching
faces within their own samples can be used to in future to predict
how discriminable faces are when making comparisons between
these
two
samples.
If
both
samples
exhibit
similar
statistics,
this
would
be
indicative
that
it
is
possible
to
distinguish
faces
between
Fig. 3. Linear measurements that created the ratios utilised in this study.
K.F. Kleinberg, J.P. Siebert/Forensic Science International 219 (2012) 248258 251
7/27/2019 2012_A study of quantitative comparisons of photographs and video images.pdf
5/11
samples because any difference between faces would be a result of
the true difference in faces rather than a result of the different
media recording each of the two samples of images. Results are
summarised in Fig. 4ac and are illustrated by superimposing the
normal distribution curves and similarity density histograms of
the two samples. In Table 3a the standard deviation scaled
differencebetween the Sample1 and Sample2means indicates the
difference in statistics between media, the cosine distance by far
exhibiting the greatest difference.
In order to equalise the absolute ranges of the feature vector
values in each sample and address the observed difference in the
statistics of the comparisons of Sample 1 and Sample 2, the
equations were completed using the application of Z-normalised
ratio values, illustrated in Fig. 4ac. Each element, F(n) in the
measurement vector F is expressed by a population of measure-
ment ratioswithin a sample, Z-normalisation potentially enhances
range of variation (and accentuates anydifferences) about the ratio
mean of this sub-population of ratios, allowing small differences in
the data to become more apparent. This was accomplished by
dividing the mean subtracted element by the sample standard
deviation for the particular ratio (Eq. (4)). Z-normalisation can be
applied to all three of the equations:
Z-normalizedelement FZn Fn mFn
sFn(4)
Therefore, by applying Z-normalisation it becomes possible to
force the distributions into a standardised range. Taking the Z-
normalised cosine distance as an example, the result of this
process, driving the means and standard deviations together and
reducing difference between the variance-scaled means is
illustrated in Table 3b.
4.2. Between sample comparisons
Results from conducting the equations were illustrated using
distribution histograms, separating TP faces from TN faces, Fig. 5a
c. These normal histogram distribution curves of TP faces and TNfaces were superimposed to determine if it was possible to
distinguish between faces in the two groups. The amount of
overlap shows the possibility of achieving either a FP or FN face
match, also known as the rate of misclassification. The smaller the
area, the smaller the chances of obtaining a FP or FN face match.
In the graphs, TP face matches are represented by the dotted
lines and the solid lines represent TN face matches. In order to
ensure equal numbers of faces in the two samples, the 39 faces in
Sample 2 that were not in Sample 1 were not included in this
Table 3b
Mean, standard deviation, and standard deviation scaled difference between the Sample 1 and Sample 2 means of Z-normalised: cosine distance for Sample 1 and Sample 2.
Comparison method Sample Sample mean m Sample standard deviation (SD) mSample
SDSample
mSample 1SDSample 1
mSample 2SDSample 2
N, number of samples
Z-normalised Cos(u) 1 0.0113 0.2460 0.04594 0.01474 3160
Z-normalised Cos(u) 2 0.0077 0.2468 0.0312 7021
Table 3a
Mean, standard deviation, and standard deviation scaled difference between the Sample 1 and Sample 2 means of unnormalised: mean absolute distance (MAD), Euclidean
distance and cosine distance for Sample 1 and Sample 2.
Comparison
method
Sample
Sample
mean
m
Sample
standard
deviation
(SD)
mSample
SDSample
mSample 1
SDSample 1
m Sample 2
SDSample 2
N,
number
of
samples
MAD 1 0.08354 0.02316 3.607 0.561 3160
MAD 2 0.08507 0.02041 4.168 7021
Euclidean distance 1 1.015 0.4121 2.463 1.042 3160
Euclidean distance 2 1.149 0.3278 3.505 7021
Cos(u) 1 0.9859 0.01314 75.03 74.955 3160
Cos(u) 2 0.9887 0.006592 149.985 7021
Fig. 4. (ac) Summary of the conditions imposed and results achieved in the within
sample comparisons of faces. Histograms and superimposed mean and standard
deviation of unnormalised: mean absolute distance (a), Euclidean distance (b) and
cosine distance (c)within sample comparisons for Sample 1 (dotted lower line) and
Sample 2 (solid upper line).
K.F. Kleinberg, J.P. Siebert/Forensic Science International 219 (2012) 248258252
7/27/2019 2012_A study of quantitative comparisons of photographs and video images.pdf
6/11
analysis and every face in Sample 1 was compared to every face in
Sample
2.
The
mean
absolute
difference,
the
Euclidean
distance
and
the
Cosine udistance equations (all distance measures Z-normalised)
were
applied
in
the
between
sample
comparisons.
Superimposednormal
histogram
distribution
curves
of
TP
and
TN
face
matches
were used to illustrate the discrimination between the two groups.
In general, a slightly narrower distribution was seen for the TP
faces. This was most likely because the distribution contained only
TP matches and therefore the data should be centred on a smaller
range of values. The amount of overlap between the TP and TN face
matches correlated to the possibility of achieving either a FP or FN
face match.
Superimposing the normal curves to demonstrate the separa-
tion between TP and TN face matches, the Cosine u distance (Z-
normalised) equation produced the smallest amount of overlap
and of the three equations conducted was determined to be best
equation to test the discrimination between faces of two samples.
Examination of the superimposed curves showed approximately a
30% chance of the best match between compared faces corre-
sponding to a correct identification. The TP distribution is very
small and difficult to see on the graphs. However, it is still possible
to see the 0.7 threshold emerging for the Cosine udistance with
careful observation of Fig. 5c.
Table 4 illustrates that following Z-normalisation, the cosine
distance provides the greatest separation of TP comparisons from
the TN comparisons, based on the difference between the TP and
TN standard deviation scaled means, respectively. The conclusion
made from this investigationwas that the cosinedistance equation
was the best predictor of face discrimination tested thus far andwas the sole equation used to test the error in landmark placement
and to determine the sample of images from the database that
could be narrowed down for further verification by an operator.
In pattern matching based on the cosine distance between two
unit vectors, the returned measure can be interpreted as a match
probability. Whilst a cosine distance of 1 indicates a 100%
probability of the compared vectors being the same, and 0
indicates zero probability, a distance of 0.5 indicates the 50%
chance level of correlation between compared vectors. Themean
of the TP distribution barely reaches this 50% level, although this of
course indicates thatapproximatelyhalf of theTP comparisonswill
at least exceed a chance match value. A standard deviation of
2.37
about
the
TP
distribution
mean
of
0.48
indicates
that
over
17.5% of the TP matches will exceed a 70% chance of producing a bestclosest match for the database tested.
4.3.
Error
in
landmark
placement
A small inter-operator study was carried out, to assess the
influence of landmark placement conducted bymultiple operators.
It
has
been
reported
that
landmark
placement,
tested
on
3D
images
in
a
clinical
setting,
reveals
that
average
operator
error
can
vary
widely [30]. Therefore the effect of landmark placement error is
important to testbecause although landmarkplacement on images
in
the
two
samples
used
in
this
study
was
conducted
by
a
single
operator,
this
would
not
likely
occur
in
practice.
Facial landmarks were placed on a total of six video images,
chosen
at
random,
six
times
each
by
five
different
operators.
Oneoperator
had
previous
experience
in
using
the
equipment
and
Table 4
Summary of the conditions imposed and results achieved in the between sample TP and TN face comparisons. Mean, standard deviation, and standard deviation scaled
difference between TP and TN comparisons for Z-normalised vectors: mean absolute distance, Euclidean distance and cosine distance TP and TN data sets.
Comparison method
(all Z-normalised)
Sample Sample mean m Sample standard
deviation (SD)
mSampleSDSample
mTPSDTP
mTNSDTN
N, number
of samples
MAD TP 0.7771 0.2234 3.479 0.815 80
MAD TN 1.1053 0.2574 4.294 6320
Euclidean distance TP 7.594 2.258 3.3632 0.9660 80
Euclidean distance TN 10.572 2.442 4.3292 6320
Cos(u) TP 0.4822 0.2035 2.370 2.3950 80
Cos(u) TN 0.0061 0.2389 0.02553 6320
Fig. 5. (ac) Summary of the conditions imposed and results achieved in the
between sample TP and TN face comparisons. Results are illustrated by the
superimposed normal histogram curves showing the amount of overlap in TP
(dotted lower line) and TN faces (solid upper line). Mean absolute distance (a),
Euclidean distance (b) and cosine distance (c).
K.F. Kleinberg, J.P. Siebert/Forensic Science International 219 (2012) 248258 253
7/27/2019 2012_A study of quantitative comparisons of photographs and video images.pdf
7/11
knowledge of the landmarks; landmark locations were studied
using the definitions provided in [18] and [24]. The remaining
operators had no experience in using the equipment and no
previous knowledge of anthropometric landmarks. The inexperi-
enced operators were given a list of landmark definitions (Table 1)
adapted from the literature [18,24] aswell as a single photocopy of
an enlarged male face (A4 sized), front facing, with previously
placed landmarks to use as a guide. The same equipment was used
by all operators and each operator conducted their landmark
placement of images in a single day. Using the Cosine udistance
equation, comparisons of re-landmarked images were analysed
first from the single experienced operator and second, from all
operators (Fig. 6a and b).
The Cosine u distance (Z-normalised) equation was used to
compare the re-landmarked images because, when applied in the
comparison of faces between samples, it was found to be the
equation in which the statistics of the TP and TN populations were
the most separated. Each face in the subset sample was compared
to every other face in the subset sample and resulting data was
illustrated as superimposed normal histogram curves of TP and TN
face matches. It was hypothesised that conducting an inter-
operator test, using high resolution research material, but
completed by inexperienced operators, would produce a greater
amount of variation than from an experienced operator and thishypothesis was tested and found to hold.
The effect that inexperienced operators had on the separation
rate of TP and TN faces was compared to that of an experienced
operator and is summarised in Fig. 6a and b and Table 5. Compared
to that of the experienced operator, the effect of landmark
placement by inexperienced operators can clearly be seen in the
separation rates of TP and TN face matches: the mean for TP
comparisons collapses from 0.8 for the experienced operator to
0.44 for the mix of experienced and in-experienced operators.
The TP vs. TN standard deviation scaled means separation is more
than double for the experienced operator compared to that of the
mix of operators. A similar observation can be made by inspecting
the
superimposed
histograms
for
the
TP
and
TN
comparisons
for
the experienced operator vs. the mix of experienced andinexperienced operators. Although a bimodal result is generated
by the experienced operator for both TP and TN, a much greater
separation
of
the
TPTN
distributions
is
observed.
However,
for
all
operators
a
typical
averaged
picture
emerges
and
the
effect
of
the
expert canbe seen as a small additional bump at the top of the TP
distribution.
The
most
important
point
witnessed
in
Fig.
6a and
b
was
to
observe
the
strong
effect
that
the
experienced
operator
had
in
creating a larger separation of TP and TN face matches. As multiple
experienced operators were not tested, it cannotbe stated that this
difference
in
separation
rates
between
the
experienced
operator
and
all
operators
was
due
to
the
experience
of
the
operators
or
instead, the effect that will naturally occur with multiple
operators.
This
could
be
tested
by
conducting
a
study
using
apool
of
experienced
operators.
A
further
study
analysing
the
distribution
achieved
from
the
re-landmarked
images
of
each
operator after applying the Cosine u distance (Z-normalised)
equation could determine if any of the inexperienced operators
also achieved the same strong separation rate as the experienced
operator.
An
inexperienced
operator
producing
a
similar
degree
of
separation to the experienced operator would signify that the largeseparation rate produced from all operators was caused by the
inclusion of multiple operators rather than their experience.
However,
from
the
literature
in
[31], it
can
be
predicted
that
the
spread
from
a
single
inexperienced
operator
would
be
larger
than
an experienced operator.
4.4.
Potential
sample
of
photographs
subject
to
manual
verification
The amount of overlap between the distributions of TP and TN
faces illustrates an approximation of the misclassification rate (see
Fig.
5ac).
However,
given
the
task
of
comparing
a
suspects
image
to
a
large
database
of
identity
photographs,
the
ability
to
decrease
the number of possible face matches could potentially save
significant
numbers
of
investigation
hours.
This
smaller
sample
ofsuspect
photographs
could
then
be
more
closely
scrutinised
by
an
expert.
For
this
analysis,
each
face
in
Sample
1
was
compared
to
Fig. 6. (a and b) Superimposed normal curve histograms illustrating TP (dotted
lower line) and TN (solid upper line) face comparisons of the Cosine u (Z-
normalised) distance equations in six re-landmarked images from Sample 1 using
one experienced operator (a) and multiple operators (b).
Table 5
Mean, standard deviation, and standard deviation scaled difference between TP and TN comparisons in landmark placement error study: mean absolute distance, Euclidean
distance and cosine distance for TP and TN data sets illustrating TP and TN face comparisons in six re-landmarked images from Sample 1 using one experienced operator and
multiple operators.
Comparison method: cosine distance (all Z-normalised) Sample Sample mean m Sample standard
deviation (SD)
mSampleSDSample
mTPSDTP
mTNSDTN
N, number
of samples
One experienced operator TP 0.8029 0.1489 5.392 5.8224 90
One experienced operator TN 0.1666 0.3871 0.4304 540
Multiple operators: experienced and inexperienced TP 0.4409 0.2655 1.6606 2.01351 2610
Multiple operators: experienced and inexperienced TN 0.08657 0.2453 0.3529 13,500
K.F. Kleinberg, J.P. Siebert/Forensic Science International 219 (2012) 248258254
7/27/2019 2012_A study of quantitative comparisons of photographs and video images.pdf
8/11
each face in Sample2 for a total of80 comparisons. The Cosine u(Z-
normalised) distance equation was used and the resulting values
were placed indescendingorder, noting the rank of theTP.Thebest
match was defined as the match value that returned a Cosine u
value thatwas highest or closest to1.0. Thiswasused to determine
within a confidence range given a best match value how many
additional faces in the database would need to be verified before
the true positive match was found.
Best match values were placed in intervals of 0.1. The mean
rank of the TP, SD, and 2SD confidence interval for each match
interval was found and results shown in Table 6. In this instancethe confidence interval says that for within a given confidence
range, how many database images should be looked at in total.
Results in Table 6 indicate that given match values of 0.7 the best
match from the database is also likely to be the TP face. This result
is consistent with the observed degree of overlap between the TP
and TN distributions shown in Fig 5.Amatch threshold of 0.7 is not
an unreasonably high value to set, given that a distance of 0.5
indicates the 50% chance level of correlation between compared
vectors. A larger sample of images should be tested to determine if
results consistent with those presented here are produced.
5. Discussion
Using high resolution photographic research material, theobject of the study was to assess if a facial anthropometric feature
vector could be utilised to distinguish between individuals of a
similar
age
group,
ancestry
and
sex.
Given
a
database
of
subjects,
knowledge
of
the
type
of
information
gathered
in
this
study
may
help in future to narrow down the number of possible suspects in
an investigation. The technique presented here entailed analysing
vector
comparisons
to
differentiate
between
images
of
two
samples.
The
feature
vector
was
utilised
in
three
types
of
equations
testing the differences between faces in the samples. Normal-
isation was applied to the ratio values as a way to equalise the
feature
vector
values
in
each
sample
and
account
(to
some
degree)
for
the
statistics
that
different
camera
parameters
would
produce.
Z-normalisation enhances any differences between means and
makes
the
interpretation
of
the
data
more
straightforwardallowing
small
differences
in
the
data
can
to
be
more
simply
seen.
We found
that
the
face
matching
technology
investigated
in
this study can assist in a database search; however, it does not
provide an unequivocal means of confirming facial identifications
suitable
to
use
in
court.
Therefore,
the
focus
of
future
work
should
concentrate
on
the
potential
for
this
approach
to
extract
facial
information improving the search of databases and leaving
humans as the ultimate authenticator.
The
first
step
to
answering
the
objectives
laid
forth
in
the
introduction
was
to
evaluate
each
sample
of
images
to
determine
if
once the equations were applied, any differences could be seen
between the two samples. Testing faces against those found in the
same
sample
is
important
because
it
allows
the
equations
to
ascertain
if
there
are
any
differences
between
faces
which
fall
under the same conditions. This means that other than the
possibility of slight changes in facial expression the facial ratios
will be the only changeable variable between faces as all other
variables remain constant; same media, same operator placing
landmarks and same facial pose.
Once samples were looked at individually, a between sample
comparison was conducted to determine how distinguishable the
faces were in the two samples.Once the respective equationswere
conducted, superimposed normal histogram distribution curves of
true positive faces and true negative faces were used to illustrate
the discrimination of the two groups. In general, a narrowerdistribution was seen for the true positive faces. This was because
as the distribution contained only true positive matches, the data
should be centred on a smaller range of values. The amount of
overlap correlated to the possibility of achieving either a false
positive or false negative face match.
Although other researchused the squared Euclidean distance to
measure the likeness between pairs of faces [32], we found by
superimposing the normal curves to demonstrate the separation
between true positive and true negative faces, the Cosine u
distance (Z-normalised) equation produced the least amount of
overlap between true positive faces and true negative faces when
statistics of the two sampleswere known. Thematch values of true
negative
faces
in
the
superimposed
histogram
normal
curves
begin
to trail off at 0.7, indicating that although it is still possible toachieve a true negative identification above this value, it is likely
that a returned match score of below 0.7 will result in a true
negative
face
after
closer
examination.
Although
this
result
occurred
in
this
study,
it
may
not
be
replicated
with
a
larger
test
database. The investigations undertaken in this study to determine
if it is possible to discriminate between individuals of two samples
using
a
multi
dimensional
facial
feature
vector
found
that
the
Cosine udistance was the best discriminator this but could further
be improved upon by administering a more comprehensive
statistical analysis.
A small
inter-operator
study
was
carried
out,
to
assess
the
influence
of
landmark
placement
conducted
by
multiple
operators.
This is important to test because although landmark placement on
all
images
used
in
the
comparative
process
of
this
study
wasconducted
by
a
single
operator,
this
would
not
likely
be
the
case
in
the
real
world.
Landmark
placement
has
been
tested
by
other
researchers on 3D images in a clinical setting and it was suggested
that average operator error varies widely [30]. Using a digital
sliding
calliper
to
measure
photographs,
researchers
carried
out
an
intra
observer
study
to
test
reliability
of
measurements
and
results
showed a low reliability in measurements of ls-sto and n-sn [33].
The currentanalysiswas conductedwith one experienced operator
but
the
remaining
operators
were
inexperienced
It
would
be
beneficial
to
analyse
this
data
further
in
an
inter-operator
study
using experienced operators located in different graphical regions
because this scenario would be more likely as a police procedure.
Experience
was
shown
to
be
a
benefiting
factor
when
the
inter-
operator
variation
in
taking
standard
skeletal
measurements
was
Table 6
Interval showing two standard deviations of how many images in the database should be manually investigated.
Interval of best
match values
n (number of best
matches in the interval)
Mean of TP rank SD of TP rank Min of TP rank Max of TP rank Number of images to
manually investigate
in database (mean+2SD)
0.900.99 0 N/A N/A N/A N/A N/A
0.800.89 2 1 0 1 1 1
0.700.79 7 1 0 1 1 1
0.600.69 23 3.6 7.6 1 37 19
0.500.59
30
8.9
13.5
1
65
360.400.49 16 16.7 28.5 1 110 74
0.300.39 2 2 1.4 1 3 5
K.F. Kleinberg, J.P. Siebert/Forensic Science International 219 (2012) 248258 255
7/27/2019 2012_A study of quantitative comparisons of photographs and video images.pdf
9/11
7/27/2019 2012_A study of quantitative comparisons of photographs and video images.pdf
10/11
http://nickfieller.staff.shef.ac.uk/seminars/faces04-10-06.pdfhttp://www.interpol.int/Public/ICPO/FactSheets/GI04.pdfhttp://www.interpol.int/Public/ICPO/FactSheets/GI04.pdf7/27/2019 2012_A study of quantitative comparisons of photographs and video images.pdf
11/11
[29] I.L.Dryden,K.V.Mardia, StatisticalShapeAnalysis, Wiley-Blackwell,WestSussex,1998.
[30] A. Ayoub, et al., Validation of a vision-based, three-dimensional facial imagingsystem, Cleft Palate: Cran. J. 40 (2003) 523529.
[31] B.J Adams, J.E. Byrd, Interobserver variation of selected postcranial skeletalmeasurements, J. Forensic Sci. 47 (2002) 11931202.
[32] J.P. Davis, T. Valentine, R.E. Davis, Computer assisted photo-anthropometricanalyses of full-face and profile facial images, Forensic Sci. Int. 200 (2010)165176.
[33] M.Roelofse,et al.,Photo identification:facialmetrical andmorphologicalfeaturesin South African males, Forensic Sci. Int. 177 (2008) 168175.
[34]
B. Murphy, R.D. Morrison, Introduction to Environmental Forensics, AcademicPress, 2007.[35] D. Sheskin,Handbookof ParametricNonparametric Statistical Procedures, Chap-
man Hall/CRC, 2007.
[36] T. Fawcett, An introduction to ROC analysis, Pattern Recogn. Lett. 27 (2006)861874.
[37] D. DeCarlo, et al., An anthropometric face model using variational techniques, in:Proceedings of the25th Annual Conference onComputerGraphicsand InteractiveTechniques, 1998, pp. 6774.
[38] C. Zhang, S.F. Cohen, 3-D face structure extraction and recognition from imagesusing 3-D morphing and distance mapping, IEEE Trans. Image Proc. 11 (2002)12491259.
[39] M.I.M. Goos, et al., 2D/3D image (facial) comparison using camera matching,Forensic Sci. Int. 163 (2006) 1017.
[40] J. Lee, et al., Efficient height measurement method of surveillance camera image,
Forensic Sci. Int. 177 (2008) 1723.[41] H.C. Longuet-Higgins,A computer algorithmfor reconstructing a scene from twoprojections, Nature 293 (1981) 133135.
K.F. Kleinberg, J.P. Siebert/Forensic Science International 219 (2012) 248258258
Recommended