2012_A study of quantitative comparisons of photographs and video images.pdf

Embed Size (px)

Citation preview

  • 7/27/2019 2012_A study of quantitative comparisons of photographs and video images.pdf

    1/11

    A study of quantitative comparisons of photographs and video images based on

    landmark

    derived

    feature

    vectors

    Krista F. Kleinberg a,*, J. Paul Siebert b

    a Forensic Medicine and Science, Joseph Black Building, University of Glasgow, Glasgow G12 8QQ, UKbDepartment of Computing Science, Sir Alwyn Williams Building, University of Glasgow, Glasgow G12 8QQ, UK

    1. Introduction

    As a result of the wide deployment of surveillance cameras,

    there is both opportunity and motivation, given the amount of

    visual material being collected digitally, to identify suspects from

    CCTV. Although rapidly improving in terms of spatial resolution,

    the majority of video surveillance equipment does not produce

    images of sufficient quality needed to provide identificationswhen

    othermore conclusive evidence, such asDNA or fingerprints, is not

    available. It is in these kinds of cases that anthropometrymay havethe potential to provide a useful identification technique.

    Surveillance video can be important supportive evidence because

    it may show a crime being committed, although, it is not always

    easy to recognise,and therefore convict,a criminal caughtonCCTV.

    Video surveillance can bemore reliable than eyewitness testimony

    because the story told is always consistent and also corroborates

    what the eyewitness reported [1]. However, a more comprehen-

    sive analysis is necessary because even when facial video images

    are of sufficient quality, it is possible that two people may look

    similar to each other in this medium.

    The roles of anthropometry and forensic science have inter-

    twined beginning with Bertillon in the 1800s [2,3] and anthro-

    pometry was one of the identification methods used in [46].

    Although more sophisticated vision based methods of image

    comparison are being developed [7,8], it remains to be seen whatcanbe achievedbyutilizing ratiosbetweenkey facial landmarks on

    single 2D images. Even if reliable automatic methods for face

    image comparison can be developed, the need for manual

    intervention in terms of landmark placement are likely to be

    required where low-quality images have to be analysed, such as

    generatedbymany currently installedCCTV systems. In contrast to

    comparing two images, anthropometric proportions from the face

    and body of live suspects were compared against 2D images and

    was one of the identification methods resulting in convictions in

    two out of three cases in Halbersteins 2001 paper [9]. One of the

    fundamental problems with comparing 2D images is facial pose.

    Forensic Science International 219 (2012) 248258

    A R T I C L E I N F O

    Article history:

    Received 6 July 2011Received in revised form 22 November 2011

    Accepted 4 January 2012

    Available online 24 January 2012

    Keywords:

    Facial identification

    Anthropometry

    Image comparison

    Face database

    A B S T R A C T

    An abundunce of surveillance cameras highlights the necessity of identifying individuals recorded.

    Images captured are often unintelligible and are unable to provide irrefutable identifications by sight,

    and therefore a more systematic method for identification is required to address this problem. An

    existing database of video and photograhic imageswas examined, which hadpreviously been used in a

    psychological research project; material consisted of 80 video (Sample 1) and 119 photograhic (Sample

    2) images, though taken with different cameras. A set of 38 anthropometric landmarks were placed by

    hand capturing 59 ratios of inter-landmark distances to conduct within sample and between sample

    comparisons using normalised correlation calculations; mean absolute value between ratios, Euclidean

    distance and Cosine u distance between ratios. The statistics of the two samples were examined to

    determine which calculation best ascertained if there were any detectable correlation differences

    between faces that fall under the same conditions. A comparison of each face in Sample 1 was then

    compared against thedatabase of faces in Sample 2. Wepresent pilot results showing that theCosineu

    distance equation usingZ-normalisedvaluesachieved the largest separation between True Positive and

    True Negative faces.Having applied theCosineu distance equationwewere then able to determine that

    if a match value returned is greater than 0.7, it is likely that the best match will be a True Positive

    allowing a decrease of database images to be verified by a human. However, a much larger sample of

    images requires to be tested to verify these outcomes. 2012 Elsevier Ireland Ltd. All rights reserved.

    * Corresponding author. Present address: PEACH Unit, University of Glasgow,

    Queen Mothers Hospital, 8th Floor Tower Block, Dalnair Street, Glasgow G3 8SJ,

    UK. Tel.: +44 141 201 1988; fax: +44 141 201 6943.

    E-mail addresses: [email protected] , [email protected]

    (K.F. Kleinberg), [email protected] (J.P. Siebert).

    Contents

    lists

    available

    at

    SciVerse

    ScienceDirect

    Forensic Science International

    journal homepage : www.elsev ier .co m/locate / fo rsc i in t

    0379-0738/$ see front matter 2012 Elsevier Ireland Ltd. All rights reserved.

    doi:10.1016/j.forsciint.2012.01.014

    http://dx.doi.org/10.1016/j.forsciint.2012.01.014http://dx.doi.org/10.1016/j.forsciint.2012.01.014http://dx.doi.org/10.1016/j.forsciint.2012.01.014http://dx.doi.org/10.1016/j.forsciint.2012.01.014http://dx.doi.org/10.1016/j.forsciint.2012.01.014http://dx.doi.org/10.1016/j.forsciint.2012.01.014http://dx.doi.org/10.1016/j.forsciint.2012.01.014http://dx.doi.org/10.1016/j.forsciint.2012.01.014http://dx.doi.org/10.1016/j.forsciint.2012.01.014http://dx.doi.org/10.1016/j.forsciint.2012.01.014http://dx.doi.org/10.1016/j.forsciint.2012.01.014mailto:[email protected]:[email protected]:[email protected]:[email protected]://www.sciencedirect.com/science/journal/03790738http://www.sciencedirect.com/science/journal/03790738http://www.sciencedirect.com/science/journal/03790738http://dx.doi.org/10.1016/j.forsciint.2012.01.014http://dx.doi.org/10.1016/j.forsciint.2012.01.014http://www.sciencedirect.com/science/journal/03790738mailto:[email protected]:[email protected]:[email protected]://dx.doi.org/10.1016/j.forsciint.2012.01.014
  • 7/27/2019 2012_A study of quantitative comparisons of photographs and video images.pdf

    2/11

    Attempts to rectify this in facial recognition pose invariant

    systems described in [10,11] reported greater recognition rates

    than when used without the pose transformations. Using soft

    biometric traits was shown to be beneficial in improving

    recognition accuracy when combined with a commercial based

    face matching program [12].

    Three questions should be asked of a comparison method; is it

    possible to carry out the comparison objectively, is it possible to

    avoid manual input, and is it applicable to checking large

    databases? An identification made based on 2D images will be

    more decisive if there is a way to quantify the comparison, rather

    than if the identification is based solely on a subjective analysis, as

    the result is a comparison that is objectivewithminimalbias.Once

    quantification of a comparison is achieved, the process should be

    automated. An automated process would decrease the error from

    involving many different operators in the comparison process and

    would allow largedatabases to be checkedquickly.As a face search

    could potentially be extended to full populations by reviewing

    internationalised databases, i.e. Interpol [13], the need for

    automation is high. According to The Ministry of Justice Statistics

    (UK) bulletin, the reoffending rate for criminals in England and

    Wales in 2006was 146.1 offences per 100 offenders [14].Although

    this is a decrease of 22.9% from 2000, the numbers indicate there is

    justification for a database of convicted criminal images that couldbe quickly automated and checked.

    We document an investigation into the comparison of

    anthropometric ratios of facial landmark pairs manually located

    on 2D images. The constraints in this study are that we consider

    best-case scenario situations as a bench mark given that scenarios

    in thefield,bydefinition, cannotbe asbenign. The subjectmatter is

    based on the analysis of comparing high quality full-face frontal

    video and photographic images of individuals of a similar ethnic

    background with neutral expressions.

    This investigation expanded previous research carried out by

    Kleinberg, Vanezis and Burton [15] and was conducted to test the

    hypothesis: Using a comparison of anthropometric facial ratios, it

    is

    possible

    to

    discriminate

    between

    individuals

    of

    two

    samples.

    The objective of this study was to derive measurements betweenspecific landmarks on the face in both print and video media and

    incorporate them into a feature vector to use in statistical analysis

    to

    determine

    if

    identifications

    of

    an

    individual

    can

    be

    made

    based

    on

    these

    measurements.

    Knowledge

    of

    the

    type

    of

    information

    gathered in this studymay help in future to rankpotential suspects

    for human identification verification. However, in order to

    establish

    that

    two

    faces

    were

    the

    same

    and

    use

    this

    identification

    method

    to

    identify

    positively

    rather

    than

    eliminate

    suspects,

    it

    would be necessary to show that the probability of a false match in

    the rest of the population at random was of an acceptably low

    probability

    [16].

    To

    investigate

    the

    hypothesis

    in

    this

    study,

    we

    seek

    to

    address

    the following questions:

    Of the proposed images, can similar faces be separated from

    dissimilar faces within a single sample using vector compar-

    isons?

    How distinguishable are individual faces in the samples? Is it

    possible to distinguish true positive faces from true negative

    faces using vector comparisons where the statistics from two

    samples are known?

    Using

    a

    small sample

    of

    re-landmarked images,

    how signifi-

    cant is the error contribution in re-landmarked images and

    what is the operator induced measurement spread under ideal

    conditions?

    Given

    a

    specific

    example

    and

    set

    of

    comparisons

    with

    the

    database, what constitutes a manageable subsample, worthy of

    further

    manual

    verification?

    2. Materials

    A total of 199 images of Caucasian male police volunteers were available which

    hadbeenused previously in research conducted byBruce et al. [17]. The199 images

    comprised 80 different video still faces (Sample 1) and 119 different photographic

    faces (Sample 2). According to Bruce et al. [17], The image quality on the videos

    was high-equivalent to what would be produced by a good amateur photographer

    trying to reveal a good likeness of someone onahome videotape. Thephotographic

    images in Sample 2 included the same 80 faces depicted in the video cohort, and an

    additional 39 new faces not included as video stills. The photographs were of

    policemen, both retired and presently working and except for photographs, which

    have already been published elsewhere are, for this reason, unable to be exhibited

    in this paper. However, an example of each type of image is provided in Fig. 1. Both

    sets of images, taken on the same day, were displayed from the frontal viewpoint,

    showing features from the neck up, in what appeared to be the format of police

    identification photographs. In this study the identity of the subjects in the video

    images was known and could be cross referenced with the corresponding

    photographic images. This means that identifications made on the basis of facial

    anthropometry could be designated as true or false. One positive feature of these

    video images was that because they were recorded on the same day as the

    photographs, the study images didnothave anyof thepossible facial changeswhich

    can occurdue to time factors such asweight loss/gain, increase in age orpresence of

    facial hair.

    3. Methodology

    Given a set of landmarks there is a need to be able to quantify the landmarks

    numerically such that they can be used to compare faces. Ideally, the measure

    should

    be

    invariant

    to

    in-plane

    translations

    and

    rotations

    and

    be

    tolerant

    to

    adegree of out-of-plane rotation in order to accommodate the variability inherent

    when posing a subject for full frontal image capture. Thirty-eight landmarks (Table

    1), ten unilateral and 14 bilateral, were chosen for inclusion in the anthropometric

    study and are shown in Fig. 2. Careful consideration was given to the selection of

    landmarks that were used is this study. Anthropometric research by Farkas [18],

    Purkait [19], Fieller [20], Evison [21], and facial recognition research by Craw et al.

    [22] and Okada et al. [23] were consulted when choosing the landmarks that were

    included in the present study. When choosing a landmark it was important that it

    was one that could be placed consistently. It had to be a point where an operator

    performing the comparison would be able to locate it in the same place within an

    acceptable error. According to Fieller [20], the criteria used to determine a

    successful/reliable landmark are: observer knowledge, consistency of landmark

    Fig.

    1.

    High

    resolution

    video

    image

    (a)

    and

    selection

    of

    ten

    database

    photographs

    (b).

    K.F. Kleinberg, J.P. Siebert/Forensic Science International 219 (2012) 248258 249

  • 7/27/2019 2012_A study of quantitative comparisons of photographs and video images.pdf

    3/11

    placement, discriminatory power, and landmark visible in majority of cases.

    Excluded landmarkswere eliminated on the basis of their inability to be located on

    photographs.

    Although thenumber ofpossible linearmeasurements increases combinatorially

    with the number of landmarks, not all are reliable or pertinent to the research

    undertaken

    for

    this

    study.

    A

    total

    of

    73

    linear

    measurements

    (21

    unilateral,

    26bilateral) were chosen for this study. The majority of these were chosen by

    consulting the literature [18,22,24]. Two of these measurements used in a previous

    study [25], ex-n and ex-sto, were chosen because they utilise landmarks that were

    considered tobe less affected by facial expression than others and also because they

    would be visible even if the subject was wearing a hat. Three bilateral

    measurements were unique to the present study.

    From these landmarks and linear measurements, a total of 59 ratios (also

    unilateral and bilateral) were selected for comparison of images (Table 2). The

    linear measurements that make up the ratios are shown in Fig. 3. A ratio wasderived by dividing the smaller linear measurement (numerator) by the larger

    linear measurement (denominator). The ratios were chosen to achieve a balance of

    the horizontal and vertical regions of the face. Intuitively, it is expected that longer

    lines between landmarks located on different sections of the face would make a

    more reliable proportion than two short lines in the same section of the face. This is

    because small variations in landmarkplacement makingup short lineswould result

    in large changes in proportions, which may not accurately portray true variations

    between individuals. The ratios utilised in this research were deliberately chosen to

    include linear measurements between landmarks in different sections of the face

    and others that covered a small section of the face, such as the length vs. the width

    of the eye. As it is more common to use absolute measurements in anthropometric

    comparisons [18,19,26,27] rather than ratios, there was less guidance with respect

    to which ratios would be more reliable or more relevant than others in the present

    study. Halberstein used a combination of up to twelve face and body ratios when

    comparing a photograph to a live subject, and three of these ratios were used [9].

    These ratioswere ear length/facial height (sa-sba/n-gn), nasal height/ear length (n-

    sn/sa-sba)

    and

    nasal

    width/nasal

    height

    (al-al/n-sn).

    The

    remainder

    of

    the

    ratiosthat were used by Halberstein were not incorporated into this research because

    they either included facial landmarks thatwere not chosen for the present study or

    Table 1

    Landmarks and their definitions used in this study [18,24].

    1. Glabella (g): the most prominent midline point between the eyebrows.

    2. Nasion (n): the point in the midline of both the nasal root and the

    nasofrontal suture. This point is always above the line that connects the

    two inner canthi. A canthus is the angle at either end of the fissure

    between the eyelids.

    3. Exocanthion (ex): the point at the outer commissure of the eye fissure. A

    commissure is the site of union of corresponding parts and a fissure is any

    cleft or groove, in this case of the eye [bilateral].

    4. Endocanthion

    (en):

    the

    point

    at

    the

    inner

    commissure

    of

    the

    eye

    fissure[bilateral].

    5. Palpebrale superius (ps): highest point in the midportion of the free

    margin of each upper eyelid. The free margin portion of the eyelid is the

    unattached edge [bilateral].

    6. Palpebrale inferius (pi): the lowest point in the midportion of the free

    margin of each lower eyelid [bilateral].

    7. Orbitale (or): the lowest point on the margin of the orbit. The orbit is the

    bony cavity that contains the eyeball [bilateral].

    8. Superaurle (sa): the highest point of the free margin of the auricle. The

    auricle is the portion of the external ear that is not contained within the

    head [bilateral].

    9. Subaurale (sba): the lowest point on the free margin of the ear lobe

    [bilateral].

    10. Postaurale (pa): the most posterior point on the free margin of the ear

    helix. The helix refers to the coiled structure of the ear. [bilateral].

    11. Otobasion inferius (obi): the lowest point of attachment of the external

    ear to the head [bilateral].

    12. Alare (al): the most lateral point on each nostril contour [bilateral].

    13. Subnasale (sn): the midpoint of the angle at the columella (fleshy, lower

    margin) base where the lower border of the nasal septum and the surface

    of the upper lip meet.

    14. Pronasale (prn): the most protruded point of the nasal tip.

    15. Subalare (sbal): the point on the lower margin of the base of the nasal

    ala where the ala disappears into the upper lip skin [bilateral].

    16. Stomion (sto): the imaginary point at the crossing of the vertical facial

    midline and the horizontal labial (lip) fissure between gently closed lips,

    with teeth shut in the natural position.

    17. Crista philtri landmark (cph): the point on the elevated margin of the

    philtrum just above the vermilion line. The philtrum is the vertical

    groove in the median portion of the upper lip and vermilion refers to the

    exposed

    red portion of the upper or lower lip [bilateral].

    18. Cheilion (ch): the point located at each labial commissure [bilateral].

    19. Labiale inferius (li): the midpoint of the vermilion border of the lower lip.

    20. Labiale superius (ls): the midpoint of the vermilion border of the upperlip.

    21. Gonion (go): the most lateral point at the angle of the mandible. The

    mandible is the bone of the lower jaw [bilateral].

    22. Sublabiale (sl): determines the lower border of the lower lip or the upper

    border of the chin.

    23. Pogonion (pg): the most anterior midpoint of the chin.

    24. Gnathion (gn): the lowest point in the midline on the lower border of the

    chin.

    Fig. 2. Facial landmarks and their location.

    Table 2

    Ratios used in this study.

    go-go/n-gn sn-sto/sto-sl sn-gn/n-sto li-sl/sn-ls

    n-prn/g-pg sbal-sn/sn-prn [bilateral] gn-go/n-gn [bilateral] sl-gn/sto-gn

    al-al/ex-ex ex-go/go-go [bilateral] al-al/n-sn n-sn/n-sto

    sa-sba/n-gn [bilateral] n-gn/n-sto n-sn/sa-sba [bilateral] en-al/ex-ch [bilateral]

    ex-ex/go-go obi-ch/g-sa [bilateral] ex-n/ex-sto [bilateral] sbal-ls/n-al [bilateral]

    ex-n/n-sto [bilateral] pi-al/sa-ex [bilateral] ex-sto/n-sto [bilateral] ex-obi/ex-ch [bilateral]

    en-ex/ps-pi [bilateral] ex-al/ch-gn [bilateral] en-en/ex-ex ch-ls/n-prn [bilateral]

    pi-or/en-ex [bilateral] al-ls/ch-gn [bilateral] sa-sba/pa-obi [bilateral] ch-li/ex-ch [bilateral]

    cph-cph/sn-ls ex-sto/rt ex-lt ch [bilateral] ls-sto/ch-ch sn-gn/ex-gn [bilateral]

    sto-li/ch-ch

    K.F. Kleinberg, J.P. Siebert/Forensic Science International 219 (2012) 248258250

  • 7/27/2019 2012_A study of quantitative comparisons of photographs and video images.pdf

    4/11

    because they were body ratios, such as shoulder width, leg or shoe lengths. Two

    ratios (n-sn/n-sto, n-gn/n-sto) were used by Catterick for his research [28]. The

    remainder of the ratios chosenwereunique to this study. Inorder to continuewith a

    best case scenario situation, one volunteer, with previous experience in placing

    landmarks on 2D images, placed the 38 landmarks on all 199 images using the

    measurement programme produced in-house, Facial Identification Centre Version0.32Forensic Medicine and Science Glasgow University.

    The group of 59 ratios is treated as a 59 dimensional vector and this has been

    evaluated as a means of comparing all faces. In this study, the feature vector is the

    series of 59 ratios derived from chosen linear measurements between facial

    landmarks. The alternative for comparing ratios between landmarks is to compare

    the raw distances between the landmarks. Comparing raw distances can be

    accomplished using the Procrustes [29] alignment techniques, and although

    outside of the scope of the current project, may be used in future studies. The

    advantage of using ratios is that they are both scale and rotation invariant and also

    to a slight degree auto-corrective (in terms of errors added during landmarking). In

    addition, ratiosexhibit adegree of invariance to the effects of out-of-plane rotations

    for small angles (when the effects of such rotations are sufficiently small to

    approximate a 2D affine transformation on the imaging plane).

    Three equations were used to test the comparison of a feature vector from one

    sample against another; mean absolute difference, Euclidean distance and Cosine u

    distance. The first two equations compare the length of the difference vector and

    the

    third

    equation

    compares

    the

    angle

    between

    the

    vectors.

    The

    three

    equations

    areas follows:

    3.1. The mean absolute difference between ratio vectors

    Eq. (1) determines the distance that separates one face from another by taking

    the absolute value of one face ratio vector subtracted from the same ratio of a

    second face. This is carried out for each ratio element in the feature vector. The

    summation of this feature vector is then divided by the total number of elements

    (59 ratios in this case). A difference of 0 between two faces establishes that those

    two faces have identical facial ratio vectors. The smaller the difference in facial

    ratios is indicative of a smaller difference between faces. A disadvantage of using

    this equation is that the maximum difference between faces is not bounded:

    Meanabsdiff

    XnNn1

    F1n F2nj j

    N (1)

    3.2. The Euclidean distance between ratios

    The Euclidean distance (Eq. (2)) also measures the distance between twomulti-

    dimensional vectors. This is the square root of the sum of the squares of the

    elements, in this case ratios. A difference of 0 between two faces establishes that

    those two faces have identical facial ratio vectors. The smaller the difference in

    facial ratios is indicative of a smaller difference between faces. A disadvantage of

    using this equation is that the maximum difference between faces is not bounded:

    Euclideandistance ffiffiffiffiffiffiffiffiffiffiffiffi ffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffiffiffiffiffiffiffiffiffiffiffiffiffi ffiffiffiXnNn1

    F1n F2n 2vuut

    (2)

    3.3. The Cosine u distance

    The Cosine u distance equation (Eq. (3)) is a similarity measurement and is used

    to measure the angle between two vectors. A cosine difference of 1.0 between two

    faces establishes that those two faces have identical vectors of facial ratios. An

    advantage of using this equation is that the range of values is bound from 1.0 to

    +1.0 and useful comparisons are ranged from zero to one. A difference of zero is

    indicative of a face that shows no correlation whereas a result of 0.5 is achieved by

    random chance. Anynegative result shows the face comparison produces an inverse

    correlation:

    Cosu

    XnNn1

    F1n F2n

    F1kk F2k

    k

    (3)

    A comparison between two faces was deemed a true positive match (TP) if the

    match was a correct match between the video image and photograph of the same

    subject. A true negative match (TN)wasone that excludes the faces andwhichwasa

    correct exclusion because it involved a video image and a photograph of two

    different subjects.A falsepositivematch (FP)wasonewhich wasan incorrect match

    between a video image and a photograph of two different subjects and a false

    negative match (FN) was one which is excluded but which was an incorrect

    exclusion because it involved the video image and photograph of the same subject.

    To answer the questions laid forth in Section 1, the three equations were applied

    in the following four scenarios to test the comparison of Sample 1 faces to Sample 2

    faces; within sample comparisons, between sample comparisons, error in landmark

    placement, and the potential sample of photographs subject tomanual verification.

    4. Results

    4.1. Within sample comparisons

    To test if similar faces were separable from dissimilar faces

    within

    a

    single

    sample

    the

    equations

    were

    applied

    so

    that

    every face

    in

    a

    single

    sample

    was

    compared

    to

    itselfand

    everyother

    face

    within

    this sample. Each sample contained only one image of each face and

    for this reason allthatcould bedetermined was the true negativity of

    this

    collection

    of

    different faces.

    Therefore,

    no

    estimate

    of

    the

    degree

    to

    which

    two

    same

    faces

    (true

    positives)

    would

    match

    when

    captured at different times could be made from this data. The same

    tests were carried out on Sample 1 (video) and then separately on

    Sample

    2

    (photographs).

    Testing all combinations

    of

    pairs

    of

    faces

    within

    each

    sample

    was

    important

    because

    it

    compared

    faces

    acquired under the same capture conditions, allowing the tests to

    ascertain

    if

    it

    were

    possible

    to

    discriminate

    between

    different

    (truenegative)

    faces.

    Therefore,

    in

    this

    experiment

    the

    primary

    source

    of

    variability

    between

    faces

    should

    be

    attributable

    to

    differences

    in

    the

    measured facial landmark ratios, i.e. generated by genuine face

    shape differences, whilst the statistics of the remaining sources of

    variability

    remain

    constant;

    same

    media, same

    operator

    placing

    landmarks

    and

    same

    facial

    pose.

    The similarity or dissimilarity between the faces in a single

    sample is cross-checked by comparing the distributions of the

    similarity

    statistics

    of

    Sample

    1

    to

    those

    of

    Sample

    2.

    A

    Sample

    1

    to

    Sample

    2

    cross-comparison

    of

    the

    statistics

    produced

    by

    matching

    faces within their own samples can be used to in future to predict

    how discriminable faces are when making comparisons between

    these

    two

    samples.

    If

    both

    samples

    exhibit

    similar

    statistics,

    this

    would

    be

    indicative

    that

    it

    is

    possible

    to

    distinguish

    faces

    between

    Fig. 3. Linear measurements that created the ratios utilised in this study.

    K.F. Kleinberg, J.P. Siebert/Forensic Science International 219 (2012) 248258 251

  • 7/27/2019 2012_A study of quantitative comparisons of photographs and video images.pdf

    5/11

    samples because any difference between faces would be a result of

    the true difference in faces rather than a result of the different

    media recording each of the two samples of images. Results are

    summarised in Fig. 4ac and are illustrated by superimposing the

    normal distribution curves and similarity density histograms of

    the two samples. In Table 3a the standard deviation scaled

    differencebetween the Sample1 and Sample2means indicates the

    difference in statistics between media, the cosine distance by far

    exhibiting the greatest difference.

    In order to equalise the absolute ranges of the feature vector

    values in each sample and address the observed difference in the

    statistics of the comparisons of Sample 1 and Sample 2, the

    equations were completed using the application of Z-normalised

    ratio values, illustrated in Fig. 4ac. Each element, F(n) in the

    measurement vector F is expressed by a population of measure-

    ment ratioswithin a sample, Z-normalisation potentially enhances

    range of variation (and accentuates anydifferences) about the ratio

    mean of this sub-population of ratios, allowing small differences in

    the data to become more apparent. This was accomplished by

    dividing the mean subtracted element by the sample standard

    deviation for the particular ratio (Eq. (4)). Z-normalisation can be

    applied to all three of the equations:

    Z-normalizedelement FZn Fn mFn

    sFn(4)

    Therefore, by applying Z-normalisation it becomes possible to

    force the distributions into a standardised range. Taking the Z-

    normalised cosine distance as an example, the result of this

    process, driving the means and standard deviations together and

    reducing difference between the variance-scaled means is

    illustrated in Table 3b.

    4.2. Between sample comparisons

    Results from conducting the equations were illustrated using

    distribution histograms, separating TP faces from TN faces, Fig. 5a

    c. These normal histogram distribution curves of TP faces and TNfaces were superimposed to determine if it was possible to

    distinguish between faces in the two groups. The amount of

    overlap shows the possibility of achieving either a FP or FN face

    match, also known as the rate of misclassification. The smaller the

    area, the smaller the chances of obtaining a FP or FN face match.

    In the graphs, TP face matches are represented by the dotted

    lines and the solid lines represent TN face matches. In order to

    ensure equal numbers of faces in the two samples, the 39 faces in

    Sample 2 that were not in Sample 1 were not included in this

    Table 3b

    Mean, standard deviation, and standard deviation scaled difference between the Sample 1 and Sample 2 means of Z-normalised: cosine distance for Sample 1 and Sample 2.

    Comparison method Sample Sample mean m Sample standard deviation (SD) mSample

    SDSample

    mSample 1SDSample 1

    mSample 2SDSample 2

    N, number of samples

    Z-normalised Cos(u) 1 0.0113 0.2460 0.04594 0.01474 3160

    Z-normalised Cos(u) 2 0.0077 0.2468 0.0312 7021

    Table 3a

    Mean, standard deviation, and standard deviation scaled difference between the Sample 1 and Sample 2 means of unnormalised: mean absolute distance (MAD), Euclidean

    distance and cosine distance for Sample 1 and Sample 2.

    Comparison

    method

    Sample

    Sample

    mean

    m

    Sample

    standard

    deviation

    (SD)

    mSample

    SDSample

    mSample 1

    SDSample 1

    m Sample 2

    SDSample 2

    N,

    number

    of

    samples

    MAD 1 0.08354 0.02316 3.607 0.561 3160

    MAD 2 0.08507 0.02041 4.168 7021

    Euclidean distance 1 1.015 0.4121 2.463 1.042 3160

    Euclidean distance 2 1.149 0.3278 3.505 7021

    Cos(u) 1 0.9859 0.01314 75.03 74.955 3160

    Cos(u) 2 0.9887 0.006592 149.985 7021

    Fig. 4. (ac) Summary of the conditions imposed and results achieved in the within

    sample comparisons of faces. Histograms and superimposed mean and standard

    deviation of unnormalised: mean absolute distance (a), Euclidean distance (b) and

    cosine distance (c)within sample comparisons for Sample 1 (dotted lower line) and

    Sample 2 (solid upper line).

    K.F. Kleinberg, J.P. Siebert/Forensic Science International 219 (2012) 248258252

  • 7/27/2019 2012_A study of quantitative comparisons of photographs and video images.pdf

    6/11

    analysis and every face in Sample 1 was compared to every face in

    Sample

    2.

    The

    mean

    absolute

    difference,

    the

    Euclidean

    distance

    and

    the

    Cosine udistance equations (all distance measures Z-normalised)

    were

    applied

    in

    the

    between

    sample

    comparisons.

    Superimposednormal

    histogram

    distribution

    curves

    of

    TP

    and

    TN

    face

    matches

    were used to illustrate the discrimination between the two groups.

    In general, a slightly narrower distribution was seen for the TP

    faces. This was most likely because the distribution contained only

    TP matches and therefore the data should be centred on a smaller

    range of values. The amount of overlap between the TP and TN face

    matches correlated to the possibility of achieving either a FP or FN

    face match.

    Superimposing the normal curves to demonstrate the separa-

    tion between TP and TN face matches, the Cosine u distance (Z-

    normalised) equation produced the smallest amount of overlap

    and of the three equations conducted was determined to be best

    equation to test the discrimination between faces of two samples.

    Examination of the superimposed curves showed approximately a

    30% chance of the best match between compared faces corre-

    sponding to a correct identification. The TP distribution is very

    small and difficult to see on the graphs. However, it is still possible

    to see the 0.7 threshold emerging for the Cosine udistance with

    careful observation of Fig. 5c.

    Table 4 illustrates that following Z-normalisation, the cosine

    distance provides the greatest separation of TP comparisons from

    the TN comparisons, based on the difference between the TP and

    TN standard deviation scaled means, respectively. The conclusion

    made from this investigationwas that the cosinedistance equation

    was the best predictor of face discrimination tested thus far andwas the sole equation used to test the error in landmark placement

    and to determine the sample of images from the database that

    could be narrowed down for further verification by an operator.

    In pattern matching based on the cosine distance between two

    unit vectors, the returned measure can be interpreted as a match

    probability. Whilst a cosine distance of 1 indicates a 100%

    probability of the compared vectors being the same, and 0

    indicates zero probability, a distance of 0.5 indicates the 50%

    chance level of correlation between compared vectors. Themean

    of the TP distribution barely reaches this 50% level, although this of

    course indicates thatapproximatelyhalf of theTP comparisonswill

    at least exceed a chance match value. A standard deviation of

    2.37

    about

    the

    TP

    distribution

    mean

    of

    0.48

    indicates

    that

    over

    17.5% of the TP matches will exceed a 70% chance of producing a bestclosest match for the database tested.

    4.3.

    Error

    in

    landmark

    placement

    A small inter-operator study was carried out, to assess the

    influence of landmark placement conducted bymultiple operators.

    It

    has

    been

    reported

    that

    landmark

    placement,

    tested

    on

    3D

    images

    in

    a

    clinical

    setting,

    reveals

    that

    average

    operator

    error

    can

    vary

    widely [30]. Therefore the effect of landmark placement error is

    important to testbecause although landmarkplacement on images

    in

    the

    two

    samples

    used

    in

    this

    study

    was

    conducted

    by

    a

    single

    operator,

    this

    would

    not

    likely

    occur

    in

    practice.

    Facial landmarks were placed on a total of six video images,

    chosen

    at

    random,

    six

    times

    each

    by

    five

    different

    operators.

    Oneoperator

    had

    previous

    experience

    in

    using

    the

    equipment

    and

    Table 4

    Summary of the conditions imposed and results achieved in the between sample TP and TN face comparisons. Mean, standard deviation, and standard deviation scaled

    difference between TP and TN comparisons for Z-normalised vectors: mean absolute distance, Euclidean distance and cosine distance TP and TN data sets.

    Comparison method

    (all Z-normalised)

    Sample Sample mean m Sample standard

    deviation (SD)

    mSampleSDSample

    mTPSDTP

    mTNSDTN

    N, number

    of samples

    MAD TP 0.7771 0.2234 3.479 0.815 80

    MAD TN 1.1053 0.2574 4.294 6320

    Euclidean distance TP 7.594 2.258 3.3632 0.9660 80

    Euclidean distance TN 10.572 2.442 4.3292 6320

    Cos(u) TP 0.4822 0.2035 2.370 2.3950 80

    Cos(u) TN 0.0061 0.2389 0.02553 6320

    Fig. 5. (ac) Summary of the conditions imposed and results achieved in the

    between sample TP and TN face comparisons. Results are illustrated by the

    superimposed normal histogram curves showing the amount of overlap in TP

    (dotted lower line) and TN faces (solid upper line). Mean absolute distance (a),

    Euclidean distance (b) and cosine distance (c).

    K.F. Kleinberg, J.P. Siebert/Forensic Science International 219 (2012) 248258 253

  • 7/27/2019 2012_A study of quantitative comparisons of photographs and video images.pdf

    7/11

    knowledge of the landmarks; landmark locations were studied

    using the definitions provided in [18] and [24]. The remaining

    operators had no experience in using the equipment and no

    previous knowledge of anthropometric landmarks. The inexperi-

    enced operators were given a list of landmark definitions (Table 1)

    adapted from the literature [18,24] aswell as a single photocopy of

    an enlarged male face (A4 sized), front facing, with previously

    placed landmarks to use as a guide. The same equipment was used

    by all operators and each operator conducted their landmark

    placement of images in a single day. Using the Cosine udistance

    equation, comparisons of re-landmarked images were analysed

    first from the single experienced operator and second, from all

    operators (Fig. 6a and b).

    The Cosine u distance (Z-normalised) equation was used to

    compare the re-landmarked images because, when applied in the

    comparison of faces between samples, it was found to be the

    equation in which the statistics of the TP and TN populations were

    the most separated. Each face in the subset sample was compared

    to every other face in the subset sample and resulting data was

    illustrated as superimposed normal histogram curves of TP and TN

    face matches. It was hypothesised that conducting an inter-

    operator test, using high resolution research material, but

    completed by inexperienced operators, would produce a greater

    amount of variation than from an experienced operator and thishypothesis was tested and found to hold.

    The effect that inexperienced operators had on the separation

    rate of TP and TN faces was compared to that of an experienced

    operator and is summarised in Fig. 6a and b and Table 5. Compared

    to that of the experienced operator, the effect of landmark

    placement by inexperienced operators can clearly be seen in the

    separation rates of TP and TN face matches: the mean for TP

    comparisons collapses from 0.8 for the experienced operator to

    0.44 for the mix of experienced and in-experienced operators.

    The TP vs. TN standard deviation scaled means separation is more

    than double for the experienced operator compared to that of the

    mix of operators. A similar observation can be made by inspecting

    the

    superimposed

    histograms

    for

    the

    TP

    and

    TN

    comparisons

    for

    the experienced operator vs. the mix of experienced andinexperienced operators. Although a bimodal result is generated

    by the experienced operator for both TP and TN, a much greater

    separation

    of

    the

    TPTN

    distributions

    is

    observed.

    However,

    for

    all

    operators

    a

    typical

    averaged

    picture

    emerges

    and

    the

    effect

    of

    the

    expert canbe seen as a small additional bump at the top of the TP

    distribution.

    The

    most

    important

    point

    witnessed

    in

    Fig.

    6a and

    b

    was

    to

    observe

    the

    strong

    effect

    that

    the

    experienced

    operator

    had

    in

    creating a larger separation of TP and TN face matches. As multiple

    experienced operators were not tested, it cannotbe stated that this

    difference

    in

    separation

    rates

    between

    the

    experienced

    operator

    and

    all

    operators

    was

    due

    to

    the

    experience

    of

    the

    operators

    or

    instead, the effect that will naturally occur with multiple

    operators.

    This

    could

    be

    tested

    by

    conducting

    a

    study

    using

    apool

    of

    experienced

    operators.

    A

    further

    study

    analysing

    the

    distribution

    achieved

    from

    the

    re-landmarked

    images

    of

    each

    operator after applying the Cosine u distance (Z-normalised)

    equation could determine if any of the inexperienced operators

    also achieved the same strong separation rate as the experienced

    operator.

    An

    inexperienced

    operator

    producing

    a

    similar

    degree

    of

    separation to the experienced operator would signify that the largeseparation rate produced from all operators was caused by the

    inclusion of multiple operators rather than their experience.

    However,

    from

    the

    literature

    in

    [31], it

    can

    be

    predicted

    that

    the

    spread

    from

    a

    single

    inexperienced

    operator

    would

    be

    larger

    than

    an experienced operator.

    4.4.

    Potential

    sample

    of

    photographs

    subject

    to

    manual

    verification

    The amount of overlap between the distributions of TP and TN

    faces illustrates an approximation of the misclassification rate (see

    Fig.

    5ac).

    However,

    given

    the

    task

    of

    comparing

    a

    suspects

    image

    to

    a

    large

    database

    of

    identity

    photographs,

    the

    ability

    to

    decrease

    the number of possible face matches could potentially save

    significant

    numbers

    of

    investigation

    hours.

    This

    smaller

    sample

    ofsuspect

    photographs

    could

    then

    be

    more

    closely

    scrutinised

    by

    an

    expert.

    For

    this

    analysis,

    each

    face

    in

    Sample

    1

    was

    compared

    to

    Fig. 6. (a and b) Superimposed normal curve histograms illustrating TP (dotted

    lower line) and TN (solid upper line) face comparisons of the Cosine u (Z-

    normalised) distance equations in six re-landmarked images from Sample 1 using

    one experienced operator (a) and multiple operators (b).

    Table 5

    Mean, standard deviation, and standard deviation scaled difference between TP and TN comparisons in landmark placement error study: mean absolute distance, Euclidean

    distance and cosine distance for TP and TN data sets illustrating TP and TN face comparisons in six re-landmarked images from Sample 1 using one experienced operator and

    multiple operators.

    Comparison method: cosine distance (all Z-normalised) Sample Sample mean m Sample standard

    deviation (SD)

    mSampleSDSample

    mTPSDTP

    mTNSDTN

    N, number

    of samples

    One experienced operator TP 0.8029 0.1489 5.392 5.8224 90

    One experienced operator TN 0.1666 0.3871 0.4304 540

    Multiple operators: experienced and inexperienced TP 0.4409 0.2655 1.6606 2.01351 2610

    Multiple operators: experienced and inexperienced TN 0.08657 0.2453 0.3529 13,500

    K.F. Kleinberg, J.P. Siebert/Forensic Science International 219 (2012) 248258254

  • 7/27/2019 2012_A study of quantitative comparisons of photographs and video images.pdf

    8/11

    each face in Sample2 for a total of80 comparisons. The Cosine u(Z-

    normalised) distance equation was used and the resulting values

    were placed indescendingorder, noting the rank of theTP.Thebest

    match was defined as the match value that returned a Cosine u

    value thatwas highest or closest to1.0. Thiswasused to determine

    within a confidence range given a best match value how many

    additional faces in the database would need to be verified before

    the true positive match was found.

    Best match values were placed in intervals of 0.1. The mean

    rank of the TP, SD, and 2SD confidence interval for each match

    interval was found and results shown in Table 6. In this instancethe confidence interval says that for within a given confidence

    range, how many database images should be looked at in total.

    Results in Table 6 indicate that given match values of 0.7 the best

    match from the database is also likely to be the TP face. This result

    is consistent with the observed degree of overlap between the TP

    and TN distributions shown in Fig 5.Amatch threshold of 0.7 is not

    an unreasonably high value to set, given that a distance of 0.5

    indicates the 50% chance level of correlation between compared

    vectors. A larger sample of images should be tested to determine if

    results consistent with those presented here are produced.

    5. Discussion

    Using high resolution photographic research material, theobject of the study was to assess if a facial anthropometric feature

    vector could be utilised to distinguish between individuals of a

    similar

    age

    group,

    ancestry

    and

    sex.

    Given

    a

    database

    of

    subjects,

    knowledge

    of

    the

    type

    of

    information

    gathered

    in

    this

    study

    may

    help in future to narrow down the number of possible suspects in

    an investigation. The technique presented here entailed analysing

    vector

    comparisons

    to

    differentiate

    between

    images

    of

    two

    samples.

    The

    feature

    vector

    was

    utilised

    in

    three

    types

    of

    equations

    testing the differences between faces in the samples. Normal-

    isation was applied to the ratio values as a way to equalise the

    feature

    vector

    values

    in

    each

    sample

    and

    account

    (to

    some

    degree)

    for

    the

    statistics

    that

    different

    camera

    parameters

    would

    produce.

    Z-normalisation enhances any differences between means and

    makes

    the

    interpretation

    of

    the

    data

    more

    straightforwardallowing

    small

    differences

    in

    the

    data

    can

    to

    be

    more

    simply

    seen.

    We found

    that

    the

    face

    matching

    technology

    investigated

    in

    this study can assist in a database search; however, it does not

    provide an unequivocal means of confirming facial identifications

    suitable

    to

    use

    in

    court.

    Therefore,

    the

    focus

    of

    future

    work

    should

    concentrate

    on

    the

    potential

    for

    this

    approach

    to

    extract

    facial

    information improving the search of databases and leaving

    humans as the ultimate authenticator.

    The

    first

    step

    to

    answering

    the

    objectives

    laid

    forth

    in

    the

    introduction

    was

    to

    evaluate

    each

    sample

    of

    images

    to

    determine

    if

    once the equations were applied, any differences could be seen

    between the two samples. Testing faces against those found in the

    same

    sample

    is

    important

    because

    it

    allows

    the

    equations

    to

    ascertain

    if

    there

    are

    any

    differences

    between

    faces

    which

    fall

    under the same conditions. This means that other than the

    possibility of slight changes in facial expression the facial ratios

    will be the only changeable variable between faces as all other

    variables remain constant; same media, same operator placing

    landmarks and same facial pose.

    Once samples were looked at individually, a between sample

    comparison was conducted to determine how distinguishable the

    faces were in the two samples.Once the respective equationswere

    conducted, superimposed normal histogram distribution curves of

    true positive faces and true negative faces were used to illustrate

    the discrimination of the two groups. In general, a narrowerdistribution was seen for the true positive faces. This was because

    as the distribution contained only true positive matches, the data

    should be centred on a smaller range of values. The amount of

    overlap correlated to the possibility of achieving either a false

    positive or false negative face match.

    Although other researchused the squared Euclidean distance to

    measure the likeness between pairs of faces [32], we found by

    superimposing the normal curves to demonstrate the separation

    between true positive and true negative faces, the Cosine u

    distance (Z-normalised) equation produced the least amount of

    overlap between true positive faces and true negative faces when

    statistics of the two sampleswere known. Thematch values of true

    negative

    faces

    in

    the

    superimposed

    histogram

    normal

    curves

    begin

    to trail off at 0.7, indicating that although it is still possible toachieve a true negative identification above this value, it is likely

    that a returned match score of below 0.7 will result in a true

    negative

    face

    after

    closer

    examination.

    Although

    this

    result

    occurred

    in

    this

    study,

    it

    may

    not

    be

    replicated

    with

    a

    larger

    test

    database. The investigations undertaken in this study to determine

    if it is possible to discriminate between individuals of two samples

    using

    a

    multi

    dimensional

    facial

    feature

    vector

    found

    that

    the

    Cosine udistance was the best discriminator this but could further

    be improved upon by administering a more comprehensive

    statistical analysis.

    A small

    inter-operator

    study

    was

    carried

    out,

    to

    assess

    the

    influence

    of

    landmark

    placement

    conducted

    by

    multiple

    operators.

    This is important to test because although landmark placement on

    all

    images

    used

    in

    the

    comparative

    process

    of

    this

    study

    wasconducted

    by

    a

    single

    operator,

    this

    would

    not

    likely

    be

    the

    case

    in

    the

    real

    world.

    Landmark

    placement

    has

    been

    tested

    by

    other

    researchers on 3D images in a clinical setting and it was suggested

    that average operator error varies widely [30]. Using a digital

    sliding

    calliper

    to

    measure

    photographs,

    researchers

    carried

    out

    an

    intra

    observer

    study

    to

    test

    reliability

    of

    measurements

    and

    results

    showed a low reliability in measurements of ls-sto and n-sn [33].

    The currentanalysiswas conductedwith one experienced operator

    but

    the

    remaining

    operators

    were

    inexperienced

    It

    would

    be

    beneficial

    to

    analyse

    this

    data

    further

    in

    an

    inter-operator

    study

    using experienced operators located in different graphical regions

    because this scenario would be more likely as a police procedure.

    Experience

    was

    shown

    to

    be

    a

    benefiting

    factor

    when

    the

    inter-

    operator

    variation

    in

    taking

    standard

    skeletal

    measurements

    was

    Table 6

    Interval showing two standard deviations of how many images in the database should be manually investigated.

    Interval of best

    match values

    n (number of best

    matches in the interval)

    Mean of TP rank SD of TP rank Min of TP rank Max of TP rank Number of images to

    manually investigate

    in database (mean+2SD)

    0.900.99 0 N/A N/A N/A N/A N/A

    0.800.89 2 1 0 1 1 1

    0.700.79 7 1 0 1 1 1

    0.600.69 23 3.6 7.6 1 37 19

    0.500.59

    30

    8.9

    13.5

    1

    65

    360.400.49 16 16.7 28.5 1 110 74

    0.300.39 2 2 1.4 1 3 5

    K.F. Kleinberg, J.P. Siebert/Forensic Science International 219 (2012) 248258 255

  • 7/27/2019 2012_A study of quantitative comparisons of photographs and video images.pdf

    9/11

  • 7/27/2019 2012_A study of quantitative comparisons of photographs and video images.pdf

    10/11

    http://nickfieller.staff.shef.ac.uk/seminars/faces04-10-06.pdfhttp://www.interpol.int/Public/ICPO/FactSheets/GI04.pdfhttp://www.interpol.int/Public/ICPO/FactSheets/GI04.pdf
  • 7/27/2019 2012_A study of quantitative comparisons of photographs and video images.pdf

    11/11

    [29] I.L.Dryden,K.V.Mardia, StatisticalShapeAnalysis, Wiley-Blackwell,WestSussex,1998.

    [30] A. Ayoub, et al., Validation of a vision-based, three-dimensional facial imagingsystem, Cleft Palate: Cran. J. 40 (2003) 523529.

    [31] B.J Adams, J.E. Byrd, Interobserver variation of selected postcranial skeletalmeasurements, J. Forensic Sci. 47 (2002) 11931202.

    [32] J.P. Davis, T. Valentine, R.E. Davis, Computer assisted photo-anthropometricanalyses of full-face and profile facial images, Forensic Sci. Int. 200 (2010)165176.

    [33] M.Roelofse,et al.,Photo identification:facialmetrical andmorphologicalfeaturesin South African males, Forensic Sci. Int. 177 (2008) 168175.

    [34]

    B. Murphy, R.D. Morrison, Introduction to Environmental Forensics, AcademicPress, 2007.[35] D. Sheskin,Handbookof ParametricNonparametric Statistical Procedures, Chap-

    man Hall/CRC, 2007.

    [36] T. Fawcett, An introduction to ROC analysis, Pattern Recogn. Lett. 27 (2006)861874.

    [37] D. DeCarlo, et al., An anthropometric face model using variational techniques, in:Proceedings of the25th Annual Conference onComputerGraphicsand InteractiveTechniques, 1998, pp. 6774.

    [38] C. Zhang, S.F. Cohen, 3-D face structure extraction and recognition from imagesusing 3-D morphing and distance mapping, IEEE Trans. Image Proc. 11 (2002)12491259.

    [39] M.I.M. Goos, et al., 2D/3D image (facial) comparison using camera matching,Forensic Sci. Int. 163 (2006) 1017.

    [40] J. Lee, et al., Efficient height measurement method of surveillance camera image,

    Forensic Sci. Int. 177 (2008) 1723.[41] H.C. Longuet-Higgins,A computer algorithmfor reconstructing a scene from twoprojections, Nature 293 (1981) 133135.

    K.F. Kleinberg, J.P. Siebert/Forensic Science International 219 (2012) 248258258