4
Corrigendum Corrigendum to Are facial image analysis experts any better than the general public at identifying individuals from CCTV images?[Science and Justice 49 (2009) 191196] Caroline Wilkinson a, , Raymond Evans b a The Centre for Anatomy & Human Identication, MSI Building, Dow Street, University of Dundee, Scotland, DD1 5EH, UK b Unit of Art in Medicine, Stopford Building, Oxford Road, University of Manchester,M13 9PT, UK Errors in the published data analysis have been noted and corrected along with paragraphs in the discussion for clarity. SigmaPlot 11.0 software was used to analyse the data. The corrections are as follows: 3. Results 3.1. Identication from the full head In total the general public falsely accepted 10% of the faces, falsely matched 43% of the cases and made 39% of the possible false rejections and 87% of the true acceptances/rejections. The general public correctly identied 61% of the target present cases and 42% of the target absent cases, giving an overall total of 53% correct identications. In total the experts falsely accepted 2% of the faces, falsely matched 15% of the cases and made 8% of the possible false rejections and 96% of the true acceptances/rejections. The experts correctly identied 92% of the target present cases and 75% of the target absent cases, giving an overall total of 85% correct identications. KruskalWallis One Way Analysis of Variance (ANOVA) tests suggested that the experts had signicantly higher true acceptances/ rejections (p=0.008), higher correct identications (p = 0.008), lower false rejections (p = 0.026) and lower false acceptances (p = 0.012) and false matches (p = 0.010) than the public at a signicance level of p b 0.05 (see Table 2). In the target present cases the public were more likely to make a false match (32%) than reject the whole pool (7% —‘not present). The experts made only one false match (8%) and did not reject the whole pool in any case (see Fig. 4 and Table 1). KruskalWallis One Way Analysis of Variance (ANOVA) tests suggested that the experts had signicantly lower false rejections (p = 0.026), lower false accep- tances (p = 0.041), lower false matches (p = 0.041), higher true acceptances/rejections (p = 0.026) and higher correct identications (p = 0.026) than the public (see Table 2) at a signicance level of p b 0.05 (see Table 2). In the target absent cases the public were more likely to make a false match (58%) than a true (42%) rejection, whereas the experts were more likely to make a true rejection (75%) than a false match (25%) (see Fig. 4 and Table 1). KruskalWallis One Way Analysis of Variance (ANOVA) tests suggested that these differences between the experts and the public were not statistically signicant at the p b 0.05 level (see Table 2). Both experts falsely accepted the same face in one of the target absent cases (clip 7), and this was also the most frequently chosen face from the pool by the public (55%). One expert falsely accepted a face in one of the target present cases (clip 10), but this face was not falsely accepted by any of the public, who chose another face almost as highly (39%) as the target (58%). The experts recorded a higher mean level of support (4 = support) for true acceptances than for false acceptances (3 = moderate) and a lower mean level of support for true rejections (1 = no) than for false rejections (3 = moderate). 3.2. Identication from the face alone In total the general public falsely accepted 10% of the faces, falsely matched 53% of the cases and made 58% of the possible false rejections and 84% of the true acceptances/rejections. The general public correctly identied 42% of the target present cases and 27% of the target absent cases, giving an overall total of 38% correct identica- tions (see Fig. 5 and Table 1). In total the experts falsely accepted 3% of the faces, falsely matched 13% of the cases and made 25% of the possible false rejections and 95% of the true acceptances/rejections. The experts correctly identied 75% of the target present cases and 100% of the target absent cases, giving an overall total of 81% correct identications (see Fig. 5 and Table 1). KruskalWallis One Way Analysis of Variance (ANOVA) tests suggested that the experts had signicantly higher true acceptances/ rejections (p = 0.010), higher correct cases (p = 0.015) and lower false acceptances (p = 0.010) and lower false matches (p = 0.010) Science and Justice 51 (2011) 218221 DOI of original article: 10.1016/j.scijus.2008.10.011. Corresponding author. Tel.: + 44 1382 386324; fax: + 44 1382 386817. E-mail addresses: [email protected] (C. Wilkinson), [email protected] (R. Evans). 1355-0306/$ see front matter © 2011 Forensic Science Society. Published by Elsevier Ireland Ltd. All rights reserved. doi:10.1016/j.scijus.2011.07.001 Contents lists available at ScienceDirect Science and Justice journal homepage: www.elsevier.com/locate/scijus

Corrigendum to “Are facial image analysis experts any better than the general public at identifying individuals from CCTV images?” [Science and Justice 49 (2009) 191–196]

Embed Size (px)

Citation preview

Science and Justice 51 (2011) 218–221

Contents lists available at ScienceDirect

Science and Justice

j ourna l homepage: www.e lsev ie r.com/ locate /sc i jus

Corrigendum

Corrigendum to “Are facial image analysis experts any better than the general publicat identifying individuals from CCTV images?”[Science and Justice 49 (2009) 191–196]

Caroline Wilkinson a,⁎, Raymond Evans b

a The Centre for Anatomy & Human Identification, MSI Building, Dow Street, University of Dundee, Scotland, DD1 5EH, UKb Unit of Art in Medicine, Stopford Building, Oxford Road, University of Manchester,M13 9PT, UK

Errors in the published data analysis have been noted andcorrected along with paragraphs in the discussion for clarity.SigmaPlot 11.0 software was used to analyse the data. The correctionsare as follows:

3. Results

3.1. Identification from the full head

In total the general public falsely accepted 10% of the faces, falselymatched 43% of the cases andmade 39% of the possible false rejectionsand 87% of the true acceptances/rejections. The general publiccorrectly identified 61% of the target present cases and 42% of thetarget absent cases, giving an overall total of 53% correctidentifications.

In total the experts falsely accepted 2% of the faces, falselymatched15% of the cases and made 8% of the possible false rejections and 96%of the true acceptances/rejections. The experts correctly identified92% of the target present cases and 75% of the target absent cases,giving an overall total of 85% correct identifications.

Kruskal–Wallis One Way Analysis of Variance (ANOVA) testssuggested that the experts had significantly higher true acceptances/rejections (p=0.008), higher correct identifications (p=0.008),lower false rejections (p=0.026) and lower false acceptances(p=0.012) and false matches (p=0.010) than the public at asignificance level of pb0.05 (see Table 2).

In the target present cases the public were more likely to make afalse match (32%) than reject the whole pool (7%— ‘not present’). Theexperts made only one false match (8%) and did not reject the wholepool in any case (see Fig. 4 and Table 1). Kruskal–Wallis One WayAnalysis of Variance (ANOVA) tests suggested that the experts hadsignificantly lower false rejections (p=0.026), lower false accep-

DOI of original article: 10.1016/j.scijus.2008.10.011.⁎ Corresponding author. Tel.: +44 1382 386324; fax: +44 1382 386817.

E-mail addresses: [email protected] (C. Wilkinson),[email protected] (R. Evans).

1355-0306/$ – see front matter © 2011 Forensic Science Society. Published by Elsevier Iredoi:10.1016/j.scijus.2011.07.001

tances (p=0.041), lower false matches (p=0.041), higher trueacceptances/rejections (p=0.026) and higher correct identifications(p=0.026) than the public (see Table 2) at a significance level ofpb0.05 (see Table 2).

In the target absent cases the public were more likely to make afalse match (58%) than a true (42%) rejection, whereas the expertswere more likely to make a true rejection (75%) than a false match(25%) (see Fig. 4 and Table 1). Kruskal–Wallis One Way Analysis ofVariance (ANOVA) tests suggested that these differences between theexperts and the public were not statistically significant at the pb0.05level (see Table 2).

Both experts falsely accepted the same face in one of the targetabsent cases (clip 7), and this was also the most frequently chosenface from the pool by the public (55%). One expert falsely accepted aface in one of the target present cases (clip 10), but this face was notfalsely accepted by any of the public, who chose another face almostas highly (39%) as the target (58%).

The experts recorded a highermean level of support (4= support)for true acceptances than for false acceptances (3 = moderate) and alower mean level of support for true rejections (1 = no) than for falserejections (3 = moderate).

3.2. Identification from the face alone

In total the general public falsely accepted 10% of the faces, falselymatched 53% of the cases andmade 58% of the possible false rejectionsand 84% of the true acceptances/rejections. The general publiccorrectly identified 42% of the target present cases and 27% of thetarget absent cases, giving an overall total of 38% correct identifica-tions (see Fig. 5 and Table 1). In total the experts falsely accepted 3% ofthe faces, falsely matched 13% of the cases and made 25% of thepossible false rejections and 95% of the true acceptances/rejections.The experts correctly identified 75% of the target present cases and100% of the target absent cases, giving an overall total of 81% correctidentifications (see Fig. 5 and Table 1).

Kruskal–Wallis One Way Analysis of Variance (ANOVA) testssuggested that the experts had significantly higher true acceptances/rejections (p=0.010), higher correct cases (p=0.015) and lowerfalse acceptances (p=0.010) and lower false matches (p=0.010)

land Ltd. All rights reserved.

Fig. 4. Full-head identification rates (%) of experts and the public for target present (TA), target absent (TP) and combined (ALL) cases from CCTV footage. NP = not present.

219C. Wilkinson, R. Evans / Science and Justice 51 (2011) 218–221

than the public at a significance level of pb0.05 (see Table 2). Therewas no significant difference between the false rejection rates of thepublic and the experts (see Table 2).

In the target present cases the public was more likely to make afalse match (46%) than a true match (42%) or reject the whole pool(12%— ‘not present’) and the experts were more likely to make a falsematch (17%) than reject the whole pool (8%— ‘not present’). Kruskal–Wallis One Way Analysis of Variance (ANOVA) tests suggested thatthe differences between the experts and the public were notstatistically significant at the pb0.05 level (see Table 2).

In the target absent cases the public was more likely to make afalse match (74%) than a true rejection (27%), whereas the expertsmade no false matches at all. Kruskal–Wallis One Way Analysis ofVariance (ANOVA) tests suggested that the differences between theexperts and the public were not statistically significant at the pb0.05level (see Table 2).

Table 1Facial identification results for public and expert groups using CCTV footage and face pool m

Part A = full-head identification

Clip Public (n=31) Experts (n

(FP=5) FM FA FR True FM

1TA 13 13 142 02TP 15 15 17 123 03TP 6 6 7 142 04TP 3 3 6 146 05TA 18 18 137 06TA 12 12 143 07TA 29 29 126 28TP 6 6 12 137 09TP 18 18 18 119 010TP 12 12 13 130 1Total 132/310 132/1364 73/186 1345/1550 3/20

Part B = face-only identification

Clip Public (n=30) Experts (n

(FP=6) FM FA FR True FM

1TP 16 16 19 145 22TP 15 15 19 146 03TP 17 17 19 144 04TA 23 23 157 05TA 21 21 159 06TP 16 16 20 144 07TP 13 13 18 149 08TP 6 6 9 165 0Total 127/240 127/1260 104/180 1209/1440 2/16

TA = target absent TP = target present FP = face pool FA = false acceptance (by non-targTRUE = true acceptances/rejections FR = false rejection.

Both experts falsely accepted a face in one of the target presentcases (clip 1) and this face was falsely accepted by 26% of the public,who chose this face almost as highly as the target (37%).

The experts recorded a slightly higher mean level of support (4 =support) for true acceptances than for false acceptances (3 =moderate) and a lower mean level of support for true rejections (1 =no) than for false rejections (3 = moderate).

When the results from part A and part B were combined, thegeneral public correctly identified 52% of the target present cases and37% of the target absent cases, giving an overall total of 47% correctidentifications. The experts were more consistent and correctlyidentified 83% of the target present, target absent and overall cases(see Fig. 6 and Table 1). Kruskal–Wallis OneWay Analysis of Variance(ANOVA) tests suggested that the experts had significantly highertrue acceptances/rejections (p b0.001), higher correct cases(pb0.001), lower false rejections (p=0.003), lower false matches

atching.

=2) Public Experts

FA FR True Correct ID % Correct ID %

0 10 18 58 2 1000 0 10 14 45 2 1000 0 10 24 78 2 1000 0 10 25 81 2 1000 10 13 42 2 1000 10 19 61 2 1002 8 2 7 0 00 0 10 19 61 2 1000 0 10 13 42 2 1001 1 8 18 58 1 503/88 1/12 96/100 165/310 53 17/20 85

=2) Public Experts

FA FR True Correct ID % Correct ID %

2 2 8 11 37 0 00 0 12 11 37 2 1000 0 12 11 37 2 1000 0 12 7 23 2 1000 0 12 9 30 2 1000 1 11 10 33 1 500 0 12 12 40 2 1000 0 12 21 70 2 1002/84 3/12 91/96 92/240 38 13/16 81

et) FM = False match (by case).

Table 2Comparison of the true acceptance/rejection, false acceptance, false match, false rejection and correct identification rates between the public and the experts.

Part A True accepts/rejects False rejections False acceptances False matches Correct IDs(% of faces) (% of targets) (% of non-targets) (% of cases) (% of cases)

Public Experts Public Experts Public Experts Public Experts Public Experts

Target absentMean (%) 88.3 95.0 11.8 5.0 58.2 25.0 42.0 75.0Std deviation 5.18 10.00 5.19 10.00 25.25 50.00 24.78 50.00H value 1.418 1.418 1.400 1.400p value 0.343 0.343 0.343 0.343

Target presentMean (%) 85.7 96.7 39.2 8.3 8.2 2.2 32.2 8.3 60.8 91.7Std deviation 6.89 8.16 15.74 20.41 4.96 5.31 18.99 20.41 16.22 20.41H value 5.404 5.404 4.677 4.677 5.404p value 0.026 0.026 0.041 0.041 0.026

AllMean (%) 86.7 96.0 39.2 8.3 9.6 3.3 42.6 15.0 53.3 85.0Std deviation 6.09 8.43 15.74 20.41 5.10 7.14 24.38 33.75 21.1 33.75H value 7.085 5.404 6.261 6.658 7.068p value 0.008 0.026 0.012 0.010 0.008

Part B

Target absentMean (%) 87.5 100.0 12.5 0.0 73.5 0.0 26.5 100.0Std deviation 0.71 0.00 0.71 0.00 4.95 0.00 4.95 0.00H value 2.667 2.667 2.667 2.667p value 0.333 0.333 0.333 0.333

Target presentMean (%) 82.8 93.2 57.7 25.0 9.3 3.3 46.0 16.7 42.3 75.0Std deviation 4.62 13.21 13.67 41.83 2.73 8.16 13.57 40.83 13.74 41.83H value 3.553 3.227 4.031 3.985 3.262p value 0.065 0.093 0.065 0.065 0.093

AllMean (%) 84.0 94.9 57.7 25.0 10.1 2.5 52.9 12.5 38.4 81.3Std deviation 4.47 11.61 13.69 41.83 2.75 7.07 17.23 35.35 13.86 37.20H value 6.452 3.227 6.968 6.934 6.190p value 0.010 0.093 0.010 0.010 0.015

Combined

AllMean (%) 85.5 95.5 48.5 16.7 9.8 2.9 47.2 13.9 46.7 83.3Std deviation 5.46 9.67 17.06 32.57 4.12 6.92 21.55 33.46 19.32 34.30H value 13.169 8.800 13.196 14.653 13.354p value b0.001 0.003 b0.001 b0.001 b0.001

Fig. 5. Face-only identification rates (%) of experts and the public for target present (TA), target absent (TP) and combined (ALL) cases from CCTV footage. NP = not present.

220 C. Wilkinson, R. Evans / Science and Justice 51 (2011) 218–221

Fig. 6. The overall correct identifications (%) from both part A and B recorded by experts and the public for target present (TP), target absent (TA) and combined (ALL) cases fromCCTV footage.

221C. Wilkinson, R. Evans / Science and Justice 51 (2011) 218–221

(pb0.001) and lower false acceptances (pb0.001) than the public at asignificance level of pb0.05 (see Table 2).

4. Discussion

The public were more likely to make a false match than record ‘notpresent’ for both full head and face only identification. The error ratesincreased when the targets wore hats, so that the false matches

became the greatest result regardless of whether the target waspresent or not. This result suggests that a jury is more likely to acceptan innocent person based on CCTV identification evidence.

This research shows that the experts were not infallible andrecorded a false rejection rate (FRR) of 17% and false acceptance rates(FAR) of 3% for faces and 14% for cases. These results follow guidelinesfor a biometric identifier, in that the FRR is greater than the FAR andthat as the amount of evidence decreases (e.g. through headwear) theFAR is unaffected whilst the FRR increases.