64
Face Recognition at a Chokepoint Scenario Evaluation Results 14 November 2002 Mike Bone Duane Blackburn NAVSEA Crane Division 300 Hwy 361 Crane, IN 47522 NAVSEA Dahlgren Division 17320 Dahlgren Road Dahlgren, VA 22448 Sponsored by: DoD Counterdrug Technology Development Program Office

Face Recognition at a Chokepoint - Scenario Evaluation Results

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Face Recognition at a Chokepoint - Scenario Evaluation Results

Face Recognition at a Chokepoint

Scenario Evaluation Results

14 November 2002

Mike Bone Duane Blackburn NAVSEA Crane Division300 Hwy 361Crane, IN 47522

NAVSEA Dahlgren Division17320 Dahlgren RoadDahlgren, VA 22448

Sponsored by: DoD Counterdrug Technology Development Program Office

Page 2: Face Recognition at a Chokepoint - Scenario Evaluation Results

Face Recognition at a Chokepoint Scenario Evaluation Results

EXECUTIVE OVERVIEW

This report describes an evaluation performed on three face recognition systems manufactured by Identix Incorporated (formerly Visionics Corporation). The evaluation follows the methodology proposed in “An Introduction to Evaluating Biometric Systems,” by P. J. Phillips, A. Martin, C. L. Wilson and M. Przybocki in IEEE Computer, February 2000, pp. 56–63. This methodology proposes a three-step evaluation protocol: a top-level technology evaluation, followed by a scenario evaluation and an operational evaluation. A technology evaluation was performed via the Facial Recognition Vendor Test 2000 (FRVT 2000) in May/June 2000. The FRVT 2000 evaluation report is available online at http://www.frvt.org/FRVT2000/documents.htm. Technologies identified as best candidates for this scenario from FRVT 2000 results were chosen for this scenario evaluation.

The goal for this evaluation was to assess the overall capabilities of entire systems for two choke­point scenarios: verification and watchlist. Verification is a one-to-one process where a person presents an identity claim and the system determines if the live face image matches the face image stored under that identity within a certain threshold. An example of a verification application is doorway access control. The watchlist scenario is a one-to-many process where a person’s live face image is compared to each face image in a watchlist and an alarm is triggered if a confidence match exceeds a certain threshold. An example of a watchlist application is searching for known terrorists passing through an airport metal detector.

For the verification scenario, the effects of users wearing glasses for enrollment but not recognition, or vice versa, were studied. For the watchlist scenario, a study was performed on the effects of using high quality enrollment images taken under controlled lighting conditions vs. badge images taken under less favorable conditions.

Two general conclusions can be drawn from the reported data. First, marginal performance im­provements are gained by ensuring that users either wear glasses for both enrollment and verifica­tion attempts or that they do not wear glasses for either part of the process. Second, the accuracy of the watchlist systems improves significantly when using high quality images taken under controlled lighting conditions close in time to the recognition attempts.

Page 2 of 58

Page 3: Face Recognition at a Chokepoint - Scenario Evaluation Results

Face Recognition at a Chokepoint Scenario Evaluation Results

ACKNOWLEDGEMENTS

The authors would like to thank the following people for their assistance with the evaluation.

• Bob Butler, Sabre Systems for coordinating personnel, soliciting volunteers, collecting con­sent forms, and organizing the database of badge images.

• Dabney “DC” Ayres, Steve Brannan, Bill Hecht, and Jose Roman of NAVSEA Dahlgren Di­vision for setting up the test equipment in the building 1470 lobby, performing the verification enrollment, and recording video of the volunteers.

• Holly Smith, SAIC for playing the recorded video into the face recognition systems and recording performance data.

• Frank Shields, SAIC for reviewing raw and calculated data.

• The 144 volunteers from NAVSEA Dahlgren Division.

• The following individuals who reviewed this report:

– Dr. Jonathon Phillips, DARPA

– Richard Vorder Bruegge, FBI

– Anonymous Reviewers

• Identix Incorporated for their willingness to participate in this difficult evaluation and for their suggestions during our development of the test plan.

Page 3 of 58

Page 4: Face Recognition at a Chokepoint - Scenario Evaluation Results

Face Recognition at a Chokepoint Scenario Evaluation Results

CONTENTS

1 Introduction 7

2 Evaluation Methodology 7

3 Evaluation Overview 83.1 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83.2 Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103.3 Performance Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

4 Evaluation Details 114.1 Timetable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114.2 Preparations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

4.2.1 Volunteer Solicitation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124.2.2 Badge Image Database . . . . . . . . . . . . . . . . . . . . . . . . . . . 124.2.3 Evaluation Area . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124.2.4 Equipment Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164.2.5 Personnel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

4.3 Verification Scenario Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184.3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184.3.2 Video Recording Phase . . . . . . . . . . . . . . . . . . . . . . . . . . . 184.3.3 Enrollment Phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194.3.4 Video Playback Phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194.3.5 Data Analysis Phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

4.4 Watchlist Scenario Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204.4.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204.4.2 Video Recording Phase . . . . . . . . . . . . . . . . . . . . . . . . . . . 204.4.3 Enrollment Phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214.4.4 Video Playback Phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224.4.5 Data Analysis Phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

5 Evaluation Results 235.1 Verification Scenario Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235.2 Watchlist Scenario Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

6 Summary 34

References 35

Appendix A Volunteer Solicitation Poster 36

Appendix B Volunteer Consent Form 37

Appendix C Operator Instructions 38C.1 Morning Duties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38C.2 Verification Video Recording . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39C.3 Verification Enrollment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40C.4 Watchlist Video Recording . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

Page 4 of 58

Page 5: Face Recognition at a Chokepoint - Scenario Evaluation Results

1234567891011121314151617181920212223242526272829

Face Recognition at a Chokepoint Scenario Evaluation Results

C.5 Evening Duties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

Appendix D Statistics Overview 43D.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

D.1.1 Generic Biometric System Operation . . . . . . . . . . . . . . . . . . . . 43D.1.2 Evaluation Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

D.2 Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44D.3 Verification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45D.4 Watchlist . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

Appendix E Subject Information 49E.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49E.2 Subject Attempt Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

Appendix F Identix Position Paper 58

FIGURES

Histogram showing the number of verification attempts made by each volunteer. . 9Histogram showing the number of watchlist attempts made by each volunteer. . . 9Verification enrollment images. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10Badge images. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12Building 1470 lobby layout. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13Verification equipment setup. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14Photo of verification station. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14Watchlist equipment setup. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15Photo of watchlist station. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15Building 1470 lobby lighting layout. . . . . . . . . . . . . . . . . . . . . . . . . . . 16Watchlist camera configuration. . . . . . . . . . . . . . . . . . . . . . . . . . . . 17Verification scores for Identix custom verification system. . . . . . . . . . . . . . . 24Verification scores for Identix custom verification system. . . . . . . . . . . . . . . 24Histogram showing time to accept valid users at threshold 9.3769. . . . . . . . . . 25Histogram showing time to accept valid users at threshold 9.972. . . . . . . . . . 25Histogram showing time to accept valid users at threshold 9.998. . . . . . . . . . 26Watchlist Scores for Identix FaceIt Surveillance with watchlist size 100. . . . . . . 27Watchlist Scores for Identix FaceIt Surveillance with watchlist size 400. . . . . . . 27Watchlist Scores for Identix FaceIt Surveillance with watchlist size 1575. . . . . . 28Total alarms for Identix FaceIt Surveillance. . . . . . . . . . . . . . . . . . . . . . 28Watchlist Scores for Identix Argus with watchlist size 100. . . . . . . . . . . . . . 29Watchlist Scores for Identix Argus with watchlist size 400. . . . . . . . . . . . . . 29Watchlist Scores for Identix Argus with watchlist size 1575. . . . . . . . . . . . . . 30Watchlist Scores for Identix Argus with watchlist size 100 (new images). . . . . . . 31Total alarms for Identix Argus. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31Watchlist Identification Characteristic (WIC) for Identix FaceIt Surveillance. . . . . 32Watchlist Alarm Characteristic (WAC) for Identix FaceIt Surveillance. . . . . . . . 33Watchlist Identification Characteristic (WIC) for Identix Argus. . . . . . . . . . . . 33Watchlist Alarm Characteristic (WAC) for Identix Argus. . . . . . . . . . . . . . . . 34

D-1 Sample cumulative match score (CMS). . . . . . . . . . . . . . . . . . . . . . . . 45

Page 5 of 58

Page 6: Face Recognition at a Chokepoint - Scenario Evaluation Results

Face Recognition at a Chokepoint Scenario Evaluation Results

D-2 Sample receiver operating characteristic (ROC). . . . . . . . . . . . . . . . . . . 46D-3 Sample watchlist identification characteristic (WIC). . . . . . . . . . . . . . . . . . 47D-4 Sample watchlist alarm characteristic (WAC). . . . . . . . . . . . . . . . . . . . . 48E-1 Histogram showing the number of verification attempts made by each volunteer. . 49E-2 Histogram showing the number of watchlist attempts made by each volunteer. . . 50

TABLES

1 Timetable of Evaluation Activities . . . . . . . . . . . . . . . . . . . . . . . . . . 112 Verification video log. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193 Verification enrollment log. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194 Verification attempt log. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205 Verification tally table. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206 Watchlist video log. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217 Watchlist scoring log. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228 Watchlist tally table. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22E-1 User data for Identix FaceIt Surveillance with watchlist size 100 . . . . . . . . . . 51E-2 User data for Identix FaceIt Surveillance with watchlist size 400 . . . . . . . . . . 52E-3 User data for Identix FaceIt Surveillance with watchlist size 1575 . . . . . . . . . . 53E-4 User data for Identix Argus with watchlist size 100 . . . . . . . . . . . . . . . . . 54E-5 User data for Identix Argus with watchlist size 400 . . . . . . . . . . . . . . . . . 55E-6 User data for Identix Argus with watchlist size 1575 . . . . . . . . . . . . . . . . . 56E-7 User data for Identix Argus with watchlist size 100 (new images) . . . . . . . . . . 57

Page 6 of 58

Page 7: Face Recognition at a Chokepoint - Scenario Evaluation Results

Face Recognition at a Chokepoint Scenario Evaluation Results

1 Introduction

Interest in face recognition has been on the rise since the 11 September 2001 terrorist attacks. Many government agencies have been considering the use of face recognition systems in airports and other important locations to search for known terrorists or to control access to secure areas. But how should these agencies determine if face recognition will meet their requirements and choose a suitable system for their application? The application of any technology should be based on the careful consideration of sound scientific test results, and face recognition is no exception. The aim of this report is to provide data that can be used to support that process as well as a detailed explanation of the evaluation that others can use to perform tests of their own.

The DoD Counterdrug Technology Development Program Office sponsored the FERET program [4, 5] from 1993 through 1997. It published the results of Facial Recognition Vendor Test 2000 (FRVT 2000) [2] in February, 2001, establishing a solid foundation for evaluating face recognition systems. FRVT 2000 consisted of a technology evaluation and a limited example of a scenario e­valuation. The top performing vendors from FRVT 2000, Identix Incorporated (formerly Visionics Corporation) and Viisage Technology Incorporated, were invited to take part in a more thorough scenario evaluation which is described in this report. During the planning of the evaluation, Viis­age withdrew from participation stating that they didn’t have the resources available to make the necessary changes to their user interface in order to provide the required data.

The concept of operations used for this scenario evaluation is that of a supervised chokepoint leading to a secure area. Individuals walking through the chokepoint look toward an overt face recognition system operating in either verification or watchlist mode. In verification mode, an individual ap­proaches the chokepoint and presents an identity using a smart card, proximity card, magnetic stripe card, or PIN. The face recognition system compares captured images of the individual’s face with face images stored for that identity. If the live and stored images don’t match within certain thresh­old criteria, an operator is notified. In watchlist mode, captured images of the individual’s face are compared with a watchlist of face images. If a match has a score greater than a certain threshold, an operator is notified.

For the verification scenario, a custom system manufactured by Identix was tested. For the watchlist scenario, two off-the-shelf systems from Identix were tested: FaceIt Surveillance and Argus. FaceIt Surveillance has been on the market for a number of years while Argus was first introduced in late 2001.

Volunteers for this evaluation walked or stood in front of system cameras while the video outputs of the cameras were recorded. The volunteers were cooperative and fully aware of the camera locations. At the completion of the data collection phase, the recorded video was played back as input to the face recognition systems using different threshold parameters and database sizes. Experiments were performed to study the impact of glasses and enrollment image quality. Accuracy and timing statistics generated for each system are presented in this report in the form of bar charts, histograms, watchlist identification characteristic (WIC) plots, and watchlist alarm characteristic (WAC) plots.

2 Evaluation Methodology

This evaluation is based on the philosophy for evaluating biometrics proposed by Phillips, et al [3] and expounded upon by Blackburn [1]. It consists of a three-step evaluation protocol: a top-level

Page 7 of 58

Page 8: Face Recognition at a Chokepoint - Scenario Evaluation Results

Face Recognition at a Chokepoint Scenario Evaluation Results

technology evaluation, followed by a scenario evaluation and an operational evaluation.

A technology evaluation is the most general and tests algorithms using a “universal” sensor. Tech­nology evaluations were performed in the Face Recognition Technology (FERET) series of evalua­tions [4, 5] and the Recognition Performance Test of FRVT 2000 [2].

A scenario evaluation tests the performance of complete systems for a class of applications. A lim­ited example of a scenario evaluation was performed in the Product Usability Test of FRVT 2000. The three-step evaluation protocol is intended to be followed from general to specific. The sce­nario evaluation described in this report maintains that sequence by evaluating technology that was selected for its performance in the technology evaluation of FRVT 2000. Complete systems were evaluated using two chokepoint scenarios: verification and watchlist. Both scenarios assume that individuals are cooperative and aware that the systems are in use and that an operator is monitoring each system.

The next step in this sequence would be an operational evaluation where a specific application is chosen. A system with scenario evaluation results that meet the application requirements would be installed at the location of intended use and the actual user population would take part in the evaluation. Results of the operational evaluation would be used to study the workflow impact caused by the addition of the new system.

3 Evaluation Overview

The following subsections give a brief overview of the evaluation process. For a more detailed description of the methods and procedures that were followed, see Section 4 and the operator in­structions in Appendix C.

3.1 Data Collection

Video from the camera(s) of each system was recorded for later playback as volunteers stood in place or walked as instructed for each scenario. See Sections 4.2.3 and 4.2.4 for details of the equipment used. Figures 1 and 2 show the number of times each volunteer showed up for verification and watchlist video recording. Each vertical line indicates the number of watchlist attempts made by an individual.

Page 8 of 58

Page 9: Face Recognition at a Chokepoint - Scenario Evaluation Results

Face Recognition at a Chokepoint Scenario Evaluation Results

Volunteer Attempt Histogram (1624 Attempts by 142 volunteers)

0

10

20

30

40

50A

ttem

pts

Volunteers

Figure 1: Histogram showing the number of verification attempts made by each volunteer.

Volunteer Attempt Histogram (1518 Attempts by 144 volunteers)

0

10

20

30

40

50

Atte

mpt

s

Volunteers

Figure 2: Histogram showing the number of watchlist attempts made by each volunteer.

The use of recorded video has several advantages over using strictly live sensor input. First, it allows the same input to be used multiple times for each system using different threshold or timeout settings for each trial. It will allow the evaluated systems to undergo further testing in the future using the same input data with different system settings. It will allow additional systems to be tested in the future and compared against the results published in this report. Finally, it provides data for future research.

For enrollment in the verification system, volunteers stood in front of a camera connected directly to the system. Enrollment was performed for each person according to vendor instructions. This was performed only once for each individual while the video recording was performed multiple times for each person. The database of enrollment images was used during data analysis for comparison with the recorded video. There was a time difference of 0–38 days between enrollment image

Page 9 of 58

Page 10: Face Recognition at a Chokepoint - Scenario Evaluation Results

Face Recognition at a Chokepoint Scenario Evaluation Results

collection and recording of video for recognition attempts. Sample enrollment images are shown in Figure 3. Due to the low resolution (100 x 125 pixels) of the enrollment images, they may look blocky as shown in this document. However, this is the resolution of the images produced by the verification enrollment process. The important thing to notice about these images is the even illumination provided by the controlled lighting.

For enrollment in the watchlist systems, a database of badge images was used. Details are given in Section 4.2.2.

Figure 3: Verification enrollment images. See Figure 4 for badge images of the same individuals.

3.2 Data Analysis

Once the data collection effort was complete, the recorded video was played back as input to each of the face recognition systems. Each system was operated as if it were using live camera input, but the output of a video recorder was connected to the system input in place of a camera. Each video segment was cued on the recorder and manually started when a verification or watchlist attempt was required.

The enrollment database used for verification was collected using live volunteers as described in the previous subsection. The verification system was configured to report the first match with a score meeting the configured threshold criteria along with the time it took to obtain that match.

For the watchlist scenario, the enrollment database was created using existing badge images as described in Section 4.2.2. The watchlist systems were configured to report the highest match with a score exceeding the threshold setting. The watchlist scenario was performed multiple times, each with a different size watchlist.

3.3 Performance Measures

Performance measures for this evaluation are reported in terms of accuracy and duration of attempts. The rest of this subsection describes how performance was measured and reported for this evalu­ation. See Appendix D for a more general overview of statistics used in the evaluation of face recognition systems.

Accuracy is measured as the number of user attempts that fall into certain categories as a function of the threshold setting (the minimum score required for a comparison to be considered a match). Categories used for the verification scenario are valid users accepted, valid users rejected, imposters rejected, and imposters accepted. Categories used for the watchlist scenario are person on watchlist (POWL) correctly identified, POWL incorrectly identified, POWL not alarmed, non-POWL not

Page 10 of 58

Page 11: Face Recognition at a Chokepoint - Scenario Evaluation Results

Face Recognition at a Chokepoint Scenario Evaluation Results

alarmed, and non-POWL alarmed. These categories should be useful to system planners who need to balance their required security level with exception handling resources.

Duration is measured as the length of time required for the system to determine the results of a match attempt. For the verification scenario, a histogram was plotted of the duration of successful attempts by valid users. This duration was a measure of the time, as reported by the system, starting with the entry of the ID number and ending with the system decision of successful verification. The duration of unsuccessful verification attempts was always 7 seconds, since this was the configured timeout value. The duration of watchlist attempts was not measured, but each volunteer stood in front of the system for at least three seconds after walking toward the camera for about five feet. These duration results should give some indication of the throughput requirements associated with the accuracy results obtained during this evaluation.

4 Evaluation Details

4.1 Timetable

The following timetable shows the events that transpired during the planning, preparation, and exe­cution of the evaluation.

Table 1: Timetable of Evaluation Activities

24 Sep 2001 Requested price quotes from Identix and Viisage 25 Sep 2001 Received quote from Viisage 27 Sep 2001 Received quote from Identix 17 Oct 2001 Identix responded to prime contractor used for equipment purchase 26 Oct 2001 E-mail sent to NAVSEA Dahlgren employees to solicit volunteers 26 Oct 2001 Posters displayed to solicit volunteers 09 Nov 2001 Preliminary test plan sent to both vendors 15 Nov 2001 Identix delivered and installed custom verification and FaceIt Surveillance sys­

tems 19 Nov 2001 Visited Identix per their request to discuss testing details 29 Nov 2001 Visited Viisage per their request to discuss testing details 03 Dec 2001 Final test plan sent to both vendors 04 Dec 2001 Completed equipment setup 05 Dec 2001 Started recording video for verification scenario 06 Dec 2001 Started recording video for watchlist scenario 11 Dec 2001 Started live enrollment for verification system 14 Dec 2001 Viisage withdrew their participation in the evaluation due to lack of resources

to make necessary user interface changes 18 Jan 2002 Completed video recording 18 Jan 2002 Created watchlist enrollment databases 28 Jan 2002 Identix delivered and installed ARGUS system 03 Mar 2002 Started data analysis 28 Jun 2002 Completed data analysis 19 Aug 2002 Started analysis of Argus with new watchlist 27 Aug 2002 Completed Argus analysis

Page 11 of 58

Page 12: Face Recognition at a Chokepoint - Scenario Evaluation Results

Face Recognition at a Chokepoint Scenario Evaluation Results

4.2 Preparations

The following subsections describe the preparations that were made before the evaluation began.

4.2.1 Volunteer Solicitation

Volunteers for this evaluation were found by displaying the poster shown in Appendix A at sev­eral locations at NAVSEA Dahlgren and sending an e-mail to employees and contractors. Those interested in volunteering were required to sign the consent form shown in Appendix B before their first day of participation. Volunteers were notified of the evaluation schedule via e-mail. They were encouraged to stop in the testing area for video recording several times each day if their schedules allowed.

4.2.2 Badge Image Database

A large database of face images was required for the watchlist scenario. To fill this requirement, the security office at NAVSEA Dahlgren was contacted. A request was made to obtain the badge images of all NAVSEA Dahlgren employees and contractors for use in this evaluation. Assurances were made that the images would not be used for any other purpose, that images would not be associated with names or any personal identifier other than the assigned badge ID number and that the images would remain in government control at all times. Security personnel agreed to this request and provided the database containing 14,612 images. During the course of the data collection effort, the images of volunteers were identified and additional images were selected at random to create databases of 100, 400, and 1575 images to be used for the data analysis. There was a time difference of 505–1580 days between badge image collection and recording of video for recognition attempts. Sample badge images are shown in Figure 4.

Figure 4: Badge images. See Figure 3 for verification enrollment images of the same individuals.

4.2.3 Evaluation Area

The data collection effort was performed in the lobby of building 1470 at NAVSEA Dahlgren. Two stations were setup: one for verification video recording and enrollment, the other for watchlist video recording. The two stations were setup at a right angle to each other as shown in Figure 5. This allowed the operators to sit in a central location while monitoring both stations.

The verification station was placed behind a large building support column, as detailed in Figures 6 and 7, to block any sunlight coming in through the glass doors of the main entrance. A white sheet

Page 12 of 58

Page 13: Face Recognition at a Chokepoint - Scenario Evaluation Results

Face Recognition at a Chokepoint Scenario Evaluation Results

was used as a background and two photographic flood lamps with reflector umbrellas were used to illuminate users.

The watchlist station was situated with the cameras facing a wall as detailed in Figures 8 and 9. No supplemental lighting was used at this station and the background contained a dark colored, stained, wooden door surrounded by a light colored, painted, metal frame and a wall covered with medium colored, low contrast, patterned wallpaper. Some of the overhead fluorescent lights were turned off to prevent glare on the faces of individuals at both stations. The locations of the remaining light fixtures are shown in Figure 10.

68’

28’

25’

60’

9’

68’

34’

20’

N

station

station

verification

watchlist

glass doors

glass doors

Figure 5: Building 1470 lobby layout. The verification and watchlist stations are detailed in Figures 6 and 8, respectively.

Page 13 of 58

Page 14: Face Recognition at a Chokepoint - Scenario Evaluation Results

Face Recognition at a Chokepoint Scenario Evaluation Results

2’-11” 2’-11”

1’-8

” 2’

-4”

3’-2

Volunteers stand here

Flood lamp with umbrella reflector, 68” high

Flood lamp with umbrella reflector, 68” high

Camera, 62” high at lens center

Support column, 55” diameter, covered with white sheet

Figure 6: Verification equipment setup.

Figure 7: Photo of verification station.

Page 14 of 58

Page 15: Face Recognition at a Chokepoint - Scenario Evaluation Results

Face Recognition at a Chokepoint Scenario Evaluation Results

8’-9” 5’-4”

Wal

l

Vol

unte

ers

stop

he

re f

or 3

sec

cam 1

cam 2, 3

Figure 8: Watchlist equipment setup.

Figure 9: Photo of watchlist station.

Page 15 of 58

Page 16: Face Recognition at a Chokepoint - Scenario Evaluation Results

Face Recognition at a Chokepoint Scenario Evaluation Results

68’

28’

25’

60’

9’

68’

34’

20’

N

Figure 10: Building 1470 lobby lighting layout. Each square symbol represents one 2 feet by 2 feet fluorescent light fixture. Lights that were turned off for the evaluation are not shown. Ceiling height is 10 feet.

4.2.4 Equipment Setup

The equipment for the verification station consisted of a face recognition system with associated camera, a digital video recorder, video monitor, flood lamps with reflective umbrellas, and a white sheet used as a background.

Watchlist equipment consisted of two face recognition systems, three cameras, three digital video recorders, and three video monitors. Two of the cameras were used to record video for the dual camera Argus system while the other was used for the single camera FaceIt Surveillance system. The cameras were arranged as shown in Figure 11. The recognition systems were not used during the video recording phase since live enrollment was not required.

Page 16 of 58

Page 17: Face Recognition at a Chokepoint - Scenario Evaluation Results

Face Recognition at a Chokepoint Scenario Evaluation Results

6”

cam 3

cam 2 cam 1

57.5” from lens center to floor

6”

Figure 11: Watchlist camera configuration.

The following gives more detailed information about the equipment used for the evaluation.

Face Recognition Systems:

All the face recognition systems used for this evaluation were manufactured by Identix Incorporated. A custom system was used for the verification scenario. Two off-the-shelf systems were used for the watchlist scenario: a single camera system called FaceIt Surveillance and a multiple camera system called Argus. Identix personnel installed the verification system and FaceIt Surveillance at the evaluation site then later installed Argus at the location where data analysis took place.

Cameras:

A Canon VC-C3 camera was used for the verification scenario. Three Sony EVI-D30 cameras were used for the watchlist scenario. Two of the Sony cameras were mounted side by side with the third mounted on top of one of the others. The vertically stacked cameras were used for the recognition system that takes multiple camera inputs (Argus) while the other was used for the single camera system (FaceIt Surveillance).

Lighting Equipment:

Two photgraphic flood lamps with reflective umbrellas were used to illuminate users for the ver­ification scenario. Philips 300 watt reflective flood lamps were used with Smith-Victor 100UL photgraphic light fixtures.

Digital Video Recorders:

Four Panasonic AG-DV2000 digital video tape recorders were used for recording the output of the cameras.

Data Recording Equipment:

Two laptop computers running spreadsheet applications were used for recording ground truth data such as user ID numbers, enrollment status, and the timecode of recorded video segments.

Page 17 of 58

Page 18: Face Recognition at a Chokepoint - Scenario Evaluation Results

Face Recognition at a Chokepoint Scenario Evaluation Results

4.2.5 Personnel

Two classes of personnel were required for the evaluation: operators and users. Operators controlled the face recognition systems and recorded data. Users stood or walked in front of the system cameras for the recording of video segments that were used later as input to the recognition systems. Some users were designated as imposters and were assigned the ID number of another valid user for the verification scenario. Some users were enrolled in the watchlist scenario database and were classified as persons on watchlist (POWL) while others, classified as non-POWL, were not enrolled. The imposters, the people they attempted to impersonate, and the POWL were chosen at random. Users and operators were not aware of these classifications.

Two test stations were setup in the building 1470 lobby: one for the verification scenario and one for the watchlist scenario. Two operators were required at each station, for a total of four operators. One operator at each station controlled and monitored the face recognition system (used for verification enrollment only), video recorders, and traffic flow while the other recorded data.

4.3 Verification Scenario Process

4.3.1 Overview

The scenario for verification is that of attempting to verify valid users at a chokepoint (i.e. employee access). In this scenario, users passing through the chokepoint stand in front of the system camera, present their assigned identity, and wait for a matching decision based on a one-to-one comparison. If the system returns a matching score that meets the threshold criteria, the user is accepted by the system and allowed access. Otherwise the user is rejected by the system and denied access.

The verification scenario evaluation was performed in four phases: video recording, enrollment, video playback, and data analysis. The video recording phase began first, and the enrollment phase began soon after. Once enrollment began, both phases were performed at the same time and ended on the same date. Each user may have gone through the video recording process multiple times throughout the evaluation, but was enrolled only once. After the first two phases were completed, the video playback phase was performed and the results were used for the data analysis phase.

4.3.2 Video Recording Phase

During the video recording phase, users stood in front of the system camera while the output of the camera was recorded using a digital video recorder. Operators manually verified each user’s identity using a database of badge images then logged the user’s ID number and the timecode from the recorder. The timecode was also stored in a substream of the recorded video to allow the identification of users during playback. Operators instructed the users to stand at a marked location 4 feet 2 inches in front of the camera, remove hats and tinted glasses, and slowly tilt their heads slightly forward and backward about one to two inches while looking at the camera with a neutral expression. The camera tilt was adjusted for the height of the user, as recommended by the vendor. Control of the camera tilt was handled through the software interface of the recognition system via serial communications with the camera. On subsequent visits, the camera tilt was automatically adjusted to the stored position for the user when the identification number was entered. Once the camera adjustments were made, the video recorder was placed in record mode for ten seconds then stopped. If the user had not been enrolled, and the enrollment phase was underway, the user

Page 18 of 58

Page 19: Face Recognition at a Chokepoint - Scenario Evaluation Results

Face Recognition at a Chokepoint Scenario Evaluation Results

continued standing at the same location for enrollment. Otherwise, the user moved to the location of the watchlist scenario evaluation.

The output produced during this phase was recorded video segments, possibly with multiple seg­ments of the same person, and a completed copy of the video log shown in Table 2.

Table 2: Verification video log.

TimecodeID Number Tape Number

4.3.3 Enrollment Phase

During the enrollment phase, users stood in front of the system camera in the same marked location used for video recording. Operators manually verified the user’s identity and ensured that the user had not been previously enrolled. The camera was connected to the recognition system via the video recorder pass-through output. Operators again instructed the users to slowly tilt their heads slightly forward and backward about one to two inches while looking at the camera with a neutral expression. Users were enrolled into the system using instructions provided by the vendor. The system captured five images of each user, each at a different angle, to allow a better chance of capturing an image at one of the same angles during recognition attempts. Users also tilted their heads during the recognition attempts to further increase the chance of matching one of the five enrollment angles. The badge ID number of each user was entered into the system as their assigned identity. Users were required to be enrolled for their results to be analyzed. If a user was not enrolled at the end of this phase, recorded video for that person was not used for video playback and data analysis.

The output produced during this phase was a vendor-specific database of enrollment images with associated ID numbers and a completed copy of the enrollment log shown in Table 3.

Table 3: Verification enrollment log.

ID Number Enrollment Date and Time

4.3.4 Video Playback Phase

Once users were enrolled and had posed for recorded video segments, the rest of the evaluation for this scenario was performed without user participation. An imposter list was created by randomly selecting 20% of the enrolled users. Each imposter was randomly assigned an imposter ID number corresponding to another enrolled user, without regard to age, sex, or race. The recorded video was then played back as input to the verification system. The recorder was first cued to the beginning of a video segment, then the ID number for the corresponding user, or imposter ID number for imposters, was entered into the verification system. The verification process and video playback were then initiated simultaneously. Once the verification process was completed, the video playback was stopped and the process was repeated using another video segment. After all video segments had been played back, the process was repeated using different threshold settings.

Page 19 of 58

Page 20: Face Recognition at a Chokepoint - Scenario Evaluation Results

Face Recognition at a Chokepoint Scenario Evaluation Results

The verification system was configured to perform matching until it first reaches a score above the threshold setting then report the time required to reach that score. It was configured to timeout after 7 seconds if it did not reach a score above the threshold.

The output produced during this phase was a completed copy of the verification attempt log shown in Table 4.

Table 4: Verification attempt log.

ThresholdID Number Imposter ID Number User Accepted? Matching Time

4.3.5 Data Analysis Phase

During this phase, data collected during the video playback phase was analyzed to determine ac­curacy and timing results. Results from verification attempts were tallied in a table similar to that shown in Table 5. In order to generate histogram plots, the number of valid user attempts requiring each possible matching time to the nearest tenth of a second was also tallied.

Table 5: Verification tally table.

Threshold

accepted rejected accepted rejected

Number of valid Number of valid Number of Number of user attempts user attempts imposter attempts imposter attempts

4.4 Watchlist Scenario Process

4.4.1 Overview

The watchlist evaluation uses the scenario of attempting to find individuals at a chokepoint (i.e. metal detector). In this scenario, users walking through the chokepoint stop and look at the system cameras for approximately three seconds, then continue walking. The system continuously com­pares found faces to images in a watchlist and displays the highest match that exceeds a certain threshold, along with a candidate list of other potential matches in descending order of similarity. The operator manually compares the top match with the live user and responds if an actual match is determined.

The watchlist scenario evaluation was performed in four phases: video recording, enrollment, video playback, and data analysis. The video recording phase began first and was the only phase requiring the presence of live users. Each user may have gone through the video recording process multiple times throughout the evaluation. The enrollment phase took place after the video recording phase. After the first two phases were completed, the video playback phase was performed and the results were used for the data analysis phase.

4.4.2 Video Recording Phase

During the video recording phase, users walked in front of three system cameras. Two were mounted side by side and the third sat on top of one of the others. The video output of each camera was

Page 20 of 58

Page 21: Face Recognition at a Chokepoint - Scenario Evaluation Results

Face Recognition at a Chokepoint Scenario Evaluation Results

recorded using digital video recorders. The output from the two vertically stacked cameras was used as input for the two camera Argus system. The output from the other camera was used as input for the single camera FaceIt Surveillance system. The vertically stacked cameras had overlapping fields of view to account for a range of different user heights. Operators logged the ID number of each user and the timecode from the recorders. The timecode was also stored in a substream of the recorded video to allow for the identification of users during playback. Operators instructed each user to remove hats and tinted glasses then stand with their back against a wall 14 feet 5 inches from the camera. The three video recorders were placed in record mode simultaneously and the operator instructed the user to look at the camera with a neutral expression, walk forward until reaching a marked location 9 feet from the camera, stand in place looking at a blinking light mounted just below the camera until it flashes three times, continue walking forward, then step to the left before reaching the camera. The video recorder was then stopped and the user left the testing area. It is estimated that each volunteer was visible to the camera for a minimum of about 7 seconds per attempt.

The output produced during this phase was recorded video segments from three cameras, possibly with multiple segments of the same person, and a completed copy of the video log shown in Table 6.

Table 6: Watchlist video log.

TimecodeID Number Tape Number

4.4.3 Enrollment Phase

Once users had posed for recorded video segments, the rest of the evaluation for this scenario was performed without user participation. Enrollment was performed using badge images for participat­ing users and other NAVSEA Dahlgren employees. The images were stored in JPEG format with pixel resolutions ranging from 188x222 to 300x400. They were enrolled into each watchlist sys­tem according to vendor instructions using the badge number as an identifier. The eye coordinates of each enrollment image were determined automatically by each system. It was then necessary to manually adjust eye coordinates for some images using tools provided with the systems. Three databases of different sizes were created using half of the participating users, called Persons on Watchlist (POWL), selected at random. The other half of the participating users (non-POWL) were left out of the database. Additional images from non-participants were added to increase the size of each database, resulting in final databases of size 100, 400, and 1575 images. These same databases were used for each watchlist system, however, another database of size 100 was created for use with the Argus system only. This additional database was created by taking the original 100 person database, and replacing the participant badge images with enrollment images from the verification system. Since the verification system stored five images of each person, one image of each person was selected that had minimal head tilt. The resolution of the new images was 100x125. The au­thors later learned that these images did not meet the minimum parameters recommended by Identix: resolution of 125x160 with eye separation of 50 pixels. Nonetheless, performance measures from the FRVT 2000 resolution experiments showed that the Identix system would not be significantly impacted.

The output produced during this phase was vendor-specific databases of ID numbers and enrollment images.

Page 21 of 58

Page 22: Face Recognition at a Chokepoint - Scenario Evaluation Results

Face Recognition at a Chokepoint Scenario Evaluation Results

4.4.4 Video Playback Phase

Once the video segments had been recorded and the enrollment databases had been created, the recorded video was played back as input to each watchlist system, once for each database. The recorders were first cued to the beginning of a video segment. The recognition process and video playback were then initiated simultaneously. Once the video segment ended, the recognition process was stopped.

At the end of each video segment, the ID number and matching score for the top match correspond­ing to the user in that segment was recorded. The process was then repeated using the rest of the video segments. Once all video segments had been processed, they were replayed using a database of a different size.

The output produced during this phase was a completed copy of the scoring log shown in Table 7 for each watchlist system.

Table 7: Watchlist scoring log.

Database ID Timecode Number Number Score

Tape Top match Size ID Number

4.4.5 Data Analysis Phase

During this phase, data collected during the video playback phase was analyzed to determine accu­racy results. Several threshold settings were used for analysis that were chosen by the vendor and the evaluators.

For POWL attempts, the top score was evaluated to determine if it met the threshold criteria. If it didn’t meet the criteria, this attempt was classified as a POWL not alarmed. If the score met the criteria, the ID number for that score was compared with the actual ID number of the POWL to determine if they matched. If a match was found, this attempt was classified as a POWL alarmed and correctly identified. If a match was not found, this attempt was classified as a POWL alarmed but incorrectly identified.

For non-POWL attempts, the top score for each attempt was searched to determine if it met the threshold criteria. If it didn’t meet the criteria, the attempt was classified as a non-POWL not alarmed. If the score did meet the criteria, the attempt was classified as a non-POWL alarmed.

Results for POWL attempts and non-POWL attempts were tallied and recorded in a table similar to that shown in Table 8 for each watchlist system.

Table 8: Watchlist tally table.

Threshold Correctly Incorrectly Not Not POWL POWL POWL non-POWL non-POWL

Alarmed ID’d ID’d Alarmed Alarmed

Page 22 of 58

Page 23: Face Recognition at a Chokepoint - Scenario Evaluation Results

Face Recognition at a Chokepoint Scenario Evaluation Results

5 Evaluation Results

5.1 Verification Scenario Results

The results of the verification scenario evaluation are presented in the form of bar graphs that classify the verification attempts into several result categories, and histogram plots that show the recognition time for each successful attempt by valid users.

The bar graph shown in Figure 12 shows the number of verification attempts that fall into the following categories.

Valid users accepted are attempts made by legitimate users, each using their assigned identifica­tion number, where the system has correctly granted access. This is shown in green where higher numbers are better. Results for this category are shown as both the raw number of attempts and as a percentage of total attempts by valid users.

Valid users rejected are attempts made by legitimate users, each using their assigned identification number, where the system has incorrectly denied access. This is shown in red where lower numbers are better. Results for this category are shown as both the raw number of attempts and as a percentage of total attempts by valid users.

Imposters rejected are attempts made by imposters, each using the identification number assigned to another user, where the system has correctly denied access. This is shown in green where higher numbers are better. Results for this category are shown as both the raw number of attempts and as a percentage of total attempts by imposters.

Imposters accepted are attempts made by imposters, each using the identification number assigned to another user, where the system has incorrectly granted access. This is shown in red where lower numbers are better. Results for this category are shown as both the raw number of attempts and as a percentage of total attempts by imposters.

The number of attempts falling into each of these categories is shown for each of the three different threshold settings used for the evaluation. Looking across the graph in Figure 12, one can see the trade-off associated with changing the threshold setting. As the threshold is increased, fewer imposters are accepted by the system, but at the expense of rejecting more valid users.

The bar graph in Figure 13 shows the number of verification attempts that fall into the same cate­gories described above. However, this graph shows results for the subset of attempts where volun­teers were wearing glasses for both enrollment and recognition attempts or were not wearing glasses for both enrollment and recognition attempts. In the case where glasses were present, users were asked to remove tinted glasses, but a different pair of glasses may have been worn for enrollment and recognition attempts. Only the presence or absence of glasses was recorded. In contrast, the results shown in Figure 12 include several attempts where volunteers were wearing glasses for one part of the process, but not both.

The effect of keeping the presence of glasses the same or allowing it to be different can be deter­mined by comparing Figures 12 and 13. The total number of attempts is different in each case, so percentages must be compared rather than the raw numbers of attempts. Comparing the bar graphs, it can be seen that the number of valid users accepted increased by 1.7–2.7% when the presence of glasses was kept the same. The number of imposters rejected fell by 1.2% for the lowest threshold, but rose by 0.1% for the other thresholds.

Page 23 of 58

Page 24: Face Recognition at a Chokepoint - Scenario Evaluation Results

Face Recognition at a Chokepoint Scenario Evaluation Results

Verification Results (1624 attempts by 142 volunteers)

Atte

mpt

s 1500

1000

500

0

1215 (97.8%)

27 (2.2%)

373 (97.6%)

9 (2.4%)

1177 (94.8%)

65 (5.2%)

380 (99.5%)

2 (0.5%)

1111 (89.5%)

131 (10.5%)

380 (99.5%)

2 (0.5%)

Valid users accepted Valid users rejected Imposters rejected Imposters accepted

9.3769 9.972 9.998 Threshold

Figure 12: Verification Scores for Identix custom verification system. Some volunteers were wearing glasses for enrollment but not certain verification attempts, and vice versa. 114 valid users made 1242 attempts. 28 imposters made 382 attempts

Verification Results (1434 attempts by 136 volunteers)

Atte

mpt

s

1500

1000

500

0

1177 (99.5%)

6 (0.5%)

242 (96.4%)

9 (3.6%)

1149 (97.1%)

34 (2.9%)

250 (99.6%)

1 (0.4%)

1091 (92.2%)

92 (7.8%)

250 (99.6%)

1 (0.4%)

Valid users accepted Valid users rejected Imposters rejected Imposters accepted

9.3769 9.972 9.998 Threshold

Figure 13: Verification Scores for Identix custom verification system. Attempts where volunteers were wearing glasses for enrollment but not the verification attempt, or vice versa, were not counted. 114 valid users made 1183 attempts. 22 imposters made 251 attempts.

The histogram plots in Figures 14–16 show the time required for the system to accept valid users. For each time duration shown on the x-axis, a line extends to a height marked on the y-axis that indicates the number of attempts requiring that duration. As the figures indicate, most valid users were accepted by the system in less than one second. This duration, reported by the recognition system, is measured starting from the time the system was instructed to start the recognition process after entering the user’s ID, and ending when the valid user was verified. In a real-world scenario,

Page 24 of 58

Page 25: Face Recognition at a Chokepoint - Scenario Evaluation Results

Face Recognition at a Chokepoint Scenario Evaluation Results

the duration of the attempt would be slightly longer due to the time required to present an identity using a card or PIN. The histogram plots do not include attempts where valid users were rejected due to timeouts or attempts made by imposters.

Verification Accept Time Histogram (1215 Attempts, Threshold: 9.3769)

0

100

200

300

400

Atte

mpt

s

0 1 2 3 4 5 6 7 Time (s)

Figure 14: Histogram showing time to accept valid users, including attempts where volunteers were wearing glasses for enrollment or verification attempts, but not both.

Verification Accept Time Histogram (1177 Attempts, Threshold: 9.972)

0

100

200

300

400

Atte

mpt

s

0 1 2 3 4 5 6 7 Time (s)

Figure 15: Histogram showing time to accept valid users, including attempts where volunteers were wearing glasses for enrollment or verification attempts, but not both.

Page 25 of 58

Page 26: Face Recognition at a Chokepoint - Scenario Evaluation Results

Face Recognition at a Chokepoint Scenario Evaluation Results

Verification Accept Time Histogram (1111 Attempts, Threshold: 9.998)

Atte

mpt

s

400

300

200

100

00 1 2 3 4 5 6 7

Time (s)

Figure 16: Histogram showing time to accept valid users, including attempts wherevolunteers were wearing glasses for enrollment or verification attempts, but not both.

5.2 Watchlist Scenario Results

The results of the watchlist scenario evaluation are presented in the form of bar graphs that classify the watchlist attempts into several result categories, as well as watchlist identification characteristic (WIC) plots and watchlist alarm characteristic (WAC) plots that show error rates as a function of threshold setting.

The bar graphs in Figures 17–19 show the number of watchlist attempts made using the FaceIt Surveillance system that fall into the following categories.

Persons on Watchlist (POWL) correctly ID’d are attempts made by enrolled users where the sys­tem has generated an alarm condition and reported the correct identity of the user. This is sometimes referred to as a hit. This is shown in green where higher numbers are better.

POWL incorrectly ID’d are attempts made by enrolled users where the system has generated an alarm condition, but has reported the identity of an enrolled person other than the user. Since we’re only looking at the top match, this category could also be referred to as a miss. This is shown in yellow since this alarm could be considered good or bad, depending on the exact application. It’s good that the POWL was alarmed, but it’s bad that the wrong identity was reported. If further scrutiny doesn’t uncover the true identity, the POWL may be let go by mistake.

POWL not alarmed are attempts made by enrolled users where the system has not generated an alarm condition. This is sometimes referred to as a miss. This is shown in red where lower numbers are better.

Non-POWL not alarmed are attempts made by users who are not enrolled where the system has not generated an alarm condition. This is shown in green where higher numbers are better.

Page 26 of 58

Page 27: Face Recognition at a Chokepoint - Scenario Evaluation Results

Face Recognition at a Chokepoint Scenario Evaluation Results

Non-POWL alarmed are attempts made by users who are not enrolled where the system has gen-erated an alarm condition. This is sometimes referred to as a false alarm. This is shown inred where lower numbers are better.

The number of attempts falling into each of these categories is shown for each of the five differentthreshold settings used. Each graph shows the results for a different watchlist size. Looking acrossthe graphs in Figures 17–19, it can be seen that the number of POWL correctly identified goes downas the threshold is raised, but fewer non-POWL are alarmed. Also, The number of non-POWLalarmed goes up as the size of the watchlist is increased. In most cases, the number of POWLcorrectly identified goes down as watchlist size increases, especially at the lower threshold settings.

7.5 7.8 8 8.2 8.40

200

400

600

800

145

232

429

347 365

8532

689

619

9343

4

759

690

22 15 0

791

711

1 1 0

805

712

0

POWL correctly ID’d POWL incorrectly ID’d POWL not alarmed Non-POWL not alarmed Non-POWL alarmed

Threshold

Watchlist Results(1518 attempts by 144 volunteers)

Atte

mpt

s

Figure 17: Watchlist Scores for Identix FaceIt Surveillance with watchlist size 100. 69POWL made 806 attempts. 75 non-POWL made 712 attempts.

7.5 7.8 8 8.2 8.40

200

400

600

800

105

401

300

209

503

71113

622

517

195

50 37

719

634

78

17 17

772

700

12 0 3

803

709

3

POWL correctly ID’d POWL incorrectly ID’d POWL not alarmed Non-POWL not alarmed Non-POWL alarmed

Threshold

Watchlist Results(1518 attempts by 144 volunteers)

Atte

mpt

s

Figure 18: Watchlist Scores for Identix FaceIt Surveillance with watchlist size 400. 69POWL made 806 attempts. 75 non-POWL made 712 attempts.

Page 27 of 58

Page 28: Face Recognition at a Chokepoint - Scenario Evaluation Results

Face Recognition at a Chokepoint Scenario Evaluation Results

Watchlist Results (1518 attempts by 144 volunteers)

Atte

mpt

s

800

600

400

200

0

70

540

196

127

585

58

226

522

400

312

38 77

691

592

120

13 16

777

683

29 2 3

801

708

4

POWL correctly ID’d POWL incorrectly ID’d POWL not alarmed Non-POWL not alarmed Non-POWL alarmed

7.5 7.8 8 8.2 8.4 Threshold

Figure 19: Watchlist Scores for Identix FaceIt Surveillance with watchlist size 1575. 69 POWL made 806 attempts. 75 non-POWL made 712 attempts.

The bar graph in Figure 20 shows the total number of alarms generated by FaceIt Surveillance as a function of both threshold and watchlist size. It can be seen that the number of alarms increases as the watchlist size is increased, and decreases as the threshold is raised.

Watchlist Results (1518 attempts by 144 volunteers)

0

500

1000

1500

742

1009

1195

210

379

596

69 165

236

16 46 59 1 6 10

Ala

rms

Watchlist size 100 Watchlist size 400 Watchlist size 1575

7.5 7.8 8 8.2 8.4 Threshold

Figure 20: Total alarms for Identix FaceIt Surveillance. 69 POWL made 806 attempts. 75 non-POWL made 712 attempts.

Figures 21–23 show bar graphs for the Argus system with the same watchlists used for the FaceIt Surveillance system. Note that different threshold settings are used here than for the FaceIt Surveil­lance system since each system uses a different threshold scale. In general, systems from different vendors will use different scales as well. Bar charts such as these cannot be directly compared among systems using different scales. However, some other plots, such as the watchlist identifica­tion characteristic (WIC) and watchlist alarm characteristic (WAC) curves, discussed below and in

Page 28 of 58

Page 29: Face Recognition at a Chokepoint - Scenario Evaluation Results

Face Recognition at a Chokepoint Scenario Evaluation Results

Appendix D, can be directly compared even when different scales are used.

The same observations made about the FaceIt Surveillance bar graphs apply to the Argus bar graphs,except for the number of POWL correctly identified as a function of watchlist size. In the Arguscase, the number of POWL correctly identified actually goes up slightly as the watchlist size isincreased, with the exception of the lowest threshold setting.

5.7 6.5 7 7.5 7.80

200

400

600

800

145 155

506 503

209

8629

691 683

2960

14

732 709

338

4

764712

027

0

779

712

0

POWL correctly ID’d POWL incorrectly ID’d POWL not alarmed Non-POWL not alarmed Non-POWL alarmed

Threshold

Watchlist Results(1518 attempts by 144 volunteers)

Atte

mpt

s

Figure 21: Watchlist Scores for Identix Argus with watchlist size 100. 69 POWL made806 attempts. 75 non-POWL made 712 attempts.

5.7 6.5 7 7.5 7.80

200

400

600

800

137

555

114 118

594

107157

542 535

177

6532

709678

34 374

765709

332

0

774

710

2

POWL correctly ID’d POWL incorrectly ID’d POWL not alarmed Non-POWL not alarmed Non-POWL alarmed

Threshold

Watchlist Results(1518 attempts by 144 volunteers)

Atte

mpt

s

Figure 22: Watchlist Scores for Identix Argus with watchlist size 400. 69 POWL made806 attempts. 75 non-POWL made 712 attempts.

Page 29 of 58

Page 30: Face Recognition at a Chokepoint - Scenario Evaluation Results

Face Recognition at a Chokepoint Scenario Evaluation Results

Watchlist Results (1518 attempts by 144 volunteers)

Atte

mpt

s

800

600

400

200

0

116

630

60 72

640

116

592

98 119

593

97

308

401 414

298

48 65

693 658

54 33 19

754

694

18

POWL correctly ID’d POWL incorrectly ID’d POWL not alarmed Non-POWL not alarmed Non-POWL alarmed

5.7 6.5 7 7.5 7.8 Threshold

Figure 23: Watchlist Scores for Identix Argus with watchlist size 1575. 69 POWL made 806 attempts. 75 non-POWL made 712 attempts.

The bar graph in Figure 24 shows the results for the Argus system with a watchlist size of 100 using new enrollment images. The badge images for each enrolled user were replaced with an image of the same person captured during enrollment on the verification system. These verification enrollment images were taken under more controlled lighting conditions than the badge images and therefore eliminate the harsh glare on the side of the face. This can be seen by comparing the sample badge images in Figure 4 with the sample verification enrollment images of the same individuals in Figure 3. The new images were also taken closer in time to the recognition attempts than the badge images: 0–38 days as opposed to 505–1580 days. The replacement of these images created a remarkable change in the measured results. This can be readily seen by comparing Figure 24 with Figure 21. The number of POWL correctly identified went up significantly with the new images. The number of non-POWL alarmed went up as well, but not nearly as much, and was more pronounced at the lower threshold settings. Since the lighting and temporal parameters were not controlled separately, it is not explicitly known how much of the performance degradation to attribute to either parameter. FRVT 2000 technology evaluation results did study these two parameters. Experiment I1 shows mild degradation due to lighting, but all temporal experiments show significant degradation. The underlying conclusion is that watchlist performance decreases significantly when using older images with less controlled lighting.

Page 30 of 58

Page 31: Face Recognition at a Chokepoint - Scenario Evaluation Results

Face Recognition at a Chokepoint Scenario Evaluation Results

Watchlist Results (1518 attempts by 144 volunteers)

Atte

mpt

s

800

600

400

200

0

614

70 122

441

271

509

22

275

647

65

427

5

374

691

21

332

3

471

706

6

272

2

532

710

2

POWL correctly ID’d POWL incorrectly ID’d POWL not alarmed Non-POWL not alarmed Non-POWL alarmed

5.7 6.5 7 7.5 7.8 Threshold

Figure 24: Watchlist Scores for Identix Argus with watchlist size 100. Images captured for verification enrollment were used to replace the badge images used for the watchlist in previous trials. 69 POWL made 806 attempts. 75 non-POWL made 712 attempts.

The bar graph in Figure 25 shows the total number of alarms generated by Argus as a function of both threshold and watchlist size. Looking at just the first three bars for each threshold, it can be seen that the number of alarms tend to increase with a larger watchlist and decrease as the threshold increases, just as with FaceIt Surveillance. The fourth bar for each threshold shows results for the new watchlist created from verification enrollment images. The number of alarms for this watchlist also decreases as the threshold is raised. Of special note is the fact that the number of alarms generated for the new watchlist of size 100 was significantly greater than the number generated for the original watchlist of the same size.

Watchlist Results (1518 attempts by 144 volunteers)

0

500

1000

1500

509

1286 1386

955

144

441

1301

596

77 131

703

453

42 44

167

341

27 34 70

276

Ala

rms

Watchlist size 100 Watchlist size 400 Watchlist size 1575 Watchlist size 100 (new images)

5.7 6.5 7 7.5 7.8 Threshold

Figure 25: Total alarms for Identix Argus. 69 POWL made 806 attempts. 75 non-POWL made 712 attempts.

Page 31 of 58

Page 32: Face Recognition at a Chokepoint - Scenario Evaluation Results

Face Recognition at a Chokepoint Scenario Evaluation Results

Figures 26 and 28 show watchlist identification characteristic (WIC) curves for both the FaceIt Surveillance and Argus systems. These plots show the probability of alarming and correctly iden­tifying a POWL vs. the probability of incorrectly alarming on a non-POWL. Data points along these plots indicate results for different threshold settings. Figures 27 and 29 show watchlist alarm characteristic (WAC) curves for each system. These plots show the probability of correct detec­tion (generating an alarm for a POWL) vs. the probability of incorrectly alarming on a non-POWL. Typically, the higher the curve on both WIC and WAC plots, the better.

Looking at the WIC curves in Figures 26 and 28, it can be seen that performance improves as the watchlist size decreases. In other words, for the same number of false alarms (inconvenience factor for non-POWL), the chance of alarming and correctly identifying a POWL increases with a smaller watchlist. The top curve in Figure 28 shows results obtained after replacing the badge images in the watchlist with verification enrollment images as described above. This curve shows a great improvement over the next lower curve using the same watchlist size with lower quality images. The top curve represents a “best case” watchlist since it involved cooperative subjects, controlled lighting, and a small time difference between enrollment and recognition.

Looking at the WAC curves in Figures 27 and 29, it can be seen that the probability of generating an alarm for a POWL changes very little as a function of watchlist size when badge images are used and is almost directly proportional to the probability of false alarm for a non-POWL. If a line were drawn from coordinates (0,0) to (100,100), it would show the expected results of random guessing. Most of the curves on both plots show performance only slightly better than random guessing. The notable exception is the top curve in Figure 29 which shows results for the watchlist created by replacing badge images with verification enrollment images. This curve shows many more alarms being generated and performance significantly better than random guessing. This shows that when using quality enrollment images, face recognition, while not 100% accurate, gives better results than randomly picking individuals. Comparing the top curves in Figures 28 and 29, it can be seen that the WAC curve is only slightly higher than the WIC curve. That means that for a large proportion of POWL alarms generated with the new image database, the POWL was correctly identified.

Watchlist Results (1518 attempts by 144 volunteers)

100

◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦◦◦◦◦◦◦◦

× × × × × × × × × × ×××××××××

���� � � � � � � � � � � ������

Watchlist Size: ◦ 100 × 40080 � 1575

60

40

20

×××0◦◦◦◦0 20 40 60 80 100

Probability of False Alarm for non-POWL (%)

Prob

. of

Cor

rect

ID

for

PO

WL

(%

)

Figure 26: Watchlist Identification Characteristic (WIC) for Identix FaceIt Surveillance. 69 POWL made 806 attempts. 75 non-POWL made 712 attempts.

Page 32 of 58

Page 33: Face Recognition at a Chokepoint - Scenario Evaluation Results

• •

Face Recognition at a Chokepoint Scenario Evaluation Results

Watchlist Results (1518 attempts by 144 volunteers)

Prob

. of

Cor

rect

Det

ectio

n (%

) 100

80

60

40

20

Watchlist Size: ◦ 100 × 400 � 1575

◦◦◦◦◦ ◦ ◦ ◦

◦ ◦

◦ ◦ ◦◦◦◦◦◦◦◦

×××× × × × ×

×

×

× ×

× × ××

××××××

���� � � �

� �

� � ��

������

00 20 40 60 80 100

Probability of False Alarm for non-POWL (%)

Figure 27: Watchlist Alarm Characteristic (WAC) for Identix FaceIt Surveillance. 69 POWL made 806 attempts. 75 non-POWL made 712 attempts.

Watchlist Results (1518 attempts by 144 volunteers)

Prob

. of

Cor

rect

ID

for

PO

WL

(%

) 100

80

60

40

20

0

Watchlist Size: ◦ 100 × 400 � 1575 • 100 (new images)

◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦ ◦ ◦◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦

××××××××××××××××××××××× × × × × × × × × × × × × ××××

�������������� � � � � � � � � � � � � �����• • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •

0 20 40 60 80 100 Probability of False Alarm for non-POWL (%)

Figure 28: Watchlist Identification Characteristic (WIC) for Identix Argus. 69 POWL made 806 attempts. 75 non-POWL made 712 attempts.

Page 33 of 58

Page 34: Face Recognition at a Chokepoint - Scenario Evaluation Results

• •

Face Recognition at a Chokepoint Scenario Evaluation Results

Prob

. of

Cor

rect

Det

ectio

n (%

)

Watchlist Results(1518 attempts by 144 volunteers)

100

80

60

40

20

�����������

Watchlist Size: ◦ 100 × 400 � 1575 • 100 (new images)

◦◦◦◦◦◦ ◦◦ ◦

◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦

××××××

× ×

× × ×

×× × × × × ×××××

��� � �� �

��

� �

��

� � �����

••••••••• •• • • • •• • • • ••• •• • • •• • • • • • • • • • • • •

××××××××××××××××××

0◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦0 20 40 60 80 100

Probability of False Alarm for non-POWL (%)

Figure 29: Watchlist Alarm Characteristic (WAC) for Identix Argus. 69 POWL made 806 attempts. 75 non-POWL made 712 attempts.

6 Summary

Three face recognition systems from Identix Incorporated were evaluated using recorded video of volunteers standing or walking toward cameras setup for two chokepoint scenarios: verification and watchlist.

A custom system was evaluated for verification using three different threshold settings. Accuracy results were reported for two situations. In the first situation, some users either wore glasses for enrollment but not recognition attempts, or the other way around. In the second situation, only attempts where the presence or absence of glasses was the same for enrollment and recognition were scored. Keeping the status of glasses the same resulted in slightly better performance. The number of valid users accepted rose 1.7–2.7%. The number of imposters rejected rose 0.1% at the higher threshold settings but fell 1.2% for the lowest threshold. When the presence of glasses was the same, valid users were accepted 92.2–99.5% of the time while imposters were rejected 96.4– 99.6% of the time. The time required to accept valid users was under 1 second for most attempts.

Two watchlist systems were evaluated: FaceIt Surveillance and Argus. Both were tested using five different threshold settings, but each uses a different threshold scale, so each was tested at different settings. Both systems were tested using the same three watchlists having sizes of 100, 400, and 1575. As the threshold was increased on both systems, fewer POWL were correctly identified and fewer non-POWL were alarmed. As the watchlist size was increased on the FaceIt Surveillance system, the number of POWL correctly identified tended to go down. However, as the watchlist size was increased on the Argus system, the number of POWL correctly identified went up slightly for most threshold settings. The number of non-POWL alarmed went up on both systems as the watchlist size was increased.

A new watchlist of size 100 was created for the Argus system using enrollment images from the verification system. The new images were captured using two flood lamps and showed visible improvement in illumination quality over the badge images used for the rest of the evaluation. The

Page 34 of 58

Page 35: Face Recognition at a Chokepoint - Scenario Evaluation Results

Face Recognition at a Chokepoint Scenario Evaluation Results

WIC and WAC plots showed a great improvement in accuracy for the Argus system using the new images as compared to the badge image watchlist of the same size.

Two general conclusions can be drawn from the reported data. First, marginal performance im­provements are gained by ensuring that users either wear glasses for both enrollment and verifica­tion attempts or that they do not wear glasses for either part of the process. Second, the accuracy of the watchlist systems improves significantly when using high quality images taken under controlled lighting conditions close in time to the recognition attempts.

References

[1] D. M. Blackburn. Evaluating Technology Properly – Three Easy Steps to Success. Corrections Today, 63(1):56–60, July 2001. http://www.frvt.org/DLs/FRVT1.pdf.

[2] D. M. Blackburn, J. M. Bone, and P. J. Phillips. Facial Recognition Vendor Test 2000 Eval­uation Report. Available online at http://www.frvt.org/FRVT2000/documents.htm, February 2001.

[3] P. J. Phillips, A. Martin, C. L. Wilson, and M. Przybocki. An Introduction to Evaluating Biomet­ric Systems. IEEE Computer, pages 56–63, February 2000. http://www.frvt.org/DLs/FERET7. pdf.

[4] P. J. Phillips, H. M. Moon, S. A. Rizvi, and P. J. Rauss. The FERET Evaluation Methodology for Face Recognition Algorithms. NISTIR 6264, National Institute of Standards and Technology, 1998. http://www.frvt.org/DLs/FERET1.pdf.

[5] S. A. Rizvi, P. J. Phillips, and H. M. Moon. The FERET Verification Testing Protocol for Face Recognition Algorithms. NISTIR 6281, National Institute of Standards and Technology, 1998. http://www.frvt.org/DLs/FERET2.pdf.

Page 35 of 58

Page 36: Face Recognition at a Chokepoint - Scenario Evaluation Results

Face Recognition at a Chokepoint Scenario Evaluation Results

Appendix A Volunteer Solicitation Poster

Page 36 of 58

Page 37: Face Recognition at a Chokepoint - Scenario Evaluation Results

Face Recognition at a Chokepoint Scenario Evaluation Results

Appendix B Volunteer Consent Form

Informed Consent Form: Face Recognition Queuing Evaluation

In response to the terrorist attacks of 11 September 2001, many federal agencies have been asking the DoD Counterdrug Technology Development Program Office about face recognition technology. The Program Office has been able to answer many of these questions through results obtained through previous efforts. One area that the Program Office has not been able to provide assistance is with queuing issues, which has never been studied. To help answer these questions, the Program Office, with concurrence from the BoD and Base Security, will be conducting a set of experiments in the coming weeks at NAVSEA Dahlgren Division.

Five different face recognition systems will soon be setup at different times at the two main en­trances to building 1470. The Program Office needs volunteers to make this evaluation successful. As they enter/exit the buildings, the volunteers would take a few seconds to see if the face recog­nition systems would correctly identify or verify their identity from a pre-established database that contains the images from NAVSEA Dahlgren ID badges. The accuracy of the results depends on how many volunteers we have and how many times they stop to work with the face recognition system.

The results from this evaluation are vitally important to the counterterrorism efforts of multiple federal agencies. If you are willing to volunteer to be a trial subject, please fill out the for­m below and return to individuals in the visitor control area of building 1470. For more in­formation about face recognition technology or previous Program Office efforts, please see http: //www.dodcounterdrug.com/facialrecognition. For information about this evaluation, please con­tact Bob Butler (Contractor) or Duane Blackburn (Deputy Program Manager).

“With my signature below, I hereby volunteer to be a subject in the queuing evaluation of face recognition systems being undertaken by the DoD Counterdrug Technology Development Program Office. I understand that pictures of my face will be taken and compared to images from NAVSEA Dahlgren’s Photo ID badge database using face recognition technology. No relationship to newly acquired images or times of building entrance/exit will be made to my name or other personally identifiable markers. These newly acquired images will be controlled by the Program Office or its appointed government representative and may be released, without identifying markers, to re­searchers in the face recognition community and published in reports that document the results of this evaluation. I understand that I will not be affected by my decision to participate or withdraw from the research, and may withdraw at any time without penalty. There will be no benefits to the volunteers or any risks in participating other than those normally associated with posing for a picture.”

Name (Printed)

Organization & Phone Number

Prox Badge ID# (found at bottom of NAVSEA Dahlgren ID Badge)

Page 37 of 58

Page 38: Face Recognition at a Chokepoint - Scenario Evaluation Results

Face Recognition at a Chokepoint Scenario Evaluation Results

Appendix C Operator Instructions

The following procedures are to be followed by operators during the video recording phase of the evaluation.

C.1 Morning Duties

Synchronize clocks on data logging computers and video recorders

1. Set clocks on all four video recorders to the same time according to manufacturer instructions.

2. Open date/time properties dialog on each data logging computer and synchronize with video recorders.

Ensure video tapes are labeled, in the proper recorder, and cued

1. Label each tape with station type (verification or watchlist), camera number, tape sequence number, and the word ”original”.

2. Just before inserting new tape into recorder, label with current date.

Create daily verification video log spreadsheet

1. Open the file FRV_Template.xls.

2. Save file with new name based on current date using the format FRVmmddyy.xls.

3. Fill out general section at the top with date and operator initials.

4. Enter tape sequence number in ’Tape’ column for first entry.

5. Save file.

Create daily watchlist video log spreadsheet

1. Open the file FRW_Template.xls.

2. Save file with new name based on current date using the format FRWmmddyy.xls.

3. Fill out general section at the top with date and operator initials.

4. Enter tape sequence number in ’Tape’ column for first entry.

5. Save file.

Setup Identix enrollment program if enrollment phase is underway

1. Open ’My Computer’ and move to folder C:\ChokepointEvaluation in the Explorer.

2. Click ’Shortcut to VerifyEnrollment.exe’ icon on desktop.

3. Ensure C:\ChokepointEvaluation folder is selected in ’Folder’ field.

4. Click ’New’ button.

Page 38 of 58

Page 39: Face Recognition at a Chokepoint - Scenario Evaluation Results

Face Recognition at a Chokepoint Scenario Evaluation Results

Ensure watchlist cameras are properly setup

1. With person at line, all cameras should be adjusted to fit 3–5 average size heads across.

2. Camera 2 should be aimed up to get tall people, camera 3 aimed down to get short people, with fields of view overlapping. Together, cameras 2 and 3 should cover a range of people 5’–0” to 6’–5” tall while standing at the line. Make sure the range of heights can still be covered with people standing against the wall and a few feet in front of line.

3. Ensure LED is blinking.

C.2 Verification Video Recording

Record user ID and timecode

1. Activate verification video log spreadsheet.

2. Move to ’Badge Number’ column and type user’s badge number.

3. Move to ’Time Code’ column and type Ctrl-t to enter the current time.

4. Save spreadsheet file.

Adjust camera

1. Adjust camera tilt so face is centered in field of view.

Instruct user

1. Remove hat and tinted glasses.

2. Look at the camera and keep neutral expression throughout process (i.e. no smiling, smirking, or frowning).

3. Slowly move head up and down about 1 inch until recording is finished.

4. Keep eyes on the camera (i.e. move your head, not your eyes).

Record video segment

1. Ensure recorder is in stop mode.

2. Press record.

3. One operator watch time on recorder and press stop after 15 seconds.

4. Other operator watch user to make sure he/she follows instructions for expression and head movement.

5. If user didn’t follow instructions, make a note in ’Comments’ column of spreadsheet. Create new entry for user in ’Badge Number’ and ’Time Code’ columns, and record another video segment.

Page 39 of 58

Page 40: Face Recognition at a Chokepoint - Scenario Evaluation Results

Face Recognition at a Chokepoint Scenario Evaluation Results

6. If end of tape is reached during recording, make a note in ’Comments’ column of spreadsheet. Insert next tape in sequence. Create new entry for user in ’Badge Number’ and ’Time Code’ columns. Enter new tape sequence number in ’Tape’ column. Save spreadsheet file. Record another video segment.

Direct user to next process

1. If enrollment phase is underway, continue with enrollment process while user continues s­tanding in place.

2. If enrollment phase is not underway, send user to watchlist station when clear.

C.3 Verification Enrollment

Check for previous enrollment

1. Search for current user’s badge number folder on verification system.

2. If it exists, send user to watchlist station when clear.

Instruct user

1. Look at the camera and keep neutral expression throughout process (i.e. no smiling, smirking, or frowning).

2. Slowly move head up and down about 1 inch until enrollment is finished.

3. Keep eyes on the camera (i.e. move your head, not your eyes).

Enroll using Identix system

1. Adjust camera tilt so face is centered.

2. Type badge number in ’Name’ field.

3. Click ’Enroll’ button.

4. Voice commands will guide user through process.

5. If system indicates enrollment was successful, click ’Save’ button.

6. If system indicates that enrollment failed, start enrollment again.

Direct user to next process

1. Send user to watchlist station if clear.

Page 40 of 58

Page 41: Face Recognition at a Chokepoint - Scenario Evaluation Results

Face Recognition at a Chokepoint Scenario Evaluation Results

C.4 Watchlist Video Recording

Adjust camera 1 if necessary

1. Have user stand at line and adjust tilt of camera 1 to get user’s head in center, if necessary.

Record user ID and timecode

1. Activate watchlist video log spreadsheet.

2. Move to ’Badge Number’ column and type user’s badge number.

3. Move to ’Time Code’ column and type Ctrl-t to enter the current time.

4. Save spreadsheet file.

Instruct user

1. Look at LED mounted at camera location and keep neutral expression throughout process (i.e. no smiling, smirking, or frowning).

2. Stand with back against projector room door and wait.

3. When instructed, walk forward to mark.

4. Stop and continue looking at LED until it blinks 3 times.

5. Continue walking toward camera.

6. Step to the left before reaching camera.

Record video segment

1. Ensure all recorders are in stop mode.

2. Press record sequence on remote to start all recorders simultaneously.

3. Instruct user to begin walking.

4. One operator watch user to make sure he/she follows instructions, other operator watch mon­itors to make sure user stays in field of view of cameras.

5. Press stop button on remote to stop all recorders simultaneously when user exits.

6. If user didn’t follow instructions or didn’t stay in field of view, make a note in ’Comments’ column of spreadsheet. Create new entry on spreadsheet for same user. Save spreadsheet file. Record another video segment.

7. If end of tape is reached during recording, make a note in ’Comments’ column of spreadsheet. Insert next tape in sequence for each recorder. Create new entry on spreadsheet for same user. Enter new tape sequence number in ’Tape’ column. Save spreadsheet file. Record another video segment.

Direct user out of evaluation area

1. User is finished and may continue on his/her way.

Page 41 of 58

Page 42: Face Recognition at a Chokepoint - Scenario Evaluation Results

Face Recognition at a Chokepoint Scenario Evaluation Results

C.5 Evening Duties

Create backup copies of logging spreadsheet files

1. Insert blank floppy disk into verification logging computer.

2. Copy daily verification video log spreadsheet FRVmmddyy.xls from hard disk to floppy.

3. Insert floppy disk into watchlist logging computer.

4. Copy daily watchlist video log spreadsheet FRWmmddyy.xls from hard disk to floppy.

5. Insert floppy disk into verification computer.

6. Copy all files from floppy to hard disk folder C:\ChokepointLogBackup.

7. Remove tapes from recorders.

8. Secure laptops and tapes in projector room.

9. Shut down vendor systems.

10. Turn off lights, monitors, and recorders.

Page 42 of 58

Page 43: Face Recognition at a Chokepoint - Scenario Evaluation Results

Face Recognition at a Chokepoint Scenario Evaluation Results

Appendix D Statistics Overview

D.1 Introduction

When discussing biometric system technical performance, it is first necessary to determine how it will ultimately be used. There are three basic methods currently being discussed in the literature:

• Identification: Closed universe test: Ranks the gallery by similarity to a probe. A probe is correctly identified if the identity of the top rank is the same as the probe.

• Verification: Open universe test: Determines if the claimed identity of a face is correct.

• Watch List: Open universe test: The gallery is the watchlist. Consists of a two stage process. First, determine if the person in a probe is a person in the gallery (on the watchlist). Next, if the probe is someone in the gallery, determine which person it is.

Each method has different statistics to characterize performance. Using statistics from one method to anticipate performance for a different method is technically incorrect, and can lead to significant estimation errors. Sections D.2–D.4 discuss each method and the statistics that are used to char­acterize performance. The remainder of Section D.1 provides a top-level overview of biometric system operation and performance testing.

D.1.1 Generic Biometric System Operation

In its most simple form, biometric systems operate using a three-step process.

1. A sensor takes an observation. The type of sensor and its observation depend on the type of biometrics device used. For face recognition, the sensor is a camera and the observation is a picture, or series of pictures. This observation gives us a “Biometric Signature” of the individual.

2. A computer algorithm “normalizes” the biometric signature so that it is in the same format (size, resolution, view, etc.) as the signatures on the system’s database. The normalization of the biometric signature gives us a “Normalized Signature” of the individual.

3. A matcher compares the normalized signature with the set (or sub-set) of normalized sig­natures on the system’s database. A measure of similarity (similarity score) or difference (distance measure) is computed for each comparison of normalized signatures. Note: The de­scriptions used in the remainder of this appendix will use similarity scores rather than distance measures.

D.1.2 Evaluation Types

Evaluations of biometric technology are divided into three categories: technology, scenario, and operational. Each category of evaluation takes a different approach and studies different aspects of the technology as it will be used for a specific application. A thorough evaluation of a technology for a specific purpose starts with a technology evaluation, followed by a scenario evaluation and finally an operational evaluation.

Page 43 of 58

Page 44: Face Recognition at a Chokepoint - Scenario Evaluation Results

Face Recognition at a Chokepoint Scenario Evaluation Results

The goal of a technology evaluation is to determine the underlying technical capabilities of the systems for a particular technology, in this case facial recognition. The testing is performed in labo­ratories using a standard set of data that was collected by a universal sensor. Technology evaluations are always completely repeatable. Technology evaluations typically take a short time to complete, depending on the type of technology being evaluated. Results from a technology evaluation typical­ly show specific areas that require future research and development, as well as provide performance data that can be used to select algorithm(s) for scenario evaluations.

Scenario evaluations aim to evaluate the overall capabilities of the entire system for a specific ap­plication. In face recognition a technology evaluation would study the face recognition algorithms only but the scenario evaluation studies the entire system, including camera and camera-algorithm interface, in a given application. Each tested system would normally have its own acquisition sen­sor and would thus receive slightly different data. Scenario evaluations are not always completely repeatable for this reason, but the approach used can always be completely repeatable. Scenario evaluations typically take a few weeks to complete because multiple trials, and for some scenario evaluations, multiple trials of multiple subjects/areas, must be completed. Results from a scenari­o evaluation typically show areas that require future system integration work, as well as provide performance data on systems as they are used for a specific application.

At first glance, an operational evaluation appears very similar to a scenario evaluation, except that it is being performed at the actual site and uses actual subjects. Rather than testing for performance, however, operational evaluations aim to study the workflow impact of specific systems installed for a specific purpose. Operational Evaluations are not very repeatable unless the actual operational environment naturally creates repeatable data. Operational Evaluations typically last from several weeks to several months, as workflow performance must be measured prior to technology insertion and again after the technology insertion is being used on a routine basis and then compared.

In an ideal three-step evaluation process, technology evaluations are performed on all applicable technologies that could conceivably meet requirements. Results from the technology evaluation can be provided to the technical community for future R&D work, while also providing data that end­users can use to select promising systems for application-specific scenario evaluations. Results from the scenario evaluation will enable end-users to find the best system for their specific application and have a good understanding of how it will operate at the proposed location. This performance data, combined with workflow impact data from subsequent operational evaluations, will enable decision makers to develop a solid business case for a large-scale installation.

Although tempting, it is not advisable to attempt to measure technical performance solely from an operational evaluation. First, technology and scenario evaluation results determine which system would be best for a specific application. Without these evaluations, the selection of a system for an operational evaluation would be based on marketing hype and intuition. Second, results from technology and scenario evaluations help end-users determine how to setup a system for optimal performance, both physically and algorithmically. Third, it is very difficult to properly ground-truth all subjects in an operational evaluation; so technical performance data will not be as accurate as in technology and scenario evaluations.

D.2 Identification

The identification method of using biometric technology follows the three-step process outlined in section D.1. It is a closed-universe test. That is, the sensor takes an observation of an individual that

Page 44 of 58

Page 45: Face Recognition at a Chokepoint - Scenario Evaluation Results

Face Recognition at a Chokepoint Scenario Evaluation Results

is known to be in the database. That person’s normalized signature is compared to the other normal­ized signatures in the system’s database and a similarity score is developed for each comparison.

Prob

. of

Ide

ntifi

catio

n (%

) These similarity scores are then numerically ranked so that the highest similarity score is first. In an ideal operation, the highest similarity score is the comparison of that person’s recently acquired normalized signature with that person’s normalized signature in the database. The percentage of times that the highest similarity score is the correct match for all individuals is referred to as the “top match score.”

It is also possible to look at the top five numerically ranked similarity scores to see if any of those five similarity scores is the comparison of that person’s recently acquired normalized signature with that person’s normalized signature in the database. The percentage of times that one of those five similarity scores is the correct match for all individuals is referred to as the “Rank n score”, where n=5 for this example. A plot of rank n scores versus probability of correct identification is called a Cumulative Match Score. An example is given in Figure D-1.

Cumulative Match Score 100

80

60

40

20

00 20 40 60 80 100 120 140

Rank

Figure D-1: Sample cumulative match score (CMS).

It must be noted that there are few real-life applications where a biometric system would be utilized in identification mode. Even so, cumulative match scores are very helpful in determining the impact of various conditions (e.g. lighting, pose, etc.) on face recognition system performance. Knowledge gained from analysis of cumulative match scores will help end-users better select ideal operating conditions and provide stimulus for issues-targeted research. Identification was not tested as part of this evaluation.

D.3 Verification

The verification method of using biometric technology follows the three-step process outlined in section D.1. It is an open-universe test. That is, the sensor takes an observation of an individual that may or may not be in the system database. Through some method, the individual makes a claim as to which normalized signature in the database is theirs. That person’s recently acquired normalized signature is compared to their claimed normalized signature in the system’s database and a similarity score is developed for that comparison. If the numerical value of that similarity score is higher than a preset threshold, the system makes the decision that the individual is who they

Page 45 of 58

Page 46: Face Recognition at a Chokepoint - Scenario Evaluation Results

Face Recognition at a Chokepoint Scenario Evaluation Results

claimed to be. If the numerical value of that similarity score is lower than the preset threshold, the system makes the decision that the individual is not who they claimed to be.

Prob

. of

Ver

ifica

tion

(%)

It is possible to have two types of errors when operating under the verification method. The first occurs when the individual makes an errant claim as to their identity, but the returned similarity score is higher than the preset threshold. In this case, the system thinks that the individual is who they say they are, even though they are not. This is called a “false alarm” or a “false accept.” The percentage of times that a false alarm/accept occurs across all individuals is called the “false alarm rate” or “false accept rate.” Note - the remainder of this appendix will use the term “false alarm rate.”

The second type of error when operating under the verification method is when the individual makes a proper claim as to their identity, but the returned similarity score is lower than the preset threshold. In this case, the system thinks that the individual is not who they say they are, even though they really are. This is called a “false reject.” The percentage of times that a false reject occurs across all individuals is called the “false reject rate”. Subtracting this rate from 100% (100% - false alarm rate) gives us the “Probability of Verification”.

The false alarm rate and the probability of verification are not mutually exclusive. Instead, there is a give-take relationship between the two. The system parameters can be changed to receive a lower false alarm rate, but this also lowers the probability of verification. A plot that shows this relationship is called a “receiver operating characteristic” or “ROC”. An example ROC curve is shown in Figure D-2.

Receiver Operating Characteristic100

80

60

40

20

00 20 40 60 80 100

False Alarm Rate (%)

Figure D-2: Sample receiver operating characteristic (ROC).

Some resources utilize a term called the “equal error rate” to show performance of a biometric system when operating using the verification method. The equal error rate is the rate at which the false alarm rate is exactly equal to the false reject rate. If a straight line was drawn on our example ROC curve above from the upper left corner (coordinates 0,1) to the lower right corner (coordinates 1,0), the equal error rate is the point at which the graph crosses this line. This one point on the curve is not adequate to fully explain the performance of biometric systems being used for verification. This is especially true for real life applications as operators prefer to set system parameters to achieve either a low false alarm rate or high probability of verification.

Page 46 of 58

Page 47: Face Recognition at a Chokepoint - Scenario Evaluation Results

Face Recognition at a Chokepoint Scenario Evaluation Results

D.4 Watchlist

Prob

. of

Cor

rect

Ide

ntifi

catio

n (%

) The watchlist method of using biometric technology follows the three-step process outlined in sec­tion D.1. It is an open-universe test. That is, the sensor takes an observation of an individual that may or may not be in the system database. That person’s normalized signature is compared to the other normalized signatures in the system’s database and a similarity score is developed for each comparison. These similarity scores are then numerically ranked so that the highest similarity score is first. Again, the highest similarity score is referred to as “top match.” If a similarity score is higher than a preset threshold, an alarm is provided. If an alarm is provided, the system thinks that the individual is located in the system’s database.

There are three items of interest for watchlist applications. The first is the percentage of times the system alarms and correctly identifies (top match) a person on the watchlist. This is called the “probability of correct ID.” The second is the percentage of times the system alarms when the individual is on the watchlist, but may or may not be the top match. This is called “probability of correct detection.” The third item of interest is the percentage of times the system alarms for an individual that is not on the watchlist. This is called the “probability of false alarm.”

Two graphs are required to show performance for watchlist applications. The first, called the “watchlist identification characteristic (WIC)”, is a plot of the probability of correct identification versus the probability of false alarm. The second, called the “watchlist alarm characterisic (WAC)”, is a plot of the probability of correct detection versus the probability of false alarm. Example WIC and WAC curves are given in Figures D-3 and D-4.

Watchlist Identification Characteristic 100

80

60

40

20

00 20 40 60 80 100

Probability of False Alarm (%)

Figure D-3: Sample watchlist identification characteristic (WIC).

Page 47 of 58

Page 48: Face Recognition at a Chokepoint - Scenario Evaluation Results

Face Recognition at a Chokepoint Scenario Evaluation Results

Watchlist Alarm Characteristic 100

Prob

. of

Cor

rect

Det

ectio

n (%

) 80

60

40

20

00 20 40 60 80 100

Probability of False Alarm (%)

Figure D-4: Sample watchlist alarm characteristic (WAC).

Page 48 of 58

Page 49: Face Recognition at a Chokepoint - Scenario Evaluation Results

Face Recognition at a Chokepoint Scenario Evaluation Results

Appendix E Subject Information

Atte

mpt

s E.1 Introduction

One question that arises when performing this type of evaluation, and showing results, is whether you show results with respect to events or with respect to users. The results shown in the body of this report show results with respect to events. There is an argument in the community that showing results in this manner is not statistically valid because some subjects frequented the system more than others.

One alternative is to calculate performance for each subject, and then to show performance averaged over subjects. Others have argued that this approach is also statistically invalid because you do not obtain a realistic assessment of the number of anticipated alarms. This approach also has another unanswered question: How many trials by each subject must be made to obtain statistical relevance for averaging purposes?

In an ideal evaluation, data would be available for a few thousand individuals making an identically high number of multiple attempts. Unfortunately, this is extremely difficult and costly to fulfill.

Seeing no clear answer to this dilemma, and a limited budget for acquisition of subject trials, the authors chose to report results with respect to events while also providing information about subject trials in this appendix. The authors’ hope is that the information provided in this report will serve as a basis for further discussion and research in this topic area that can be used for future evaluations.

E.2 Subject Attempt Information

The first item of interest is the number of times each volunteer was recorded for verification and watchlist attempts. This is best described via histograms in Figures E-1 and E-2.

Volunteer Attempt Histogram(1624 Attempts by 142 volunteers)

50

40

30

20

10

0Volunteers

Figure E-1: Histogram showing the number of verification attempts made by eachvolunteer.

Page 49 of 58

Page 50: Face Recognition at a Chokepoint - Scenario Evaluation Results

Face Recognition at a Chokepoint Scenario Evaluation Results

Volunteer Attempt Histogram (1518 Attempts by 144 volunteers)

50A

ttem

pts

40

30

20

10

0Volunteers

Figure E-2: Histogram showing the number of watchlist attempts made by each volunteer.

The second item of interest is the mean and standard deviation of the results returned by the system for each volunteer. These are shown in Tables E-1 through E-7 for POWL and non-POWL watchlist attempts. A score may not have been returned by the systems if no scores above the threshold were generated. For testing purposes, the threshold was set to 6.0 for FaceIt Surveillance and 5.0 for Argus. The reported mean and standard deviation include only attempts where a score was returned.

Page 50 of 58

Page 51: Face Recognition at a Chokepoint - Scenario Evaluation Results

Face Recognition at a Chokepoint Scenario Evaluation Results

Table E-1: User data for Identix FaceIt Surveillance with watchlist size 100

User Attempts

Attempts

Score Scores Scores User

Attempts

Attempts

Score Scores Scores 1 1 1 7.0 0.0 1 7 7 7.3 0.3 2 11 11 6.9 0.2 2 14 14 7.5 0.2 3 16 16 7.5 0.1 3 29 29 7.5 0.1 4 28 28 6.9 0.2 4 6 6 7.2 0.1 5 28 28 7.5 0.2 5 28 28 7.6 0.2 6 4 4 7.3 0.2 6 9 9 7.5 0.2 7 3 3 7.1 0.3 7 32 32 7.4 0.2 8 7 7 7.9 0.1 8 23 23 7.6 0.1 9 2 2 7.3 0.3 9 22 22 7.7 0.1

10 6 6 7.6 0.1 10 7 7 7.2 0.1 11 8 8 6.5 0.2 11 1 1 7.0 0.0 12 17 17 7.3 0.2 12 10 6 7.5 0.2 13 31 31 7.6 0.2 13 7 7 7.5 0.1 14 19 19 7.1 0.1 14 2 2 7.2 0.1 15 10 10 7.3 0.1 15 19 19 7.4 0.2 16 19 19 8.0 0.2 16 23 23 7.4 0.1 17 21 21 7.3 0.2 17 9 9 6.8 0.3 18 8 8 7.3 0.2 18 2 2 7.3 0.1 19 7 7 7.0 0.3 19 9 9 7.4 0.2 20 12 12 7.3 0.1 20 5 5 7.6 0.1 21 14 14 8.0 0.2 21 2 2 6.9 0.2 22 14 14 7.5 0.1 22 5 2 7.8 0.0 23 4 4 7.9 0.1 23 6 6 7.2 0.2 24 3 3 7.6 0.1 24 1 1 7.1 0.0 25 6 5 7.5 0.2 25 11 11 7.5 0.1 26 12 12 7.6 0.1 26 3 3 7.0 0.2 27 15 15 7.6 0.1 27 9 9 7.0 0.2 28 6 6 7.5 0.1 28 7 7 7.4 0.2 29 20 20 7.4 0.2 29 1 1 7.6 0.0 30 7 7 7.8 0.1 30 3 3 7.5 0.1 31 3 3 7.7 0.2 31 21 21 7.3 0.2 32 12 12 7.6 0.1 32 20 20 7.7 0.1 33 3 3 7.1 0.2 33 16 16 7.4 0.1 34 19 15 7.2 0.3 34 2 2 7.4 0.1 35 1 1 7.2 0.0 35 19 19 7.6 0.2 36 5 5 6.9 0.3 36 6 6 7.3 0.1 37 15 15 7.2 0.1 37 1 1 7.6 0.0 38 12 12 7.4 0.1 38 4 4 7.7 0.1 39 18 18 7.1 0.2 39 5 5 7.2 0.4 40 21 21 7.2 0.2 40 2 2 7.7 0.2 41 18 18 7.4 0.2 41 9 9 7.1 0.2 42 10 10 6.8 0.3 42 13 13 7.4 0.1 43 8 8 7.9 0.2 43 4 4 7.2 0.3 44 2 2 7.5 0.1 44 10 10 7.3 0.1 45 17 17 7.3 0.2 45 12 11 6.6 0.2 46 20 19 6.8 0.4 46 16 16 7.5 0.1 47 20 20 7.3 0.1 47 4 4 7.6 0.1 48 9 9 7.7 0.2 48 30 29 7.4 0.2 49 6 6 7.8 0.2 49 1 1 6.9 0.0 50 18 17 7.7 0.2 50 4 4 7.4 0.1 51 14 14 7.8 0.2 51 1 1 7.7 0.0 52 14 14 8.2 0.2 52 1 1 7.6 0.0 53 11 11 7.8 0.2 53 7 7 7.9 0.1 54 23 23 7.4 0.1 54 3 3 7.4 0.1 55 6 6 7.3 0.2 55 3 3 6.8 0.2 56 5 5 7.3 0.0 56 15 15 7.7 0.2 57 4 4 7.6 0.1 57 2 2 7.5 0.2 58 7 7 7.0 0.2 58 4 4 7.5 0.1 59 4 4 8.0 0.2 59 1 1 7.5 0.0 60 42 42 7.5 0.1 60 8 8 7.4 0.2 61 4 4 7.7 0.1 61 17 17 7.3 0.1 62 10 10 7.5 0.1 62 3 3 7.0 0.1 63 6 6 7.6 0.1 63 5 5 7.0 0.3 64 19 19 7.3 0.1 64 13 13 7.3 0.2 65 7 7 7.1 0.3 65 11 11 7.8 0.1 66 10 10 7.4 0.2 66 8 8 7.3 0.2 67 2 2 7.9 0.1 67 39 38 7.9 0.1 68 21 21 7.4 0.1 68 26 26 7.6 0.2 69 3 3 7.6 0.1 69 8 8 7.6 0.2

70 8 8 7.3 0.2 71 2 2 6.7 0.4 72 19 19 7.1 0.2 73 1 1 6.9 0.0 74 2 2 7.7 0.1 75 2 2 7.7 0.1

POWL non-POWL

Total Returning

Mean of Returned

Std. Dev. of Returned

Total Returning

Mean of Returned

Std. Dev. of Returned

Page 51 of 58

Page 52: Face Recognition at a Chokepoint - Scenario Evaluation Results

Face Recognition at a Chokepoint Scenario Evaluation Results

Table E-2: User data for Identix FaceIt Surveillance with watchlist size 400

User Attempts

Attempts

Score Scores Scores User

Attempts

Attempts

Score Scores Scores 1 1 1 7.1 0.0 1 7 7 7.2 0.2 2 11 11 7.0 0.2 2 14 14 7.6 0.2 3 16 16 7.6 0.1 3 29 29 7.5 0.1 4 28 28 7.0 0.3 4 6 6 7.2 0.1 5 28 28 7.6 0.2 5 28 28 7.7 0.2 6 4 4 7.4 0.2 6 9 9 7.6 0.1 7 3 3 7.3 0.1 7 32 32 7.5 0.2 8 7 7 8.2 0.1 8 23 23 7.8 0.1 9 2 2 7.4 0.2 9 22 22 7.9 0.1

10 6 6 7.8 0.1 10 7 7 7.3 0.1 11 8 8 6.8 0.2 11 1 1 7.1 0.0 12 17 17 7.4 0.2 12 10 7 7.5 0.2 13 31 31 7.7 0.1 13 7 7 7.6 0.2 14 19 19 7.2 0.1 14 2 2 7.1 0.1 15 10 10 7.4 0.1 15 19 19 7.5 0.1 16 19 19 8.0 0.2 16 23 23 7.7 0.1 17 21 21 7.6 0.2 17 9 9 6.8 0.3 18 8 8 7.4 0.2 18 2 2 7.6 0.2 19 7 7 7.3 0.3 19 9 9 7.4 0.2 20 12 12 7.3 0.1 20 5 5 7.8 0.1 21 14 14 8.0 0.2 21 2 2 7.1 0.1 22 14 14 7.7 0.1 22 5 2 7.9 0.2 23 4 4 8.2 0.1 23 6 6 7.2 0.2 24 3 3 7.6 0.1 24 1 1 7.6 0.0 25 6 6 7.7 0.1 25 11 11 7.5 0.2 26 12 12 7.8 0.1 26 3 3 7.1 0.2 27 15 15 7.6 0.1 27 9 9 7.1 0.1 28 6 6 7.7 0.1 28 7 7 7.7 0.2 29 20 20 7.7 0.2 29 1 1 8.0 0.0 30 7 7 7.9 0.2 30 3 3 7.7 0.1 31 3 3 7.7 0.1 31 21 21 7.5 0.2 32 12 12 7.6 0.1 32 20 20 7.9 0.1 33 3 3 7.2 0.2 33 16 16 7.5 0.1 34 19 16 7.3 0.1 34 2 2 7.7 0.0 35 1 1 7.6 0.0 35 19 19 7.7 0.2 36 5 5 6.9 0.2 36 6 6 7.4 0.1 37 15 15 7.4 0.2 37 1 1 7.9 0.0 38 12 12 7.5 0.1 38 4 4 7.8 0.1 39 18 18 7.2 0.2 39 5 5 7.3 0.1 40 21 21 7.5 0.2 40 2 2 8.0 0.1 41 18 18 7.5 0.1 41 9 9 7.4 0.2 42 10 10 6.9 0.2 42 13 13 7.6 0.1 43 8 8 8.2 0.2 43 4 4 7.5 0.4 44 2 2 7.9 0.1 44 10 10 7.4 0.1 45 17 17 7.3 0.2 45 12 11 6.8 0.3 46 20 19 6.9 0.2 46 16 16 7.6 0.1 47 20 20 7.4 0.1 47 4 4 7.6 0.1 48 9 9 7.8 0.1 48 30 29 7.6 0.2 49 6 6 7.9 0.2 49 1 1 7.1 0.0 50 18 18 7.8 0.4 50 4 4 7.5 0.1 51 14 14 7.9 0.2 51 1 1 7.7 0.0 52 14 14 8.1 0.2 52 1 1 7.6 0.0 53 11 11 7.8 0.1 53 7 7 7.9 0.1 54 23 23 7.7 0.1 54 3 3 7.5 0.1 55 6 6 7.3 0.3 55 3 3 7.0 0.3 56 5 5 7.3 0.1 56 15 15 7.8 0.2 57 4 4 7.7 0.1 57 2 2 7.8 0.5 58 7 7 7.2 0.2 58 4 4 7.5 0.1 59 4 4 8.0 0.1 59 1 1 7.7 0.0 60 42 42 7.6 0.1 60 8 8 7.6 0.2 61 4 4 7.7 0.0 61 17 17 7.4 0.1 62 10 10 7.7 0.2 62 3 3 7.2 0.1 63 6 6 7.7 0.1 63 5 5 7.1 0.1 64 19 19 7.6 0.1 64 13 13 7.5 0.2 65 7 7 7.3 0.4 65 11 11 7.9 0.2 66 10 10 7.5 0.2 66 8 8 7.5 0.2 67 2 2 8.1 0.1 67 39 39 8.1 0.1 68 21 21 7.7 0.2 68 26 26 7.7 0.2 69 3 3 7.7 0.2 69 8 8 7.7 0.1

70 8 8 7.5 0.1 71 2 2 6.7 0.3 72 19 19 7.4 0.2 73 1 1 6.9 0.0 74 2 2 7.8 0.3 75 2 2 7.8 0.1

POWL non-POWL

Total Returning

Mean of Returned

Std. Dev. of Returned

Total Returning

Mean of Returned

Std. Dev. of Returned

Page 52 of 58

Page 53: Face Recognition at a Chokepoint - Scenario Evaluation Results

Face Recognition at a Chokepoint Scenario Evaluation Results

Table E-3: User data for Identix FaceIt Surveillance with watchlist size 1575

User Attempts

Attempts

Score Scores Scores User

Attempts

Attempts

Score Scores Scores 1 1 1 7.4 0.0 1 7 7 7.3 0.1 2 11 11 7.1 0.1 2 14 14 7.7 0.2 3 16 16 7.7 0.1 3 29 29 7.6 0.2 4 28 28 7.1 0.2 4 6 6 7.4 0.1 5 28 28 7.7 0.2 5 28 28 7.9 0.2 6 4 4 7.6 0.2 6 9 9 7.8 0.1 7 3 3 7.5 0.2 7 32 32 7.7 0.1 8 7 7 8.2 0.1 8 23 23 7.8 0.2 9 2 2 7.5 0.4 9 22 22 8.0 0.1

10 6 6 7.9 0.1 10 7 7 7.4 0.1 11 8 8 7.0 0.2 11 1 1 7.2 0.0 12 17 17 7.6 0.2 12 10 7 7.6 0.2 13 31 31 7.8 0.1 13 7 7 7.7 0.1 14 19 19 7.5 0.2 14 2 2 7.3 0.2 15 10 10 7.6 0.1 15 19 19 7.7 0.2 16 19 19 8.0 0.1 16 23 23 7.9 0.1 17 21 21 7.6 0.1 17 9 9 7.2 0.1 18 8 8 7.6 0.1 18 2 2 7.6 0.1 19 7 7 7.4 0.1 19 9 9 7.6 0.2 20 12 12 7.5 0.2 20 5 5 7.9 0.1 21 14 14 8.1 0.1 21 2 2 7.4 0.1 22 14 14 7.7 0.2 22 5 1 8.2 0.0 23 4 4 8.1 0.2 23 6 6 7.3 0.2 24 3 3 7.8 0.2 24 1 1 7.3 0.0 25 6 6 7.7 0.2 25 11 11 7.7 0.1 26 12 12 8.0 0.1 26 3 3 7.4 0.1 27 15 15 7.8 0.1 27 9 9 7.3 0.2 28 6 6 7.8 0.1 28 7 7 7.9 0.1 29 20 20 7.7 0.2 29 1 1 8.0 0.0 30 7 7 7.9 0.2 30 3 3 7.7 0.2 31 3 3 7.8 0.1 31 21 21 7.5 0.2 32 12 12 7.7 0.2 32 20 20 7.9 0.1 33 3 3 7.1 0.1 33 16 16 7.5 0.1 34 19 15 7.5 0.2 34 2 2 7.8 0.1 35 1 1 7.2 0.0 35 19 19 7.8 0.2 36 5 5 7.2 0.2 36 6 6 7.6 0.2 37 15 15 7.5 0.2 37 1 1 8.1 0.0 38 12 12 7.6 0.1 38 4 4 8.1 0.1 39 18 18 7.3 0.1 39 5 5 7.4 0.1 40 21 21 7.5 0.1 40 2 2 8.0 0.1 41 18 18 7.7 0.1 41 9 9 7.5 0.1 42 10 10 7.2 0.2 42 13 13 7.6 0.1 43 8 8 8.2 0.2 43 4 4 7.6 0.4 44 2 2 7.9 0.1 44 10 10 7.4 0.1 45 17 17 7.4 0.1 45 12 11 7.0 0.2 46 20 20 7.0 0.2 46 16 16 7.7 0.1 47 20 20 7.5 0.1 47 4 4 7.9 0.4 48 9 9 7.9 0.2 48 30 29 7.6 0.1 49 6 6 7.8 0.2 49 1 1 7.4 0.0 50 18 18 7.8 0.1 50 4 4 7.6 0.2 51 14 14 7.9 0.2 51 1 1 7.8 0.0 52 14 14 8.1 0.2 52 1 1 7.9 0.0 53 11 11 7.9 0.2 53 7 7 8.1 0.2 54 23 23 7.9 0.1 54 3 3 7.5 0.2 55 6 6 7.4 0.1 55 3 3 7.1 0.3 56 5 5 7.4 0.2 56 15 15 7.8 0.3 57 4 4 7.7 0.1 57 2 2 8.0 0.1 58 7 7 7.4 0.2 58 4 4 7.7 0.0 59 4 4 8.1 0.1 59 1 1 7.6 0.0 60 42 42 7.7 0.2 60 8 8 7.7 0.1 61 4 4 7.8 0.0 61 17 17 7.8 0.1 62 10 10 7.7 0.1 62 3 3 7.3 0.2 63 6 6 7.7 0.1 63 5 5 7.2 0.2 64 19 19 7.7 0.1 64 13 13 7.6 0.2 65 7 7 7.4 0.2 65 11 11 7.9 0.1 66 10 10 7.5 0.2 66 8 8 7.6 0.2 67 2 2 8.1 0.1 67 39 39 8.2 0.1 68 21 21 7.8 0.2 68 26 26 7.8 0.1 69 3 3 7.7 0.2 69 8 8 7.8 0.1

70 8 8 7.5 0.2 71 2 2 7.0 0.1 72 19 19 7.5 0.2 73 1 1 7.2 0.0 74 2 2 7.8 0.3 75 2 2 8.1 0.1

POWL non-POWL

Total Returning

Mean of Returned

Std. Dev. of Returned

Total Returning

Mean of Returned

Std. Dev. of Returned

Page 53 of 58

Page 54: Face Recognition at a Chokepoint - Scenario Evaluation Results

Face Recognition at a Chokepoint Scenario Evaluation Results

Table E-4: User data for Identix Argus with watchlist size 100

User Attempts

Attempts

Score Scores Scores User

Attempts

Attempts

Score Scores Scores 1 1 1 6.1 0.0 1 7 2 5.0 0.0 2 11 10 5.9 0.4 2 14 2 5.2 0.1 3 16 15 5.7 0.6 3 29 29 5.9 0.5 4 28 26 5.8 0.5 4 6 6 5.6 0.5 5 28 21 5.5 0.3 5 28 14 5.3 0.2 6 4 3 5.5 0.4 6 9 1 5.5 0.0 7 3 2 5.3 0.3 7 32 19 5.2 0.2 8 7 7 5.5 0.3 8 23 11 5.3 0.3 9 2 2 5.3 0.2 9 22 22 6.0 0.5

10 6 6 5.6 0.5 10 7 2 5.5 0.5 11 8 5 6.2 0.8 11 1 0 0.0 0.0 12 17 15 5.5 0.3 12 10 7 5.9 0.3 13 31 28 5.5 0.3 13 7 3 5.4 0.4 14 19 10 5.4 0.2 14 2 2 6.1 0.4 15 10 1 5.7 0.0 15 19 7 5.6 0.4 16 19 19 6.0 0.4 16 23 15 5.6 0.3 17 21 10 5.3 0.1 17 9 9 5.6 0.2 18 8 8 6.8 0.6 18 2 2 6.1 0.4 19 7 4 5.2 0.2 19 9 9 5.7 0.4 20 12 3 5.2 0.1 20 5 3 5.8 0.3 21 14 14 7.5 0.6 21 2 2 5.8 0.1 22 14 9 5.3 0.2 22 5 0 0.0 0.0 23 4 3 5.7 0.7 23 6 4 5.4 0.3 24 3 3 5.5 0.3 24 1 1 5.8 0.0 25 6 6 6.2 0.7 25 11 11 5.7 0.3 26 12 12 5.6 0.4 26 3 1 5.4 0.0 27 15 10 5.7 0.7 27 9 2 5.4 0.4 28 6 4 5.4 0.1 28 7 5 5.8 0.4 29 20 18 5.7 0.5 29 1 1 6.0 0.0 30 7 7 6.4 0.6 30 3 3 5.8 0.6 31 3 3 6.1 0.7 31 21 20 6.0 0.4 32 12 11 5.6 1.2 32 20 18 5.6 0.5 33 3 3 5.7 0.2 33 16 8 5.4 0.3 34 19 13 5.3 0.2 34 2 1 5.4 0.0 35 1 0 0.0 0.0 35 19 7 5.1 0.1 36 5 0 0.0 0.0 36 6 5 5.7 0.3 37 15 5 5.2 0.2 37 1 1 5.5 0.0 38 12 9 5.8 0.3 38 4 4 6.4 0.2 39 18 18 7.0 0.9 39 5 1 5.6 0.0 40 21 21 6.8 0.6 40 2 1 5.1 0.0 41 18 8 5.6 0.5 41 9 8 5.3 0.3 42 10 3 5.2 0.1 42 13 13 5.6 0.3 43 8 8 6.5 0.5 43 4 3 5.7 0.3 44 2 2 6.1 0.2 44 10 7 5.2 0.2 45 17 3 6.2 0.6 45 12 0 0.0 0.0 46 20 6 5.4 0.5 46 16 13 5.5 0.3 47 20 4 5.4 0.1 47 4 4 5.7 0.6 48 9 9 5.8 0.7 48 30 3 5.1 0.2 49 6 6 6.2 0.9 49 1 0 0.0 0.0 50 18 11 5.5 0.4 50 4 3 5.7 0.4 51 14 13 5.6 0.5 51 1 1 6.2 0.0 52 14 14 8.3 0.3 52 1 1 5.5 0.0 53 11 11 6.5 0.5 53 7 7 6.0 0.4 54 23 6 5.3 0.2 54 3 1 5.4 0.0 55 6 1 5.6 0.0 55 3 1 5.1 0.0 56 5 3 5.4 0.2 56 15 12 5.8 0.6 57 4 3 5.6 0.4 57 2 2 5.1 0.0 58 7 2 5.2 0.2 58 4 3 5.8 0.4 59 4 4 7.3 0.2 59 1 1 5.5 0.0 60 42 41 6.1 0.7 60 8 4 5.6 0.3 61 4 4 7.5 0.9 61 17 17 5.9 0.5 62 10 8 5.7 0.3 62 3 0 0.0 0.0 63 6 6 5.8 0.5 63 5 0 0.0 0.0 64 19 1 5.2 0.0 64 13 4 5.2 0.2 65 7 4 5.3 0.2 65 11 9 5.3 0.3 66 10 9 6.1 0.7 66 8 4 5.4 0.4 67 2 2 6.8 0.6 67 39 24 5.4 0.3 68 21 8 5.3 0.2 68 26 25 5.9 0.5 69 3 3 6.0 0.4 69 8 8 5.4 0.3

70 8 6 5.4 0.4 71 2 1 5.4 0.0 72 19 18 6.2 0.5 73 1 1 5.3 0.0 74 2 1 5.4 0.0 75 2 1 5.3 0.0

POWL non-POWL

Total Returning

Mean of Returned

Std. Dev. of Returned

Total Returning

Mean of Returned

Std. Dev. of Returned

Page 54 of 58

Page 55: Face Recognition at a Chokepoint - Scenario Evaluation Results

Face Recognition at a Chokepoint Scenario Evaluation Results

Table E-5: User data for Identix Argus with watchlist size 400

User Attempts

Attempts

Score Scores Scores User

Attempts

Attempts

Score Scores Scores 1 1 1 6.9 0.0 1 7 7 5.8 0.3 2 11 11 6.4 0.3 2 14 13 5.8 0.3 3 16 16 6.3 0.4 3 29 29 6.7 0.5 4 28 28 6.6 0.4 4 6 6 6.2 0.1 5 28 28 6.1 0.5 5 28 28 5.8 0.4 6 4 4 6.7 0.5 6 9 8 6.0 0.4 7 3 3 6.0 0.2 7 32 32 5.9 0.3 8 7 7 6.4 0.5 8 23 21 6.1 0.4 9 2 1 6.0 0.0 9 22 22 6.1 0.4

10 6 6 6.0 0.4 10 7 4 6.2 0.4 11 8 7 6.2 0.5 11 1 0 0.0 0.0 12 17 17 6.2 0.3 12 10 7 6.4 0.2 13 31 28 6.1 0.4 13 7 4 5.7 0.2 14 19 18 6.1 0.5 14 2 2 6.3 0.6 15 10 2 6.4 0.8 15 19 19 6.0 0.4 16 19 19 6.6 0.4 16 23 23 6.1 0.4 17 21 20 5.9 0.3 17 9 9 6.3 0.2 18 8 7 6.6 0.3 18 2 2 6.5 0.6 19 7 7 6.5 0.4 19 9 9 6.2 0.4 20 12 12 6.1 0.4 20 5 4 6.1 0.2 21 14 14 7.7 0.5 21 2 2 6.6 0.6 22 14 13 6.2 0.2 22 5 1 6.2 0.0 23 4 4 6.2 0.2 23 6 6 6.2 0.3 24 3 3 6.0 0.1 24 1 1 6.2 0.0 25 6 6 6.7 0.4 25 11 11 6.6 0.7 26 12 12 6.4 0.5 26 3 3 6.2 0.8 27 15 10 5.9 0.4 27 9 8 6.0 0.3 28 6 6 6.1 0.3 28 7 6 6.2 0.3 29 20 20 6.1 0.4 29 1 1 6.6 0.0 30 7 7 6.6 0.1 30 3 3 6.3 0.2 31 3 3 6.4 0.6 31 21 21 6.6 0.3 32 12 12 6.5 0.3 32 20 20 6.3 0.3 33 3 3 6.6 0.6 33 16 15 6.3 0.3 34 19 18 6.0 0.2 34 2 1 6.4 0.0 35 1 1 6.1 0.0 35 19 17 5.9 0.4 36 5 2 6.3 0.4 36 6 6 6.2 0.2 37 15 13 6.1 0.4 37 1 1 7.0 0.0 38 12 11 6.1 0.3 38 4 4 6.4 0.3 39 18 18 7.0 0.8 39 5 3 5.7 0.6 40 21 21 7.1 0.5 40 2 2 6.6 0.4 41 18 15 6.3 0.5 41 9 9 6.1 0.3 42 10 2 5.9 0.1 42 13 13 6.5 0.4 43 8 8 6.5 0.4 43 4 3 6.2 0.1 44 2 2 6.3 0.1 44 10 9 5.9 0.3 45 17 12 5.9 0.4 45 12 0 0.0 0.0 46 20 13 6.0 0.4 46 16 16 6.4 0.3 47 20 18 5.8 0.3 47 4 4 6.1 0.2 48 9 9 6.2 0.2 48 30 21 5.8 0.4 49 6 6 6.7 0.5 49 1 1 6.2 0.0 50 18 15 5.9 0.3 50 4 2 6.5 0.1 51 14 13 6.2 0.4 51 1 1 6.6 0.0 52 14 14 8.5 0.3 52 1 1 6.2 0.0 53 11 11 6.7 0.2 53 7 7 6.2 0.4 54 23 23 6.1 0.3 54 3 3 5.8 0.3 55 6 3 6.5 0.6 55 3 3 6.4 0.8 56 5 5 6.3 0.2 56 15 15 6.1 0.3 57 4 4 6.1 0.2 57 2 2 6.5 0.6 58 7 3 6.0 0.0 58 4 4 6.1 0.3 59 4 4 7.4 0.1 59 1 1 6.3 0.0 60 42 42 6.4 0.6 60 8 8 6.2 0.2 61 4 4 7.5 0.4 61 17 17 6.6 0.3 62 10 10 6.3 0.3 62 3 3 6.3 0.4 63 6 6 6.7 0.3 63 5 5 6.7 0.3 64 19 18 6.0 0.4 64 13 13 6.1 0.4 65 7 7 6.5 0.3 65 11 11 6.1 0.5 66 10 10 6.6 0.5 66 8 8 6.2 0.4 67 2 2 6.4 0.6 67 39 38 6.1 0.3 68 21 21 5.9 0.3 68 26 26 6.2 0.3 69 3 3 6.4 0.7 69 8 8 7.1 0.3

70 8 7 6.4 0.4 71 2 2 5.8 0.6 72 19 19 6.5 0.4 73 1 1 6.1 0.0 74 2 2 6.1 0.1 75 2 2 6.8 0.6

POWL non-POWL

Total Returning

Mean of Returned

Std. Dev. of Returned

Total Returning

Mean of Returned

Std. Dev. of Returned

Page 55 of 58

Page 56: Face Recognition at a Chokepoint - Scenario Evaluation Results

Face Recognition at a Chokepoint Scenario Evaluation Results

Table E-6: User data for Identix Argus with watchlist size 1575

User Attempts

Attempts

Score Scores Scores User

Attempts

Attempts

Score Scores Scores 1 1 1 7.4 0.0 1 7 7 6.6 0.2 2 11 10 7.1 0.3 2 14 10 6.8 0.3 3 16 16 6.7 0.2 3 29 29 7.4 0.4 4 28 28 7.1 0.3 4 6 6 7.0 0.2 5 28 28 6.9 0.4 5 28 26 6.7 0.3 6 4 4 7.0 0.3 6 9 8 7.0 0.3 7 3 3 6.7 0.3 7 32 28 6.8 0.3 8 7 5 6.9 0.5 8 23 19 6.9 0.2 9 2 2 6.4 0.0 9 22 22 6.7 0.3

10 6 6 6.7 0.2 10 7 4 6.7 0.4 11 8 6 7.1 0.2 11 1 1 6.8 0.0 12 17 17 6.9 0.2 12 10 6 7.1 0.3 13 31 23 6.7 0.3 13 7 5 6.6 0.3 14 19 19 6.8 0.3 14 2 2 7.3 0.8 15 10 5 6.8 0.3 15 19 19 6.9 0.3 16 19 18 7.4 0.4 16 23 23 6.9 0.5 17 21 21 6.7 0.2 17 9 7 7.0 0.3 18 8 8 6.9 0.3 18 2 2 7.3 0.6 19 7 5 7.3 0.3 19 9 9 7.3 0.5 20 12 12 6.8 0.4 20 5 5 7.1 0.2 21 14 14 7.6 0.4 21 2 2 7.3 0.3 22 14 14 7.0 0.4 22 5 1 6.5 0.0 23 4 4 6.9 0.4 23 6 5 6.6 0.2 24 3 3 6.9 0.2 24 1 0 0.0 0.0 25 6 6 7.4 0.4 25 11 11 6.9 0.3 26 12 12 6.9 0.4 26 3 3 6.7 0.4 27 15 15 7.3 0.3 27 9 9 6.9 0.4 28 6 6 7.1 0.3 28 7 7 6.9 0.2 29 20 20 7.0 0.3 29 1 1 7.1 0.0 30 7 7 6.9 0.1 30 3 3 6.7 0.2 31 3 3 7.2 0.1 31 21 21 7.0 0.4 32 12 12 7.0 0.3 32 20 20 6.9 0.3 33 3 3 7.0 0.4 33 16 16 6.9 0.3 34 19 19 7.5 0.3 34 2 2 7.3 0.5 35 1 1 6.8 0.0 35 19 16 6.7 0.3 36 5 1 7.3 0.0 36 6 6 7.0 0.2 37 15 14 6.9 0.3 37 1 0 0.0 0.0 38 12 11 6.7 0.3 38 4 4 6.8 0.2 39 18 18 7.2 0.3 39 5 4 7.0 0.1 40 21 21 7.5 0.5 40 2 2 7.1 0.2 41 18 13 7.1 0.4 41 9 9 7.1 0.4 42 10 4 6.7 0.2 42 13 13 7.0 0.3 43 8 6 7.0 0.2 43 4 3 7.4 0.3 44 2 2 6.7 0.3 44 10 8 6.7 0.2 45 17 13 6.7 0.4 45 12 0 0.0 0.0 46 20 19 7.0 0.3 46 16 16 7.0 0.3 47 20 20 6.8 0.3 47 4 4 6.9 0.3 48 9 9 7.1 0.4 48 30 21 6.7 0.4 49 6 6 7.0 0.3 49 1 1 6.6 0.0 50 18 15 6.8 0.3 50 4 4 6.7 0.3 51 14 14 7.1 0.3 51 1 1 7.2 0.0 52 14 14 8.7 0.2 52 1 1 7.3 0.0 53 11 11 7.3 0.3 53 7 7 7.0 0.2 54 23 20 6.9 0.4 54 3 3 7.0 0.6 55 6 2 6.6 0.2 55 3 2 6.9 0.1 56 5 5 6.7 0.2 56 15 15 6.9 0.3 57 4 4 7.0 0.3 57 2 2 6.8 0.4 58 7 3 7.0 0.2 58 4 3 7.1 0.6 59 4 4 7.3 0.4 59 1 1 6.6 0.0 60 42 42 7.2 0.3 60 8 8 6.9 0.3 61 4 4 7.9 0.3 61 17 17 7.1 0.3 62 10 9 6.9 0.3 62 3 3 7.1 0.3 63 6 6 7.1 0.4 63 5 5 7.3 0.5 64 19 18 7.1 0.3 64 13 12 7.1 0.2 65 7 7 7.2 0.3 65 11 11 7.3 0.4 66 10 10 7.0 0.3 66 8 8 7.0 0.4 67 2 2 7.4 0.1 67 39 24 6.8 0.3 68 21 21 6.8 0.2 68 26 26 6.8 0.3 69 3 3 7.2 0.7 69 8 8 7.2 0.2

70 8 7 7.2 0.2 71 2 2 7.2 0.4 72 19 19 7.1 0.3 73 1 1 6.8 0.0 74 2 2 7.3 0.4 75 2 2 7.2 0.3

POWL non-POWL

Total Returning

Mean of Returned

Std. Dev. of Returned

Total Returning

Mean of Returned

Std. Dev. of Returned

Page 56 of 58

Page 57: Face Recognition at a Chokepoint - Scenario Evaluation Results

Face Recognition at a Chokepoint Scenario Evaluation Results

Table E-7: User data for Identix Argus with watchlist size 100 (new images)

User Attempts

Attempts

Score Scores Scores User

Attempts

Attempts

Score Scores Scores 1 1 1 9.4 0.0 1 7 6 5.4 0.3 2 11 10 6.8 1.3 2 14 6 5.3 0.3 3 16 13 6.2 0.4 3 29 20 5.4 0.4 4 28 28 7.0 1.0 4 6 5 6.2 0.4 5 28 28 9.0 0.9 5 28 24 5.8 0.5 6 4 4 6.3 1.1 6 9 8 5.6 0.3 7 3 3 6.2 0.8 7 32 27 5.4 0.4 8 7 7 6.2 0.9 8 23 21 5.7 0.4 9 2 2 6.3 0.2 9 22 21 5.7 0.4

10 6 6 9.0 0.1 10 7 6 5.8 0.4 11 8 8 6.8 0.4 11 1 1 5.5 0.0 12 17 17 6.3 0.6 12 10 8 6.9 0.9 13 31 31 7.9 0.9 13 7 4 5.5 0.5 14 19 17 6.0 0.4 14 2 2 5.9 0.4 15 10 6 5.3 0.1 15 19 13 5.3 0.2 16 19 17 6.4 0.6 16 23 21 5.6 0.4 17 21 21 7.5 0.5 17 9 9 6.3 0.5 18 8 8 9.5 0.3 18 2 2 6.1 0.4 19 7 4 6.0 0.5 19 9 3 5.5 0.4 20 12 6 5.5 0.3 20 5 5 6.8 0.4 21 14 14 9.1 0.6 21 2 2 6.0 0.8 22 14 14 6.5 0.5 22 5 3 6.8 0.8 23 4 4 7.9 0.7 23 6 6 5.9 0.6 24 3 3 5.7 0.4 24 1 1 5.3 0.0 25 6 6 6.3 0.6 25 11 10 6.0 0.6 26 12 12 7.0 0.7 26 3 2 5.5 0.1 27 15 15 8.3 0.5 27 9 6 5.4 0.3 28 6 6 8.4 0.9 28 7 7 5.8 0.4 29 20 20 8.1 0.5 29 1 1 5.8 0.0 30 7 7 7.9 0.6 30 3 3 5.4 0.2 31 3 3 6.3 0.5 31 21 19 5.7 0.3 32 12 12 7.7 0.4 32 20 17 5.7 0.4 33 3 3 7.3 1.2 33 16 16 5.9 0.5 34 19 18 6.1 0.4 34 2 2 6.8 0.2 35 1 1 9.0 0.0 35 19 4 5.4 0.3 36 5 3 5.5 0.3 36 6 6 6.8 0.4 37 15 15 8.5 0.5 37 1 1 7.1 0.0 38 12 11 5.7 0.3 38 4 4 5.7 0.4 39 18 18 8.0 0.7 39 5 3 5.5 0.1 40 21 20 7.7 0.8 40 2 2 7.2 1.3 41 18 17 5.5 0.3 41 9 9 5.4 0.3 42 10 10 6.3 0.7 42 13 11 5.5 0.4 43 8 8 7.1 0.5 43 4 4 5.8 0.4 44 2 2 7.7 0.5 44 10 8 5.7 0.3 45 17 10 5.7 0.4 45 12 0 0.0 0.0 46 20 20 6.3 0.5 46 16 14 5.4 0.3 47 20 20 7.8 0.5 47 4 4 5.9 0.6 48 9 8 6.3 0.8 48 30 7 5.4 0.1 49 6 6 8.0 0.5 49 1 0 0.0 0.0 50 18 15 7.4 1.1 50 4 2 5.8 0.6 51 14 14 6.6 1.0 51 1 1 6.1 0.0 52 14 14 7.6 0.6 52 1 1 5.9 0.0 53 11 11 7.9 0.5 53 7 6 6.0 0.6 54 23 23 7.0 0.7 54 3 0 0.0 0.0 55 6 1 5.2 0.0 55 3 2 6.4 1.6 56 5 3 5.7 0.3 56 15 13 5.5 0.3 57 4 4 8.4 0.6 57 2 2 5.6 0.3 58 7 3 5.5 0.7 58 4 3 5.2 0.1 59 4 4 7.1 0.2 59 1 1 5.8 0.0 60 42 42 7.8 0.7 60 8 8 5.9 0.5 61 4 4 6.5 0.2 61 17 13 5.6 0.4 62 10 10 7.2 0.6 62 3 1 5.1 0.0 63 6 6 6.8 0.7 63 5 5 6.0 0.8 64 19 18 6.9 0.9 64 13 12 6.1 0.7 65 7 7 6.7 0.4 65 11 11 6.4 0.6 66 10 10 7.4 0.9 66 8 6 5.8 0.3 67 2 2 6.8 0.1 67 39 36 5.9 0.4 68 21 21 8.5 0.6 68 26 16 5.5 0.2 69 3 3 8.4 1.1 69 8 7 5.5 0.4

70 8 7 6.2 0.4 71 2 2 6.0 0.6 72 19 15 6.3 0.6 73 1 1 5.7 0.0 74 2 2 5.3 0.2 75 2 2 6.4 0.6

POWL non-POWL

Total Returning

Mean of Returned

Std. Dev. of Returned

Total Returning

Mean of Returned

Std. Dev. of Returned

Page 57 of 58

Page 58: Face Recognition at a Chokepoint - Scenario Evaluation Results

Face Recognition at a Chokepoint Scenario Evaluation Results

Appendix F Identix Position Paper

The authors sent a copy of this report to Identix on November 14, 2002. Identix was given the option to provide a position paper to be included in this appendix. The submission deadline was 12:00 P.M. on November 19, 2002. The submitted position paper is presented here without modification.

Page 58 of 58

Page 59: Face Recognition at a Chokepoint - Scenario Evaluation Results

ALL RIGHTS RESERVED. IDENTIX, INC.

IIddeennttiixx CCoorrppoorraattee RReesseeaarrcchh CCeenntteerrOOnnee EExxcchhaannggee PPllaaccee,, SSuuiittee 880000

JJeerrsseeyy CCiittyy,, NNeeww JJeerrsseeyy0077330022

Comments on “Face Recognition at a Chokepoint, Scenario Evaluation Results”

November 19, 2002

EXECUTIVE SUMMARY 2

VERIFICATION TEST COMMENTS 3

WATCHLIST TEST COMMENTS 4

Watchlist Images Taken From Existing Employee Badges 4

Good Quality Watchlist Images Taken From Recent Enrollment Session 5

FaceIt® Surveillance versus FaceIt® Argus and Algorithmic Improvements 6

ADDITIONAL COMMENTS 6

1

Page 60: Face Recognition at a Chokepoint - Scenario Evaluation Results

ALL RIGHTS RESERVED. IDENTIX, INC.

Executive Summary

Identix is pleased to have been invited to participate in the DoD Counterdrug Technology Development Program Office’s scenario evaluation, Face Recognition at a Chokepoint.

Overall, the data from both the verification and watchlist tests paint a strong technology picture for Identix (formerly Visionics Corporation) facial recognition technology. The results demonstrate that facial recognition is effective not only as a mechanism to prevent unauthorized access, but also as a tool to detect the presence of a criminal or terrorist attempting to access a restricted area.

The Watchlist test results emphasize the importance of good image quality for optimal performance. In addition, the results of the Watchlist tests validate the improved performance of our later COTS surveillance system, FaceIt® ARGUS, over our older, software-only technology, FaceIt Surveillance. FaceIt ARGUS is a complete software and hardware solution for multi-camera facial surveillance. It uses a new generation of FaceIt algorithms.

Our record shows that we have released significant enhancements of the technology at the rate of once every one or two quarters and this forward progress continues today.

In addition, it is important to note that there was another facial recognition technology vendor that initially agreed to participate in the present evaluation but who later withdrew. Identix has long been promoting independent evaluations of its technology and honest communication and for this reason, embraced this test and its objectives. We have also encouraged many partners to test and validate FaceIt technology in their specific applications and environments. Identix’ active involvement in this process helps to ensure continued technological innovation and improved product design.

2

Page 61: Face Recognition at a Chokepoint - Scenario Evaluation Results

ALL RIGHTS RESERVED. IDENTIX, INC.

Verification Test Comments

Verification test results are usually characterized by False Accept Rate (FAR)/False Reject Rate (FRR) curves or at least an equal error rate (EER). No EER is calculated in this report. With only three threshold values reported for Verification, it is not possible to plot FAR/FRR curves. Therefore, it is worth summarizing here EERs estimated from the data provided in Figures 12 and 13..

Verification Test Condition Estimated Equal Error Rate

Glasses On/ Off 2.3%

Glasses On/On and Off/Off 1.8%

Referring to Figure 12, threshold 9.3769: Valid Users Rejected = 2.2% and Imposters Accepted = 2.4%. Hence, one can estimate with reasonable certainty that the EER for the glasses on/off condition is about 2.3%.

Referring to Figure 13, threshold 9.3769: Valid Users Rejected = 0.5% and Imposters Accepted = 3.6%. We estimate that these results represent an EER around 2%. Similarly, for threshold 9.792: Valid Users Rejected = 2.9% and Imposters Accepted = 0.4%. We estimate that these results represent an EER around 1.6%. Splitting the difference between these two estimates, we have an overall EER of 1.8% for the glasses on/on and off/off condition.

Let us look in more detail at the “glasses on for enrollment/glasses on for verification” or “glasses off for enrollment/glasses off for verification” experiments. For each of the two higher thresholds, only 0.4% of imposters were accepted (1 attempt). At the same time, at the middle threshold, 97% of authorized users were accepted. In other words, when the FAR was 0.4%, the Correct Accept Rate was 97%. Less than 3% of authorized users were rejected. For high security access control systems, the ability to prevent access by unauthorized users is the most important factor. A 3% FRR should be an acceptable trade-off for a 0.4% FAR.

Identix is pleased to note that the present results show some improvement over performance in a similar test conducted by the National Physical Laboratory, CESG/BWG Biometric Test Programme in the U.K, as reported in March, 2001. Specifically, the FRR reported here, in the glasses on/on and off/off experiment is approximately 2% lower now than was reported by the NPL group for the same FAR. This improvement is likely due to the use of later generation algorithms during the present Verification study than were available at the time of the NPL study.

3

Page 62: Face Recognition at a Chokepoint - Scenario Evaluation Results

ALL RIGHTS RESERVED. IDENTIX, INC.

Watchlist Test Comments

We will first discuss the results of experiments in which the watchlist was comprised of images taken from existing employee badges. Next, we will discuss the results of experiments in which the watchlist was comprised of recent enrollment images taken under good lighting conditions.

Identix has consistently stated that the performance of facial recognition systems in surveillance depends on three variables: quality of enrollment images, thresholds and participation of the subject. The Watchlist results are in line with these parameters.

Watchlist Images Taken From Existing Employee Badges The data plotted in Figures 26 and 28 (bottom set of curves, with open symbols only) indicate that the probability of correctly identifying a person on the watchlist (POWL) in Rank 1 (top position of candidate list) is less than 30%. For indoor experiments, this is an exceptionally poor result for a controlled experiment and indicates the existence of significant data problems.

The watchlist (gallery) images were 1.4 to 4.3 years old at the time of these tests. In addition, they were not in accordance with the mugshot image quality guidelines established by NIST (www.itl.nist.gov/iad/894.03/face/bpr_mug3.html) or with Identix own, similar recommended best practices for database image acquisition. In particular, there was bright, saturating lighting on one side of the face, as can be seen in the samples in Figure 4. In three of these four sample images, directional outdoor lighting effects are observed on the side of the face that is on the left of the image. In the fourth sample image (furthest right), directional indoor lighting is observed on the side of the face that is on the right of the image.

In addition, some proportion of images may not be directly frontal, as exemplified by the first sample image in Figure 4. The version of Argus tested does not contain pose compensation software. It is well known that image quality parameters such as lighting, pose (degree of rotation of head relative to frontal) and contrast affect performance of all facial recognition systems. Therefore, to the greatest extent possible, image quality guidelines such as those published by NIST and Identix should be followed.

Having said this, Identix is aware that it is not always possible to obtain high quality, recent images for use in watchlist databases. For this reason, the results of this experiment are useful. We wish to point out that if human resources are available to process a higher rate of false alarms, it is still possible to achieve a very high correct detection rate. It is worth pointing out here that the state-of-the-art surveillance systems are not meant for un-manned operation.

4

Page 63: Face Recognition at a Chokepoint - Scenario Evaluation Results

ALL RIGHTS RESERVED. IDENTIX, INC.

Good Quality Watchlist Images Taken From Recent Enrollment Session

In this experiment, the watchlist was comprised of more recently enrolled images. In general, image quality was better in this data set (lighting was even across face, fewer shadows).1

Figure 28 shows the level of performance we more likely expect in surveillance applications: approximately 80% probability of correctly identifying (in Rank 1 position) a POWL. Figure 29 shows that the probability that FaceIt ARGUS correctly detected a POWL (whether in Rank 1, 2, 3, etc. of the candidate list). Naturally, performance is improved relative to that shown Figure 28. If the operator of a FaceIt ARGUS system has time to look at the top few images in the matching candidate list, rather than only the Rank 1, probability of the system correctly identifying a member of the watchlist is increased.

From the data presented in Figure 24, we calculate False Alarm (FAR) and Correct Alarm Rate (CAR) for Argus using this higher quality, more recent watchlist. These rates summarize performance in a different way than did the graphs in the main report. They may help put the results into perspective more readily for some readers.

ARGUS Correct & False Alarm Rates as a Function of Threshold

0%

20%

40%

60%

80%

100%

Ala

rm R

ates

CAR FAR

5.7 6.5 7 7.5 7.8

Thresholds

By comparison, in the Dallas Fort Worth airport trial, Identix reported a CAR of 92% with a FAR of 1.2% and in the West Palm Beach airport trial, Identix reported a CAR of 50% with a FAR of less than 0.4%.

There are many site specific variables that require the settings of each surveillance system to be adjusted. Proper subject lighting is a crucial factor and Identix personnel were on site prior to testing to assist in appropriate adjustments. However, no Identix personnel were on site to optimize ARGUS settings (e.g. face finding threshold, face size threshold, rank, etc).

The better quality watchlist was only used with the FaceIt ARGUS system. Thus, it is not known exactly how much the Surveillance software performance would have improved with the same quality database. However, it is expected that the improvement would be similarly dramatic.

1 However, not all of these “new” images would be considered high quality. For example, the samples shown in Figure 3 include one image (2nd from left) with glare on glasses that would probably require manual alignment.

5

Page 64: Face Recognition at a Chokepoint - Scenario Evaluation Results

ALL RIGHTS RESERVED. IDENTIX, INC.

FaceIt Surveillance versus FaceIt ARGUS and Algorithmic Improvements

The False Alarm Rate (FAR) of Surveillance was always significantly higher than that of Argus, at the same Correct Alarm Rate (CAR). This difference in performance is to be expected because the two products use different generations of FaceIt technology. Surveillance (a software only product) relies on our 3rd Generation (G3) algorithms, while FaceIt ARGUS (a complete software/hardware solution) uses our G4 algorithms.

Identix G5 technology is just now becoming available and was used in the FRVT2002 tests. However, it was not available at the time of this Chokepoint Scenario Evaluation. Our G5 algorithms are expected to yield better results than G4. This technology will soon replace the G4 algorithms currently used in the COTS FaceIt ARGUS product.

Additional Comments

Both the Verification and Watchlist tests measured a combination of two different technologies: face finding and face recognition. This is the correct approach because for almost all uses of face recognition, automatic face finding is required as a necessity (large scale database searching) or as something extremely useful (access control). In many cases, failure to acquire can be the limiting factor in the performance of the product. This point was not made in the main report.

The data analyses presented here did not take into account the fact that some subjects passed through the system more times than did others. Differences in number of trials per subject can sometimes lead to systematic errors associated with test subjects interfacing with test administrators. This statistical problem can be eliminated by weighing each metric, such as alarm rate, equally for each person (normalizing).

With the exception of the watchlist results based upon existing employee badges, Identix is pleased with the results presented in the DoD Counterdrug Technology Development Program Office’s Face Recognition at a Chokepoint report.

Identix looks forward to continued advancement of our technologies and additional opportunities to participate in important independent government evaluations such as this one.

6