(2008) Statistical Analysis Framework for Biometric System Interoperability Testing

Abstract—Biometric systems are increasingly deployed in

networked environment, and issues related to interoperability are

bound to arise as single vendor, monolithic architectures become

less desirable. Interoperability issues affect every subsystem

of the biometric system, and a statistical framework to

evaluate interoperability is proposed. The framework was

applied to the acquisition subsystem for a fingerprint

recognition system and the results were evaluated using the

framework. Fingerprints were collected from 100 subjects

on 6 fingerprint sensors. The results show that performance

of interoperable fingerprint datasets is not easily

predictable and the proposed framework can aid in

removing unpredictability to some degree.

Index Terms—fingerprint recognition, biometrics,

statistics, fingerprint sensor interoperability, analysis

framework.

I. INTRODUCTION

Establishing and maintaining identity of individuals is

becoming evermore important in today’s networked world. The

complexity of these tasks has also increased proportionally with

advances in automated recognition. There are three main

methods of authenticating an individual: 1) using something

that only the authorized individual has knowledge of e.g.

passwords 2) using something that only the authorized

individual has possession of e.g. physical tokens 3) using

physiological or behavioral characteristics that only the

authorized individual can reproduce i.e. biometrics. The

increasing use of information technology systems has created

the concept of digital identities which can be used in any of

these authentication mechanisms. Digital identities and

electronic credentialing have changed the way authentication

S. K. Modi is a researcher and Ph.D. candidate in the Biometrics Standards,

Performance, and Assurance Laboratory in the Department of Industrial

Technology, Purdue University, West Lafayette, IN 47907 USA (e-mail:

[email protected]).

S. J. Elliott is Director of the Biometrics Standards, Performance, and

Assurance Laboratory and Associate Professor in the Department of Industrial

Technology, Purdue University, West Lafayette, IN 47907 USA (e-mail:

[email protected]).

H. Kim is Professor of School of Information & Communication

Engineering at Inha University, and a member of Biometrics Engineering

Research Center (BERC) at Yonsei University, Seoul, Korea (email:

[email protected]).

E. P. Kukula is a researcher and Ph.D. candidate in the Biometrics

Standards, Performance, and Assurance Laboratory in the Department of

Industrial Technology, Purdue University, West Lafayette, IN 47907 USA

(e-mail: [email protected]).

ICITA2008 ISBN: 978-0-9803267-2-7

architectures are designed. Instead of stand-alone, monolithic

authentication architectures of the past, today’s networked

world offers the advantage of distributed and federated

authentication architectures. The development of distributed

authentication architectures can be seen as an evolutionary step,

but it raises an issue always accompanied by an attempt to mix

disparate systems: interoperability. What is the effect on

performance of the authentication process if an individual

establishes his/her credential on one system, and then

authenticates him/her-self on a different system of the same

modality? This issue is of relevance to all kinds of

authentication mechanisms, and of particular relevance to

biometric recognition systems. The last decade has witnessed a

huge increase in deployment of biometric systems, and while

most of these systems have been single vendor systems the issue

of interoperability is bound to arise as distributed architectures

are considered to be the norm, and not an exception. Fingerprint

recognition systems are the most widely deployed biometric

systems, and most commercially deployed fingerprint systems

are single vendor systems [1]. The single point of interaction of

a user with the fingerprint system is during the acquisition stage,

and this stage has the maximum probability of introducing

inconsistencies in the biometric data. Fingerprint sensors are

based on a variety of different technologies like electrical,

optical, thermal etc. The physics behind these technologies

introduces distortions and variations in the feature set of the

captured image that is not consistent across all of them. This

makes the goal of interoperability even more challenging.

Performance analysis of biometric systems can be achieved

using several different techniques; one such technique involves

analyzing DET curves. This methodology can be also applied

to testing performance rates of native systems and interoperable

systems. Although this method allows a researcher to visually

compare the different error rates, creating a statistical

methodology for testing interoperability of biometric systems is

also of great importance. A formalized process of testing

interoperability of biometric systems will be required as the

issues related to interoperability become more prominent. This

research proposes and tests a basic statistical framework for

analyzing matching scores and error rates for fingerprints

collected from 6 different fingerprint sensors.

II. REVIEW OF LITERATURE

The acquisition of fingerprint images is heavily affected by

interaction and contact issues. Fingerprint images are affected

by issues like inconsistent contact, non-uniform contact and

irreproducible contact [2]. The interaction of a finger when

placed on a sensor maps a 3D shape on a 2D surface. This

Statistical Analysis Framework for Biometric System

Interoperability Testing

Shimon K. Modi, Stephen J. Elliott, Ph.D., and H. Kim, Ph.D., Eric P. Kukula

777

5th International Conference on Information Technology and Applications (ICITA 2008)

mailto:[email protected]

mailto:[email protected]

mapping is not controlled, and mapping for the same finger can

differ from one sensor to another and thereby adding sensor

specific deformations. The area of contact between a finger

surface and the sensor is not the same for different sensors. This

non-uniform contact can influence the amount of detail captured

from the finger surface and consistency of detail captured. Jain

and Ross examined the issue of matching feature sets

originating from an optical sensor and a capacitive sensor

[3]. Their results showed that minutiae count for the dataset

collected from the optical sensor was higher than the minutiae

count for the dataset collected from the capacitive sensor. Their

results showed that Equal Error Rate (EER) for the two native

databases were 6.14% and 10.39%, while the EER for the

interoperable database was 23.13%. Ko and Krishnan [4]

present a methodology for measuring and monitoring quality of

fingerprint database and fingerprint match performance of the

Department of Homeland Security’s Biometric Identification

System. One of the findings of their research was the importance

of understanding the impact on performance if fingerprints

captured by a new fingerprint sensor were integrated into an

identification application with images captured from an existing

fingerprint sensor.

Han et. al [5] performed a study examining the influence of

image resolution and distortion due to differences in fingerprint

sensor technologies to their matching performance. Their

approach proposed a compensation algorithm which worked on

raw images and templates so that all the fingerprint images and

templates processed by the system are normalized against a

pre-specified baseline. Their research performed statistical

analysis on the basic fingerprint features for the original images

and the transformed images to test for differences between the

two.

The International Labor Organization (ILO) commissioned a

biometric testing campaign in an attempt to understand the

causes of the lack of interoperability [6]. 10 products were

tested, where each product consisted of a sensor paired with an

algorithm capable of enrollment and verification. Native and

interoperable False Accept Rate (FAR) and False Reject Rate

(FRR) were computed for all datasets. Mean FRR for genuine

matching scores of 0.92% was observed at FAR of 1%. The

objectives of this test were twofold: to test conformance of the

products in producing biometric information records complying

with ILO SID-0002, and to test which products could

interoperate at levels of less than 1% FRR at a fixed 1% FAR.

The results showed that out of the 10 products, only 2 products

were able to interoperate at the mandated levels.

NIST conducted the minutiae template interoperability test -

MINEX 2004 – to assess the interoperability of fingerprint

templates generated by different extractors and then matched

with different matchers. Four different datasets were used which

were referred to as the DOS, DHS, POE and POEBVA. The

performance evaluation framework calculated FNMR at fixed

FMR of 0.01% for all the matchers. Performance matrices were

created which represented all FNMR of native and

interoperable datasets and provided a means for a quick

comparison. Their results showed that FNMR for native

datasets were far lower than FNMR for interoperable datasets.

In [7] the authors conduct an image quality and minutiae count

analysis on fingerprints collected from three different sensors

and assess the performance rates of the native and interoperable

fingerprint datasets using different enrollment methodologies.

They also describe an ANOVA test for testing differences in

mean genuine matching scores between the three datasets.

Their preliminary statistical analysis showed that the genuine

matching scores were statistically significant in their differences

for the native and interoperable datasets. The previous body of

literature points to the growing importance of interoperability

for biometric systems. The previous research has concentrated

on interoperability error rate matrices and comparison of EER

between native and interoperable datasets. Although analysis of

error rates serve as a good indicator of performance, an alternate

technique which utilizes statistical techniques would be

beneficial. A formalized statistical analysis framework for

testing interoperability is lacking and this problem needs to be

addressed. The researchers in this experiment build on previous

work done in this area and propose a statistical analysis

framework for testing interoperability.

III. STATISTICAL ANALYSIS FRAMEWORK

Interoperability of biometric systems is going to become an

important issue, and the need for an analysis framework will

become imperative. Frameworks for testing biometric systems

and biometric algorithms can be found in literature, and the

researchers have proposed a framework for testing

interoperability of biometric systems in this paper. For the

purposes of this research the framework was adapted for testing

interoperability of fingerprint sensors. The framework is based

on the concept that if two fingerprint sensors are interoperable

then the resulting fingerprint datasets should have similar error

rates compared to error rates of fingerprint datasets collected

from any one of the single fingerprint sensors. The framework

for testing interoperability of fingerprint sensors is based on

three steps:

1. Statistical analysis of basic fingerprint features.

2. Error rates analysis of native and interoperable fingerprint

datasets.

3. Statistical analysis of matching scores of native and

interoperable fingerprint datasets.

This framework was evaluated using a dataset of fingerprints

collected from 6 fingerprint sensors and the methodology and

results are discussed in the following sections.

778

Fig. 1. Statistical Analysis Framework for Interoperability

Testing.

IV. DATA COLLECTION

The dataset used in this research is a part of KFRIA-DB

(Korea Fingerprint Recognition Interoperability Alliance

Database). Fingerprints were collected from 100 subjects using

6 different fingerprint sensors. Each subject provided 6

fingerprint images from the right index finger, right middle

finger, left index finger, and left middle finger. 2,400

fingerprint images were collected using each sensor. Table I has

a specifications overview for all the fingerprint sensors used in

the study.

Table I. Sensor Specifications

Sensor Technology

Type

Resolution

(DPI)

Interaction

Type

Image Size

Sensor 1 Thermal 500 Swipe 360 X

500

Sensor 2 Optical 500 Touch 280 X

320


292

Sensor 4 Polymer 620 Touch 480 X

640

Sensor 5 Capacitive 508 Touch 256 X

360


256

All analysis for this study was performed on raw fingerprint

images collected from the 6 sensors. Sensor characteristics like

capture area, capture technology, aspect ratio etc. have an

influence on the resulting image. Fig. 2 shows sample images

collected from the different sensors.

Sensor 1 Sensor 2 Sensor 3

Sensor 4 Sensor 5 Sensor 6

Fig. 2. Example Fingerprint Images.

V. FINGERPRINT FEATURE ANALYSIS

An important factor to consider when examining

interoperability is the ability of different fingerprint sensors to

capture similar fingerprint features from the same fingerprint.

Human interaction with the sensor, levels of habituation, finger

skin characteristics, and sensor characteristics introduce its own

source of variability. All of these factors affect the consistency

of fingerprint features of images acquired from different

sensors. It is important to analyze the amount of variance in

image quality and minutiae count of fingerprints captured from

different sensors. This analysis was performed by computing

image quality scores and minutiae count for all fingerprint

images using a commercially available software. Table II

shows the average values for image quality scores and minutiae

counts for all the datasets.

Table II. Average Image Quality Scores & Minutiae Count

Fingerprint dataset Quality Score

Range 0-100

Minutiae Count

Sensor 1 15.24 94.13

Sensor 2 74.97 45.89

Sensor 3 71.15 38.77

Sensor 4 6.62 52.44

Sensor 5 68.92 39.21

Sensor 6 62.58 31.25

The datasets for Sensor1 and Sensor4 showed very low

quality scores. It should be noted that the background of the

images captured from Sensor4 had a very dark background

which could have contributed for a very low quality score. Also

Sensor1 and Sensor4 were different technologies compared to

the other sensors which are more commonly available. An

analysis of variance (ANOVA) was performed on all the

datasets to test the differences in the mean count of image

quality and minutiae count between all the datasets. The

hypothesis stated in (1) was tested using the ANOVA test.

779

H10: µ1 = µ 2 = ……..= µ 6

H1A: µ1 ≠ µ2 ≠ ……..≠ µ 6

(1)

The p-value of this hypothesis test was computed to be less

than 0.05 which indicated that all the mean scores were

statistically significant in their differences. The same

hypothesis test was conducted on minutiae count for all the

datasets. The p-value of this hypothesis test was calculated to

be less than 0.05 which indicated that all the mean scores

were statistically significant in their differences. Table II

shows that image quality scores for fingerprints collected

from different sensors were significantly different. Previous

research has shown that image quality has an impact on

performance of fingerprint matching systems [8]. The next

step of the research involved evaluating the impact of the

basic fingerprint feature inconsistencies on performance of

fingerprint datasets collected from different sensors.

VI. PERFORMANCE RATES ANALYSIS

A. Performance Rates: ROC and Error Rates Matrix

In order to evaluate the performance of fingerprint datasets

False Non Match Rates (FNMR) were computed for all datasets.

A commercially available fingerprint feature extractor and

matcher was used to generate FNMR. FNMR were generated

for native datasets and interoperable datasets, where the native

dataset refers to the comparison of fingerprint images collected

from the same fingerprint sensors, and the interoperable

datasets refer to fingerprint images collected from different

sensors. The first three fingerprint images provided by the

subject were used to create the enrollment database, and the last

three images were used to create the verification database.

Enrollment databases were created for each of the 6 sensors, and

verification databases were also created for each of the 6

sensors. Matching scores were generated by comparing every

enrollment template from each enrollment database against

every fingerprint image from each verification database, which

resulted in a set of scores S for every combination of enrollment

and verification databases, where

S ={Eix,Vjy,scoreijxy}

i= 1,.. , number of enrolled template

j = 1,.., number of verification images

x= 1,.., number of enrollment datasets

y=1,…, number of verification datasets

scoreijxy = match score between enrollment template and

verification image

Using this set of scores, a FNMR matrix was generated.

FNMR was calculated for each set of scores at a fixed False

Match Rate (FMR) of 0.1% and 1%. The results are shown in

Table III and IV. The diagonal of the FNMR matrix are rates

for the native datasets, and the cells off the diagonal are rates for

the interoperable datasets. Since matching of fingerprints is a

symmetric process, the matrix can be viewed as a symmetric

matrix as well.

The FNMR matrix for 0.1% fixed FMR showed a varying

range of FNMR for native and interoperable datasets. All the

native datasets had a significantly lower FNMR compared to the

interoperable datasets. For example, S4 showed a native FNMR

of 0.1% and lowest interoperable FNMR of 35% which

indicates a very low level of interoperability between the

datasets. When the FNMR are analyzed in the context of image

quality scores, it can be seen that sensor 4 had the lowest image

quality scores which indicated it was an important factor in the

high FNMR of interoperable datasets. The interesting

observation is the relatively low level of FNMR for the native

dataset of fingerprints captured with S4. It was also observed

that FNMR of interoperable datasets created from fingerprint

sensors of the same acquisition technology and interaction type,

for example S2 and S3, was comparable to the FNMR of their

native datasets.

Table III. FNMR at fixed 0.1% FMR

S1 S2 S3 S4 S5 S6

S1 0.8 5.0 8.0 35.0 18.0 7.0

S2 0 0.6 38.0 2.0 0.12

S3 0.1 38.0 1.9 0.12

S4 0.1 58.0 32.0

S5 0.1 6.0

S6 0.1

Table IV. FNMR at fixed 1% FMR

S1 S2 S3 S4 S5 S6

S1 0 0 0 0 0 0

S2 0 0 0 0 0

S3 0 0 0 0

S4 0 0 0

S5 0 0

S6 0

Detection Error Tradeoff (DET) curves were also generated for

all the datasets. DET curves are a modification of Receiver

Operating Characteristic (ROC) curves. ROC curves are a

means of representing results of performance of diagnostic,

detection and pattern matching systems [9]. A DET curve plots

FMR on the x-axis and FNMR on the y-axis as function of

decision threshold. DET curves for different combinations of

enrollment/verification databases allow comparison of error

rates at different thresholds.

DET (T) = (FMR (T), FNMR (T))

where T is the threshold

(2)

Fig. 3 shows three superimposed DET curves. It can be

observed that the DET curve for the interoperable dataset for S2

and S3 performs worse than the other two native datasets at

every possible threshold. Fig. 4 also shows three superimposed

780

DET curves. The DET curve for the interoperable dataset for S2

and S4 shows its performance is much worse compared to the

native datasets. Looking at the DET curves for native datasets in

Fig. 3 and Fig. 4, the difference in performance between the

native datasets is comparable. But the difference in performance

between the interoperable datasets is significantly different.

This indicates the unpredictable nature of determining

performance of the interoperable datasets based entirely on

performance of native datasets.

Fig. 3. DET Curve for S2 and S3 datasets.

Fig. 4. DET Curve for S2 and S4 datasets.

B. Statistical Analysis of Matching Scores

The DET curves and FNMR matrix provide an insight into

any existing differences in FNMR between native and

interoperable databases, but they do not provide a statistical

basis for testing the differences. A statistical analysis of the

results could help uncover underlying patterns which contribute

to the unpredictability observed in comparison of the DET

curves. To assess interoperability at matching score level, the

matching scores from the genuine comparisons of native dataset

were compared to matching scores from genuine comparisons

of interoperable datasets. For true interoperability,

performance of interoperable datasets should be statistically

similar to performance of native dataset. An ANOVA test was

used to test for differences in the mean genuine matching scores

between the native dataset and the interoperable datasets at a

significance level of 0.05. This test was performed for each of

the six native datasets, which resulted in 6 sets of hypothesis as

stated in (3).

H20: µnative = µ interoperable1 = ……..= µ interoperable5

H2A: µnative ≠ µ interoperable1 ≠ ……..≠ µ interoperable5

(3)

The ANOVA test for all six hypothesis had a p-value of less

than 0.05, which resulted in rejecting the null hypothesis and

concluding that native genuine matching scores were

significantly different compared to interoperable matching

scores.

In several experiments such as this, one of the treatments is a

control and the other treatments are comparison treatments. A

statistical test can be devised which compares different

treatments to a control. Such a statistical test can be performed

using the Dunnett’s test, which is a modified form of a t-test

[10]. For this particular experiment, the native dataset genuine

match scores were used as the control and the interoperable

dataset genuine match scores as the comparison treatments. For

each native dataset, there were 5 control treatments which

corresponded to interoperable datasets. The mean genuine

match score for each interoperable database was tested against

the control (i.e. native database score). According to the

Dunnett’s test, the null hypothesis H0: µnative = µinteroperable is

rejected at α = 0.05 if

)11

(),1(||ai

αa.i.

nnMSEfadyy (4)

where

dα (a-1,f) = Dunnet’s constant

a-1=number of interoperable datasets

f = number of error degrees of freedom

MSE = mean square of error terms

ni = number of samples in control

na= number of samples in interoperable set a

The Dunnet’s test was performed on all the possible

combinations of native and interoperable datasets. The

Dunnet’s test showed all of the genuine matching scores of the

interoperable datasets were different compared to the genuine

matching scores of the control dataset. Table IV. shows the

average genuine matching score of the control dataset and the

average genuine matching score of the interoperable dataset

which had the least absolute difference with the control.

An evaluation of results in Table IV shows that S2 and S3 had

the best interoperable genuine matching scores. When the

interoperable matching rates are analyzed in context of image

quality scores and minutiae count, S2 and S3 had the least

absolute difference between their image quality scores and

minutiae counts. Combining these results provides a positive

781

indicator for improving predictability of FNMR for

interoperable datasets.

Table V. Matching Scores for Control Dataset and

Interoperable Dataset

Average Matching Score

Control Dataset

Interoperable Dataset with

Least Difference

Sensor 1 Dataset- 319.3 Sensor 3- 294.2


Sensor 3 Dataset- 789 Sensor 2- 609.2




VII. CONCLUSIONS AND FUTURE WORK

Previous research has shown image quality has a significant

impact on performance of native fingerprint datasets, and this

research showed that image quality and minutiae count have an

impact on performance of interoperable fingerprint datasets.

The type of capture technology did not have a consistent effect

on FNMR of interoperable fingerprint datasets which was

noticed in the difference in FNMR between datasets S2 and S3,

and S2 and S4. It is important to understand the effect of these

factors since they can be used to reduce the unpredictability of

performance of interoperable datasets. Interoperability is

dependent on several factors, and this research uncovered

important factors and illustrated its significance using statistical

tests and analysis methodologies. The results of these findings

can be used in designing fingerprint matching algorithms which

specifically take advantage of this new knowledge.

The results discussed in this paper indicate several avenues of

research which could be followed to improve the statistical

analysis framework. Along with comparison of genuine

matching scores using the Dunnet’s test, a comparison of

proportions can also be applied to statistically test the FNMR

between native and interoperable datasets. This would add one

more test to collection of interoperability tests. Application of

this framework to a different modality would also be an

interesting study. In this research the framework was applied

exclusively to interoperability tests for fingerprint recognition

and it helped in synthesizing the results in a novel way. Other

biometric modalities will be facing the same problems related to

interoperability as those by fingerprint recognition, and it will

become imperative to understand these issues and try to solve

them. Application of this framework to other modalities could

provide ideas into solving the problems of interoperability in a

larger context. An investigative multivariate analysis which

uses basic fingerprint features as predictor variables and

matching scores as response variables is another avenue of

future work. Understanding the effect of these predictor

variables on interoperable matching scores could be used to

create a model which is capable of describing the interactions

and effects.

ACKNOWLEDGMENT

The authors would like to thank KFRIA (Korea Fingerprint

Recognition Interoperability Alliance) for supporting this

research and providing the fingerprint database for analysis.

REFERENCES

[1] IBG, Biometrics Market and Industry Report. 2007, IBG: NY. p.

224.

[2] Haas, N., S. Pankanti, and M. Yao, Fingerprint Quality

Assessment. In Automatic Fingerprint Recognition Systems. 2004,

NY: Springer-Verlag. 55-66.

[3] Jain, A. and A. Ross, eds. Biometric Sensor Interoperability.

BioAW 2004, ed. A. Jain and D. Maltoni. Vol. 3067. 2004,

Springer-Verlag: Berlin. 134-145.

[4] Ko, T. and R. Krishnan. Monitoring and Reporting of Fingerprint

Image Quality and Match Accuracy for a Large User Application.

in Applied Imagery Pattern Recognition Workshop. 2004.

Washington, D.C.: IEEE Computer Society.

[5] Han, Y., et al. Resolution and Distortion Compensation based on

Sensor Evaluation for Interoperable Fingerprint Recognition. in

2006 International Joint Conference on Neural Networks. 2006.

Vancourver, Canada.

[6] Campbell, J. and M. Madden, ILO Seafarers' Identity Documents

Biometric Interoperability Test Report 2006, International Labour

Organization: Geneva. p. 170.

[7] Modi, S., S. Elliott, and H. Kim. Performance Analysis for Multi

Sensor Fingerprint Recognition System. in International

Conference on Information Systems Security. 2007. Delhi, India:

Springer Verlag.

[8] Elliott, S.J. and S.K. Modi. Impact of Image Quality on

Performance: Comparison of Young and Elderly Fingerprints. in

6th International Conference on Recent Advances in Soft

Computing (RASC). 2006. Canterbury, UK.

[9] Mansfield, A. and J. Wayman, Best Practices. 2002, National

Physics Laboratory: Middlesex. p. 32.

[10] Montgomery, D.C., Design and Analysis of Experiments. 4th ed.

1997, New York: John Wiley & Sons. 704.

782