Click here to load reader
View
587
Download
1
Embed Size (px)
DESCRIPTION
Abstract—Biometric systems are increasingly deployed in networked environment, and issues related to interoperability are bound to arise as single vendor, monolithic architectures become less desirable. Interoperability issues affect every subsystem of the biometric system, and a statistical framework to evaluate interoperability is proposed. The framework was applied to the acquisition subsystem for a fingerprint recognition system and the results were evaluated using the framework. Fingerprints were collected from 100 subjects on 6 fingerprint sensors. The results show that performance of interoperable fingerprint datasets is not easily predictable and the proposed framework can aid in removing unpredictability to some degree.
Citation preview
Abstract—Biometric systems are increasingly deployed in
networked environment, and issues related to interoperability are
bound to arise as single vendor, monolithic architectures become
less desirable. Interoperability issues affect every subsystem
of the biometric system, and a statistical framework to
evaluate interoperability is proposed. The framework was
applied to the acquisition subsystem for a fingerprint
recognition system and the results were evaluated using the
framework. Fingerprints were collected from 100 subjects
on 6 fingerprint sensors. The results show that performance
of interoperable fingerprint datasets is not easily
predictable and the proposed framework can aid in
removing unpredictability to some degree.
Index Terms—fingerprint recognition, biometrics,
statistics, fingerprint sensor interoperability, analysis
framework.
I. INTRODUCTION
Establishing and maintaining identity of individuals is
becoming evermore important in today’s networked world. The
complexity of these tasks has also increased proportionally with
advances in automated recognition. There are three main
methods of authenticating an individual: 1) using something
that only the authorized individual has knowledge of e.g.
passwords 2) using something that only the authorized
individual has possession of e.g. physical tokens 3) using
physiological or behavioral characteristics that only the
authorized individual can reproduce i.e. biometrics. The
increasing use of information technology systems has created
the concept of digital identities which can be used in any of
these authentication mechanisms. Digital identities and
electronic credentialing have changed the way authentication
S. K. Modi is a researcher and Ph.D. candidate in the Biometrics Standards,
Performance, and Assurance Laboratory in the Department of Industrial
Technology, Purdue University, West Lafayette, IN 47907 USA (e-mail:
S. J. Elliott is Director of the Biometrics Standards, Performance, and
Assurance Laboratory and Associate Professor in the Department of Industrial
Technology, Purdue University, West Lafayette, IN 47907 USA (e-mail:
H. Kim is Professor of School of Information & Communication
Engineering at Inha University, and a member of Biometrics Engineering
Research Center (BERC) at Yonsei University, Seoul, Korea (email:
E. P. Kukula is a researcher and Ph.D. candidate in the Biometrics
Standards, Performance, and Assurance Laboratory in the Department of
Industrial Technology, Purdue University, West Lafayette, IN 47907 USA
(e-mail: [email protected]).
ICITA2008 ISBN: 978-0-9803267-2-7
architectures are designed. Instead of stand-alone, monolithic
authentication architectures of the past, today’s networked
world offers the advantage of distributed and federated
authentication architectures. The development of distributed
authentication architectures can be seen as an evolutionary step,
but it raises an issue always accompanied by an attempt to mix
disparate systems: interoperability. What is the effect on
performance of the authentication process if an individual
establishes his/her credential on one system, and then
authenticates him/her-self on a different system of the same
modality? This issue is of relevance to all kinds of
authentication mechanisms, and of particular relevance to
biometric recognition systems. The last decade has witnessed a
huge increase in deployment of biometric systems, and while
most of these systems have been single vendor systems the issue
of interoperability is bound to arise as distributed architectures
are considered to be the norm, and not an exception. Fingerprint
recognition systems are the most widely deployed biometric
systems, and most commercially deployed fingerprint systems
are single vendor systems [1]. The single point of interaction of
a user with the fingerprint system is during the acquisition stage,
and this stage has the maximum probability of introducing
inconsistencies in the biometric data. Fingerprint sensors are
based on a variety of different technologies like electrical,
optical, thermal etc. The physics behind these technologies
introduces distortions and variations in the feature set of the
captured image that is not consistent across all of them. This
makes the goal of interoperability even more challenging.
Performance analysis of biometric systems can be achieved
using several different techniques; one such technique involves
analyzing DET curves. This methodology can be also applied
to testing performance rates of native systems and interoperable
systems. Although this method allows a researcher to visually
compare the different error rates, creating a statistical
methodology for testing interoperability of biometric systems is
also of great importance. A formalized process of testing
interoperability of biometric systems will be required as the
issues related to interoperability become more prominent. This
research proposes and tests a basic statistical framework for
analyzing matching scores and error rates for fingerprints
collected from 6 different fingerprint sensors.
II. REVIEW OF LITERATURE
The acquisition of fingerprint images is heavily affected by
interaction and contact issues. Fingerprint images are affected
by issues like inconsistent contact, non-uniform contact and
irreproducible contact [2]. The interaction of a finger when
placed on a sensor maps a 3D shape on a 2D surface. This
Statistical Analysis Framework for Biometric System
Interoperability Testing
Shimon K. Modi, Stephen J. Elliott, Ph.D., and H. Kim, Ph.D., Eric P. Kukula
777
5th International Conference on Information Technology and Applications (ICITA 2008)
mapping is not controlled, and mapping for the same finger can
differ from one sensor to another and thereby adding sensor
specific deformations. The area of contact between a finger
surface and the sensor is not the same for different sensors. This
non-uniform contact can influence the amount of detail captured
from the finger surface and consistency of detail captured. Jain
and Ross examined the issue of matching feature sets
originating from an optical sensor and a capacitive sensor
[3]. Their results showed that minutiae count for the dataset
collected from the optical sensor was higher than the minutiae
count for the dataset collected from the capacitive sensor. Their
results showed that Equal Error Rate (EER) for the two native
databases were 6.14% and 10.39%, while the EER for the
interoperable database was 23.13%. Ko and Krishnan [4]
present a methodology for measuring and monitoring quality of
fingerprint database and fingerprint match performance of the
Department of Homeland Security’s Biometric Identification
System. One of the findings of their research was the importance
of understanding the impact on performance if fingerprints
captured by a new fingerprint sensor were integrated into an
identification application with images captured from an existing
fingerprint sensor.
Han et. al [5] performed a study examining the influence of
image resolution and distortion due to differences in fingerprint
sensor technologies to their matching performance. Their
approach proposed a compensation algorithm which worked on
raw images and templates so that all the fingerprint images and
templates processed by the system are normalized against a
pre-specified baseline. Their research performed statistical
analysis on the basic fingerprint features for the original images
and the transformed images to test for differences between the
two.
The International Labor Organization (ILO) commissioned a
biometric testing campaign in an attempt to understand the
causes of the lack of interoperability [6]. 10 products were
tested, where each product consisted of a sensor paired with an
algorithm capable of enrollment and verification. Native and
interoperable False Accept Rate (FAR) and False Reject Rate
(FRR) were computed for all datasets. Mean FRR for genuine
matching scores of 0.92% was observed at FAR of 1%. The
objectives of this test were twofold: to test conformance of the
products in producing biometric information records complying
with ILO SID-0002, and to test which products could
interoperate at levels of less than 1% FRR at a fixed 1% FAR.
The results showed that out of the 10 products, only 2 products
were able to interoperate at the mandated levels.
NIST conducted the minutiae template interoperability test -
MINEX 2004 – to assess the interoperability of fingerprint
templates generated by different extractors and then matched
with different matchers. Four different datasets were used which
were referred to as the DOS, DHS, POE and POEBVA. The
performance evaluation framework calculated FNMR at fixed
FMR of 0.01% for all the matchers. Performance matrices were
created which represented all FNMR of native and
interoperable datasets and provided a means for a quick
comparison. Their results showed that FNMR for native
datasets were far lower than FNMR for interoperable datasets.
In [7] the authors conduct an image quality and minutiae count
analysis on fingerprints collected from three different sensors
and assess the performance rates of the native and interoperable
fingerprint datasets using different enrollment methodologies.
They also describe an ANOVA test for testing differences in
mean genuine matching scores between the three datasets.
Their preliminary statistical analysis showed that the genuine
matching scores were statistically significant in their differences
for the native and interoperable datasets. The previous body of
literature points to the growing importance of interoperability
for biometric systems. The previous research has concentrated
on interoperability error rate matrices and comparison of EER
between native and interoperable datasets. Although analysis of
error rates serve as a good indicator of performance, an alternate
technique which utilizes statistical techniques would be
beneficial. A formalized statistical analysis framework for
testing interoperability is lacking and this problem needs to be
addressed. The researchers in this experiment build on previous
work done in this area and propose a statistical analysis
framework for testing interoperability.
III. STATISTICAL ANALYSIS FRAMEWORK
Interoperability of biometric systems is going to become an
important issue, and the need for an analysis framework will
become imperative. Frameworks for testing biometric systems
and biometric algorithms can be found in literature, and the
researchers have proposed a framework for testing
interoperability of biometric systems in this paper. For the
purposes of this research the framework was adapted for testing
interoperability of fingerprint sensors. The framework is based
on the concept that if two fingerprint sensors are interoperable
then the resulting fingerprint datasets should have similar error
rates compared to error rates of fingerprint datasets collected
from any one of the single fingerprint sensors. The framework
for testing interoperability of fingerprint sensors is based on
three steps:
1. Statistical analysis of basic fingerprint features.
2. Error rates analysis of native and interoperable fingerprint
datasets.
3. Statistical analysis of matching scores of native and
interoperable fingerprint datasets.
This framework was evaluated using a dataset of fingerprints
collected from 6 fingerprint sensors and the methodology and
results are discussed in the following sections.
778
Fig. 1. Statistical Analysis Framework for Interoperability
Testing.
IV. DATA COLLECTION
The dataset used in this research is a part of KFRIA-DB
(Korea Fingerprint Recognition Interoperability Alliance
Database). Fingerprints were collected from 100 subjects using
6 different fingerprint sensors. Each subject provided 6
fingerprint images from the right index finger, right middle
finger, left index finger, and left middle finger. 2,400
fingerprint images were collected using each sensor. Table I has
a specifications overview for all the fingerprint sensors used in
the study.
Table I. Sensor Specifications
Sensor Technology
Type
Resolution
(DPI)
Interaction
Type
Image Size
Sensor 1 Thermal 500 Swipe 360 X
500
Sensor 2 Optical 500 Touch 280 X
320
Sensor 3 Optical 500 Touch 248 X
292
Sensor 4 Polymer 620 Touch 480 X
640
Sensor 5 Capacitive 508 Touch 256 X
360
Sensor 6 Optical 500 Touch 224 X
256
All analysis for this study was performed on raw fingerprint
images collected from the 6 sensors. Sensor characteristics like
capture area, capture technology, aspect ratio etc. have an
influence on the resulting image. Fig. 2 shows sample images
collected from the different sensors.
Sensor 1 Sensor 2 Sensor 3
Sensor 4 Sensor 5 Sensor 6
Fig. 2. Example Fingerprint Images.
V. FINGERPRINT FEATURE ANALYSIS
An important factor to consider when examining
interoperability is the ability of different fingerprint sensors to
capture similar fingerprint features from the same fingerprint.
Human interaction with the sensor, levels of habituation, finger
skin characteristics, and sensor characteristics introduce its own
source of variability. All of these factors affect the consistency
of fingerprint features of images acquired from different
sensors. It is important to analyze the amount of variance in
image quality and minutiae count of fingerprints captured from
different sensors. This analysis was performed by computing
image quality scores and minutiae count for all fingerprint
images using a commercially available software. Table II
shows the average values for image quality scores and minutiae
counts for all the datasets.
Table II. Average Image Quality Scores & Minutiae Count
Fingerprint dataset Quality Score
Range 0-100
Minutiae Count
Sensor 1 15.24 94.13
Sensor 2 74.97 45.89
Sensor 3 71.15 38.77
Sensor 4 6.62 52.44
Sensor 5 68.92 39.21
Sensor 6 62.58 31.25
The datasets for Sensor1 and Sensor4 showed very low
quality scores. It should be noted that the background of the
images captured from Sensor4 had a very dark background
which could have contributed for a very low quality score. Also
Sensor1 and Sensor4 were different technologies compared to
the other sensors which are more commonly available. An
analysis of variance (ANOVA) was performed on all the
datasets to test the differences in the mean count of image
quality and minutiae count between all the datasets. The
hypothesis stated in (1) was tested using the ANOVA test.
779
H10: µ1 = µ 2 = ……..= µ 6
H1A: µ1 ≠ µ2 ≠ ……..≠ µ 6
(1)
The p-value of this hypothesis test was computed to be less
than 0.05 which indicated that all the mean scores were
statistically significant in their differences. The same
hypothesis test was conducted on minutiae count for all the
datasets. The p-value of this hypothesis test was calculated to
be less than 0.05 which indicated that all the mean scores
were statistically significant in their differences. Table II
shows that image quality scores for fingerprints collected
from different sensors were significantly different. Previous
research has shown that image quality has an impact on
performance of fingerprint matching systems [8]. The next
step of the research involved evaluating the impact of the
basic fingerprint feature inconsistencies on performance of
fingerprint datasets collected from different sensors.
VI. PERFORMANCE RATES ANALYSIS
A. Performance Rates: ROC and Error Rates Matrix
In order to evaluate the performance of fingerprint datasets
False Non Match Rates (FNMR) were computed for all datasets.
A commercially available fingerprint feature extractor and
matcher was used to generate FNMR. FNMR were generated
for native datasets and interoperable datasets, where the native
dataset refers to the comparison of fingerprint images collected
from the same fingerprint sensors, and the interoperable
datasets refer to fingerprint images collected from different
sensors. The first three fingerprint images provided by the
subject were used to create the enrollment database, and the last
three images were used to create the verification database.
Enrollment databases were created for each of the 6 sensors, and
verification databases were also created for each of the 6
sensors. Matching scores were generated by comparing every
enrollment template from each enrollment database against
every fingerprint image from each verification database, which
resulted in a set of scores S for every combination of enrollment
and verification databases, where
S ={Eix,Vjy,scoreijxy}
i= 1,.. , number of enrolled template
j = 1,.., number of verification images
x= 1,.., number of enrollment datasets
y=1,…, number of verification datasets
scoreijxy = match score between enrollment template and
verification image
Using this set of scores, a FNMR matrix was generated.
FNMR was calculated for each set of scores at a fixed False
Match Rate (FMR) of 0.1% and 1%. The results are shown in
Table III and IV. The diagonal of the FNMR matrix are rates
for the native datasets, and the cells off the diagonal are rates for
the interoperable datasets. Since matching of fingerprints is a
symmetric process, the matrix can be viewed as a symmetric
matrix as well.
The FNMR matrix for 0.1% fixed FMR showed a varying
range of FNMR for native and interoperable datasets. All the
native datasets had a significantly lower FNMR compared to the
interoperable datasets. For example, S4 showed a native FNMR
of 0.1% and lowest interoperable FNMR of 35% which
indicates a very low level of interoperability between the
datasets. When the FNMR are analyzed in the context of image
quality scores, it can be seen that sensor 4 had the lowest image
quality scores which indicated it was an important factor in the
high FNMR of interoperable datasets. The interesting
observation is the relatively low level of FNMR for the native
dataset of fingerprints captured with S4. It was also observed
that FNMR of interoperable datasets created from fingerprint
sensors of the same acquisition technology and interaction type,
for example S2 and S3, was comparable to the FNMR of their
native datasets.
Table III. FNMR at fixed 0.1% FMR
S1 S2 S3 S4 S5 S6
S1 0.8 5.0 8.0 35.0 18.0 7.0
S2 0 0.6 38.0 2.0 0.12
S3 0.1 38.0 1.9 0.12
S4 0.1 58.0 32.0
S5 0.1 6.0
S6 0.1
Table IV. FNMR at fixed 1% FMR
S1 S2 S3 S4 S5 S6
S1 0 0 0 0 0 0
S2 0 0 0 0 0
S3 0 0 0 0
S4 0 0 0
S5 0 0
S6 0
Detection Error Tradeoff (DET) curves were also generated for
all the datasets. DET curves are a modification of Receiver
Operating Characteristic (ROC) curves. ROC curves are a
means of representing results of performance of diagnostic,
detection and pattern matching systems [9]. A DET curve plots
FMR on the x-axis and FNMR on the y-axis as function of
decision threshold. DET curves for different combinations of
enrollment/verification databases allow comparison of error
rates at different thresholds.
DET (T) = (FMR (T), FNMR (T))
where T is the threshold
(2)
Fig. 3 shows three superimposed DET curves. It can be
observed that the DET curve for the interoperable dataset for S2
and S3 performs worse than the other two native datasets at
every possible threshold. Fig. 4 also shows three superimposed
780
DET curves. The DET curve for the interoperable dataset for S2
and S4 shows its performance is much worse compared to the
native datasets. Looking at the DET curves for native datasets in
Fig. 3 and Fig. 4, the difference in performance between the
native datasets is comparable. But the difference in performance
between the interoperable datasets is significantly different.
This indicates the unpredictable nature of determining
performance of the interoperable datasets based entirely on
performance of native datasets.
Fig. 3. DET Curve for S2 and S3 datasets.
Fig. 4. DET Curve for S2 and S4 datasets.
B. Statistical Analysis of Matching Scores
The DET curves and FNMR matrix provide an insight into
any existing differences in FNMR between native and
interoperable databases, but they do not provide a statistical
basis for testing the differences. A statistical analysis of the
results could help uncover underlying patterns which contribute
to the unpredictability observed in comparison of the DET
curves. To assess interoperability at matching score level, the
matching scores from the genuine comparisons of native dataset
were compared to matching scores from genuine comparisons
of interoperable datasets. For true interoperability,
performance of interoperable datasets should be statistically
similar to performance of native dataset. An ANOVA test was
used to test for differences in the mean genuine matching scores
between the native dataset and the interoperable datasets at a
significance level of 0.05. This test was performed for each of
the six native datasets, which resulted in 6 sets of hypothesis as
stated in (3).
H20: µnative = µ interoperable1 = ……..= µ interoperable5
H2A: µnative ≠ µ interoperable1 ≠ ……..≠ µ interoperable5
(3)
The ANOVA test for all six hypothesis had a p-value of less
than 0.05, which resulted in rejecting the null hypothesis and
concluding that native genuine matching scores were
significantly different compared to interoperable matching
scores.
In several experiments such as this, one of the treatments is a
control and the other treatments are comparison treatments. A
statistical test can be devised which compares different
treatments to a control. Such a statistical test can be performed
using the Dunnett’s test, which is a modified form of a t-test
[10]. For this particular experiment, the native dataset genuine
match scores were used as the control and the interoperable
dataset genuine match scores as the comparison treatments. For
each native dataset, there were 5 control treatments which
corresponded to interoperable datasets. The mean genuine
match score for each interoperable database was tested against
the control (i.e. native database score). According to the
Dunnett’s test, the null hypothesis H0: µnative = µinteroperable is
rejected at α = 0.05 if
)11
(),1(||ai
αa.i.
nnMSEfadyy (4)
where
dα (a-1,f) = Dunnet’s constant
a-1=number of interoperable datasets
f = number of error degrees of freedom
MSE = mean square of error terms
ni = number of samples in control
na= number of samples in interoperable set a
The Dunnet’s test was performed on all the possible
combinations of native and interoperable datasets. The
Dunnet’s test showed all of the genuine matching scores of the
interoperable datasets were different compared to the genuine
matching scores of the control dataset. Table IV. shows the
average genuine matching score of the control dataset and the
average genuine matching score of the interoperable dataset
which had the least absolute difference with the control.
An evaluation of results in Table IV shows that S2 and S3 had
the best interoperable genuine matching scores. When the
interoperable matching rates are analyzed in context of image
quality scores and minutiae count, S2 and S3 had the least
absolute difference between their image quality scores and
minutiae counts. Combining these results provides a positive
781
indicator for improving predictability of FNMR for
interoperable datasets.
Table V. Matching Scores for Control Dataset and
Interoperable Dataset
Average Matching Score
Control Dataset
Interoperable Dataset with
Least Difference
Sensor 1 Dataset- 319.3 Sensor 3- 294.2
Sensor 2 Dataset- 749.2 Sensor 3- 609.2
Sensor 3 Dataset- 789 Sensor 2- 609.2
Sensor 4 Dataset- 575.5 Sensor 3- 281.6
Sensor 5 Dataset- 631.9 Sensor 3- 390.5
Sensor 6 Dataset- 652.7 Sensor 2- 521.9
VII. CONCLUSIONS AND FUTURE WORK
Previous research has shown image quality has a significant
impact on performance of native fingerprint datasets, and this
research showed that image quality and minutiae count have an
impact on performance of interoperable fingerprint datasets.
The type of capture technology did not have a consistent effect
on FNMR of interoperable fingerprint datasets which was
noticed in the difference in FNMR between datasets S2 and S3,
and S2 and S4. It is important to understand the effect of these
factors since they can be used to reduce the unpredictability of
performance of interoperable datasets. Interoperability is
dependent on several factors, and this research uncovered
important factors and illustrated its significance using statistical
tests and analysis methodologies. The results of these findings
can be used in designing fingerprint matching algorithms which
specifically take advantage of this new knowledge.
The results discussed in this paper indicate several avenues of
research which could be followed to improve the statistical
analysis framework. Along with comparison of genuine
matching scores using the Dunnet’s test, a comparison of
proportions can also be applied to statistically test the FNMR
between native and interoperable datasets. This would add one
more test to collection of interoperability tests. Application of
this framework to a different modality would also be an
interesting study. In this research the framework was applied
exclusively to interoperability tests for fingerprint recognition
and it helped in synthesizing the results in a novel way. Other
biometric modalities will be facing the same problems related to
interoperability as those by fingerprint recognition, and it will
become imperative to understand these issues and try to solve
them. Application of this framework to other modalities could
provide ideas into solving the problems of interoperability in a
larger context. An investigative multivariate analysis which
uses basic fingerprint features as predictor variables and
matching scores as response variables is another avenue of
future work. Understanding the effect of these predictor
variables on interoperable matching scores could be used to
create a model which is capable of describing the interactions
and effects.
ACKNOWLEDGMENT
The authors would like to thank KFRIA (Korea Fingerprint
Recognition Interoperability Alliance) for supporting this
research and providing the fingerprint database for analysis.
REFERENCES
[1] IBG, Biometrics Market and Industry Report. 2007, IBG: NY. p.
224.
[2] Haas, N., S. Pankanti, and M. Yao, Fingerprint Quality
Assessment. In Automatic Fingerprint Recognition Systems. 2004,
NY: Springer-Verlag. 55-66.
[3] Jain, A. and A. Ross, eds. Biometric Sensor Interoperability.
BioAW 2004, ed. A. Jain and D. Maltoni. Vol. 3067. 2004,
Springer-Verlag: Berlin. 134-145.
[4] Ko, T. and R. Krishnan. Monitoring and Reporting of Fingerprint
Image Quality and Match Accuracy for a Large User Application.
in Applied Imagery Pattern Recognition Workshop. 2004.
Washington, D.C.: IEEE Computer Society.
[5] Han, Y., et al. Resolution and Distortion Compensation based on
Sensor Evaluation for Interoperable Fingerprint Recognition. in
2006 International Joint Conference on Neural Networks. 2006.
Vancourver, Canada.
[6] Campbell, J. and M. Madden, ILO Seafarers' Identity Documents
Biometric Interoperability Test Report 2006, International Labour
Organization: Geneva. p. 170.
[7] Modi, S., S. Elliott, and H. Kim. Performance Analysis for Multi
Sensor Fingerprint Recognition System. in International
Conference on Information Systems Security. 2007. Delhi, India:
Springer Verlag.
[8] Elliott, S.J. and S.K. Modi. Impact of Image Quality on
Performance: Comparison of Young and Elderly Fingerprints. in
6th International Conference on Recent Advances in Soft
Computing (RASC). 2006. Canterbury, UK.
[9] Mansfield, A. and J. Wayman, Best Practices. 2002, National
Physics Laboratory: Middlesex. p. 32.
[10] Montgomery, D.C., Design and Analysis of Experiments. 4th ed.
1997, New York: John Wiley & Sons. 704.
782