29
Keystroke Biometrics Studies on a Variety of Short and Long Text and Numeric Input Ned Bakelman, DPS Candidate Charles C. Tappert, PhD, Advisor Seidenberg School of Computer Science and Information Systems Pace University White Plains, NY 10606, USA DPS Defense April 11, 2014

Keystroke Biometrics Studies on a Variety of Short and Long Text and Numeric Input Ned Bakelman, DPS Candidate Charles C. Tappert, PhD, Advisor Seidenberg

Embed Size (px)

Citation preview

Page 1: Keystroke Biometrics Studies on a Variety of Short and Long Text and Numeric Input Ned Bakelman, DPS Candidate Charles C. Tappert, PhD, Advisor Seidenberg

Keystroke Biometrics Studies on a Variety of Short and Long Text and Numeric Input

Ned Bakelman, DPS CandidateCharles C. Tappert, PhD, Advisor

Seidenberg School of Computer Science and Information Systems Pace University

White Plains, NY 10606, USA

DPS DefenseApril 11, 2014

Page 2: Keystroke Biometrics Studies on a Variety of Short and Long Text and Numeric Input Ned Bakelman, DPS Candidate Charles C. Tappert, PhD, Advisor Seidenberg

Researched Questions

This study focuses on biometric authentication using long bursts of arbitrary input and short bursts of fixed input

with an improved classification system

• Long Input: 100 – 1500 characters (paragraph, couple of sentences, etc.)• Short Input: 10 – 15 characters (password, pass code, etc.)

• Arbitrary Input: Open unrestricted text (up to the users choosing)

Page 3: Keystroke Biometrics Studies on a Variety of Short and Long Text and Numeric Input Ned Bakelman, DPS Candidate Charles C. Tappert, PhD, Advisor Seidenberg

Research Questions (continued)

1) Can we accurately detect the intruder use of a computer system in an office environment?

2) How does the use of standard applications such as word processing, spreadsheet, browser impact intruder detection?

3) Is an intruder still detectable if using a web browser (low text environment)

Purpose of the StudyLong Input - Unauthorized User Detection

1) What is the accuracy between the two?2) Which performs better on long input?3) Which performs better on short input?

1) What is the detection accuracy of short fixed numeric keypad input?2) Does the use of specific keypad features improve detection accuracy?

Short Keypad Input – Detection Accuracy

Classifier Comparison – Multi Match vs. Single Match

Page 4: Keystroke Biometrics Studies on a Variety of Short and Long Text and Numeric Input Ned Bakelman, DPS Candidate Charles C. Tappert, PhD, Advisor Seidenberg

Background

T. Olzak, Keystroke Dynamics: Low Impact Biometric Verification, Sep, 2006

• Derived from raw timing data• Based on key press duration and transition times• Also known as Dwell and Flight time

• Statistical in nature, mainly Means and Standard Deviations• Pre-processing to remove outliers and standardize between 0 – 1• Fallback procedure

(Source of Features or Attributes)

Page 5: Keystroke Biometrics Studies on a Variety of Short and Long Text and Numeric Input Ned Bakelman, DPS Candidate Charles C. Tappert, PhD, Advisor Seidenberg

Background (continued)

Wikipedia.org http://en.wikipedia.org/wiki/Computer_keyboard, last updated: March 6, 2012

QWERTY Numeric Keypad

Separate features for QWERTY and Keypad• Durations and transitions for individual keys, groups of keys, etc.• QWERTY: each letter, each number, vowels, consonants, all letters, etc.• Keypad: each digit, each operator (+ - * /), all digits, all operators, etc

(Target of Features or Attributes)

Page 6: Keystroke Biometrics Studies on a Variety of Short and Long Text and Numeric Input Ned Bakelman, DPS Candidate Charles C. Tappert, PhD, Advisor Seidenberg

Background (continued)(Pace Classifier: Single Match)

• Dichotomy Model• Uses vector differences• Transforms a multi-class problem to a two-class problem• K-Nearest Neighbor (k-NN) is used for classification

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.80

0.05

0.1

0.15

0.2

S1

S2S3

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.40

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

within distance

between distance

Feature Vector Space3 subjects, 4 samples

Feature Difference Space18 within, 48 between

Page 7: Keystroke Biometrics Studies on a Variety of Short and Long Text and Numeric Input Ned Bakelman, DPS Candidate Charles C. Tappert, PhD, Advisor Seidenberg

Background (continued)(Pace Classifier: Multi Match)

Authentication Process• User Focused Reduction Method (reduces the training space)• System performance obtained using the Leave-One-Out method• “Left out” test sample is used to create differences of different vectors

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.40

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

S1 within distance

S1 between distance

• Each test difference is classified(k-NN)• Results are grouped together • Authentication decision based on all

Feature Reduction Space6 within, 32 between

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.80

0.05

0.1

0.15

0.2

S1

S2S3

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.40

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

within distance

between distance

Feature Vector Space3 subjects, 4 samples

Feature Difference Space18 within, 48 between

Page 8: Keystroke Biometrics Studies on a Variety of Short and Long Text and Numeric Input Ned Bakelman, DPS Candidate Charles C. Tappert, PhD, Advisor Seidenberg

Background (continued)

Receiver Operating Characteristic Curves (ROC)• Historically used in signal detection such as RADAR in distinguishing an actual signal from noise• Used in Biometrics to plot the FAR and FRR at various operating points (thresholds)

(Performance: ROC Curves, Equal Error Rate)

Equal Error Rate (EER)• The point on the ROC curve where the FAR and FRR are equal• The operating point on the ROC curve where the FAR and FRR intersect

ROC Curve FAR / FRR Intersection

Page 9: Keystroke Biometrics Studies on a Variety of Short and Long Text and Numeric Input Ned Bakelman, DPS Candidate Charles C. Tappert, PhD, Advisor Seidenberg

Data Collection

• Only “perfect” samples were used (no mistakes)• Rest period of at least one day between sessions• Data entered into a spreadsheet using right hand

30 Subjects

914 193 7761 4

Number Sessions

20Per Subject

(Numeric Keypad)

Page 10: Keystroke Biometrics Studies on a Variety of Short and Long Text and Numeric Input Ned Bakelman, DPS Candidate Charles C. Tappert, PhD, Advisor Seidenberg

Features

Features

Attributes Mean (µ) Standard Deviation (σ) TotalQWERTY (Non-Numeric)

Durations: 53 53 53 106per (Type I and II) Transitions: 35 70 70 140

QWERTY (Numeric)Durations: 27 27 27 54

per (Type I and II) Transitions: 26 52 52 104Keypad

Durations: 29 29 29 58per (Type I and II) Transitions: 128 256 256 512

Totals: 298 487 487 974

(Feature Attribute Summary)

Page 11: Keystroke Biometrics Studies on a Variety of Short and Long Text and Numeric Input Ned Bakelman, DPS Candidate Charles C. Tappert, PhD, Advisor Seidenberg

NumericKeypad

Digits with Decimal

0

1

2

34 5 6

7

8

9

.

Arithmetic Operators with Num Lock and Enter

NumLock

Enter/ *

-

+

All Keys

Features(Keypad Durations)

Print Screen, Sys Rq, Scroll Lock, Pause, Break

CenterpadHome

Page Up

Page Dn End

Del

Ins

Four Arrows

Page 12: Keystroke Biometrics Studies on a Variety of Short and Long Text and Numeric Input Ned Bakelman, DPS Candidate Charles C. Tappert, PhD, Advisor Seidenberg

keypad ->keypad

any digit->any Digit1->1,2,3…0

2->1,2,3…0

3->1,2,3…0

4->1,2,3…0

5->1,2,3…0 6->1,2,3…0

7->1,2,3…0

8->1,2,3…0

9->1,2,3…0

0->1,2,3…01->digits

2->digits

3->digits

4->digits

5->digits 6->digits

7->digits

8->digits

9->digits

0->digits

Any Digit->ArithmeticOperators

1->ArithmeticOperators

2->ArithmeticOperators

3->ArithmeticOperators

4->ArithmeticOperators

5->ArithmeticOperators

6->ArithmeticOperators 7->

ArithmeticOperators

8->ArithmeticOperators

9->ArithmeticOperators

0->Arithmetic Operators div->

digits

Arithmetic Operator->

any digit

mult-> digits

sub-> digits

add-> digits

Any Key->Any Key

Features (continued)(Keypad Transitions)

Page 13: Keystroke Biometrics Studies on a Variety of Short and Long Text and Numeric Input Ned Bakelman, DPS Candidate Charles C. Tappert, PhD, Advisor Seidenberg

Results – Short Input Experiments(Equal Error Rate for each keypad experiment per Classifier)

10 Subject 20 Subject 30 Subject

Multi Match

Single Match

Multi Match

Single Match

Multi Match

Single Match

Page 14: Keystroke Biometrics Studies on a Variety of Short and Long Text and Numeric Input Ned Bakelman, DPS Candidate Charles C. Tappert, PhD, Advisor Seidenberg

Results – Short Input Experiments (continued)(ROC Curve for each keypad experiment per Classifier)

Multi Match Classifier Single Match Classifier

10 - 20: 10 Subjects, 20 samples each20 - 20: 20 Subjects, 20 samples each30 - 20: 30 Subjects, 20 samples each

Page 15: Keystroke Biometrics Studies on a Variety of Short and Long Text and Numeric Input Ned Bakelman, DPS Candidate Charles C. Tappert, PhD, Advisor Seidenberg

Results – Short Input Experiments (continued)

Numeric KeypadSubjects 10 20 30Samples per Subject 20 20 20Total Samples (All Subjects) 200 400 600

EER % (Multi Match) 5.50% 5.65% 6.14%EER % (Single Match) 15.56% 15.72% 14.95%

EER Improvement % 64.65% 64.06% 58.93%

• Independent Variable 1: Number of Subjects• Independent Variable 2: Classifier

• Conclusion 1: EER increases ˄ as Number of Subjects increases *• Conclusion 2: New Classifier much better than Old Classifier

* Except for old Classifier

(Independent Variables for the short input experiments)

(but not by much)

Page 16: Keystroke Biometrics Studies on a Variety of Short and Long Text and Numeric Input Ned Bakelman, DPS Candidate Charles C. Tappert, PhD, Advisor Seidenberg

CMU Experiment - Keypad

914 193 7761 + Enter Key = 11 Characters

• 10 key-down ---> key-down• 10 key-up ---> key-down• 11 dwell times• 31 Features

Carnegie Melon Features (from their numeric keypad study *)

• (10 key-down ---> key-down) per µ, per σ = 20 • (10 key-up ---> key-down) per µ, per σ = 20 • (7 dwell) per µ, per σ = 14• 54 Timing Features

Pace University Features (from our numeric keypad study)

(Features Set Comparison – CMU vs. PaceU)

R. Maxion and K. Killourhy, "Keystroke Biometrics with Number-Pad Input,“ 2010 IEEE/IFIP International Conference on Dependable Systems & Networks (DSN), Chicago, IL, 2010, pp. 201-210.

*

Page 17: Keystroke Biometrics Studies on a Variety of Short and Long Text and Numeric Input Ned Bakelman, DPS Candidate Charles C. Tappert, PhD, Advisor Seidenberg

CMU Experiment – Keypad (continued)(Equal Error Rate and ROC Curves only using Multi Match)

0

20

40

60

80

100

0 20 40 60 80 100

FA

R (%

)

FRR (%)

Keypad 30 - 20 CMU Features

Keypad 30 - 20 PU Features

PU Data with CMU Features

Equal Error Rate ROC Curves

PU Features vs. CMU Features

Page 18: Keystroke Biometrics Studies on a Variety of Short and Long Text and Numeric Input Ned Bakelman, DPS Candidate Charles C. Tappert, PhD, Advisor Seidenberg

CMU Experiment – Keypad (continued)

• Independent Variable: Feature Set• Conclusion: PU Feature Set out performed CMU Feature Set

(Independent Variable for the CMU Keypad experiment)

  Numeric Keypad (30 – 20)Features Set CMU PUSubjects   30 30Samples per Subject   20 20Total Samples (All Subjects)   600 600      

EER % (Multi Match)   10.47% 6.14%EER Improvement % 41.36%

Page 19: Keystroke Biometrics Studies on a Variety of Short and Long Text and Numeric Input Ned Bakelman, DPS Candidate Charles C. Tappert, PhD, Advisor Seidenberg

Conclusions

• Keystroke Biometrics can be effective at detecting the unauthorized use of a computer system in a closed environment (government office, school, business office, etc.)

• Performance Varied with Input Type:• Spreadsheet: Good Performance (EER: 8.1%)• Text: Very Good Performance (EER: 5.8%)• Browser: Fair Performance (ER: 15.7%)

Long Input Experiments – Intruder Detection Accuracy

1) Multi Match out performed Single Match significantly (EER Improvement from 50% - 64%)2) Multi Match out performed detector study from CMU using their data and features (EER: 7.6%)

• Numeric Keypad yields very good performance (EER Range: 5.5% - 6.2%)• PaceU Features Set is Effective: CMU features performed much worse (10.5% vs. 6.2%)

Short Input Experiments – Detection Accuracy

Classifier Comparison – Multi Match vs. Single Match

Page 20: Keystroke Biometrics Studies on a Variety of Short and Long Text and Numeric Input Ned Bakelman, DPS Candidate Charles C. Tappert, PhD, Advisor Seidenberg

Conclusions (continued)

• Less optimal samples• No designated entry window for sample collection (less control over quality of entry)• Large fluctuations in the number of keystrokes• Input types most likely had substantial mouse activity that “Interrupts” keystroke entry• Possible sparseness of keystrokes (meaning less concentrated and spread out especially with

browser entry)

Long Input Performance: Weaker Performance compared to previous studies at PU… Why?

• Propose that correlating performance simply to Number of Keystrokes is not sufficient• Need to factor in the density of the keystrokes as well• Simply stated: It may take a lot more keystrokes to maintain an effective level of performance if the

sparseness is high

Future Considerations: Do keystroke counts tell the whole story?

Page 21: Keystroke Biometrics Studies on a Variety of Short and Long Text and Numeric Input Ned Bakelman, DPS Candidate Charles C. Tappert, PhD, Advisor Seidenberg

Suggestions for Future Work

• Further studies on numeric entry from QWERTY• Compare performance to numeric entry from keypad• Study free text entry from keypad

• Feature Analysis• Which features contributed to performance from the keypad?• How do equivalent numeric features from QWERTY perform compared to

keypad?

• Perform mixed mode experiments• Collect input that combines spreadsheet, browser, and text• Collect spreadsheet input which includes all numeric entry from keypad

• Incorporate Multi Biometric• Keystroke + Mouse Movement + Stylometry

Page 22: Keystroke Biometrics Studies on a Variety of Short and Long Text and Numeric Input Ned Bakelman, DPS Candidate Charles C. Tappert, PhD, Advisor Seidenberg

Backup Slides

Page 23: Keystroke Biometrics Studies on a Variety of Short and Long Text and Numeric Input Ned Bakelman, DPS Candidate Charles C. Tappert, PhD, Advisor Seidenberg

Generate ROC Curves from kNN Data(vary m from 0 to k [m is the controlling or threshold parameter])

R. Zack, C. Tappert, and S.Cha, "Performance of a Long-Text-Input Keystroke Biometric Authentication System Using anImproved k-Nearest-Neighbor Classification Method," IEEE 4th Int Conf Biometrics (BTAS 2010), Washington D.C., 2010.

The m-kNN procedure with k = 9 and m = 5

For each Q (questioned) test sample:

• Examine the top k nearest-neighbors• count the number of within-class matches• If the number of within-class matches >= a threshold

of matches (m), the user is authenticated. Otherwise rejected.

Generate the ROC curve as follows:

• vary m from 0 to k• calculate FAR / FRR in each of the following cases:

• m = 0, authenticate if 0 or more of the k choices are within• m = 1 authenticate of 1 or more of the k choices are within• and so on until m = 9 in this case

Linear Rank Weighting Method:

• 1st choice weight = k, 2nd choice weight = k-1… weight = 1

• Authenticate a user if the sum of the weighted-within-classchoices >= the m threshold

• Threshold varies from 0 to k(k+1)/2 (maximum score)

Page 24: Keystroke Biometrics Studies on a Variety of Short and Long Text and Numeric Input Ned Bakelman, DPS Candidate Charles C. Tappert, PhD, Advisor Seidenberg

Equal Error Rates(From the Literature)Long Input:

• Ferreiar and Santos: 1.4%• Monaco using data from Villani: 1.7%

Generate the ROC curve as follows:

• vary m from 0 to k• calculate FAR / FRR in each of the following cases:

• m = 0, authenticate if 0 or more of the k choices are within• m = 1 authenticate of 1 or more of the k choices are within• and so on until m = 9 in this case

Page 25: Keystroke Biometrics Studies on a Variety of Short and Long Text and Numeric Input Ned Bakelman, DPS Candidate Charles C. Tappert, PhD, Advisor Seidenberg

Multi Biometrics for Intrusion Detection • Motor Control Level: keystroke + mouse movement

• Linguistic Level: stylometry (char, word, syntax)

• Semantic Level: target likely intruder commands

Intruder

Keystroke + Mouse

Stylometry

Motor Control Level

Linguistic Level

SemanticLevel

Future Work (continued)

Page 26: Keystroke Biometrics Studies on a Variety of Short and Long Text and Numeric Input Ned Bakelman, DPS Candidate Charles C. Tappert, PhD, Advisor Seidenberg

Intruder Experiment Design (continued)

• Authenticate user on various window sizes, beginning 300-keystroke windows• Window Type 1: use overlapping windows to:

• Minimize the “wait” period for the next authentication• Maximize fast intruder detection

1 300 600 900 1200 1500 1800

300KS

300KS

300KS

300KS

300KS

300KS

150

300KS

450 750 1050 1350 1650

300KS

300KS

300KS

300KS

Figure 1.5-1 Overlapping Window Burst Authentication

Page 27: Keystroke Biometrics Studies on a Variety of Short and Long Text and Numeric Input Ned Bakelman, DPS Candidate Charles C. Tappert, PhD, Advisor Seidenberg

EISIC 2012 27

Continuous vs Continual Authenticationwith Data Capture Windows

• Continuous (ongoing) burst authentication

• Continual burst authentication with pauses

0 5 min 10 min

1min

1min

1min

Burst 1 Burst 2 Burst 3

0 8 min 30 min

1min

1min

1min

PauseThreshold

Burst 1 Burst 2 Burst 3

PauseThreshold

Page 28: Keystroke Biometrics Studies on a Variety of Short and Long Text and Numeric Input Ned Bakelman, DPS Candidate Charles C. Tappert, PhD, Advisor Seidenberg

Background (continued)

• DARPA (Defense Advanced Research Projects Agency) through their Cyber Genome Program is funding research for the development of new software based authentication biometric modalities

•These include keystrokes and targets a desktop environment running Microsoft Office applications as the standard computer system platform

DARPA. Active Authentication Program. https://www.fbo.gov/index?s=opportunity&mode=form&id=c7968647352f0276fc1b28817c581d86&tab=core&_cview=0, accessed 2014.

• The 2008 United States Higher Education Opportunity Act requires institutions of higher learning to make greater online access control efforts by adopting ubiquitous identification technologies

HEOA. Higher Education Opportunity Act (HEOA) of 2008. http://www2.ed.gov/policy/highered/leg/hea08/index.html, accessed 2014.

Page 29: Keystroke Biometrics Studies on a Variety of Short and Long Text and Numeric Input Ned Bakelman, DPS Candidate Charles C. Tappert, PhD, Advisor Seidenberg

Spreadsheet Template2011 2010 2009

AssetsCash      

Investments:

Cash      

Equity Securities      

Corporate debt securities      

US government securities      

Private equity      

Real estate      

Total Investments 0   0   0

Other Assets      

Total Assets $0   $0   $0

Liabilities and Net Assets

Liabilities:Penalities      

Accounts Payable      

Advance from Lendor      

Federak excuse tax      

Total Liabilities 0   0   0

Net Assets:Tangiable      

Non Tangiable      

Total Net Assets 0   0   0Total Net Assets and Liabilities $0   $0   $0

Special Journal EntriesEnter Journal Entry name here      

Enter Journal Entry name here      

Enter Journal Entry name here      

Total Journal Entries $0.00   $0.00   $0.00