17
Hidden Markov Models in Keystroke Dynamics Md Liakat Ali, John V. Monaco, and Charles C. Tappert Seidenberg School of CSIS, Pace University, White Plains, New York May 1, 2015

Hidden Markov Models in Keystroke Dynamics Md Liakat Ali, John V. Monaco, and Charles C. Tappert Seidenberg School of CSIS, Pace University, White Plains,

Embed Size (px)

Citation preview

Page 1: Hidden Markov Models in Keystroke Dynamics Md Liakat Ali, John V. Monaco, and Charles C. Tappert Seidenberg School of CSIS, Pace University, White Plains,

Hidden Markov Models in Keystroke Dynamics

Md Liakat Ali, John V. Monaco, and Charles C. Tappert

Seidenberg School of CSIS, Pace University, White Plains, New YorkMay 1, 2015

Page 2: Hidden Markov Models in Keystroke Dynamics Md Liakat Ali, John V. Monaco, and Charles C. Tappert Seidenberg School of CSIS, Pace University, White Plains,

Contents

Objective o Generative Vs Discriminative

Classifier

Markov Modelo Markov chaino Hidden Markov Model

Elements of HMM Three problems for HMMs and

solutions.

Researches in Keystroke Dynamics Using HMMo Findings

Page 3: Hidden Markov Models in Keystroke Dynamics Md Liakat Ali, John V. Monaco, and Charles C. Tappert Seidenberg School of CSIS, Pace University, White Plains,

Machine Learning Unsupervised learning is known

as clustering, and involves grouping data into categories based on some measure of inherent similarity or distance.

Supervised learning, i.e. learning where a training set of correctly identified observations is available. The computer is presented with example inputs and their desired outputs

Classification is an instance of supervised learning

Page 4: Hidden Markov Models in Keystroke Dynamics Md Liakat Ali, John V. Monaco, and Charles C. Tappert Seidenberg School of CSIS, Pace University, White Plains,

Generative Vs Discriminative Classifier

The main difference –

o Generative model is a full probabilistic model of all variable .

For fewer training samples, the generative model performs better .

o Discriminative model is only for target variable(s) dependent on observed variable.

For large samples, discriminative classifiers generally outperform generative classifiers.

Faster classification of new data compared to generative model

Page 5: Hidden Markov Models in Keystroke Dynamics Md Liakat Ali, John V. Monaco, and Charles C. Tappert Seidenberg School of CSIS, Pace University, White Plains,

Generative Vs Discriminative Classifier Popular generative models are:

• Gaussians, Mixture of Gaussians, • Naïve Bayes, Bayesian networks, • Hidden Markov Models, • Sigmoidal belief networks• Markov random fields.

Popular discriminative models are:

• Logistic regression, • Support Vector Machine, • Neural Networks, • Nearest neighbor, • Conditional Random Fields.

Page 6: Hidden Markov Models in Keystroke Dynamics Md Liakat Ali, John V. Monaco, and Charles C. Tappert Seidenberg School of CSIS, Pace University, White Plains,

Andrey Andreyevich Markov was a Russian mathematician. He is best known for his work on stochastic processes. A primary subject of his research later became known as Markov chains and Markov processes. (Wikipedia)

Andrey Andreyevich Markov

Page 7: Hidden Markov Models in Keystroke Dynamics Md Liakat Ali, John V. Monaco, and Charles C. Tappert Seidenberg School of CSIS, Pace University, White Plains,

A stochastic model which is used to model a system that randomly changes, and where the future state, given the past and the present, only depend on the present state not the past.

Types of Markov model:

MARKOV MODEL

System state is fully observable

System state is partially observable

System is autonomous

Markov chainHidden Markov model

System is controlled

Markov decision process

partially observable Markov decision process

Page 8: Hidden Markov Models in Keystroke Dynamics Md Liakat Ali, John V. Monaco, and Charles C. Tappert Seidenberg School of CSIS, Pace University, White Plains,

The simplest Markov model which models a system that randomly changes through time.

A Markov chain is a discrete-time process where a Markov process is the continuous-time version of Markov chain.

Three-state Markov Process of Weathero Assume any given day t, the weather is observed as being one of the

following states:• State 1: rain• State 2: cloudy• State 3: sunny

And the transition probabilities between states is described by the transition matrix A is

Markov Chain

Fig. A Markov model with 3 states and state transition probabilities.

Page 9: Hidden Markov Models in Keystroke Dynamics Md Liakat Ali, John V. Monaco, and Charles C. Tappert Seidenberg School of CSIS, Pace University, White Plains,

For example on the day (t) 1, the weather is sunny means in state 3 and we want to find the probability that the weather for next 7 days will be sunny-sunny-rain-rain-sunny-cloudy-sunny. Which means, we want to find the probability of the observation sequence, O = S3, S3, S3, S1, S1, S3, S2, S3

Three-state Markov Process of Weather

Page 10: Hidden Markov Models in Keystroke Dynamics Md Liakat Ali, John V. Monaco, and Charles C. Tappert Seidenberg School of CSIS, Pace University, White Plains,

An extension of Markov chain where observation of the system is a probabilistic function of the state

Has an underlying stochastic process that is not observable (hidden), but can only be observed through another set of stochastic processes that produce a sequence of observation.

Hidden Markov Model

𝑠2

𝑞3 A A

Markov chain 𝑞1 𝑞2 𝑞3 𝑞𝑛 A A A

Hidden Markov Model 𝑞1 𝑞1 𝑞𝑛 A

𝑠1 𝑠3 𝑠𝑛

Hidden

Observable

B B\B

B B

Π0

Page 11: Hidden Markov Models in Keystroke Dynamics Md Liakat Ali, John V. Monaco, and Charles C. Tappert Seidenberg School of CSIS, Pace University, White Plains,

Assume that we were kidnapped and kept in a locked room for several days. We cannot observe the weather directly-the only evidence we have one kidnapper brings food have brought an umbrella.

Let assume that the probability of rain seeing an

umbrella is 0.8, for cloudy 0.3 and for sunny 0.1. Before we were kidnapped, we could observe the

weather and the weather Markov process was:

Hidden Markov Model

Page 12: Hidden Markov Models in Keystroke Dynamics Md Liakat Ali, John V. Monaco, and Charles C. Tappert Seidenberg School of CSIS, Pace University, White Plains,

But now as the actual weather is hidden and the probability of states of weather corresponds to seeing umbrella (u = True or False)

o For example the day we were kidnapped, it was sunny.

o The next day when the kidnapper brought food, he also carried umbrella into the room.

Hidden Markov Model

o Assuming the prior probability of carrying umbrella on any day is 0.5, we can find the probability of rain on second day by the following steps:

Page 13: Hidden Markov Models in Keystroke Dynamics Md Liakat Ali, John V. Monaco, and Charles C. Tappert Seidenberg School of CSIS, Pace University, White Plains,

Hidden Markov ModelState 1: rain State 2: cloudy State 3: sunny

Page 14: Hidden Markov Models in Keystroke Dynamics Md Liakat Ali, John V. Monaco, and Charles C. Tappert Seidenberg School of CSIS, Pace University, White Plains,

Previous example depicts how to construct a weather model via HMM

Some difficulties in modeling procedure such as Finding the number of states (model

size) of the model, How to choose model parameter (such as

transition probabilities) and The size of observation sequence.

Hidden Markov Model

Page 15: Hidden Markov Models in Keystroke Dynamics Md Liakat Ali, John V. Monaco, and Charles C. Tappert Seidenberg School of CSIS, Pace University, White Plains,

Problem 1 (Evaluation problem): Given observation sequence, O={O1,O2,…OT} and a model λ=(A,B,π), how do we efficiently compute P(O|λ), the likelihood of the observation sequence given the model?

o The solution is given by the Forward and Backward procedures.

Problem 2 (Decoding Problem): Given observation sequence, O={O1,O2,…OT}and a model λ=(A,B,π), how do we choose a corresponding state sequence Q= {q1, q2, …qT} that is optimal such as best explains the data?

o The solution for this problem is provided by the Viterbi algorithm.

Problem 3 (Learning problem): How do we adjust the model parameters A,B, Π to maximize the likelihood P(O|λ).

o The solution is given by the Baum-Welch re-estimation procedure.

Three basic problems for HMMs

Page 16: Hidden Markov Models in Keystroke Dynamics Md Liakat Ali, John V. Monaco, and Charles C. Tappert Seidenberg School of CSIS, Pace University, White Plains,

Keystroke dynamic studies using HMM

Study Participants

Samples Per User Features

Input Method EER (%)

Training Testing

Chen and Chang, 2004 [11]

Chang, 2005 [12]

20 20 200 DT, FT

User-fixed text-one word

Discrete HMM

NA

Rodgrigues et al., 2005 [13]

20 40 30 DT, FT, ASCII key code

User-fixed Number- 8

digit

Continuous HMM

3.6

Vuyyuru et al., 2006 [14]

43 9 20 DT Fixed-text: “master of science in computer science”

HMM 3.04

Jiang et al., 2007 [15] 315Training:

58Testing:

257

15 13 DT, FT, n-graph

User-fixed text

Minimum 9 characters

HMM , Gaussian

2.54

Zhang et al., 2010 [16] 12 20 40 DT, FT

User-fixed textMinimum 10

characters

HMM 2.00

Page 17: Hidden Markov Models in Keystroke Dynamics Md Liakat Ali, John V. Monaco, and Charles C. Tappert Seidenberg School of CSIS, Pace University, White Plains,

Thank You

?