View
215
Download
0
Category
Tags:
Preview:
Citation preview
Developments:
•Estimating the Order (Number of Hidden States) of a Hidden Markov Model
•Application of Decision Tree to HMM
A Hidden Markov Model consists of
1. A sequence of states {Xt|t T} = {X1,
X2, ... , XT} , and
2. A sequence of observations {Yt |t T} =
{Y1, Y2, ... , YT}
Some basic problems:
from the observations {Y1, Y2, ... , YT}
1. Determine the sequence of states {X1,
X2, ... , XT}.
2. Determine (or estimate) the parameters of the stochastic process that is generating the states and the observations.
Estimating the Order (Number of Hidden States) of a Hidden Markov Model
Finite mixture models
),(1
)( jyfm
jjyF
Finite mixture model takes the form
Example: Poisson mixture model with m=3 components
The density function of Poisson mixture model:)3,(3)2,(2)1,(1)( yfyfyfyF
!
33
3!
22
2!
11
1 y
ey
y
ey
y
ey
Poi (λ1)α1 Poi (λ2)
α2
Poi (λ3)α3
Estimation of the number of components of a finite mixture model
•AIC-Akaike Information Criterion
•BIC-Bayesian Information Criterion
Most commonly used but not justified theoretically
mm dl
2/)(log mm dnl
md -The number of free parameters in the modelm -The number of components n -sample size
ml -log likelihood with m components
Solution
Penalized likelihood methods -Only for finite number of states
•Penalized Minimum distance method (Chen & Kalbfleisch, 1996)
• Consistent estimate of the number of components in a finite mixture model
Chen & Kalbfleisch Idea
The stationary HMMs form a class of finite mixture models with a Markovian property
Penalized Minimum Distance Method to estimate the number of Hidden States in HMM (MacKey, 2002)
+
Penalized Distance
Let { }, be a family of density functions and G( ) be a finite distribution function on . Then the density function of a finite mixture model is
),( xF
),(1
),( jxFk
jjpGxF
The mixing distribution is
)(1
)(
jIk
jjpG
The Penalized Distance is calculated using following way
k
j jPnCGxFnFdGxFnFD
1log)),(,()),(,(
Distance Measure Penalty term
-Sequence of positive constants Chen & Kalbfleisch used =0.01n-1/2logn where n is number of observations The penalty proposed here penalizes the overfitting of subpopulations which has an estimated probability close to zero and which differs only very slightly.
nC
nC
n
ix
iXI
nxnF
1)(1)(
The empirical distribution function
Different distance measures ) can be used The Kolmogorov-Smirnov Distance The Cramer-Von Mises Distance The Kullback-Leibler Distance
2,
1( FFd
Application to Multiple Sclerosis Lesion Count Data
Patients afflicted with relapsing –remitting multiple sclerosis (MS) experience lesions on the brain stem, with symptoms typically worsening and improving in a somewhat cyclic fashion. -Reasonable to assume that the distribution of the lesion counts depends on the patient’s underlying disease activity. -The sequence of disease states is hidden. -Three patients, each of whom has monthly MRI scans for a period of 30 months.
Proposed model:
Yit|Zit ~ Poisson (μ0Zit)
Yit – the number of lesions observed on patient i at time t
Zit – the associated disease state (unobserved)
μ0Zit- Distinct Poisson means
Results: Penalized minimum –distances for different numbers of hidden states
Number of states
Estimated Poisson means
Minimum distance
1 4.03 0.1306
2 2.48, 6.25 0.0608
3 2.77, 2.62, 7.10 0.0639
4 2.05, 2.96, 3.53, 7.75
0.0774
5 1.83, 3.21, 3.40, 3.58, 8.35
0.0959
Estimates of the parameters of the hidden process Initial probability matrix
]406.0,594.0[0ˆ
Transition probability matrix
0P̂ 0.619 0.3810.558 0.442
The performance of the penalized minimum distance method
Number of components Sample size Separation of components Proportion of time in each state
1. Application of Decision Tree to HMM Observed data sequence
…. Ot-1 Ot Ot+1 ….
Viterbi-labeled statesDecision Tree
Output probabilitiesPr(Lj, qt=si)
Lj
The Simulated Hidden Markov model for the Multiple Sclerosis Lesion Count Data (Laverty et al., 2002)
Transition Probability Matrix
State1 State 2 State 1 State 2
Initial Probability Matrix State1 State 20.594 0.406
Mean Vector State1 State 2
2.48 6.25
0.619 0.3810.558 0.442
Number of lesions Counts
State Number of lesions Counts
State Number of lesions Counts
State
4 2 1 1 3 2
3 2 4 2 4 2
4 2 2 2 7 2
7 2 0 1 0 1
1 1 2 2 5 2
1 1 1 1 3 2
0 1 2 1 4 2
1 1 3 2 6 2
3 1 1 1 4 2
2 1 4 2 1 2
How this works:Tree construction Greedy Tree Construction Algorithm
Step 0:start with all labeled data Step 1: while stopping condition is unmet do: Step 2: Find best split threshold over all thresholds and dimensions. Step 3: send data to left or right child depending on threshold test. Step 4: recursively repeat steps 1-4 for left and right children.
The three rules characterize a tree- growing strategy:
A splitting rule: that determines when the decision threshold is placed, given the data in a node.
A stopping rule: that determines when recursion ends. This is the rule that determines whether a node is a leaf node.
A labeling rule: that assigns some values or class label to every leaf node. For the tree considered here, leaves will be associated (labeled) with the state-conditional output probabilities used in the HMM.
Splitting Rules: Entropy Criterion: The highest info-Gain is used to select the attribute to split.
The entropy of the set S (units are in bits)
Info(T)=
where size of S.Infox(T)=
Gain(X)=Info(T)-Infox(T)
S
SiCfreqm
i S
SiCfreq ),(2
log1
),(
S
)(inf1
||iTo
k
i TiT
GINI Criterion: The smallest value of GINI Index
is used to select the attribute to split.
GINI criteria for splitting is calculated by the following formula:
where N-the number of observations in the initial node. -the number of observations of wth class, which corresponds to lth nodeNl -the number of observations appropriate to lth new node
L
l
K
w l
wl
N
N
NLG
1 1
2)(11)(
wlN
Decision Tree classification of States
Number of lesions Counts
State According to Decision Tree
Classification
Number of lesions Counts
State According to Decision Tree Classification
Number of lesions Counts
State According to Decision Tree Classification
4 2 2 1 1 1 3 2 2
3 2 2 4 2 2 4 2 2
4 2 2 2 2 1 7 2 2
7 2 2 0 1 1 0 1 1
1 1 1 2 2 1 5 2 2
1 1 1 1 1 1 3 2 2
0 1 1 2 1 1 4 2 2
1 1 1 3 2 2 6 2 2
3 1 1 1 1 1 4 2 2
2 1 1 4 2 2 1 2 1
Given the state
The state-conditional probability at time t and state Si
Pr(Ot|qt=Si)
Can estimate the probabilities that a given state emitted a certain observation.
2. Application of Decision Tree to HMM Observed data sequence
…. Ot-1 Ot Ot+1 ….
Decision Tree
The Simplest possible model for
the given data
Decision Tree The splitting criterion can be depending on several things:
•Type of observed data (independent/autoregressive)
Type of the transition probabilities (balanced/ unbalanced among the states)
Separation of Components (well separated or close together)
S
?
S-Well separated C-Close together
Unbalanced
C
ObservedData
Independent Autoregressive
Durbin Watson test
SC S
CSC
Balanced Balanced Unbalanced
??
?
S
Advantages of Decision Tree
•Trees can handle high-dimensional spaces gracefully.
•Because of the hierarchical nature, finding a tree-based output probability given the output is extremely fast.
•Trees can cope with categorical as well as continuous data.
Disadvantages of Decision Tree
•The set of class boundaries is relatively inelegant (rough).
•A decision tree model is non-parametric and has many more free parameters than a parametric model of similar power. Therefore this will require more storage and to obtain good estimates a large amount of training data is required.
Reference:
Foote, J.T., Decision-Tree Probability modeling for HMM Speech Recognition, Ph.D. Thesis, Division of Engineering, Brown University, RI, USA, 1993.
Kantardzic, M, Data mining: concepts, models, methods and algorithms, New York; Chichester, Wiley, c2003 Laverty, W.H., M. J. Miket and I.W. Kelly, Simulation of Hidden Markov models with Excel, The Statistician, Volume 51, Part 1, pp. 31-40, 2002 MacKay, R.J., estimating the order of a Hidden Markov Model, The Canadian Journal of Statistics, Vol. 30, pp.573-589, 2002.
Recommended