Upload
roary-mcbride
View
32
Download
1
Embed Size (px)
DESCRIPTION
Maximum Likelihood Estimation of Mixture Densities for Binned and Truncated Multivariate Data. Igor V. Cadez, Padhraic Smyth, Geoff J. Mclachlan, Christine and E. McLaren, Machine Learning 2001 (to appear) O, Jangmin 2001/06/01. Introduction (1). - PowerPoint PPT Presentation
Citation preview
Maximum Likelihood Maximum Likelihood Estimation of Mixture Estimation of Mixture Densities for Binned and Densities for Binned and Truncated Multivariate DataTruncated Multivariate Data
Igor V. Cadez, Padhraic Smyth, Geoff J. Mclachlan, Christine and E. McLaren,
Machine Learning 2001 (to appear)
O, Jangmin
2001/06/01
Introduction (1)Introduction (1)
Fitting mixture models to binned and truncated data by ML via EM.
Binning measurement with finite resolution quantifying real-valued variables
Truncation Motivation
diagnostic evaluation of anemia volume of RBC, amount of hemoglobin : measured by cytometric blood cell counter (Bayer Corp.)
Figure 1Figure 1
Introduction (2)Introduction (2)
Data in the form of histogram Computer Vision, Massive data sets, …
Binning Measurement Precision
Truncation Limitation of the range of measurement, intentionally, …
EM frame work Missing data: original data points.
Binned and Truncated DataBinned and Truncated Data
Sample space v mutually exclusive regions Hr (r=1,…,v)
Observation Only the number of nr of the Yj that fall in Hr (r=1,…,v0) is
recorded (v0 v).
Observed data vector :
a is multinomial distribution
ov
r rT
r nnnnα11 ,),...,(
)()(
);()(
1
o
r
v
rr
Y jjr
PP
dyyfP
Observed log likelihood
}!/!log{
)}(/)(log{)(log
11
11
o
o
v
rr
v
rrr
nnC
CPPnL
0
1
1
)(
)(
!
!)()(
v
r
n
rv
rr
r
o P
P
n
naPL
Application of EM Algorithm : Application of EM Algorithm : Missing DataMissing Data Unobservable frequencies in the case of truncation.
nr unobservable individuals in the rth region Yr.
Complete Data vector
Tvv nnu
o),...,( 1
vryyY TTnr
Trr r
,...,1 ,),...,( ,1,
TTv
TTT yyuax ),...,,,( 1
),|,...,()|()()( 1TTT
vTTTT uayypaupapxp
v
r
n
ssrc
r
yfL1 1
, );(log)(log
p(a;) is specified
p(u|a;) can be specified… (negative binomial ?)
p(y1+,…, yv+|u, a; ) is specified Conditioning on u and a, yj+ is composed by independent nj sampling
from the density
}!)!1/{()!1(
,)}({)}({
112
12
v
vrr
v
vrr
v
vr
nr
n
oo
o
r
nnnnC
PPC
0
1
1
)(
)(
!
!)()(
v
r
n
rv
rr
r
o P
P
n
naPL
)(/);( jj Pyf
v
r
n
srrs
r
Pyf1 1
)(/);(
Application of EM Algorithm : Application of EM Algorithm : Missing DataMissing Data Then, complete data log-likelihood
v
r
n
ssr
v
vr
nr
nv
r
n
ssr
v
r
n
srsr
v
rrr
vc
Cyf
CPPaupyf
PyfaupCPPn
auyypaupapL
r
r
r
r
1 1,
111 1
,
1 1,1
1
,...,1
);(log
})()(log{)|(log);(log
)}(/);(log{)|(log)}(/)(log{
),|(log)|(log)(log)(log
0
0
Application of EM Algorithm : Application of EM Algorithm : Mixture ModelMixture Model Extension to mixture model (g components)
Conditional probability that Yrs belongs to i-th component given yrs.
Final complete data log-likelihood
),...,1;,...,1( ),...,( 1 rT
grsrsrs nsvrzzz Zero-one indicator variable
);(
);(/);(}|1{
rsi
rsirsiirsirs
y
yfyfyZpr
g
i
v
r
n
sisriiirsc
r
yfzL1 1 1
, )};(log{)(log
E-StepE-Step
Calculation of Q(; (k)) expection over y1+,…,yv+ expection over u .
Expectation of u given a …
),...,1( )|(
),...,1(
]|}log);(){log;([);(
)(
)(
)(
1 1
)()()(
vvranE
vrnn
HYYfYEnQ
or
orkr
g
i
v
rrjiji
kji
kk
k
kr
),...,1( )()/()|( )()()( vvrPnPanE o
kkrrk
M-StepM-Step
i(k+1) update
= (1,…, g) : other parameters are adjusted to be…
v
r
kr
v
r rjk
jikrk
in
HYYEn k
1
)(
1
)()()1(
}|);({)(
0/);( )( kQ
0]|/);(log);([1 1
)()()(
g
i
v
rrjji
kji
k HYYfYEn kr
M-Step for Normal M-Step for Normal ComponentsComponents Parameter update equation
Practical implementation is more complex due to multinomial integrals.
v
r rjk
jikr
ki
ki
ki
v
r rjTk
ijkij
kji
krk
i
ki
v
r rjjk
jikr
ki
HYYEnC
C
CHYYYYEn
CHYYYEn
k
k
k
1
)()()(
)(
)(
1
)1()1()()()1(
)(
1
)()()1(
}|);({)(
)(
)(/}|))()(;({
)(/}|);({
)(
)(
)(
Computational and Numerical Computational and Numerical IssuesIssues Integration can’t be evaluated analytically.
m bins in univariate, O(md) in d-dimensional. O(i) evaluation in univariate integration, O(id) in d-dimensional Complex geometry. For fixed sample size, more sparser multivariate histogram
Integrating methods Numerical Monte Carlo Romberg : Idea – repeated 1-dimensional integration.
Handling Truncated RegionsHandling Truncated Regions
A single bin
No extra integration is needed.
0
01
, \v
rrvv HHH
0
,0
0
,0
1
1
),(),(
)(1),(
v
rH jjjiH jjj
v
rrH jj
rvv
vv
dyyfydyyfy
Pdyyf
3.3 The Complete EM 3.3 The Complete EM AlgorithmAlgorithm Treat the histogram as a PDF and draw a small number
of data points from it Fit the mixture model using the standard EM algorithm
(nonbinned , nontruncated) Using the parameter estimates from above, refine the
estimate with the full EM algorithm applied to the binned and truncated data
4. 4. Experimental Results with Experimental Results with Simulated DataSimulated Data 3 experiments
Generate data from a known PDF and then bin them (bivariate). Number of bin per dimension: 5 ~ 100 (step 5) 10 different samples for smoothing results. Standard EM on unbinned samples v.s. full EM on binned
samples Estimation method: KL distance between true density v.s. 2
EMs
Experiment SetupExperiment Setup
To test the quality of the solution for different numbers of data points from Figure 4. Data points N : 100 ~ 1000 (step 10) (20 bin, 100 data, 10 samples)
To test performance of the algorithm when the component densities are not so well separated. 3 apart components (20 bin, 20 separation, 10 samples)
To test the performance of the algorithm when significant truncation occurs (20 bin, 100 positions, 10 samples)
4.2 4.2 Estimation from Random Estimation from Random Samples Generated from the Samples Generated from the Binned Data Binned Data Baseline approach
Estimate PDF from a random sample from the binned data Uniform sampling estimation method
Figure 6 : comparison Overestimates the variance Variance inflation
Figure 6 : Estimated PDFs obtained from original data and PDFs fitted by binned and the uniform random-sample algorithm for (a) 5 bins per dimension and (b) 10 per dimension. 3-covariance ellipse
4.3 4.3 Experiments with Different Experiments with Different Sample SizeSample Size Figure 7
As a function of number of bins and number of data points Bin > 20, data > 500 : small KL distance
Figure 8 As a function of number of bins Bin (5 ~ 20): rapid decay, Bin > 20 : flat
Figure 9 As a function of number of data Exponential decay
Figure 7 : (a) average KL distance between the estimated density and the true density, (b) standard deviation of the KL distance from10 repeated samples.
4.4 4.4 Experiments with Different Experiments with Different Separations of Mixture Separations of Mixture ComponentsComponents Figure 10
As a function of number of bins and separation of mean Insensitive to separation of components
Figure 11 As a function of separation of mean Ratio of KL distance of the standard and binned algorithm Small number of bin : standard EM is better. Small separation : binned EM is better
Figure 12
4.5 4.5 Experiments with Experiments with TruncationTruncation Figure 13
Function of ratio of truncated points Standard EM ignores the information of truncation Relatively insensitive to truncation, in binned EM
Figure 14
Real Example : Red Blood Cell Real Example : Red Blood Cell DataData Medical diagnosis
based on two-dimensional histograms characterizing RBC and hemoglobin measurements
Mixture densities were fitted to histograms from 90 control subject and 82 subjects with iron deficient anemia
B=1002, N=40,000 Using for discriminant rule
Baseline features: 4-dim feature vector (mean, variance along RBC and hemoglobin)
11-dim features: two-component lognormal mixture models (mean, cov, mixing weight)
9-dim features: (mean, log-odds of eigenvalues of cov, mixing weight)
Figure 15. Contour plots from estimated density estimates for three control patients and three iron deficient anemia patients.
ConclusionConclusion
Fitting mixture densities to multivariate binned and truncated data
Computational and numerical implementation issues In 2-dim simulation, If number of bins exceeds 10 the
loss of information from quantization is minimal.