Upload
others
View
6
Download
0
Embed Size (px)
Citation preview
Fast Bayesian Inference in Dirichlet Process
Mixture Models
L. Wang and D. DunsonJournal of Computational and Graphical Statistics, 2011
Presented by Esther SalazarDuke University
November 18, 2011
E. Salazar (Reading group) November 18, 2011 1 / 19
Summary
The authors propose a fast approach for inference in Dirichlet processmixture (DPM) models
They focus on extremely fast alternatives to MCMC which allowaccurate approximate Bayes inferences and produce marginallikelihood estimates to be used in model comparison
The proposed algorithm is called: sequential updating and greedysearch (SUGS) algorithm
E. Salazar (Reading group) November 18, 2011 2 / 19
Dirichlet process mixture (DPM) models
Consider a DPM of normals (Lo 1984):
Sequential application of the DP prediction rule for subjects 1, . . . , ncreates a random partition of the integers {1, . . . , n}.
E. Salazar (Reading group) November 18, 2011 3 / 19
Approaches for posterior inference
For DPMs, there is a rich literature on MCMC algorithms
I marginal Gibbs sampling (MacEachern 1994)I conditional Gibbs sampling ((Ishwaran and James 2001)I split-merge (Jain and Neal 2004)
The previous approaches are useful in small to moderate sized datasets (severalhours or days for computation)
For DPMs, alternatives to MCMC are:
I predictive recursion (PR) (Newton and Zhang 1999; ...)I weighted Chinese restaurant (WCR) sampling (Lo, Brunner, and Chan
1996;...)I sequential importance sampling (SIS) (MacEachern, Clyde, and Liu 1999; ...)I variational Bayes (VB) (Blei and Jordan 2006; ...)
Disadvantages of the previous approaches:
I WCR and SIS are computationally intensive (large number of particles)I PR involves approximating a normalizing constantI VB tends to underestimate uncertainty in mixture models and is
sensitive to the starting values
E. Salazar (Reading group) November 18, 2011 4 / 19
Approaches for posterior inference
For DPMs, there is a rich literature on MCMC algorithms
I marginal Gibbs sampling (MacEachern 1994)I conditional Gibbs sampling ((Ishwaran and James 2001)I split-merge (Jain and Neal 2004)
The previous approaches are useful in small to moderate sized datasets (severalhours or days for computation)
For DPMs, alternatives to MCMC are:
I predictive recursion (PR) (Newton and Zhang 1999; ...)I weighted Chinese restaurant (WCR) sampling (Lo, Brunner, and Chan
1996;...)I sequential importance sampling (SIS) (MacEachern, Clyde, and Liu 1999; ...)I variational Bayes (VB) (Blei and Jordan 2006; ...)
Disadvantages of the previous approaches:
I WCR and SIS are computationally intensive (large number of particles)I PR involves approximating a normalizing constantI VB tends to underestimate uncertainty in mixture models and is
sensitive to the starting values
E. Salazar (Reading group) November 18, 2011 4 / 19
Proposal: general idea
They proposed an alternative sequential updating greedy search(SUGS) algorithm
The idea is factorize the DP prior as a product of:(i) a prior on the partition of subjects into clusters and(ii) independent priors on the parameters within each cluster
E. Salazar (Reading group) November 18, 2011 5 / 19
Product partition models (PPMs)
PPMs assume that items in different partition components are independent. Thelikelihood for a partition π = {S1, . . . , Sq} with observations y = (y1, . . . , yn) is aproduct over components
p(y|π) =q∏
j=1
f(ySj )
π is the only parameter under consideration (integrated out other parameters). Theprior distribution of the partition π is a product over the partition components
p(π) ∝q∏
j=1
h(Sj)
E. Salazar (Reading group) November 18, 2011 6 / 19
Product partition models (PPMs)
PPMs assume that items in different partition components are independent. Thelikelihood for a partition π = {S1, . . . , Sq} with observations y = (y1, . . . , yn) is aproduct over components
p(y|π) =q∏
j=1
f(ySj )
π is the only parameter under consideration (integrated out other parameters). Theprior distribution of the partition π is a product over the partition components
p(π) ∝q∏
j=1
h(Sj)
E. Salazar (Reading group) November 18, 2011 6 / 19
Dirichlet process mixtures and partition models
Assume there is an infinite sequence of clusters with θh the parameter for clusterh, h = 1, . . . ,∞Let γi be a cluster index for subject i with γi = h, then
Priors for the parameters within each of the clusters:
Sequential updating and greedy search: proposed algorithm
Conditional posterior probability of allocating subject i to cluster h
The joint posterior distribution for the cluster-specific coefficients θ = {θh}∞h=1
given the data and cluster allocation for all n subjects
E. Salazar (Reading group) November 18, 2011 8 / 19
SUGS algorithm
The algorithm cycles through subjects, i = 1, . . . , n, sequentially allocating themto the cluster that maximizes the conditional posterior allocation probability
This algorithm only requires a single cycle of deterministic calculations and canbe implemented within a few seconds. Also, the algorithm is online so additionalsubjects can be added
E. Salazar (Reading group) November 18, 2011 9 / 19
Estimation of the DP precision parameter αTo allow unknown α, we choose the prior
E. Salazar (Reading group) November 18, 2011 10 / 19
E. Salazar (Reading group) November 18, 2011 11 / 19
DP mixtures of normals and SUGS detailsThe authors focus on normal mixture models, letting θh = (µh, τh)
′ represent the meanand precision parameter for cluster h, h = 1, . . . ,∞. To specify p0 they chooseconjugate normal inverse-gamma priors
Updating
Simulation study
Two models were considered(1) mixture of three normals:
(2) a single normal with mean 0 and variance 0.4
In each case, they considered 100 simulated datasets with sample sizen = 500
E. Salazar (Reading group) November 18, 2011 13 / 19
Simulation study: results
E. Salazar (Reading group) November 18, 2011 14 / 19
Simulation study: results
E. Salazar (Reading group) November 18, 2011 15 / 19
Simulation study: results
Comparison with four other fast nonparametric DPM algorithms
E. Salazar (Reading group) November 18, 2011 16 / 19
Application
Data: Gestational age at delivery (GAD) from the Collaborative Perinatal Project(epidemiologic study conducted in the 1960s and 1970s)
The study is focused on 34,178 pregnancies which provide a large sample sizeexample
Aim: We are interested in the relationship of GAD and the covariates: race, sex,maternal smoking status during pregnancy, and maternal age (denoted byX1, X2, X3 and X4)
Model:
Application: results
20 permutations were run to eliminate the ordering effect
Computational speed: approximately 2 minutes for every singlepermutation
E. Salazar (Reading group) November 18, 2011 18 / 19
Application: results
E. Salazar (Reading group) November 18, 2011 19 / 19