25
Empirical and Fully Bayesian Hierarchical Models : Two Applications in the Stellar Evolution Shijing Si * , David van Dyk * , Ted von Hippel Introduction Example Fitting the Distributions of Halo WDs Fitting the Variability in IFMRs Among Stellar Clusters Empirical and Fully Bayesian Hierarchical Models : Two Applications in the Stellar Evolution Shijing Si * , David van Dyk * , Ted von Hippel * Imperial College London Embry-Riddle Aeronautical University February 10, 2015

Empirical and Fully Bayesian Hierarchical Models : Two ......2015/02/10  · Empirical Bayes Estimation It proceeds by obtaining an estimation for hyper parameters and ˝2 rst. Usually,

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Empirical and Fully Bayesian Hierarchical Models : Two ......2015/02/10  · Empirical Bayes Estimation It proceeds by obtaining an estimation for hyper parameters and ˝2 rst. Usually,

Empirical andFully BayesianHierarchical

Models : TwoApplicationsin the StellarEvolution

Shijing Si∗,David vanDyk∗, Tedvon Hippel†

Introduction

Example

Fitting theDistributionsof Halo WDs

Fitting theVariability inIFMRs AmongStellarClusters

Empirical and Fully Bayesian HierarchicalModels : Two Applications in the Stellar

Evolution

Shijing Si∗, David van Dyk∗, Ted von Hippel†

∗Imperial College London

†Embry-Riddle Aeronautical University

February 10, 2015

Page 2: Empirical and Fully Bayesian Hierarchical Models : Two ......2015/02/10  · Empirical Bayes Estimation It proceeds by obtaining an estimation for hyper parameters and ˝2 rst. Usually,

Empirical andFully BayesianHierarchical

Models : TwoApplicationsin the StellarEvolution

Shijing Si∗,David vanDyk∗, Tedvon Hippel†

Introduction

Example

Fitting theDistributionsof Halo WDs

Fitting theVariability inIFMRs AmongStellarClusters

Overview

1 Introduction

2 Example

3 Fitting the Distributions of Halo WDs

4 Fitting the Variability in IFMRs Among Stellar Clusters

Page 3: Empirical and Fully Bayesian Hierarchical Models : Two ......2015/02/10  · Empirical Bayes Estimation It proceeds by obtaining an estimation for hyper parameters and ˝2 rst. Usually,

Empirical andFully BayesianHierarchical

Models : TwoApplicationsin the StellarEvolution

Shijing Si∗,David vanDyk∗, Tedvon Hippel†

Introduction

Example

Fitting theDistributionsof Halo WDs

Fitting theVariability inIFMRs AmongStellarClusters

Hierarchical Modelling

• Many data are of hierarchical structure, whichmotivates hierarchical models.

• Hierarchical modelling usually produces ashrinkage estimate, which is widely considered tobe more reasonable.

• Also, it can reduce the mean squared errors.

• However, fitting a hierarchical model may becomputationally intensive.

Page 4: Empirical and Fully Bayesian Hierarchical Models : Two ......2015/02/10  · Empirical Bayes Estimation It proceeds by obtaining an estimation for hyper parameters and ˝2 rst. Usually,

Empirical andFully BayesianHierarchical

Models : TwoApplicationsin the StellarEvolution

Shijing Si∗,David vanDyk∗, Tedvon Hippel†

Introduction

Example

Fitting theDistributionsof Halo WDs

Fitting theVariability inIFMRs AmongStellarClusters

Hierarchical Models

• One commonly used and simple hierarchicalmodel is:

Yi = µi + εi ; εi ∼ N(0, σ2i ), (1)

µi ∼ N(γ, τ2), i = 1, · · · , I . (2)

in which:

• εi s are independent of µj s and σ2i s are known.

• µi s are group effects, γ pooling effects, and τ2

between-group variation.

• What we want to learn from this model is µi s andγ and τ2.

Page 5: Empirical and Fully Bayesian Hierarchical Models : Two ......2015/02/10  · Empirical Bayes Estimation It proceeds by obtaining an estimation for hyper parameters and ˝2 rst. Usually,

Empirical andFully BayesianHierarchical

Models : TwoApplicationsin the StellarEvolution

Shijing Si∗,David vanDyk∗, Tedvon Hippel†

Introduction

Example

Fitting theDistributionsof Halo WDs

Fitting theVariability inIFMRs AmongStellarClusters

Statistical Inference to Hierarchical Models

• Commonly used ways are Maximum likelihood,fully Bayesian and empirical Bayes.

• MLE: Find the MLE of hyper-parameters bymaximising their likelihood function;

• Fully Bayesian: Find the fully joint posterior forall parameters and make inferences from thispostrior, usually via MCMC;

• Empirical Bayesian: Find the MLE or MAP ofhyper-parameters by maximising their likelihoodfunction and then fit group-level parameters in ageneral Bayesian way.

Page 6: Empirical and Fully Bayesian Hierarchical Models : Two ......2015/02/10  · Empirical Bayes Estimation It proceeds by obtaining an estimation for hyper parameters and ˝2 rst. Usually,

Empirical andFully BayesianHierarchical

Models : TwoApplicationsin the StellarEvolution

Shijing Si∗,David vanDyk∗, Tedvon Hippel†

Introduction

Example

Fitting theDistributionsof Halo WDs

Fitting theVariability inIFMRs AmongStellarClusters

Pros and Cons

• Fully Bayesian will yield a shrinkage estimator forall µi s. However, it often brings computationalchallenges, eg. high-dimensional sampling.

• Empirical Bayes is the combination of theprevious two. It will produce a shrinkageestimator but not much as fully Bayesian, alsoless intensive than it.

Page 7: Empirical and Fully Bayesian Hierarchical Models : Two ......2015/02/10  · Empirical Bayes Estimation It proceeds by obtaining an estimation for hyper parameters and ˝2 rst. Usually,

Empirical andFully BayesianHierarchical

Models : TwoApplicationsin the StellarEvolution

Shijing Si∗,David vanDyk∗, Tedvon Hippel†

Introduction

Example

Fitting theDistributionsof Halo WDs

Fitting theVariability inIFMRs AmongStellarClusters

Fully Bayesian Approach

We fit a dataset with these two approaches and compare theresult later.Fully Bayesian: The joint posterior density

p(γ, τ2, µ1, · · · , µI |Y) ∝ p(γ, τ2)I∏

i=1

N(Yi |µi , σi )N(µi |γ, τ2).

Page 8: Empirical and Fully Bayesian Hierarchical Models : Two ......2015/02/10  · Empirical Bayes Estimation It proceeds by obtaining an estimation for hyper parameters and ˝2 rst. Usually,

Empirical andFully BayesianHierarchical

Models : TwoApplicationsin the StellarEvolution

Shijing Si∗,David vanDyk∗, Tedvon Hippel†

Introduction

Example

Fitting theDistributionsof Halo WDs

Fitting theVariability inIFMRs AmongStellarClusters

Shrinkage Estimator

• For brevity, we take the hyper-prior p(γ, τ2) ∝ 1.Given γ and τ2, the conditional densities for µi sare

p(µi |γ, τ2,Y) ∼ N(µ̂i , σ̂2i ), (3)

with µ̂i =τ2Yi+σ2

i γ

σ2i +τ2

and σ̂2i =σ2

i τ2

σ2i +τ2

.

p(γ|τ2, µ1, · · · , µI ,Y) ∼ N(1

I

I∑i=1

µi ,τ2

n),

p(τ2|γ, µ1, · · · , µI ,Y) ∼ inv − Γ(n − 2

2,

1

2

I∑i=1

(µi − γ)2).

Page 9: Empirical and Fully Bayesian Hierarchical Models : Two ......2015/02/10  · Empirical Bayes Estimation It proceeds by obtaining an estimation for hyper parameters and ˝2 rst. Usually,

Empirical andFully BayesianHierarchical

Models : TwoApplicationsin the StellarEvolution

Shijing Si∗,David vanDyk∗, Tedvon Hippel†

Introduction

Example

Fitting theDistributionsof Halo WDs

Fitting theVariability inIFMRs AmongStellarClusters

Empirical Bayes Estimation

• It proceeds by obtaining an estimation for hyperparameters γ and τ2 first. Usually, a MLE orMAP estimator. Then, µi s are fitted in a generalBayesian framework.

• EM-type algorithms can be employed to find theMLE of hyper-parameters. µi s are treated asmissing data.

The log complete data likelihood function:

L(γ, τ2|Y, µ1, · · · , µI ) =I∑

i=1

[− 1

2log(2πσ2i )− 1

2σ2i(Yi − µi )

2

−1

2log(2πτ2)− 1

2τ2(µi − γ)2

].

Page 10: Empirical and Fully Bayesian Hierarchical Models : Two ......2015/02/10  · Empirical Bayes Estimation It proceeds by obtaining an estimation for hyper parameters and ˝2 rst. Usually,

Empirical andFully BayesianHierarchical

Models : TwoApplicationsin the StellarEvolution

Shijing Si∗,David vanDyk∗, Tedvon Hippel†

Introduction

Example

Fitting theDistributionsof Halo WDs

Fitting theVariability inIFMRs AmongStellarClusters

EM algorithm

• Given the t-th iteration γ(t) and τ2(t), conditionaldistribution of µi s are normal with mean

E (µi |Y, γ(t), τ2(t)) =τ2(t)Yi+σ2

i γ(t)

σ2i +τ2(t)

and variance

Var(µi |Y, γ(t), τ2(t)) =τ2(t)σ2

i

σ2i +τ2(t)

.

• The t + 1-st update:

γ(t+1) =1

I

I∑i=1

E (µi |Y, γ(t), τ2(t));

τ2(t+1) =1

I

I∑i=1

E [(µi − γ(t+1))2|Y, γ(t), τ2(t)].

Page 11: Empirical and Fully Bayesian Hierarchical Models : Two ......2015/02/10  · Empirical Bayes Estimation It proceeds by obtaining an estimation for hyper parameters and ˝2 rst. Usually,

Empirical andFully BayesianHierarchical

Models : TwoApplicationsin the StellarEvolution

Shijing Si∗,David vanDyk∗, Tedvon Hippel†

Introduction

Example

Fitting theDistributionsof Halo WDs

Fitting theVariability inIFMRs AmongStellarClusters

One Example

• We take one dataset in Rstan (with modificationson σ). It is a survey conducted by ETS to checkthe effect of coaching on students’ performancein eight states in US. The dataset is presented asfollows:Y 28 8 -3 7 -1 1 18 12

σ 5 1 5 2 3 1 2 3

• The hierarchical model:

Yi = µi + εi ; εi ∼ N(0, σ2i ),

µi ∼ N(γ, τ2), i = 1, · · · , I .

Page 12: Empirical and Fully Bayesian Hierarchical Models : Two ......2015/02/10  · Empirical Bayes Estimation It proceeds by obtaining an estimation for hyper parameters and ˝2 rst. Usually,

Empirical andFully BayesianHierarchical

Models : TwoApplicationsin the StellarEvolution

Shijing Si∗,David vanDyk∗, Tedvon Hippel†

Introduction

Example

Fitting theDistributionsof Halo WDs

Fitting theVariability inIFMRs AmongStellarClusters

Results

We use Rstan to fit this hierarchical model in a fully Bayesianway and code in R to find the MLE of hyper-parameters by EMalgorithm.

histogram of gamma

gamma

Den

sity

−10 0 10 20

0.00

0.04

0.08 Empirical Bayes Estimate

histogram of tau

gamma

Den

sity

5 10 15 20 25 30

0.00

0.04

0.08

0.12

Empirical Bayes Estimate

Figure 1 : Inferences on γ and τ 2 from fully Bayesian (Histograms)and empirical Bayes (Red line)

Page 13: Empirical and Fully Bayesian Hierarchical Models : Two ......2015/02/10  · Empirical Bayes Estimation It proceeds by obtaining an estimation for hyper parameters and ˝2 rst. Usually,

Empirical andFully BayesianHierarchical

Models : TwoApplicationsin the StellarEvolution

Shijing Si∗,David vanDyk∗, Tedvon Hippel†

Introduction

Example

Fitting theDistributionsof Halo WDs

Fitting theVariability inIFMRs AmongStellarClusters

Cont’d

mean score of 1st school

mu1

Den

sity

10 15 20 25 30 35 40

0.00

0.04

0.08

Empirical

Bayes

mean score of 2nd school

mu2

Den

sity

5 6 7 8 9 10 11

0.0

0.1

0.2

0.3

0.4

mean score of 3rd school

mu3

Den

sity

−15 −5 0 5 10 15

0.00

0.04

0.08

mean score of 4th school

mu4

Den

sity

2 4 6 8 10 12 14

0.00

0.10

0.20

mean score of 5th school

mu5

Den

sity

−10 −5 0 5 10

0.00

0.05

0.10

0.15

mean score of 6th school

mu6

Den

sity

−2 −1 0 1 2 3 4

0.0

0.2

0.4

mean score of 7th school

mu7

Den

sity

12 14 16 18 20 22 24

0.00

0.10

0.20

mean score of 8th school

mu8

Den

sity

5 10 15 20

0.00

0.05

0.10

0.15

Figure 2 : Comparisons on indivual means from fully Bayesian(Histograms) and empirical Bayes (Red line)

Page 14: Empirical and Fully Bayesian Hierarchical Models : Two ......2015/02/10  · Empirical Bayes Estimation It proceeds by obtaining an estimation for hyper parameters and ˝2 rst. Usually,

Empirical andFully BayesianHierarchical

Models : TwoApplicationsin the StellarEvolution

Shijing Si∗,David vanDyk∗, Tedvon Hippel†

Introduction

Example

Fitting theDistributionsof Halo WDs

Fitting theVariability inIFMRs AmongStellarClusters

Diagnostics about EM algorithm

8.55 8.60 8.65 8.70 8.75

7090

110

Trace plot of EM updates

gamma

tauS

quar

e

0 100 200 300 400 500

8.55

8.65

8.75

iterations

gam

ma

8.535824

0 100 200 300 400 500

7090

110

iterations

tauS

quar

e

69.78020

0 50 100 150 2000.0e

+00

1.5e

−13

profile likelihood function of tauSquare

tauSquare

prof

ile li

kelih

ood

69.76

Figure 3 : Diagnostic pictures about EM algorithm

Page 15: Empirical and Fully Bayesian Hierarchical Models : Two ......2015/02/10  · Empirical Bayes Estimation It proceeds by obtaining an estimation for hyper parameters and ˝2 rst. Usually,

Empirical andFully BayesianHierarchical

Models : TwoApplicationsin the StellarEvolution

Shijing Si∗,David vanDyk∗, Tedvon Hippel†

Introduction

Example

Fitting theDistributionsof Halo WDs

Fitting theVariability inIFMRs AmongStellarClusters

Contour plot of Marginal likelihood

0 5 10 15

050

100

150

200

Contour of gamma and tau−square

gamma

tau−

squa

re

Figure 4 : Contour plot of marginal likelihood of γ and τ 2

Page 16: Empirical and Fully Bayesian Hierarchical Models : Two ......2015/02/10  · Empirical Bayes Estimation It proceeds by obtaining an estimation for hyper parameters and ˝2 rst. Usually,

Empirical andFully BayesianHierarchical

Models : TwoApplicationsin the StellarEvolution

Shijing Si∗,David vanDyk∗, Tedvon Hippel†

Introduction

Example

Fitting theDistributionsof Halo WDs

Fitting theVariability inIFMRs AmongStellarClusters

Fitting the Distributions of Age of Halo WDs

• The goal here is the distribution of the populationage of galactic halo WDs.

• µ the mean of the population logAge.

• Here is the simple hierarchical structure:

populationage

WDnWD1 WD2

Page 17: Empirical and Fully Bayesian Hierarchical Models : Two ......2015/02/10  · Empirical Bayes Estimation It proceeds by obtaining an estimation for hyper parameters and ˝2 rst. Usually,

Empirical andFully BayesianHierarchical

Models : TwoApplicationsin the StellarEvolution

Shijing Si∗,David vanDyk∗, Tedvon Hippel†

Introduction

Example

Fitting theDistributionsof Halo WDs

Fitting theVariability inIFMRs AmongStellarClusters

Mathematical Formula

• For the i-th star, denote its logAge, distancemodulus, metallicity, mass as Ai ,Di ,Ti ,Mi .

• The data available is its photometry, Yi , thebrightness on a range of wavelengths.

• Assume that the logAge of the Galactic halo isnormal with mean γ and variance τ2.

(Yi |Ai ,Di ,Ti ,Mi ) ∼ N(G (Ai ,Di ,Ti ,Mi ),Σi );

Ai ∼ N(γ, τ2), i = 1, · · · , I ,

with G (·) is a computer model and Σi s are known fromastronomers. Also, astronomers have very good priors on otherparameters, Di ,Ti ,Mi , i = 1, · · · , I .

Page 18: Empirical and Fully Bayesian Hierarchical Models : Two ......2015/02/10  · Empirical Bayes Estimation It proceeds by obtaining an estimation for hyper parameters and ˝2 rst. Usually,

Empirical andFully BayesianHierarchical

Models : TwoApplicationsin the StellarEvolution

Shijing Si∗,David vanDyk∗, Tedvon Hippel†

Introduction

Example

Fitting theDistributionsof Halo WDs

Fitting theVariability inIFMRs AmongStellarClusters

Photometry Data

Photometry is the intensity of an astronomical object’selectromagnetic radiation, over a braod band of wavelengths.

Figure 5 : The photometry over several filters.

Page 19: Empirical and Fully Bayesian Hierarchical Models : Two ......2015/02/10  · Empirical Bayes Estimation It proceeds by obtaining an estimation for hyper parameters and ˝2 rst. Usually,

Empirical andFully BayesianHierarchical

Models : TwoApplicationsin the StellarEvolution

Shijing Si∗,David vanDyk∗, Tedvon Hippel†

Introduction

Example

Fitting theDistributionsof Halo WDs

Fitting theVariability inIFMRs AmongStellarClusters

Fitting one Star at a Time

Page 20: Empirical and Fully Bayesian Hierarchical Models : Two ......2015/02/10  · Empirical Bayes Estimation It proceeds by obtaining an estimation for hyper parameters and ˝2 rst. Usually,

Empirical andFully BayesianHierarchical

Models : TwoApplicationsin the StellarEvolution

Shijing Si∗,David vanDyk∗, Tedvon Hippel†

Introduction

Example

Fitting theDistributionsof Halo WDs

Fitting theVariability inIFMRs AmongStellarClusters

Empirical Bayes Fitting

BASE9 can fit one star at a time, namely, it can draw a verygood sample from the i-th star’s posterior density for given γand τ2:

p(Ai ,Di ,Ti ,Mi |Yi ) ∼ p(Yi |Ai ,Di ,Ti ,Mi )×p(Ai |γ, τ2)p(Mi )p(Di )p(Ti ),

where p(Ai |γ, τ2), p(Mi ), p(Di ), p(Ti ) are prior densities.We incorporate this fit-one-at-a-time algorithm in our fitting toa more complicated hierarchical model.

Page 21: Empirical and Fully Bayesian Hierarchical Models : Two ......2015/02/10  · Empirical Bayes Estimation It proceeds by obtaining an estimation for hyper parameters and ˝2 rst. Usually,

Empirical andFully BayesianHierarchical

Models : TwoApplicationsin the StellarEvolution

Shijing Si∗,David vanDyk∗, Tedvon Hippel†

Introduction

Example

Fitting theDistributionsof Halo WDs

Fitting theVariability inIFMRs AmongStellarClusters

Monte Carlo EM algorithm

Monte Carlo EM (MCEM) algorithm is used to find the MLEof γ and τ2.

MC-step: Given the t-th update γ(t) and τ2(t), draw asample of size N,

(A[j]i ,D

[j]i ,M

[j]i ,T

[j]i ), j = 1, 2, · · · ,N, i = 1, 2, · · · , I from

its posterior;

E-step: Replace expectations with sample means;

M-step:

γ(t+1) =1

I · N

I∑i=1

N∑j=1

A[j]i ;

τ2(t+1) =1

I · N

I∑i=1

N∑j=1

(A[j]i − γ

(t+1))2.

Page 22: Empirical and Fully Bayesian Hierarchical Models : Two ......2015/02/10  · Empirical Bayes Estimation It proceeds by obtaining an estimation for hyper parameters and ˝2 rst. Usually,

Empirical andFully BayesianHierarchical

Models : TwoApplicationsin the StellarEvolution

Shijing Si∗,David vanDyk∗, Tedvon Hippel†

Introduction

Example

Fitting theDistributionsof Halo WDs

Fitting theVariability inIFMRs AmongStellarClusters

Primary Results

9.8 9.9 10.0 10.1 10.2

010

2030

40

Distribution of logAge for DB white dwarfs

Den

sity

Empirical Hierarchical

Individual

J0346

J1102

J2137

J2145N

J2145S

Figure 6 : Fit 5 stars in a hierarchical model.

Page 23: Empirical and Fully Bayesian Hierarchical Models : Two ......2015/02/10  · Empirical Bayes Estimation It proceeds by obtaining an estimation for hyper parameters and ˝2 rst. Usually,

Empirical andFully BayesianHierarchical

Models : TwoApplicationsin the StellarEvolution

Shijing Si∗,David vanDyk∗, Tedvon Hippel†

Introduction

Example

Fitting theDistributionsof Halo WDs

Fitting theVariability inIFMRs AmongStellarClusters

IFMRs

• In stellar evolution, IFMR, initial-final massrelationship connects the mass of a white dwarfwith the mass of its progenitor in themain-sequence.

• Commonly used IFMRs: William, Weidemann,Salaris I and Salaris II.

• Assume Mfinal = f (Minitial ,β), usually f (·) istaken as linear or piecewise linear.

Page 24: Empirical and Fully Bayesian Hierarchical Models : Two ......2015/02/10  · Empirical Bayes Estimation It proceeds by obtaining an estimation for hyper parameters and ˝2 rst. Usually,

Empirical andFully BayesianHierarchical

Models : TwoApplicationsin the StellarEvolution

Shijing Si∗,David vanDyk∗, Tedvon Hippel†

Introduction

Example

Fitting theDistributionsof Halo WDs

Fitting theVariability inIFMRs AmongStellarClusters

Contour plot

Initial Mass

WD

Mas

s

0 2 4 6 8

0.6

0.8

1.0

1.2

HyadesNGC 2477WeidemannWilliamSalaris ISalaris II

Figure 7 : Contour of the joint distribution of final mass and initialmass, which contains 95% of the distribution.

Page 25: Empirical and Fully Bayesian Hierarchical Models : Two ......2015/02/10  · Empirical Bayes Estimation It proceeds by obtaining an estimation for hyper parameters and ˝2 rst. Usually,

Empirical andFully BayesianHierarchical

Models : TwoApplicationsin the StellarEvolution

Shijing Si∗,David vanDyk∗, Tedvon Hippel†

Introduction

Example

Fitting theDistributionsof Halo WDs

Fitting theVariability inIFMRs AmongStellarClusters

Hierarchical Modelling of IFMRs

(Xi |βi ) ∼ N(G (Ai , θi ,βi ),Σi ), i = 1, 2, · · · , I ;(βi |ξ,Ψ) ∼ N(ξ,Ψ),

• Σi are known;

• βi s are coefficients in IFMRs;

• We want to learn from some dataset about ξ andΨ.