89
BS and MI Xiao-Li Meng Bootstrap Replications Multiple Imputation Bayes Justification Non-Bayesian Justification Self-efficiency Characterizations of Self-efficiency Three Scenarios History 30 Years of Bootstrap and Multiple Imputation: Joint Replications vs Conditional Replications Xiao-Li Meng Department of Statistics, Harvard University Thanks to Zhan Li and Xianchao Xie

30 Years of Bootstrap and Multiple Imputation: Joint

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

BS and MI

Xiao-Li Meng

Bootstrap

Replications

MultipleImputation

BayesJustification

Non-BayesianJustification

Self-efficiency

Characterizationsof Self-efficiency

ThreeScenarios

History

30 Years of Bootstrap and Multiple Imputation:Joint Replications vs Conditional Replications

Xiao-Li Meng

Department of Statistics, Harvard University

Thanks to Zhan Li and Xianchao Xie

Supplementing Efron (1994, JASA) & Discussion by Rubin

“Missing Data, Imputation, and the Bootstrap.”

Continuing Meng (1994, Statistical Science)

“Multiple-imputation Inferences With Uncongenial Sources ofInput”

BS and MI

Xiao-Li Meng

Bootstrap

Replications

MultipleImputation

BayesJustification

Non-BayesianJustification

Self-efficiency

Characterizationsof Self-efficiency

ThreeScenarios

History

30 Years of Bootstrap and Multiple Imputation:Joint Replications vs Conditional Replications

Xiao-Li Meng

Department of Statistics, Harvard University

Thanks to Zhan Li and Xianchao Xie

Supplementing Efron (1994, JASA) & Discussion by Rubin

“Missing Data, Imputation, and the Bootstrap.”

Continuing Meng (1994, Statistical Science)

“Multiple-imputation Inferences With Uncongenial Sources ofInput”

BS and MI

Xiao-Li Meng

Bootstrap

Replications

MultipleImputation

BayesJustification

Non-BayesianJustification

Self-efficiency

Characterizationsof Self-efficiency

ThreeScenarios

History

30 Years of Bootstrap and Multiple Imputation:Joint Replications vs Conditional Replications

Xiao-Li Meng

Department of Statistics, Harvard University

Thanks to Zhan Li and Xianchao Xie

Supplementing Efron (1994, JASA) & Discussion by Rubin

“Missing Data, Imputation, and the Bootstrap.”

Continuing Meng (1994, Statistical Science)

“Multiple-imputation Inferences With Uncongenial Sources ofInput”

BS and MI

Xiao-Li Meng

Bootstrap

Replications

MultipleImputation

BayesJustification

Non-BayesianJustification

Self-efficiency

Characterizationsof Self-efficiency

ThreeScenarios

History

Celebrating 30+ Years of Bootstrap (Efron, 1977,1979)

IEEE SIGNAL PROCESSING MAGAZINE [10] JULY 2007 1053-5888/07/$25.00©2007IEEE

[A tutorial for the signal processing practitioner]

This year marks the pearl anniversary of the bootstrap. It has been 30 years sinceBradley Efron’s 1977 Reitz lecture, published two years later in [1]. Today, bootstraptechniques are available as standard tools in several statistical software packages andare used to solve problems in a wide range of applications. There have also been sev-eral monographs written on the topic, such as [2], and several tutorial papers writ-

ten for a nonstatistical readership, including two for signal processing practitioners publishedin this magazine [4], [5].

Given the wealth of literature on the topic supported by solutions to practical problems, wewould expect the bootstrap to be an off-the-shelf tool for signal processing problems as are max-imum likelihood and least-squares methods. This is not the case, and we wonder why a signalprocessing practitioner would not resort to the bootstrap for inferential problems.

We may attribute the situation to some confusion when the engineer attempts to discoverthe bootstrap paradigm in an overwhelming body of statistical literature. To give an exampleand ignoring the two basic approaches of the bootstrap, i.e., the parametric and the nonpara-metric bootstrap [2], there is not only one bootstrap. Many variants of it exist, such as the smallbootstrap [6], the wild bootstrap [7], the naïve bootstrap (a name often given to the standardbootstrap resampling technique), the block (or moving block) bootstrap (see the chapter by Liuand Singh in [8]) and its extended circular block bootstrap version (see the chapter by Politisand Romano in [8]), and the iterated bootstrap [9]. Then there are derivatives such as theweighted bootstrap or the threshold bootstrap and some more recently introduced methodssuch as bootstrap bagging and bumping. Clearly, this wide spectrum of bootstrap variants maybe a hurdle for newcomers to this area.

[Abdelhak M. Zoubir and D. Robert Iskander]

BootstrapMethods and Applications

PROF. DR. KARL HEINRICH HOFMANN

BS and MI

Xiao-Li Meng

Bootstrap

Replications

MultipleImputation

BayesJustification

Non-BayesianJustification

Self-efficiency

Characterizationsof Self-efficiency

ThreeScenarios

History

Celebrating 30+ Years of Bootstrap (Efron, 1977,1979)

IEEE SIGNAL PROCESSING MAGAZINE [10] JULY 2007 1053-5888/07/$25.00©2007IEEE

[A tutorial for the signal processing practitioner]

This year marks the pearl anniversary of the bootstrap. It has been 30 years sinceBradley Efron’s 1977 Reitz lecture, published two years later in [1]. Today, bootstraptechniques are available as standard tools in several statistical software packages andare used to solve problems in a wide range of applications. There have also been sev-eral monographs written on the topic, such as [2], and several tutorial papers writ-

ten for a nonstatistical readership, including two for signal processing practitioners publishedin this magazine [4], [5].

Given the wealth of literature on the topic supported by solutions to practical problems, wewould expect the bootstrap to be an off-the-shelf tool for signal processing problems as are max-imum likelihood and least-squares methods. This is not the case, and we wonder why a signalprocessing practitioner would not resort to the bootstrap for inferential problems.

We may attribute the situation to some confusion when the engineer attempts to discoverthe bootstrap paradigm in an overwhelming body of statistical literature. To give an exampleand ignoring the two basic approaches of the bootstrap, i.e., the parametric and the nonpara-metric bootstrap [2], there is not only one bootstrap. Many variants of it exist, such as the smallbootstrap [6], the wild bootstrap [7], the naïve bootstrap (a name often given to the standardbootstrap resampling technique), the block (or moving block) bootstrap (see the chapter by Liuand Singh in [8]) and its extended circular block bootstrap version (see the chapter by Politisand Romano in [8]), and the iterated bootstrap [9]. Then there are derivatives such as theweighted bootstrap or the threshold bootstrap and some more recently introduced methodssuch as bootstrap bagging and bumping. Clearly, this wide spectrum of bootstrap variants maybe a hurdle for newcomers to this area.

[Abdelhak M. Zoubir and D. Robert Iskander]

BootstrapMethods and Applications

PROF. DR. KARL HEINRICH HOFMANN

BS and MI

Xiao-Li Meng

Bootstrap

Replications

MultipleImputation

BayesJustification

Non-BayesianJustification

Self-efficiency

Characterizationsof Self-efficiency

ThreeScenarios

History

The Greatest Statistical Magic

Estimate the errors in our estimate without knowing the truth!

Graph from Statistical Sleuth, by Ramsey and Schafer

BS and MI

Xiao-Li Meng

Bootstrap

Replications

MultipleImputation

BayesJustification

Non-BayesianJustification

Self-efficiency

Characterizationsof Self-efficiency

ThreeScenarios

History

The Greatest Statistical MagicEstimate the errors in our estimate without knowing the truth!

Graph from Statistical Sleuth, by Ramsey and Schafer

BS and MI

Xiao-Li Meng

Bootstrap

Replications

MultipleImputation

BayesJustification

Non-BayesianJustification

Self-efficiency

Characterizationsof Self-efficiency

ThreeScenarios

History

The Greatest Statistical MagicEstimate the errors in our estimate without knowing the truth!

Graph from Statistical Sleuth, by Ramsey and Schafer

BS and MI

Xiao-Li Meng

Bootstrap

Replications

MultipleImputation

BayesJustification

Non-BayesianJustification

Self-efficiency

Characterizationsof Self-efficiency

ThreeScenarios

History

Seeking Approximated i.i.d Replications (forSpecific Purposes)

Bootstrap: seeking approximated i.i.d replications in the data

Observations are approximately i.i.d.

Residuals can be regarded as i.i.d.

“Data blocks” can be viewed as i.i.d.

Once bootstrap replications are created, inference is easy; e.g.,

V[θ(Y )

]=

1

B − 1

B∑b=1

[θ(Y ∗b)− θ∗]2

Multiple Imputation (MI): seeking i.i.d replications of themissing data conditioning on the observed data

MIs are created from posterior predictive sampling:

Y(1)mis , . . . ,Y

(m)mis

i.i.d.∼ PM(Ymis |Yobs)

BS and MI

Xiao-Li Meng

Bootstrap

Replications

MultipleImputation

BayesJustification

Non-BayesianJustification

Self-efficiency

Characterizationsof Self-efficiency

ThreeScenarios

History

Seeking Approximated i.i.d Replications (forSpecific Purposes)

Bootstrap: seeking approximated i.i.d replications in the data

Observations are approximately i.i.d.

Residuals can be regarded as i.i.d.

“Data blocks” can be viewed as i.i.d.

Once bootstrap replications are created, inference is easy; e.g.,

V[θ(Y )

]=

1

B − 1

B∑b=1

[θ(Y ∗b)− θ∗]2

Multiple Imputation (MI): seeking i.i.d replications of themissing data conditioning on the observed data

MIs are created from posterior predictive sampling:

Y(1)mis , . . . ,Y

(m)mis

i.i.d.∼ PM(Ymis |Yobs)

BS and MI

Xiao-Li Meng

Bootstrap

Replications

MultipleImputation

BayesJustification

Non-BayesianJustification

Self-efficiency

Characterizationsof Self-efficiency

ThreeScenarios

History

Seeking Approximated i.i.d Replications (forSpecific Purposes)

Bootstrap: seeking approximated i.i.d replications in the data

Observations are approximately i.i.d.

Residuals can be regarded as i.i.d.

“Data blocks” can be viewed as i.i.d.

Once bootstrap replications are created, inference is easy; e.g.,

V[θ(Y )

]=

1

B − 1

B∑b=1

[θ(Y ∗b)− θ∗]2

Multiple Imputation (MI): seeking i.i.d replications of themissing data conditioning on the observed data

MIs are created from posterior predictive sampling:

Y(1)mis , . . . ,Y

(m)mis

i.i.d.∼ PM(Ymis |Yobs)

BS and MI

Xiao-Li Meng

Bootstrap

Replications

MultipleImputation

BayesJustification

Non-BayesianJustification

Self-efficiency

Characterizationsof Self-efficiency

ThreeScenarios

History

Seeking Approximated i.i.d Replications (forSpecific Purposes)

Bootstrap: seeking approximated i.i.d replications in the data

Observations are approximately i.i.d.

Residuals can be regarded as i.i.d.

“Data blocks” can be viewed as i.i.d.

Once bootstrap replications are created, inference is easy; e.g.,

V[θ(Y )

]=

1

B − 1

B∑b=1

[θ(Y ∗b)− θ∗]2

Multiple Imputation (MI): seeking i.i.d replications of themissing data conditioning on the observed data

MIs are created from posterior predictive sampling:

Y(1)mis , . . . ,Y

(m)mis

i.i.d.∼ PM(Ymis |Yobs)

BS and MI

Xiao-Li Meng

Bootstrap

Replications

MultipleImputation

BayesJustification

Non-BayesianJustification

Self-efficiency

Characterizationsof Self-efficiency

ThreeScenarios

History

Seeking Approximated i.i.d Replications (forSpecific Purposes)

Bootstrap: seeking approximated i.i.d replications in the data

Observations are approximately i.i.d.

Residuals can be regarded as i.i.d.

“Data blocks” can be viewed as i.i.d.

Once bootstrap replications are created, inference is easy; e.g.,

V[θ(Y )

]=

1

B − 1

B∑b=1

[θ(Y ∗b)− θ∗]2

Multiple Imputation (MI): seeking i.i.d replications of themissing data conditioning on the observed data

MIs are created from posterior predictive sampling:

Y(1)mis , . . . ,Y

(m)mis

i.i.d.∼ PM(Ymis |Yobs)

BS and MI

Xiao-Li Meng

Bootstrap

Replications

MultipleImputation

BayesJustification

Non-BayesianJustification

Self-efficiency

Characterizationsof Self-efficiency

ThreeScenarios

History

Multiple Imputation (Rubin, 1978, 1987)

MI creates m “complete-data” sets, via posterior prediction:

Y(`)com = {Yobs ,Y

(`)mis}, ` = 1, . . . ,m

Step I: Perform m complete-data analyses:

θ` ≡ θ(Y(`)com), and U` ≡ U(Y

(`)com), ` = 1, . . . ,m

Step II: Use Rubin’s Combining Rules:

Point Estimator

θm = 1m

∑m`=1 θ`

Variance Estimator

Tm = Um +(1 + 1

m

)Bm

Within-Imputation Var

Um = 1m

∑m`=1 U`

Between-Imputation Var

Bm = 1m−1

∑m`=1 (θ`−θm)(θ`−θm)>

BS and MI

Xiao-Li Meng

Bootstrap

Replications

MultipleImputation

BayesJustification

Non-BayesianJustification

Self-efficiency

Characterizationsof Self-efficiency

ThreeScenarios

History

Multiple Imputation (Rubin, 1978, 1987)

MI creates m “complete-data” sets, via posterior prediction:

Y(`)com = {Yobs ,Y

(`)mis}, ` = 1, . . . ,m

Step I: Perform m complete-data analyses:

θ` ≡ θ(Y(`)com), and U` ≡ U(Y

(`)com), ` = 1, . . . ,m

Step II: Use Rubin’s Combining Rules:

Point Estimator

θm = 1m

∑m`=1 θ`

Variance Estimator

Tm = Um +(1 + 1

m

)Bm

Within-Imputation Var

Um = 1m

∑m`=1 U`

Between-Imputation Var

Bm = 1m−1

∑m`=1 (θ`−θm)(θ`−θm)>

BS and MI

Xiao-Li Meng

Bootstrap

Replications

MultipleImputation

BayesJustification

Non-BayesianJustification

Self-efficiency

Characterizationsof Self-efficiency

ThreeScenarios

History

Multiple Imputation (Rubin, 1978, 1987)

MI creates m “complete-data” sets, via posterior prediction:

Y(`)com = {Yobs ,Y

(`)mis}, ` = 1, . . . ,m

Step I: Perform m complete-data analyses:

θ` ≡ θ(Y(`)com), and U` ≡ U(Y

(`)com), ` = 1, . . . ,m

Step II: Use Rubin’s Combining Rules:

Point Estimator

θm = 1m

∑m`=1 θ`

Variance Estimator

Tm = Um +(1 + 1

m

)Bm

Within-Imputation Var

Um = 1m

∑m`=1 U`

Between-Imputation Var

Bm = 1m−1

∑m`=1 (θ`−θm)(θ`−θm)>

BS and MI

Xiao-Li Meng

Bootstrap

Replications

MultipleImputation

BayesJustification

Non-BayesianJustification

Self-efficiency

Characterizationsof Self-efficiency

ThreeScenarios

History

Multiple Imputation (Rubin, 1978, 1987)

MI creates m “complete-data” sets, via posterior prediction:

Y(`)com = {Yobs ,Y

(`)mis}, ` = 1, . . . ,m

Step I: Perform m complete-data analyses:

θ` ≡ θ(Y(`)com), and U` ≡ U(Y

(`)com), ` = 1, . . . ,m

Step II: Use Rubin’s Combining Rules:

Point Estimator

θm = 1m

∑m`=1 θ`

Variance Estimator

Tm = Um

+(1 + 1

m

)Bm

Within-Imputation Var

Um = 1m

∑m`=1 U`

Between-Imputation Var

Bm = 1m−1

∑m`=1 (θ`−θm)(θ`−θm)>

BS and MI

Xiao-Li Meng

Bootstrap

Replications

MultipleImputation

BayesJustification

Non-BayesianJustification

Self-efficiency

Characterizationsof Self-efficiency

ThreeScenarios

History

Multiple Imputation (Rubin, 1978, 1987)

MI creates m “complete-data” sets, via posterior prediction:

Y(`)com = {Yobs ,Y

(`)mis}, ` = 1, . . . ,m

Step I: Perform m complete-data analyses:

θ` ≡ θ(Y(`)com), and U` ≡ U(Y

(`)com), ` = 1, . . . ,m

Step II: Use Rubin’s Combining Rules:

Point Estimator

θm = 1m

∑m`=1 θ`

Variance Estimator

Tm = Um +(1 + 1

m

)Bm

Within-Imputation Var

Um = 1m

∑m`=1 U`

Between-Imputation Var

Bm = 1m−1

∑m`=1 (θ`−θm)(θ`−θm)>

BS and MI

Xiao-Li Meng

Bootstrap

Replications

MultipleImputation

BayesJustification

Non-BayesianJustification

Self-efficiency

Characterizationsof Self-efficiency

ThreeScenarios

History

Multiple Imputation (Rubin, 1978, 1987)

MI creates m “complete-data” sets, via posterior prediction:

Y(`)com = {Yobs ,Y

(`)mis}, ` = 1, . . . ,m

Step I: Perform m complete-data analyses:

θ` ≡ θ(Y(`)com), and U` ≡ U(Y

(`)com), ` = 1, . . . ,m

Step II: Use Rubin’s Combining Rules:

Point Estimator

θm = 1m

∑m`=1 θ`

Variance Estimator

Tm = Um +(1 + 1

m

)Bm

Within-Imputation Var

Um = 1m

∑m`=1 U`

Between-Imputation Var

Bm = 1m−1

∑m`=1 (θ`−θm)(θ`−θm)>

BS and MI

Xiao-Li Meng

Bootstrap

Replications

MultipleImputation

BayesJustification

Non-BayesianJustification

Self-efficiency

Characterizationsof Self-efficiency

ThreeScenarios

History

Multiple Imputation (Rubin, 1978, 1987)

MI creates m “complete-data” sets, via posterior prediction:

Y(`)com = {Yobs ,Y

(`)mis}, ` = 1, . . . ,m

Step I: Perform m complete-data analyses:

θ` ≡ θ(Y(`)com), and U` ≡ U(Y

(`)com), ` = 1, . . . ,m

Step II: Use Rubin’s Combining Rules:

Point Estimator

θm = 1m

∑m`=1 θ`

Variance Estimator

Tm = Um +(1 + 1

m

)Bm

Within-Imputation Var

Um = 1m

∑m`=1 U`

Between-Imputation Var

Bm = 1m−1

∑m`=1 (θ`−θm)(θ`−θm)>

BS and MI

Xiao-Li Meng

Bootstrap

Replications

MultipleImputation

BayesJustification

Non-BayesianJustification

Self-efficiency

Characterizationsof Self-efficiency

ThreeScenarios

History

A Debate on MI Variance Combining Rule

Obvious: When imputation model is (really) wrong,nothing will/should work.

Less Obvious: Even when the imputation model iscorrect, Tm may not provide consistent estimator ofV(θm) (Fay 1991), because of

Uncongeniality: (Meng, 1994)

The analysis procedure a user adopts may not be compatiblewith the imputation model.

In the context of public-use data files:

Analysis procedure is often frequentist based, focusingon a few variables of interests.

Imputation model is typically constructed Bayesainly,using as many variables as possible; some are confidential.

BS and MI

Xiao-Li Meng

Bootstrap

Replications

MultipleImputation

BayesJustification

Non-BayesianJustification

Self-efficiency

Characterizationsof Self-efficiency

ThreeScenarios

History

A Debate on MI Variance Combining Rule

Obvious: When imputation model is (really) wrong,nothing will/should work.

Less Obvious: Even when the imputation model iscorrect, Tm may not provide consistent estimator ofV(θm) (Fay 1991), because of

Uncongeniality: (Meng, 1994)

The analysis procedure a user adopts may not be compatiblewith the imputation model.

In the context of public-use data files:

Analysis procedure is often frequentist based, focusingon a few variables of interests.

Imputation model is typically constructed Bayesainly,using as many variables as possible; some are confidential.

BS and MI

Xiao-Li Meng

Bootstrap

Replications

MultipleImputation

BayesJustification

Non-BayesianJustification

Self-efficiency

Characterizationsof Self-efficiency

ThreeScenarios

History

A Debate on MI Variance Combining Rule

Obvious: When imputation model is (really) wrong,nothing will/should work.

Less Obvious: Even when the imputation model iscorrect, Tm may not provide consistent estimator ofV(θm) (Fay 1991), because of

Uncongeniality: (Meng, 1994)

The analysis procedure a user adopts may not be compatiblewith the imputation model.

In the context of public-use data files:

Analysis procedure is often frequentist based, focusingon a few variables of interests.

Imputation model is typically constructed Bayesainly,using as many variables as possible; some are confidential.

BS and MI

Xiao-Li Meng

Bootstrap

Replications

MultipleImputation

BayesJustification

Non-BayesianJustification

Self-efficiency

Characterizationsof Self-efficiency

ThreeScenarios

History

A Debate on MI Variance Combining Rule

Obvious: When imputation model is (really) wrong,nothing will/should work.

Less Obvious: Even when the imputation model iscorrect, Tm may not provide consistent estimator ofV(θm) (Fay 1991), because of

Uncongeniality: (Meng, 1994)

The analysis procedure a user adopts may not be compatiblewith the imputation model.

In the context of public-use data files:

Analysis procedure is often frequentist based, focusingon a few variables of interests.

Imputation model is typically constructed Bayesainly,using as many variables as possible; some are confidential.

BS and MI

Xiao-Li Meng

Bootstrap

Replications

MultipleImputation

BayesJustification

Non-BayesianJustification

Self-efficiency

Characterizationsof Self-efficiency

ThreeScenarios

History

A Debate on MI Variance Combining Rule

Obvious: When imputation model is (really) wrong,nothing will/should work.

Less Obvious: Even when the imputation model iscorrect, Tm may not provide consistent estimator ofV(θm) (Fay 1991), because of

Uncongeniality: (Meng, 1994)

The analysis procedure a user adopts may not be compatiblewith the imputation model.

In the context of public-use data files:

Analysis procedure is often frequentist based, focusingon a few variables of interests.

Imputation model is typically constructed Bayesainly,using as many variables as possible; some are confidential.

BS and MI

Xiao-Li Meng

Bootstrap

Replications

MultipleImputation

BayesJustification

Non-BayesianJustification

Self-efficiency

Characterizationsof Self-efficiency

ThreeScenarios

History

The Most Rigorous Justification under Congeniality

Two Key Assumptions:

The complete-data analysis procedure is a Bayesian one:

θ(Ycom) = EA(θ|Ycom); U(Ycom) = VA(θ|Ycom)

The imputer’s model and analysis model are congenial:

PM(Ymis |Yobs) = PA(Ymis |Yobs), ∀ Ymis

Then for θm as m→∞

θ∞ = EM[θ(Ycom)

∣∣Yobs

]= EM

[EA(θ|Ycom)

∣∣Yobs

]= EA

[EA(θ|Ycom)

∣∣Yobs

]= EA(θ|Yobs)

BS and MI

Xiao-Li Meng

Bootstrap

Replications

MultipleImputation

BayesJustification

Non-BayesianJustification

Self-efficiency

Characterizationsof Self-efficiency

ThreeScenarios

History

The Most Rigorous Justification under Congeniality

Two Key Assumptions:

The complete-data analysis procedure is a Bayesian one:

θ(Ycom) = EA(θ|Ycom); U(Ycom) = VA(θ|Ycom)

The imputer’s model and analysis model are congenial:

PM(Ymis |Yobs) = PA(Ymis |Yobs), ∀ Ymis

Then for θm as m→∞

θ∞ = EM[θ(Ycom)

∣∣Yobs

]= EM

[EA(θ|Ycom)

∣∣Yobs

]= EA

[EA(θ|Ycom)

∣∣Yobs

]= EA(θ|Yobs)

BS and MI

Xiao-Li Meng

Bootstrap

Replications

MultipleImputation

BayesJustification

Non-BayesianJustification

Self-efficiency

Characterizationsof Self-efficiency

ThreeScenarios

History

The Most Rigorous Justification under Congeniality

Two Key Assumptions:

The complete-data analysis procedure is a Bayesian one:

θ(Ycom) = EA(θ|Ycom); U(Ycom) = VA(θ|Ycom)

The imputer’s model and analysis model are congenial:

PM(Ymis |Yobs) = PA(Ymis |Yobs), ∀ Ymis

Then for θm as m→∞

θ∞ = EM[θ(Ycom)

∣∣Yobs

]= EM

[EA(θ|Ycom)

∣∣Yobs

]

= EA[EA(θ|Ycom)

∣∣Yobs

]= EA(θ|Yobs)

BS and MI

Xiao-Li Meng

Bootstrap

Replications

MultipleImputation

BayesJustification

Non-BayesianJustification

Self-efficiency

Characterizationsof Self-efficiency

ThreeScenarios

History

The Most Rigorous Justification under Congeniality

Two Key Assumptions:

The complete-data analysis procedure is a Bayesian one:

θ(Ycom) = EA(θ|Ycom); U(Ycom) = VA(θ|Ycom)

The imputer’s model and analysis model are congenial:

PM(Ymis |Yobs) = PA(Ymis |Yobs), ∀ Ymis

Then for θm as m→∞

θ∞ = EM[θ(Ycom)

∣∣Yobs

]= EM

[EA(θ|Ycom)

∣∣Yobs

]= EA

[EA(θ|Ycom)

∣∣Yobs

]= EA(θ|Yobs)

BS and MI

Xiao-Li Meng

Bootstrap

Replications

MultipleImputation

BayesJustification

Non-BayesianJustification

Self-efficiency

Characterizationsof Self-efficiency

ThreeScenarios

History

Bayesian Justification of Rubin’s Variance Rule

For Tm = Um + Bm as m→∞, by the “EVE’s Law”

U∞ + B∞ = EM [U(Ycom)|Yobs ] + VM [θ(Ycom)|Yobs ]

= EM [VA(θ|Ycom)|Yobs ] + VM [EA(θ|Ycom)|Yobs ]

= EA[VA(θ|Ycom)|Yobs ] + VA[EA(θ|Ycom)|Yobs ]

= VA(θ|Yobs)

Therefore, as m→∞, {θm,Tm} reproduces the correctBayesian analysis under the analyst’s model given Yobs

θ∞ = EA(θ|Yobs), T∞ = VA(θ|Yobs)

So far so good, but what if

(I) The analyst is not a Bayesian?

(II) The analyst’s procedure and imputer’s model areuncongenial?

BS and MI

Xiao-Li Meng

Bootstrap

Replications

MultipleImputation

BayesJustification

Non-BayesianJustification

Self-efficiency

Characterizationsof Self-efficiency

ThreeScenarios

History

Bayesian Justification of Rubin’s Variance Rule

For Tm = Um + Bm as m→∞, by the “EVE’s Law”

U∞ + B∞ = EM [U(Ycom)|Yobs ] + VM [θ(Ycom)|Yobs ]

= EM [VA(θ|Ycom)|Yobs ] + VM [EA(θ|Ycom)|Yobs ]

= EA[VA(θ|Ycom)|Yobs ] + VA[EA(θ|Ycom)|Yobs ]

= VA(θ|Yobs)

Therefore, as m→∞, {θm,Tm} reproduces the correctBayesian analysis under the analyst’s model given Yobs

θ∞ = EA(θ|Yobs), T∞ = VA(θ|Yobs)

So far so good, but what if

(I) The analyst is not a Bayesian?

(II) The analyst’s procedure and imputer’s model areuncongenial?

BS and MI

Xiao-Li Meng

Bootstrap

Replications

MultipleImputation

BayesJustification

Non-BayesianJustification

Self-efficiency

Characterizationsof Self-efficiency

ThreeScenarios

History

Bayesian Justification of Rubin’s Variance Rule

For Tm = Um + Bm as m→∞, by the “EVE’s Law”

U∞ + B∞ = EM [U(Ycom)|Yobs ] + VM [θ(Ycom)|Yobs ]

= EM [VA(θ|Ycom)|Yobs ] + VM [EA(θ|Ycom)|Yobs ]

= EA[VA(θ|Ycom)|Yobs ] + VA[EA(θ|Ycom)|Yobs ]

= VA(θ|Yobs)

Therefore, as m→∞, {θm,Tm} reproduces the correctBayesian analysis under the analyst’s model given Yobs

θ∞ = EA(θ|Yobs), T∞ = VA(θ|Yobs)

So far so good, but what if

(I) The analyst is not a Bayesian?

(II) The analyst’s procedure and imputer’s model areuncongenial?

BS and MI

Xiao-Li Meng

Bootstrap

Replications

MultipleImputation

BayesJustification

Non-BayesianJustification

Self-efficiency

Characterizationsof Self-efficiency

ThreeScenarios

History

Bayesian Justification of Rubin’s Variance Rule

For Tm = Um + Bm as m→∞, by the “EVE’s Law”

U∞ + B∞ = EM [U(Ycom)|Yobs ] + VM [θ(Ycom)|Yobs ]

= EM [VA(θ|Ycom)|Yobs ] + VM [EA(θ|Ycom)|Yobs ]

= EA[VA(θ|Ycom)|Yobs ] + VA[EA(θ|Ycom)|Yobs ]

= VA(θ|Yobs)

Therefore, as m→∞, {θm,Tm} reproduces the correctBayesian analysis under the analyst’s model given Yobs

θ∞ = EA(θ|Yobs), T∞ = VA(θ|Yobs)

So far so good, but what if

(I) The analyst is not a Bayesian?

(II) The analyst’s procedure and imputer’s model areuncongenial?

BS and MI

Xiao-Li Meng

Bootstrap

Replications

MultipleImputation

BayesJustification

Non-BayesianJustification

Self-efficiency

Characterizationsof Self-efficiency

ThreeScenarios

History

Bayesian Justification of Rubin’s Variance Rule

For Tm = Um + Bm as m→∞, by the “EVE’s Law”

U∞ + B∞ = EM [U(Ycom)|Yobs ] + VM [θ(Ycom)|Yobs ]

= EM [VA(θ|Ycom)|Yobs ] + VM [EA(θ|Ycom)|Yobs ]

= EA[VA(θ|Ycom)|Yobs ] + VA[EA(θ|Ycom)|Yobs ]

= VA(θ|Yobs)

Therefore, as m→∞, {θm,Tm} reproduces the correctBayesian analysis under the analyst’s model given Yobs

θ∞ = EA(θ|Yobs), T∞ = VA(θ|Yobs)

So far so good, but what if

(I) The analyst is not a Bayesian?

(II) The analyst’s procedure and imputer’s model areuncongenial?

BS and MI

Xiao-Li Meng

Bootstrap

Replications

MultipleImputation

BayesJustification

Non-BayesianJustification

Self-efficiency

Characterizationsof Self-efficiency

ThreeScenarios

History

Bayesian Justification of Rubin’s Variance Rule

For Tm = Um + Bm as m→∞, by the “EVE’s Law”

U∞ + B∞ = EM [U(Ycom)|Yobs ] + VM [θ(Ycom)|Yobs ]

= EM [VA(θ|Ycom)|Yobs ] + VM [EA(θ|Ycom)|Yobs ]

= EA[VA(θ|Ycom)|Yobs ] + VA[EA(θ|Ycom)|Yobs ]

= VA(θ|Yobs)

Therefore, as m→∞, {θm,Tm} reproduces the correctBayesian analysis under the analyst’s model given Yobs

θ∞ = EA(θ|Yobs), T∞ = VA(θ|Yobs)

So far so good, but what if

(I) The analyst is not a Bayesian?

(II) The analyst’s procedure and imputer’s model areuncongenial?

BS and MI

Xiao-Li Meng

Bootstrap

Replications

MultipleImputation

BayesJustification

Non-BayesianJustification

Self-efficiency

Characterizationsof Self-efficiency

ThreeScenarios

History

Bayesian Justification of Rubin’s Variance Rule

For Tm = Um + Bm as m→∞, by the “EVE’s Law”

U∞ + B∞ = EM [U(Ycom)|Yobs ] + VM [θ(Ycom)|Yobs ]

= EM [VA(θ|Ycom)|Yobs ] + VM [EA(θ|Ycom)|Yobs ]

= EA[VA(θ|Ycom)|Yobs ] + VA[EA(θ|Ycom)|Yobs ]

= VA(θ|Yobs)

Therefore, as m→∞, {θm,Tm} reproduces the correctBayesian analysis under the analyst’s model given Yobs

θ∞ = EA(θ|Yobs), T∞ = VA(θ|Yobs)

So far so good, but what if

(I) The analyst is not a Bayesian?

(II) The analyst’s procedure and imputer’s model areuncongenial?

BS and MI

Xiao-Li Meng

Bootstrap

Replications

MultipleImputation

BayesJustification

Non-BayesianJustification

Self-efficiency

Characterizationsof Self-efficiency

ThreeScenarios

History

ANOVA: The Criticality of Orthogonality

Rubin’s variance rule T∞ = U∞ + B∞ relies on

E[θobs − θ]2 = E[θcom − θ]2 + E[θobs − θcom]2 (1)

where E is either with respect to P(θ,Ycom|Yobs) or P(Ycom|θ).

Equation (1) holds if and only if (θcom − θ) ⊥ (θobs − θcom):

Cov[θcom − θ, θobs − θcom

]= 0 (2)

For Bayesian inference under congeniality, this is automatic:

Cov [E(θ|Ycom)− θ, E(θ|Yobs)− E(θ|Ycom) | Yobs ] = 0

BS and MI

Xiao-Li Meng

Bootstrap

Replications

MultipleImputation

BayesJustification

Non-BayesianJustification

Self-efficiency

Characterizationsof Self-efficiency

ThreeScenarios

History

ANOVA: The Criticality of Orthogonality

Rubin’s variance rule T∞ = U∞ + B∞ relies on

E[θobs − θ]2 = E[θcom − θ]2 + E[θobs − θcom]2 (1)

where E is either with respect to P(θ,Ycom|Yobs) or P(Ycom|θ).

Equation (1) holds if and only if (θcom − θ) ⊥ (θobs − θcom):

Cov[θcom − θ, θobs − θcom

]= 0 (2)

For Bayesian inference under congeniality, this is automatic:

Cov [E(θ|Ycom)− θ, E(θ|Yobs)− E(θ|Ycom) | Yobs ] = 0

BS and MI

Xiao-Li Meng

Bootstrap

Replications

MultipleImputation

BayesJustification

Non-BayesianJustification

Self-efficiency

Characterizationsof Self-efficiency

ThreeScenarios

History

ANOVA: The Criticality of Orthogonality

Rubin’s variance rule T∞ = U∞ + B∞ relies on

E[θobs − θ]2 = E[θcom − θ]2 + E[θobs − θcom]2 (1)

where E is either with respect to P(θ,Ycom|Yobs) or P(Ycom|θ).

Equation (1) holds if and only if (θcom − θ) ⊥ (θobs − θcom):

Cov[θcom − θ, θobs − θcom

]= 0 (2)

For Bayesian inference under congeniality, this is automatic:

Cov [E(θ|Ycom)− θ, E(θ|Yobs)− E(θ|Ycom) | Yobs ] = 0

BS and MI

Xiao-Li Meng

Bootstrap

Replications

MultipleImputation

BayesJustification

Non-BayesianJustification

Self-efficiency

Characterizationsof Self-efficiency

ThreeScenarios

History

It Holds More Generally ...

Asymptotically (in n, not m) for MLE due to its full efficiency

V[θMLEobs |θ] = V[θMLE

com |θ] + V[θMLEobs − θMLE

com |θ] + o(n−1).

For design-based Inference, it is a consequence of unbiasedness

Let I = {Ii , i = 1, . . .N}: Ii = 1 if ith unit is sampled, 0otherwise; n =

∑Ni Ii . (N is the population size)

Let Ri = 1 if Yi is observed and 0 otherwise;R = {Ri , i = 1, . . . ,N} (Y = {Y1, . . . ,YN})Then θcom = θ(I ; Y) e.g., θcom =

∑Ni=1 IiYi/n

θobs = θ(I ,R; Y) e.g., θobs =∑N

i=1 Ri IiYi/∑N

i=1 Ri Ii

If E [θ(I ,R; Y)|I ] = θ(I ; Y), then

Cov[θ(I ; Y)− θ(Y), θ(I ,R; Y)− θ(I ; Y)

]= 0

BS and MI

Xiao-Li Meng

Bootstrap

Replications

MultipleImputation

BayesJustification

Non-BayesianJustification

Self-efficiency

Characterizationsof Self-efficiency

ThreeScenarios

History

It Holds More Generally ...

Asymptotically (in n, not m) for MLE due to its full efficiency

V[θMLEobs |θ] = V[θMLE

com |θ] + V[θMLEobs − θMLE

com |θ] + o(n−1).

For design-based Inference, it is a consequence of unbiasedness

Let I = {Ii , i = 1, . . .N}: Ii = 1 if ith unit is sampled, 0otherwise; n =

∑Ni Ii . (N is the population size)

Let Ri = 1 if Yi is observed and 0 otherwise;R = {Ri , i = 1, . . . ,N} (Y = {Y1, . . . ,YN})Then θcom = θ(I ; Y) e.g., θcom =

∑Ni=1 IiYi/n

θobs = θ(I ,R; Y) e.g., θobs =∑N

i=1 Ri IiYi/∑N

i=1 Ri Ii

If E [θ(I ,R; Y)|I ] = θ(I ; Y), then

Cov[θ(I ; Y)− θ(Y), θ(I ,R; Y)− θ(I ; Y)

]= 0

BS and MI

Xiao-Li Meng

Bootstrap

Replications

MultipleImputation

BayesJustification

Non-BayesianJustification

Self-efficiency

Characterizationsof Self-efficiency

ThreeScenarios

History

It Holds More Generally ...

Asymptotically (in n, not m) for MLE due to its full efficiency

V[θMLEobs |θ] = V[θMLE

com |θ] + V[θMLEobs − θMLE

com |θ] + o(n−1).

For design-based Inference, it is a consequence of unbiasedness

Let I = {Ii , i = 1, . . .N}: Ii = 1 if ith unit is sampled, 0otherwise; n =

∑Ni Ii . (N is the population size)

Let Ri = 1 if Yi is observed and 0 otherwise;R = {Ri , i = 1, . . . ,N} (Y = {Y1, . . . ,YN})

Then θcom = θ(I ; Y) e.g., θcom =∑N

i=1 IiYi/n

θobs = θ(I ,R; Y) e.g., θobs =∑N

i=1 Ri IiYi/∑N

i=1 Ri Ii

If E [θ(I ,R; Y)|I ] = θ(I ; Y), then

Cov[θ(I ; Y)− θ(Y), θ(I ,R; Y)− θ(I ; Y)

]= 0

BS and MI

Xiao-Li Meng

Bootstrap

Replications

MultipleImputation

BayesJustification

Non-BayesianJustification

Self-efficiency

Characterizationsof Self-efficiency

ThreeScenarios

History

It Holds More Generally ...

Asymptotically (in n, not m) for MLE due to its full efficiency

V[θMLEobs |θ] = V[θMLE

com |θ] + V[θMLEobs − θMLE

com |θ] + o(n−1).

For design-based Inference, it is a consequence of unbiasedness

Let I = {Ii , i = 1, . . .N}: Ii = 1 if ith unit is sampled, 0otherwise; n =

∑Ni Ii . (N is the population size)

Let Ri = 1 if Yi is observed and 0 otherwise;R = {Ri , i = 1, . . . ,N} (Y = {Y1, . . . ,YN})Then θcom = θ(I ; Y) e.g., θcom =

∑Ni=1 IiYi/n

θobs = θ(I ,R; Y) e.g., θobs =∑N

i=1 Ri IiYi/∑N

i=1 Ri Ii

If E [θ(I ,R; Y)|I ] = θ(I ; Y), then

Cov[θ(I ; Y)− θ(Y), θ(I ,R; Y)− θ(I ; Y)

]= 0

BS and MI

Xiao-Li Meng

Bootstrap

Replications

MultipleImputation

BayesJustification

Non-BayesianJustification

Self-efficiency

Characterizationsof Self-efficiency

ThreeScenarios

History

It Holds More Generally ...

Asymptotically (in n, not m) for MLE due to its full efficiency

V[θMLEobs |θ] = V[θMLE

com |θ] + V[θMLEobs − θMLE

com |θ] + o(n−1).

For design-based Inference, it is a consequence of unbiasedness

Let I = {Ii , i = 1, . . .N}: Ii = 1 if ith unit is sampled, 0otherwise; n =

∑Ni Ii . (N is the population size)

Let Ri = 1 if Yi is observed and 0 otherwise;R = {Ri , i = 1, . . . ,N} (Y = {Y1, . . . ,YN})Then θcom = θ(I ; Y) e.g., θcom =

∑Ni=1 IiYi/n

θobs = θ(I ,R; Y) e.g., θobs =∑N

i=1 Ri IiYi/∑N

i=1 Ri Ii

If E [θ(I ,R; Y)|I ] = θ(I ; Y), then

Cov[θ(I ; Y)− θ(Y), θ(I ,R; Y)− θ(I ; Y)

]= 0

BS and MI

Xiao-Li Meng

Bootstrap

Replications

MultipleImputation

BayesJustification

Non-BayesianJustification

Self-efficiency

Characterizationsof Self-efficiency

ThreeScenarios

History

It Holds More Generally ...

Asymptotically (in n, not m) for MLE due to its full efficiency

V[θMLEobs |θ] = V[θMLE

com |θ] + V[θMLEobs − θMLE

com |θ] + o(n−1).

For design-based Inference, it is a consequence of unbiasedness

Let I = {Ii , i = 1, . . .N}: Ii = 1 if ith unit is sampled, 0otherwise; n =

∑Ni Ii . (N is the population size)

Let Ri = 1 if Yi is observed and 0 otherwise;R = {Ri , i = 1, . . . ,N} (Y = {Y1, . . . ,YN})Then θcom = θ(I ; Y) e.g., θcom =

∑Ni=1 IiYi/n

θobs = θ(I ,R; Y) e.g., θobs =∑N

i=1 Ri IiYi/∑N

i=1 Ri Ii

If E [θ(I ,R; Y)|I ] = θ(I ; Y), then

Cov[θ(I ; Y)− θ(Y), θ(I ,R; Y)− θ(I ; Y)

]= 0

BS and MI

Xiao-Li Meng

Bootstrap

Replications

MultipleImputation

BayesJustification

Non-BayesianJustification

Self-efficiency

Characterizationsof Self-efficiency

ThreeScenarios

History

But It Does Not Hold Arbitrarily ...

We have to assume the user’s complete-data analysisprocedure is valid (e.g., consistent; unbiased).

However, it will NOT be realistic to assume that it is fullyefficient (e.g., likelihood inference; Bayesian inference).

But it cannot be arbitrary either for Rubin’s VarianceCombining Rule to work, because

For Total V = With-Imput. V + Between-Impu. V to hold, i.e.,

V[θ(Yobs)] = V[θ(Ycom)]+Increased Var Due to Missing Data,

we minimally need to require θ(·) to satisfy

V[θ(Yobs)] ≥ V[θ(Ycom)].

Wait! Is this Obvious???

BS and MI

Xiao-Li Meng

Bootstrap

Replications

MultipleImputation

BayesJustification

Non-BayesianJustification

Self-efficiency

Characterizationsof Self-efficiency

ThreeScenarios

History

But It Does Not Hold Arbitrarily ...

We have to assume the user’s complete-data analysisprocedure is valid (e.g., consistent; unbiased).

However, it will NOT be realistic to assume that it is fullyefficient (e.g., likelihood inference; Bayesian inference).

But it cannot be arbitrary either for Rubin’s VarianceCombining Rule to work, because

For Total V = With-Imput. V + Between-Impu. V to hold, i.e.,

V[θ(Yobs)] = V[θ(Ycom)]+Increased Var Due to Missing Data,

we minimally need to require θ(·) to satisfy

V[θ(Yobs)] ≥ V[θ(Ycom)].

Wait! Is this Obvious???

BS and MI

Xiao-Li Meng

Bootstrap

Replications

MultipleImputation

BayesJustification

Non-BayesianJustification

Self-efficiency

Characterizationsof Self-efficiency

ThreeScenarios

History

But It Does Not Hold Arbitrarily ...

We have to assume the user’s complete-data analysisprocedure is valid (e.g., consistent; unbiased).

However, it will NOT be realistic to assume that it is fullyefficient (e.g., likelihood inference; Bayesian inference).

But it cannot be arbitrary either for Rubin’s VarianceCombining Rule to work, because

For Total V = With-Imput. V + Between-Impu. V to hold, i.e.,

V[θ(Yobs)] = V[θ(Ycom)]+Increased Var Due to Missing Data,

we minimally need to require θ(·) to satisfy

V[θ(Yobs)] ≥ V[θ(Ycom)].

Wait! Is this Obvious???

BS and MI

Xiao-Li Meng

Bootstrap

Replications

MultipleImputation

BayesJustification

Non-BayesianJustification

Self-efficiency

Characterizationsof Self-efficiency

ThreeScenarios

History

But It Does Not Hold Arbitrarily ...

We have to assume the user’s complete-data analysisprocedure is valid (e.g., consistent; unbiased).

However, it will NOT be realistic to assume that it is fullyefficient (e.g., likelihood inference; Bayesian inference).

But it cannot be arbitrary either for Rubin’s VarianceCombining Rule to work, because

For Total V = With-Imput. V + Between-Impu. V to hold, i.e.,

V[θ(Yobs)] = V[θ(Ycom)]+Increased Var Due to Missing Data,

we minimally need to require θ(·) to satisfy

V[θ(Yobs)] ≥ V[θ(Ycom)].

Wait! Is this Obvious???

BS and MI

Xiao-Li Meng

Bootstrap

Replications

MultipleImputation

BayesJustification

Non-BayesianJustification

Self-efficiency

Characterizationsof Self-efficiency

ThreeScenarios

History

But It Does Not Hold Arbitrarily ...

We have to assume the user’s complete-data analysisprocedure is valid (e.g., consistent; unbiased).

However, it will NOT be realistic to assume that it is fullyefficient (e.g., likelihood inference; Bayesian inference).

But it cannot be arbitrary either for Rubin’s VarianceCombining Rule to work, because

For Total V = With-Imput. V + Between-Impu. V to hold, i.e.,

V[θ(Yobs)] = V[θ(Ycom)]+Increased Var Due to Missing Data,

we minimally need to require θ(·) to satisfy

V[θ(Yobs)] ≥ V[θ(Ycom)].

Wait! Is this Obvious???

BS and MI

Xiao-Li Meng

Bootstrap

Replications

MultipleImputation

BayesJustification

Non-BayesianJustification

Self-efficiency

Characterizationsof Self-efficiency

ThreeScenarios

History

But It Does Not Hold Arbitrarily ...

We have to assume the user’s complete-data analysisprocedure is valid (e.g., consistent; unbiased).

However, it will NOT be realistic to assume that it is fullyefficient (e.g., likelihood inference; Bayesian inference).

But it cannot be arbitrary either for Rubin’s VarianceCombining Rule to work, because

For Total V = With-Imput. V + Between-Impu. V to hold, i.e.,

V[θ(Yobs)] = V[θ(Ycom)]+Increased Var Due to Missing Data,

we minimally need to require θ(·) to satisfy

V[θ(Yobs)] ≥ V[θ(Ycom)].

Wait! Is this Obvious???

BS and MI

Xiao-Li Meng

Bootstrap

Replications

MultipleImputation

BayesJustification

Non-BayesianJustification

Self-efficiency

Characterizationsof Self-efficiency

ThreeScenarios

History

But It Does Not Hold Arbitrarily ...

We have to assume the user’s complete-data analysisprocedure is valid (e.g., consistent; unbiased).

However, it will NOT be realistic to assume that it is fullyefficient (e.g., likelihood inference; Bayesian inference).

But it cannot be arbitrary either for Rubin’s VarianceCombining Rule to work, because

For Total V = With-Imput. V + Between-Impu. V to hold, i.e.,

V[θ(Yobs)] = V[θ(Ycom)]+Increased Var Due to Missing Data,

we minimally need to require θ(·) to satisfy

V[θ(Yobs)] ≥ V[θ(Ycom)].

Wait! Is this Obvious???

BS and MI

Xiao-Li Meng

Bootstrap

Replications

MultipleImputation

BayesJustification

Non-BayesianJustification

Self-efficiency

Characterizationsof Self-efficiency

ThreeScenarios

History

Surprise: Even the Least-squares Estimator ...

A Heteroscedastic Regression Model

Yi = βXi + εi , εi ∼ N(0, σ2X ηi ), i = 1, . . . , n

Least-squares estimator:

βcom =

∑ni=1 XiYi∑n

i=1 X 2i

“Sandwich” estimator of var:

Ucom =

∑ni=1 X 2

i (Yi − Xi βcom)2

[∑n

i=1 X 2i ]

2

But βcom does not have the “obvious property” when η 6= 0:

V(βcom|X , θ) = σ2

∑ni=1 X 2+η

i[∑ni=1 X 2

i

]2Compare, when η = 0:

V(βcom|X , θ) = σ2 1∑ni=1 X 2

i

BS and MI

Xiao-Li Meng

Bootstrap

Replications

MultipleImputation

BayesJustification

Non-BayesianJustification

Self-efficiency

Characterizationsof Self-efficiency

ThreeScenarios

History

Surprise: Even the Least-squares Estimator ...

A Heteroscedastic Regression Model

Yi = βXi + εi , εi ∼ N(0, σ2X ηi ), i = 1, . . . , n

Least-squares estimator:

βcom =

∑ni=1 XiYi∑n

i=1 X 2i

“Sandwich” estimator of var:

Ucom =

∑ni=1 X 2

i (Yi − Xi βcom)2

[∑n

i=1 X 2i ]

2

But βcom does not have the “obvious property” when η 6= 0:

V(βcom|X , θ) = σ2

∑ni=1 X 2+η

i[∑ni=1 X 2

i

]2Compare, when η = 0:

V(βcom|X , θ) = σ2 1∑ni=1 X 2

i

BS and MI

Xiao-Li Meng

Bootstrap

Replications

MultipleImputation

BayesJustification

Non-BayesianJustification

Self-efficiency

Characterizationsof Self-efficiency

ThreeScenarios

History

Surprise: Even the Least-squares Estimator ...

A Heteroscedastic Regression Model

Yi = βXi + εi , εi ∼ N(0, σ2X ηi ), i = 1, . . . , n

Least-squares estimator:

βcom =

∑ni=1 XiYi∑n

i=1 X 2i

“Sandwich” estimator of var:

Ucom =

∑ni=1 X 2

i (Yi − Xi βcom)2

[∑n

i=1 X 2i ]

2

But βcom does not have the “obvious property” when η 6= 0:

V(βcom|X , θ) = σ2

∑ni=1 X 2+η

i[∑ni=1 X 2

i

]2Compare, when η = 0:

V(βcom|X , θ) = σ2 1∑ni=1 X 2

i

BS and MI

Xiao-Li Meng

Bootstrap

Replications

MultipleImputation

BayesJustification

Non-BayesianJustification

Self-efficiency

Characterizationsof Self-efficiency

ThreeScenarios

History

Surprise: Even the Least-squares Estimator ...

A Heteroscedastic Regression Model

Yi = βXi + εi , εi ∼ N(0, σ2X ηi ), i = 1, . . . , n

Least-squares estimator:

βcom =

∑ni=1 XiYi∑n

i=1 X 2i

“Sandwich” estimator of var:

Ucom =

∑ni=1 X 2

i (Yi − Xi βcom)2

[∑n

i=1 X 2i ]

2

But βcom does not have the “obvious property” when η 6= 0:

V(βcom|X , θ) = σ2

∑ni=1 X 2+η

i[∑ni=1 X 2

i

]2

Compare, when η = 0:

V(βcom|X , θ) = σ2 1∑ni=1 X 2

i

BS and MI

Xiao-Li Meng

Bootstrap

Replications

MultipleImputation

BayesJustification

Non-BayesianJustification

Self-efficiency

Characterizationsof Self-efficiency

ThreeScenarios

History

Surprise: Even the Least-squares Estimator ...

A Heteroscedastic Regression Model

Yi = βXi + εi , εi ∼ N(0, σ2X ηi ), i = 1, . . . , n

Least-squares estimator:

βcom =

∑ni=1 XiYi∑n

i=1 X 2i

“Sandwich” estimator of var:

Ucom =

∑ni=1 X 2

i (Yi − Xi βcom)2

[∑n

i=1 X 2i ]

2

But βcom does not have the “obvious property” when η 6= 0:

V(βcom|X , θ) = σ2

∑ni=1 X 2+η

i[∑ni=1 X 2

i

]2Compare, when η = 0:

V(βcom|X , θ) = σ2 1∑ni=1 X 2

i

BS and MI

Xiao-Li Meng

Bootstrap

Replications

MultipleImputation

BayesJustification

Non-BayesianJustification

Self-efficiency

Characterizationsof Self-efficiency

ThreeScenarios

History

From Robins and Wang (2000, Biometrika)

BS and MI

Xiao-Li Meng

Bootstrap

Replications

MultipleImputation

BayesJustification

Non-BayesianJustification

Self-efficiency

Characterizationsof Self-efficiency

ThreeScenarios

History

Please Throw Away Some Data ...

Weighting the Heteroscedastic Regression Model: Wi = X−η/2i

WiYi = β(WiXi ) + εi , εi ∼ N(0, σ2), i = 1, . . . , n

βMLE =

∑ni=1 X 1−η

i Yi∑ni=1 X 2−η

i

V(βMLE |X , θ) = σ2 1∑ni=1 X 2−η

i

So it is justifiable to throw away some data points if you don’tknow how to use them most effectively because

When the optimal Wi ’s have large variation, setting small Wi ’sto zero better approximates the optimal weighting scheme than“blindly” using equal weights.

BS and MI

Xiao-Li Meng

Bootstrap

Replications

MultipleImputation

BayesJustification

Non-BayesianJustification

Self-efficiency

Characterizationsof Self-efficiency

ThreeScenarios

History

Please Throw Away Some Data ...

Weighting the Heteroscedastic Regression Model: Wi = X−η/2i

WiYi = β(WiXi ) + εi , εi ∼ N(0, σ2), i = 1, . . . , n

βMLE =

∑ni=1 X 1−η

i Yi∑ni=1 X 2−η

i

V(βMLE |X , θ) = σ2 1∑ni=1 X 2−η

i

So it is justifiable to throw away some data points if you don’tknow how to use them most effectively because

When the optimal Wi ’s have large variation, setting small Wi ’sto zero better approximates the optimal weighting scheme than“blindly” using equal weights.

BS and MI

Xiao-Li Meng

Bootstrap

Replications

MultipleImputation

BayesJustification

Non-BayesianJustification

Self-efficiency

Characterizationsof Self-efficiency

ThreeScenarios

History

Please Throw Away Some Data ...

Weighting the Heteroscedastic Regression Model: Wi = X−η/2i

WiYi = β(WiXi ) + εi , εi ∼ N(0, σ2), i = 1, . . . , n

βMLE =

∑ni=1 X 1−η

i Yi∑ni=1 X 2−η

i

V(βMLE |X , θ) = σ2 1∑ni=1 X 2−η

i

So it is justifiable to throw away some data points if you don’tknow how to use them most effectively because

When the optimal Wi ’s have large variation, setting small Wi ’sto zero better approximates the optimal weighting scheme than“blindly” using equal weights.

BS and MI

Xiao-Li Meng

Bootstrap

Replications

MultipleImputation

BayesJustification

Non-BayesianJustification

Self-efficiency

Characterizationsof Self-efficiency

ThreeScenarios

History

Please Throw Away Some Data ...

Weighting the Heteroscedastic Regression Model: Wi = X−η/2i

WiYi = β(WiXi ) + εi , εi ∼ N(0, σ2), i = 1, . . . , n

βMLE =

∑ni=1 X 1−η

i Yi∑ni=1 X 2−η

i

V(βMLE |X , θ) = σ2 1∑ni=1 X 2−η

i

So it is justifiable to throw away some data points if you don’tknow how to use them most effectively because

When the optimal Wi ’s have large variation, setting small Wi ’sto zero better approximates the optimal weighting scheme than“blindly” using equal weights.

BS and MI

Xiao-Li Meng

Bootstrap

Replications

MultipleImputation

BayesJustification

Non-BayesianJustification

Self-efficiency

Characterizationsof Self-efficiency

ThreeScenarios

History

Please Throw Away Some Data ...

Weighting the Heteroscedastic Regression Model: Wi = X−η/2i

WiYi = β(WiXi ) + εi , εi ∼ N(0, σ2), i = 1, . . . , n

βMLE =

∑ni=1 X 1−η

i Yi∑ni=1 X 2−η

i

V(βMLE |X , θ) = σ2 1∑ni=1 X 2−η

i

So it is justifiable to throw away some data points if you don’tknow how to use them most effectively because

When the optimal Wi ’s have large variation, setting small Wi ’sto zero better approximates the optimal weighting scheme than“blindly” using equal weights.

BS and MI

Xiao-Li Meng

Bootstrap

Replications

MultipleImputation

BayesJustification

Non-BayesianJustification

Self-efficiency

Characterizationsof Self-efficiency

ThreeScenarios

History

Don’t sue me because I’m not the first offender ...

BS and MI

Xiao-Li Meng

Bootstrap

Replications

MultipleImputation

BayesJustification

Non-BayesianJustification

Self-efficiency

Characterizationsof Self-efficiency

ThreeScenarios

History

Need a Bit More: Self-efficiency

Definition of Self-efficiency (Meng, 1994):

Let Wc be a data set, and Wo be a subset of Wc created by aselection mechanism. A statistical estimation procedure θ(·) forθ is said to be self-efficient (with respect to the selectionmechanism) if there is no λ ∈ (−∞,∞) such that themean-squared error of λθ(Wo) + (1− λ)θ(Wc) is less than thatof θ(Wc).

Self-efficiency is a weaker requirement than full-efficiency(e.g., MLE, Bayesian procedures), but can be violated incommon practice. (Had I known that back in 1994 ...)

Under the assumption of Congeniality, self-efficiency is asufficient and necessary condition for Rubin’s variance ruleto hold (Meng, 1994).

BS and MI

Xiao-Li Meng

Bootstrap

Replications

MultipleImputation

BayesJustification

Non-BayesianJustification

Self-efficiency

Characterizationsof Self-efficiency

ThreeScenarios

History

Need a Bit More: Self-efficiency

Definition of Self-efficiency (Meng, 1994):

Let Wc be a data set, and Wo be a subset of Wc created by aselection mechanism. A statistical estimation procedure θ(·) forθ is said to be self-efficient (with respect to the selectionmechanism) if there is no λ ∈ (−∞,∞) such that themean-squared error of λθ(Wo) + (1− λ)θ(Wc) is less than thatof θ(Wc).

Self-efficiency is a weaker requirement than full-efficiency(e.g., MLE, Bayesian procedures), but can be violated incommon practice. (Had I known that back in 1994 ...)

Under the assumption of Congeniality, self-efficiency is asufficient and necessary condition for Rubin’s variance ruleto hold (Meng, 1994).

BS and MI

Xiao-Li Meng

Bootstrap

Replications

MultipleImputation

BayesJustification

Non-BayesianJustification

Self-efficiency

Characterizationsof Self-efficiency

ThreeScenarios

History

Need a Bit More: Self-efficiency

Definition of Self-efficiency (Meng, 1994):

Let Wc be a data set, and Wo be a subset of Wc created by aselection mechanism. A statistical estimation procedure θ(·) forθ is said to be self-efficient (with respect to the selectionmechanism) if there is no λ ∈ (−∞,∞) such that themean-squared error of λθ(Wo) + (1− λ)θ(Wc) is less than thatof θ(Wc).

Self-efficiency is a weaker requirement than full-efficiency(e.g., MLE, Bayesian procedures), but can be violated incommon practice. (Had I known that back in 1994 ...)

Under the assumption of Congeniality, self-efficiency is asufficient and necessary condition for Rubin’s variance ruleto hold (Meng, 1994).

BS and MI

Xiao-Li Meng

Bootstrap

Replications

MultipleImputation

BayesJustification

Non-BayesianJustification

Self-efficiency

Characterizationsof Self-efficiency

ThreeScenarios

History

A Geometrical Characterization of Self-efficiency

A Picture of Orthogonality

Pythagoras Identity

E [θ(Yobs)− θ]2 = E [θ(Ycom)− θ]2 + E [θ(Yobs)− θ(Ycom)]2

Or, equivalently:

Cov[θ(Ycom)− θ, θ(Yobs)− θ(Ycom)] = 0

BS and MI

Xiao-Li Meng

Bootstrap

Replications

MultipleImputation

BayesJustification

Non-BayesianJustification

Self-efficiency

Characterizationsof Self-efficiency

ThreeScenarios

History

A Geometrical Characterization of Self-efficiency

A Picture of Orthogonality

Pythagoras Identity

E [θ(Yobs)− θ]2 = E [θ(Ycom)− θ]2 + E [θ(Yobs)− θ(Ycom)]2

Or, equivalently:

Cov[θ(Ycom)− θ, θ(Yobs)− θ(Ycom)] = 0

BS and MI

Xiao-Li Meng

Bootstrap

Replications

MultipleImputation

BayesJustification

Non-BayesianJustification

Self-efficiency

Characterizationsof Self-efficiency

ThreeScenarios

History

A Geometrical Characterization of Self-efficiency

A Picture of Orthogonality

Pythagoras Identity

E [θ(Yobs)− θ]2 = E [θ(Ycom)− θ]2 + E [θ(Yobs)− θ(Ycom)]2

Or, equivalently:

Cov[θ(Ycom)− θ, θ(Yobs)− θ(Ycom)] = 0

BS and MI

Xiao-Li Meng

Bootstrap

Replications

MultipleImputation

BayesJustification

Non-BayesianJustification

Self-efficiency

Characterizationsof Self-efficiency

ThreeScenarios

History

Self-efficient Estimating Equations (Xie and Meng,2010; for all the rest of the results)

Standardized Estimating Equation

Let the estimators θ(Ycom) and θ(Yobs) be derived from thestandardized estimating equations

Scom(Ycom; θ) = 0 and Sobs(Yobs ; θ) = 0,

which satisfies

E

(∂Scom

∂θ

)= E

(∂Sobs

∂θ

)= I

and certain regularity conditions.

A Characterization

The estimating procedure θ(·) is (asymptotically) self-efficientif and only if (asymptotically)

Cov(Scom,Sobs) = Var(Scom)

or, equivalently, “the regression coefficient”

B = Var(Scom)−1Cov(Scom,Sobs) = I .

BS and MI

Xiao-Li Meng

Bootstrap

Replications

MultipleImputation

BayesJustification

Non-BayesianJustification

Self-efficiency

Characterizationsof Self-efficiency

ThreeScenarios

History

Self-efficient Estimating Equations (Xie and Meng,2010; for all the rest of the results)

Standardized Estimating Equation

Let the estimators θ(Ycom) and θ(Yobs) be derived from thestandardized estimating equations

Scom(Ycom; θ) = 0 and Sobs(Yobs ; θ) = 0,

which satisfies

E

(∂Scom

∂θ

)= E

(∂Sobs

∂θ

)= I

and certain regularity conditions.

A Characterization

The estimating procedure θ(·) is (asymptotically) self-efficientif and only if (asymptotically)

Cov(Scom, Sobs) = Var(Scom)

or, equivalently, “the regression coefficient”

B = Var(Scom)−1Cov(Scom,Sobs) = I .

BS and MI

Xiao-Li Meng

Bootstrap

Replications

MultipleImputation

BayesJustification

Non-BayesianJustification

Self-efficiency

Characterizationsof Self-efficiency

ThreeScenarios

History

Self-efficient Estimating Equations (Xie and Meng,2010; for all the rest of the results)

Standardized Estimating Equation

Let the estimators θ(Ycom) and θ(Yobs) be derived from thestandardized estimating equations

Scom(Ycom; θ) = 0 and Sobs(Yobs ; θ) = 0,

which satisfies

E

(∂Scom

∂θ

)= E

(∂Sobs

∂θ

)= I

and certain regularity conditions.

A Characterization

The estimating procedure θ(·) is (asymptotically) self-efficientif and only if (asymptotically)

Cov(Scom, Sobs) = Var(Scom)

or, equivalently, “the regression coefficient”

B = Var(Scom)−1Cov(Scom,Sobs) = I .

BS and MI

Xiao-Li Meng

Bootstrap

Replications

MultipleImputation

BayesJustification

Non-BayesianJustification

Self-efficiency

Characterizationsof Self-efficiency

ThreeScenarios

History

Examples of Self-efficient Procedures

Holds for Arbitrary Pattern of the Observed Data

Maximum Likelihood Estimators

Bayes Estimators

Holds for “Regular Pattern” of the Observed Data

Let the complete data Ycom be an i.i.d. sequence and Yobs bea regular subset of it, i.e.,

Ycom = (Y1, · · · ,Yn)

Yobs = (Yi1 , · · · ,Yim) .

Then any estimating equation having the form

S(Ycom; θ) =n∑

i=1

g(Yi ; θ)

is self-efficient.

BS and MI

Xiao-Li Meng

Bootstrap

Replications

MultipleImputation

BayesJustification

Non-BayesianJustification

Self-efficiency

Characterizationsof Self-efficiency

ThreeScenarios

History

Examples of Self-efficient Procedures

Holds for Arbitrary Pattern of the Observed Data

Maximum Likelihood Estimators

Bayes Estimators

Holds for “Regular Pattern” of the Observed Data

Let the complete data Ycom be an i.i.d. sequence and Yobs bea regular subset of it, i.e.,

Ycom = (Y1, · · · ,Yn)

Yobs = (Yi1 , · · · ,Yim) .

Then any estimating equation having the form

S(Ycom; θ) =n∑

i=1

g(Yi ; θ)

is self-efficient.

BS and MI

Xiao-Li Meng

Bootstrap

Replications

MultipleImputation

BayesJustification

Non-BayesianJustification

Self-efficiency

Characterizationsof Self-efficiency

ThreeScenarios

History

When the Analyst Assumes More

The Relationship Among the Three Modelers

����

�����

�����

Main Result

Under the above scenario:

Rubin’s variance estimator is consistent (or confidenceproper), i.e. T∞ = V (θ∞) asymptotically, if the analyst’sestimating procedure is self-efficient.

The MI estimator θ∞ is less efficient than the analyst’sobserved-data estimator θA

obs : V (θ∞) ≥ V (θAobs).

BS and MI

Xiao-Li Meng

Bootstrap

Replications

MultipleImputation

BayesJustification

Non-BayesianJustification

Self-efficiency

Characterizationsof Self-efficiency

ThreeScenarios

History

When the Analyst Assumes More

The Relationship Among the Three Modelers

����

�����

�����

Main Result

Under the above scenario:

Rubin’s variance estimator is consistent (or confidenceproper), i.e. T∞ = V (θ∞) asymptotically, if the analyst’sestimating procedure is self-efficient.

The MI estimator θ∞ is less efficient than the analyst’sobserved-data estimator θA

obs : V (θ∞) ≥ V (θAobs).

BS and MI

Xiao-Li Meng

Bootstrap

Replications

MultipleImputation

BayesJustification

Non-BayesianJustification

Self-efficiency

Characterizationsof Self-efficiency

ThreeScenarios

History

When the Analyst Assumes More

The Relationship Among the Three Modelers

����

�����

�����

Main Result

Under the above scenario:

Rubin’s variance estimator is consistent (or confidenceproper), i.e. T∞ = V (θ∞) asymptotically, if the analyst’sestimating procedure is self-efficient.

The MI estimator θ∞ is less efficient than the analyst’sobserved-data estimator θA

obs : V (θ∞) ≥ V (θAobs).

BS and MI

Xiao-Li Meng

Bootstrap

Replications

MultipleImputation

BayesJustification

Non-BayesianJustification

Self-efficiency

Characterizationsof Self-efficiency

ThreeScenarios

History

When the Imputer Assumes More

The Relationship Among the Three Modelers

����

�����

�����

Under the above scenario:

Rubin’s variance estimator is confidence valid and theanalyst’s observed-data interval estimator is inadmissible:

V(θAobs) ≥ T∞ ≥ V(θ∞)

assuming self-efficiency and the parameter θ is a scalar.

θ∞ is more efficient than θAobs : V(θA

obs) ≥ V(θ∞).

BS and MI

Xiao-Li Meng

Bootstrap

Replications

MultipleImputation

BayesJustification

Non-BayesianJustification

Self-efficiency

Characterizationsof Self-efficiency

ThreeScenarios

History

When the Imputer Assumes More

The Relationship Among the Three Modelers

����

�����

�����

Under the above scenario:

Rubin’s variance estimator is confidence valid and theanalyst’s observed-data interval estimator is inadmissible:

V(θAobs) ≥ T∞ ≥ V(θ∞)

assuming self-efficiency and the parameter θ is a scalar.

θ∞ is more efficient than θAobs : V(θA

obs) ≥ V(θ∞).

BS and MI

Xiao-Li Meng

Bootstrap

Replications

MultipleImputation

BayesJustification

Non-BayesianJustification

Self-efficiency

Characterizationsof Self-efficiency

ThreeScenarios

History

When the Imputer Assumes More

The Relationship Among the Three Modelers

����

�����

�����

Under the above scenario:

Rubin’s variance estimator is confidence valid and theanalyst’s observed-data interval estimator is inadmissible:

V(θAobs) ≥ T∞ ≥ V(θ∞)

assuming self-efficiency and the parameter θ is a scalar.

θ∞ is more efficient than θAobs : V(θA

obs) ≥ V(θ∞).

BS and MI

Xiao-Li Meng

Bootstrap

Replications

MultipleImputation

BayesJustification

Non-BayesianJustification

Self-efficiency

Characterizationsof Self-efficiency

ThreeScenarios

History

A Grand Challenge: Non-nested Cases

The Relationship among the Three Parties

��������� �����

There is no general theoretical guarantee!

But we can use conservative approximations

V(θ∞) = V((I − F )θAobs + FK θM

obs)

[Sd((I − F )θA

obs) + Sd(FK θMobs)]2

≤(

U1/2

∞ + B1/2∞

)2

, scalar;

2(

Var((I − F )θAobs) + Var(FK θM

obs))≤ 2(U∞ + B∞), vector.

BS and MI

Xiao-Li Meng

Bootstrap

Replications

MultipleImputation

BayesJustification

Non-BayesianJustification

Self-efficiency

Characterizationsof Self-efficiency

ThreeScenarios

History

A Grand Challenge: Non-nested Cases

The Relationship among the Three Parties

��������� �����

There is no general theoretical guarantee!

But we can use conservative approximations

V(θ∞) = V((I − F )θAobs + FK θM

obs)

[Sd((I − F )θA

obs) + Sd(FK θMobs)]2

≤(

U1/2

∞ + B1/2∞

)2

, scalar;

2(

Var((I − F )θAobs) + Var(FK θM

obs))≤ 2(U∞ + B∞), vector.

BS and MI

Xiao-Li Meng

Bootstrap

Replications

MultipleImputation

BayesJustification

Non-BayesianJustification

Self-efficiency

Characterizationsof Self-efficiency

ThreeScenarios

History

A Grand Challenge: Non-nested Cases

The Relationship among the Three Parties

��������� �����

There is no general theoretical guarantee!

But we can use conservative approximations

V(θ∞) = V((I − F )θAobs + FK θM

obs)

[Sd((I − F )θA

obs) + Sd(FK θMobs)]2

≤(

U1/2

∞ + B1/2∞

)2

, scalar;

2(

Var((I − F )θAobs) + Var(FK θM

obs))≤ 2(U∞ + B∞), vector.

BS and MI

Xiao-Li Meng

Bootstrap

Replications

MultipleImputation

BayesJustification

Non-BayesianJustification

Self-efficiency

Characterizationsof Self-efficiency

ThreeScenarios

History

Return to Comparisons: An Incomplete CitationStudy (by Zhan Li) (Noticing the Reporting Delayand Truncation)

Citations for Efron 1979Record count

0

50

100

150

200

250

1980

1982

1984

1986

1988

1990

1992

1994

1996

1998

2000

2002

2004

2006

2008

BS and MI

Xiao-Li Meng

Bootstrap

Replications

MultipleImputation

BayesJustification

Non-BayesianJustification

Self-efficiency

Characterizationsof Self-efficiency

ThreeScenarios

History

The Price for Quick Publication: No Citation Dataon Rubin (1978, ASA Proceedings)! Hence we use

Citations for “Multiple Imputation”

Record Count

0

20

40

60

80

100

120

140

160

180

1986

1988

1989

1990

1991

1992

1993

1994

1995

1996

1997

1998

1999

2000

2001

2002

2003

2004

2005

2006

2007

2008

2009

BS and MI

Xiao-Li Meng

Bootstrap

Replications

MultipleImputation

BayesJustification

Non-BayesianJustification

Self-efficiency

Characterizationsof Self-efficiency

ThreeScenarios

History

A Curious Phenomenon?

Citations for “Bootstrap” “bootstrap statistics”, “bootstrap estimating”, “bootstrap estimation”

“bootstrap inerval” “bootstrap bias” “bootstrap jackknife”, excluding “bootstrap equation”

Publication Year

0

50

100

150

200

250

300

350

400

450

500

1979

1981

1983

1985

1987

1989

1991

1993

1995

1997

1999

2001

2003

2005

2007

BS and MI

Xiao-Li Meng

Bootstrap

Replications

MultipleImputation

BayesJustification

Non-BayesianJustification

Self-efficiency

Characterizationsof Self-efficiency

ThreeScenarios

History

A Concluding Wisdom for Anyone Younger ThanMe

Be patient

Because there is a 12-year waiting period before fame ...

BS and MI

Xiao-Li Meng

Bootstrap

Replications

MultipleImputation

BayesJustification

Non-BayesianJustification

Self-efficiency

Characterizationsof Self-efficiency

ThreeScenarios

History

A Concluding Wisdom for Anyone Younger ThanMe

Be patient

Because there is a 12-year waiting period before fame ...