76
Statistical Methods for Analysis with Missing Data Lecture 16: pattern-mixture models (continued), sensitivity analysis Mauricio Sadinle Department of Biostatistics 1 / 33

Statistical Methods for Analysis with Missing Datafaculty.washington.edu › msadinle › teaching › UW-BIOST531...Statistical Methods for Analysis with Missing Data Lecture 16:

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Statistical Methods for Analysis with Missing Datafaculty.washington.edu › msadinle › teaching › UW-BIOST531...Statistical Methods for Analysis with Missing Data Lecture 16:

Statistical Methods for Analysis with Missing Data

Lecture 16: pattern-mixture models (continued), sensitivity analysis

Mauricio Sadinle

Department of Biostatistics

1 / 33

Page 2: Statistical Methods for Analysis with Missing Datafaculty.washington.edu › msadinle › teaching › UW-BIOST531...Statistical Methods for Analysis with Missing Data Lecture 16:

Previous Lecture

Introduction to

I The fundamental issue of non-identifiability

I General strategy for identification

I Pattern-mixture models

2 / 33

Page 3: Statistical Methods for Analysis with Missing Datafaculty.washington.edu › msadinle › teaching › UW-BIOST531...Statistical Methods for Analysis with Missing Data Lecture 16:

Today’s Lecture

I Common identifying assumptions for pattern-mixture models

I Reading: Chapter 6 of the lecture notes of Davidian and Tsiatis

I Itemwise conditionally independent nonresponse

I Properties of classes of full-data distributions

I Sensitivity analysis

I Reading: Chapter 7 of the lecture notes of Davidian and Tsiatis

3 / 33

Page 4: Statistical Methods for Analysis with Missing Datafaculty.washington.edu › msadinle › teaching › UW-BIOST531...Statistical Methods for Analysis with Missing Data Lecture 16:

Outline

Recap from Previous Lecture

Common Identifying Assumptions for Pattern-Mixture Models

Itemwise Conditionally Independent Nonresponse

Properties of Classes of Full-Data Distributions

Sensitivity Analysis

4 / 33

Page 5: Statistical Methods for Analysis with Missing Datafaculty.washington.edu › msadinle › teaching › UW-BIOST531...Statistical Methods for Analysis with Missing Data Lecture 16:

Moving Away from MAR?

I In which direction do we go??

I Remember: there is a universe of missing-data assumptions:

5 / 33

Page 6: Statistical Methods for Analysis with Missing Datafaculty.washington.edu › msadinle › teaching › UW-BIOST531...Statistical Methods for Analysis with Missing Data Lecture 16:

Identification Strategies

I Inference with missing data is impossible without identificationassumptions

I Identification strategies generally follow this structure:

p(z(r), r)

pA(z , r)

pA(z)

Identifying assumption A

Sum over r

6 / 33

Page 7: Statistical Methods for Analysis with Missing Datafaculty.washington.edu › msadinle › teaching › UW-BIOST531...Statistical Methods for Analysis with Missing Data Lecture 16:

Pattern-Mixture Models

I Pattern-mixture models (Little, JASA 1993) provide a transparentway of specifying missing data assumptions

I The pattern-mixture model factorization explicitly reveals:

p(z) =∑

r∈{0,1}K

needs identifying assumption︷ ︸︸ ︷p(z(r̄) | z(r), r) p(z(r) | r)p(r)︸ ︷︷ ︸

can be estimated from data

I Explicitly shows what needs identifying assumptions and what canbe obtained from data alone

I Identifying assumptions explicitly or implicitly amount toconstructing {p(z(r̄) | z(r), r)}r from {p(z(r), r)}r

7 / 33

Page 8: Statistical Methods for Analysis with Missing Datafaculty.washington.edu › msadinle › teaching › UW-BIOST531...Statistical Methods for Analysis with Missing Data Lecture 16:

Dropout in Longitudinal StudyIf missingness only comes from subjects dropping out

I Missingness patterns are uniquely summarized by the dropout time

D = 1 +T∑j=1

Rj

I The observed data are obtained as realizations of

(Z(D),D)

I If D = d , Z(d) = (Z1, . . . ,Zd−1) and Z(d̄) = (Zd , . . . ,ZT )

I Pattern-mixture model requires modeling the observed-datadistribution:

I p(D = d): simply take empirical frequency

I p(z(d) | D = d): depends on variables’ types

8 / 33

Page 9: Statistical Methods for Analysis with Missing Datafaculty.washington.edu › msadinle › teaching › UW-BIOST531...Statistical Methods for Analysis with Missing Data Lecture 16:

Identifying Assumptions for PMMs Under Dropout

I In general, how to obtain {p(z(d̄) | z(d), d)}d from {p(z(d), d)}d?

I Note that

p(z(d̄) | z(d), d) =T∏

`=d

p(z` | z(`), d)

I Example: for T = 3, we need to identify:

I If D = 3,

p(z(3̄) | z(3),D = 3) = p(z3 | z1, z2,D = 3)

I If D = 2,

p(z(2̄) | z(2),D = 2) = p(z2 | z1,D = 2)p(z3 | z1, z2,D = 2)

9 / 33

Page 10: Statistical Methods for Analysis with Missing Datafaculty.washington.edu › msadinle › teaching › UW-BIOST531...Statistical Methods for Analysis with Missing Data Lecture 16:

Identifying Assumptions for PMMs Under Dropout

I In general, how to obtain {p(z(d̄) | z(d), d)}d from {p(z(d), d)}d?

I Note that

p(z(d̄) | z(d), d) =T∏

`=d

p(z` | z(`), d)

I Example: for T = 3, we need to identify:

I If D = 3,

p(z(3̄) | z(3),D = 3) = p(z3 | z1, z2,D = 3)

I If D = 2,

p(z(2̄) | z(2),D = 2) = p(z2 | z1,D = 2)p(z3 | z1, z2,D = 2)

9 / 33

Page 11: Statistical Methods for Analysis with Missing Datafaculty.washington.edu › msadinle › teaching › UW-BIOST531...Statistical Methods for Analysis with Missing Data Lecture 16:

Identifying Assumptions for PMMs Under Dropout

I In general, how to obtain {p(z(d̄) | z(d), d)}d from {p(z(d), d)}d?

I Note that

p(z(d̄) | z(d), d) =T∏

`=d

p(z` | z(`), d)

I Example: for T = 3, we need to identify:

I If D = 3,

p(z(3̄) | z(3),D = 3) = p(z3 | z1, z2,D = 3)

I If D = 2,

p(z(2̄) | z(2),D = 2) = p(z2 | z1,D = 2)p(z3 | z1, z2,D = 2)

9 / 33

Page 12: Statistical Methods for Analysis with Missing Datafaculty.washington.edu › msadinle › teaching › UW-BIOST531...Statistical Methods for Analysis with Missing Data Lecture 16:

Outline

Recap from Previous Lecture

Common Identifying Assumptions for Pattern-Mixture Models

Itemwise Conditionally Independent Nonresponse

Properties of Classes of Full-Data Distributions

Sensitivity Analysis

10 / 33

Page 13: Statistical Methods for Analysis with Missing Datafaculty.washington.edu › msadinle › teaching › UW-BIOST531...Statistical Methods for Analysis with Missing Data Lecture 16:

The Complete-Case Identifying AssumptionLittle (JASA 1993) proposed to assume:

pCC (z` | z(`),D = d) ≡ p(z` | z(`),D = T + 1),

for all ` ≥ d , d = 1, . . . ,T .

I Distributions for D = T + 1 are identifiable from complete cases

I Example: for T = 3, we have:

I For D = 3,

pCC (z3 | z1, z2,D = 3) ≡ p(z3 | z1, z2,D = 4)

I If D = 2,

pCC (z2 | z1,D = 2) ≡ p(z2 | z1,D = 4)

pCC (z3 | z1, z2,D = 2) ≡ p(z3 | z1, z2,D = 4)

11 / 33

Page 14: Statistical Methods for Analysis with Missing Datafaculty.washington.edu › msadinle › teaching › UW-BIOST531...Statistical Methods for Analysis with Missing Data Lecture 16:

The Complete-Case Identifying AssumptionLittle (JASA 1993) proposed to assume:

pCC (z` | z(`),D = d) ≡ p(z` | z(`),D = T + 1),

for all ` ≥ d , d = 1, . . . ,T .

I Distributions for D = T + 1 are identifiable from complete cases

I Example: for T = 3, we have:

I For D = 3,

pCC (z3 | z1, z2,D = 3) ≡ p(z3 | z1, z2,D = 4)

I If D = 2,

pCC (z2 | z1,D = 2) ≡ p(z2 | z1,D = 4)

pCC (z3 | z1, z2,D = 2) ≡ p(z3 | z1, z2,D = 4)

11 / 33

Page 15: Statistical Methods for Analysis with Missing Datafaculty.washington.edu › msadinle › teaching › UW-BIOST531...Statistical Methods for Analysis with Missing Data Lecture 16:

The Complete-Case Identifying AssumptionLittle (JASA 1993) proposed to assume:

pCC (z` | z(`),D = d) ≡ p(z` | z(`),D = T + 1),

for all ` ≥ d , d = 1, . . . ,T .

I Distributions for D = T + 1 are identifiable from complete cases

I Example: for T = 3, we have:

I For D = 3,

pCC (z3 | z1, z2,D = 3) ≡ p(z3 | z1, z2,D = 4)

I If D = 2,

pCC (z2 | z1,D = 2) ≡ p(z2 | z1,D = 4)

pCC (z3 | z1, z2,D = 2) ≡ p(z3 | z1, z2,D = 4)

11 / 33

Page 16: Statistical Methods for Analysis with Missing Datafaculty.washington.edu › msadinle › teaching › UW-BIOST531...Statistical Methods for Analysis with Missing Data Lecture 16:

The Complete-Case Identifying AssumptionLittle (JASA 1993) proposed to assume:

pCC (z` | z(`),D = d) ≡ p(z` | z(`),D = T + 1),

for all ` ≥ d , d = 1, . . . ,T .

I Distributions for D = T + 1 are identifiable from complete cases

I Example: for T = 3, we have:

I For D = 3,

pCC (z3 | z1, z2,D = 3) ≡ p(z3 | z1, z2,D = 4)

I If D = 2,

pCC (z2 | z1,D = 2) ≡ p(z2 | z1,D = 4)

pCC (z3 | z1, z2,D = 2) ≡ p(z3 | z1, z2,D = 4)

11 / 33

Page 17: Statistical Methods for Analysis with Missing Datafaculty.washington.edu › msadinle › teaching › UW-BIOST531...Statistical Methods for Analysis with Missing Data Lecture 16:

The Complete-Case Identifying AssumptionLittle (JASA 1993) proposed to assume:

pCC (z` | z(`),D = d) ≡ p(z` | z(`),D = T + 1),

for all ` ≥ d , d = 1, . . . ,T .

I Distributions for D = T + 1 are identifiable from complete cases

I Example: for T = 3, we have:

I For D = 3,

pCC (z3 | z1, z2,D = 3) ≡ p(z3 | z1, z2,D = 4)

I If D = 2,

pCC (z2 | z1,D = 2) ≡ p(z2 | z1,D = 4)

pCC (z3 | z1, z2,D = 2) ≡ p(z3 | z1, z2,D = 4)

11 / 33

Page 18: Statistical Methods for Analysis with Missing Datafaculty.washington.edu › msadinle › teaching › UW-BIOST531...Statistical Methods for Analysis with Missing Data Lecture 16:

The Neighboring-Case Identifying Assumption

The extrapolation distributions could also be obtained from the closestdropout pattern where ` is available:

pNC (z` | z(`),D = d) ≡ p(z` | z(`),D = `+ 1),

for all ` ≥ d , d = 1, . . . ,T .

I Among observations with D = `+ 1 we get to observe z` and z(`)

I We could think that observations with D = `+ 1 are the best forbasing extrapolation of the values of Z`

I For example, among observations where Z` is available, those whodropout at time ` + 1 might be the most similar to those thatdropout at time `

I HW4: say T = 3, write down this restriction for ` ≥ d , d = 1, 2, 3.

12 / 33

Page 19: Statistical Methods for Analysis with Missing Datafaculty.washington.edu › msadinle › teaching › UW-BIOST531...Statistical Methods for Analysis with Missing Data Lecture 16:

The Neighboring-Case Identifying Assumption

The extrapolation distributions could also be obtained from the closestdropout pattern where ` is available:

pNC (z` | z(`),D = d) ≡ p(z` | z(`),D = `+ 1),

for all ` ≥ d , d = 1, . . . ,T .

I Among observations with D = `+ 1 we get to observe z` and z(`)

I We could think that observations with D = `+ 1 are the best forbasing extrapolation of the values of Z`

I For example, among observations where Z` is available, those whodropout at time ` + 1 might be the most similar to those thatdropout at time `

I HW4: say T = 3, write down this restriction for ` ≥ d , d = 1, 2, 3.

12 / 33

Page 20: Statistical Methods for Analysis with Missing Datafaculty.washington.edu › msadinle › teaching › UW-BIOST531...Statistical Methods for Analysis with Missing Data Lecture 16:

The Neighboring-Case Identifying Assumption

The extrapolation distributions could also be obtained from the closestdropout pattern where ` is available:

pNC (z` | z(`),D = d) ≡ p(z` | z(`),D = `+ 1),

for all ` ≥ d , d = 1, . . . ,T .

I Among observations with D = `+ 1 we get to observe z` and z(`)

I We could think that observations with D = `+ 1 are the best forbasing extrapolation of the values of Z`

I For example, among observations where Z` is available, those whodropout at time ` + 1 might be the most similar to those thatdropout at time `

I HW4: say T = 3, write down this restriction for ` ≥ d , d = 1, 2, 3.

12 / 33

Page 21: Statistical Methods for Analysis with Missing Datafaculty.washington.edu › msadinle › teaching › UW-BIOST531...Statistical Methods for Analysis with Missing Data Lecture 16:

The Neighboring-Case Identifying Assumption

The extrapolation distributions could also be obtained from the closestdropout pattern where ` is available:

pNC (z` | z(`),D = d) ≡ p(z` | z(`),D = `+ 1),

for all ` ≥ d , d = 1, . . . ,T .

I Among observations with D = `+ 1 we get to observe z` and z(`)

I We could think that observations with D = `+ 1 are the best forbasing extrapolation of the values of Z`

I For example, among observations where Z` is available, those whodropout at time ` + 1 might be the most similar to those thatdropout at time `

I HW4: say T = 3, write down this restriction for ` ≥ d , d = 1, 2, 3.

12 / 33

Page 22: Statistical Methods for Analysis with Missing Datafaculty.washington.edu › msadinle › teaching › UW-BIOST531...Statistical Methods for Analysis with Missing Data Lecture 16:

The Neighboring-Case Identifying Assumption

The extrapolation distributions could also be obtained from the closestdropout pattern where ` is available:

pNC (z` | z(`),D = d) ≡ p(z` | z(`),D = `+ 1),

for all ` ≥ d , d = 1, . . . ,T .

I Among observations with D = `+ 1 we get to observe z` and z(`)

I We could think that observations with D = `+ 1 are the best forbasing extrapolation of the values of Z`

I For example, among observations where Z` is available, those whodropout at time ` + 1 might be the most similar to those thatdropout at time `

I HW4: say T = 3, write down this restriction for ` ≥ d , d = 1, 2, 3.

12 / 33

Page 23: Statistical Methods for Analysis with Missing Datafaculty.washington.edu › msadinle › teaching › UW-BIOST531...Statistical Methods for Analysis with Missing Data Lecture 16:

The Available-Case Identifying Assumption

Here, the extrapolation distributions are obtained from all available caseswhere ` is available:

pAC (z` | z(`),D = d) ≡ p(z` | z(`),D > `),

for all ` ≥ d , d = 1, . . . ,T .

I Among all observations with D > ` we get to observe z` and z(`)

I We could think that this approach maximizes the use of availableinformation for basing extrapolation of the values of Z`

I HW4: say T = 3, write down this restriction for ` ≥ d , d = 1, 2, 3.

I HW4: under monotone nonresponse, the AC assumption isequivalent to MAR

13 / 33

Page 24: Statistical Methods for Analysis with Missing Datafaculty.washington.edu › msadinle › teaching › UW-BIOST531...Statistical Methods for Analysis with Missing Data Lecture 16:

The Available-Case Identifying Assumption

Here, the extrapolation distributions are obtained from all available caseswhere ` is available:

pAC (z` | z(`),D = d) ≡ p(z` | z(`),D > `),

for all ` ≥ d , d = 1, . . . ,T .

I Among all observations with D > ` we get to observe z` and z(`)

I We could think that this approach maximizes the use of availableinformation for basing extrapolation of the values of Z`

I HW4: say T = 3, write down this restriction for ` ≥ d , d = 1, 2, 3.

I HW4: under monotone nonresponse, the AC assumption isequivalent to MAR

13 / 33

Page 25: Statistical Methods for Analysis with Missing Datafaculty.washington.edu › msadinle › teaching › UW-BIOST531...Statistical Methods for Analysis with Missing Data Lecture 16:

The Available-Case Identifying Assumption

Here, the extrapolation distributions are obtained from all available caseswhere ` is available:

pAC (z` | z(`),D = d) ≡ p(z` | z(`),D > `),

for all ` ≥ d , d = 1, . . . ,T .

I Among all observations with D > ` we get to observe z` and z(`)

I We could think that this approach maximizes the use of availableinformation for basing extrapolation of the values of Z`

I HW4: say T = 3, write down this restriction for ` ≥ d , d = 1, 2, 3.

I HW4: under monotone nonresponse, the AC assumption isequivalent to MAR

13 / 33

Page 26: Statistical Methods for Analysis with Missing Datafaculty.washington.edu › msadinle › teaching › UW-BIOST531...Statistical Methods for Analysis with Missing Data Lecture 16:

The Available-Case Identifying Assumption

Here, the extrapolation distributions are obtained from all available caseswhere ` is available:

pAC (z` | z(`),D = d) ≡ p(z` | z(`),D > `),

for all ` ≥ d , d = 1, . . . ,T .

I Among all observations with D > ` we get to observe z` and z(`)

I We could think that this approach maximizes the use of availableinformation for basing extrapolation of the values of Z`

I HW4: say T = 3, write down this restriction for ` ≥ d , d = 1, 2, 3.

I HW4: under monotone nonresponse, the AC assumption isequivalent to MAR

13 / 33

Page 27: Statistical Methods for Analysis with Missing Datafaculty.washington.edu › msadinle › teaching › UW-BIOST531...Statistical Methods for Analysis with Missing Data Lecture 16:

The Available-Case Identifying Assumption

Here, the extrapolation distributions are obtained from all available caseswhere ` is available:

pAC (z` | z(`),D = d) ≡ p(z` | z(`),D > `),

for all ` ≥ d , d = 1, . . . ,T .

I Among all observations with D > ` we get to observe z` and z(`)

I We could think that this approach maximizes the use of availableinformation for basing extrapolation of the values of Z`

I HW4: say T = 3, write down this restriction for ` ≥ d , d = 1, 2, 3.

I HW4: under monotone nonresponse, the AC assumption isequivalent to MAR

13 / 33

Page 28: Statistical Methods for Analysis with Missing Datafaculty.washington.edu › msadinle › teaching › UW-BIOST531...Statistical Methods for Analysis with Missing Data Lecture 16:

Outline

Recap from Previous Lecture

Common Identifying Assumptions for Pattern-Mixture Models

Itemwise Conditionally Independent Nonresponse

Properties of Classes of Full-Data Distributions

Sensitivity Analysis

14 / 33

Page 29: Statistical Methods for Analysis with Missing Datafaculty.washington.edu › msadinle › teaching › UW-BIOST531...Statistical Methods for Analysis with Missing Data Lecture 16:

Itemwise Conditionally Independent Nonresponse

I Identification assumptions can also be expressed as restrictions onthe full-data distribution

I The itemwise conditionally independent nonresponse (ICIN)1

assumption says that

Zj ⊥⊥ Rj | Z−j ,R−j , for all j = 1, . . . ,K ,

where Z−j = (. . . ,Zj−1,Zj+1, . . . ), R−j = (. . . ,Rj−1,Rj+1, . . . )

I Remark: Zj and Rj being conditionally independent does not implymarginal independence

1Sadinle & Reiter (Biometrika 2017): https://doi.org/10.1093/biomet/asw06315 / 33

Page 30: Statistical Methods for Analysis with Missing Datafaculty.washington.edu › msadinle › teaching › UW-BIOST531...Statistical Methods for Analysis with Missing Data Lecture 16:

Itemwise Conditionally Independent Nonresponse

I Identification assumptions can also be expressed as restrictions onthe full-data distribution

I The itemwise conditionally independent nonresponse (ICIN)1

assumption says that

Zj ⊥⊥ Rj | Z−j ,R−j , for all j = 1, . . . ,K ,

where Z−j = (. . . ,Zj−1,Zj+1, . . . ), R−j = (. . . ,Rj−1,Rj+1, . . . )

I Remark: Zj and Rj being conditionally independent does not implymarginal independence

1Sadinle & Reiter (Biometrika 2017): https://doi.org/10.1093/biomet/asw06315 / 33

Page 31: Statistical Methods for Analysis with Missing Datafaculty.washington.edu › msadinle › teaching › UW-BIOST531...Statistical Methods for Analysis with Missing Data Lecture 16:

ICIN DistributionI Sadinle & Reiter showed how to construct a full-data distribution

that encodes ICIN given an observed-data distribution

I For each missingness pattern r ∈ {0, 1}K , given p(z(r), r) > 0, letthe function ηr : Z(r) 7→ R be defined recursively as

ηr (z(r)) = log p(z(r), r)− log

∫Z(r̄)

exp

{∑r̄ ′≺r̄

ηr ′(z(r ′))

}µ(dz(r̄)).

Then

pICIN(z , r) = exp

∑r̄ ′�r̄

ηr ′(z(r ′))

= p(z(r), r)

exp{∑

r̄ ′≺r̄ ηr ′(z(r ′))}∫

Z(r̄)exp

{∑r̄ ′≺r̄ ηr ′(z(r ′))

}µ(dz(r̄))

I Therefore ICIN can be seen as a restriction for pattern-mixturemodels!

16 / 33

Page 32: Statistical Methods for Analysis with Missing Datafaculty.washington.edu › msadinle › teaching › UW-BIOST531...Statistical Methods for Analysis with Missing Data Lecture 16:

ICIN DistributionI Sadinle & Reiter showed how to construct a full-data distribution

that encodes ICIN given an observed-data distribution

I For each missingness pattern r ∈ {0, 1}K , given p(z(r), r) > 0, letthe function ηr : Z(r) 7→ R be defined recursively as

ηr (z(r)) = log p(z(r), r)− log

∫Z(r̄)

exp

{∑r̄ ′≺r̄

ηr ′(z(r ′))

}µ(dz(r̄)).

Then

pICIN(z , r) = exp

∑r̄ ′�r̄

ηr ′(z(r ′))

= p(z(r), r)

exp{∑

r̄ ′≺r̄ ηr ′(z(r ′))}∫

Z(r̄)exp

{∑r̄ ′≺r̄ ηr ′(z(r ′))

}µ(dz(r̄))

I Therefore ICIN can be seen as a restriction for pattern-mixturemodels!

16 / 33

Page 33: Statistical Methods for Analysis with Missing Datafaculty.washington.edu › msadinle › teaching › UW-BIOST531...Statistical Methods for Analysis with Missing Data Lecture 16:

Outline

Recap from Previous Lecture

Common Identifying Assumptions for Pattern-Mixture Models

Itemwise Conditionally Independent Nonresponse

Properties of Classes of Full-Data Distributions

Sensitivity Analysis

17 / 33

Page 34: Statistical Methods for Analysis with Missing Datafaculty.washington.edu › msadinle › teaching › UW-BIOST531...Statistical Methods for Analysis with Missing Data Lecture 16:

Properties of Full-Data Distributions

I Assumptions covered so far: MCAR, MAR, CC, NC, AC, ICIN

I Can we talk about properties of the implied classes of full-datadistributions?2

I Observational equivalence

I Full-data identifiability

I (Observed-data) identifiability

I Nonparametric identifiability

2Taken from Sadinle & Reiter (forthcoming in Biometrika):https://arxiv.org/pdf/1902.06043.pdf

18 / 33

Page 35: Statistical Methods for Analysis with Missing Datafaculty.washington.edu › msadinle › teaching › UW-BIOST531...Statistical Methods for Analysis with Missing Data Lecture 16:

Observational Equivalence

I Two full-data distributions are said to be observationally equivalentif their implied observed-data distributions are the same

I This is, say I have two full-data distributions:

pA(z(r̄), z(r), r) and pB(z(r̄), z(r), r).

If ∫pA(z(r̄), z(r), r) dz(r̄) =

∫pB(z(r̄), z(r), r) dz(r̄)

for all (z(r), r), then they are observationally equivalent

I HW4: the full-data distributions obtained under the CC, NC, andAC assumptions are observationally equivalent (under dropout)

I This is an important feature in sensitivity analysis, becausedifferences in inferences will be due to the different identifyingassumptions and not due to different fits to the observed data!

19 / 33

Page 36: Statistical Methods for Analysis with Missing Datafaculty.washington.edu › msadinle › teaching › UW-BIOST531...Statistical Methods for Analysis with Missing Data Lecture 16:

Observational Equivalence

I Two full-data distributions are said to be observationally equivalentif their implied observed-data distributions are the same

I This is, say I have two full-data distributions:

pA(z(r̄), z(r), r) and pB(z(r̄), z(r), r).

If ∫pA(z(r̄), z(r), r) dz(r̄) =

∫pB(z(r̄), z(r), r) dz(r̄)

for all (z(r), r), then they are observationally equivalent

I HW4: the full-data distributions obtained under the CC, NC, andAC assumptions are observationally equivalent (under dropout)

I This is an important feature in sensitivity analysis, becausedifferences in inferences will be due to the different identifyingassumptions and not due to different fits to the observed data!

19 / 33

Page 37: Statistical Methods for Analysis with Missing Datafaculty.washington.edu › msadinle › teaching › UW-BIOST531...Statistical Methods for Analysis with Missing Data Lecture 16:

Observational Equivalence

I Two full-data distributions are said to be observationally equivalentif their implied observed-data distributions are the same

I This is, say I have two full-data distributions:

pA(z(r̄), z(r), r) and pB(z(r̄), z(r), r).

If ∫pA(z(r̄), z(r), r) dz(r̄) =

∫pB(z(r̄), z(r), r) dz(r̄)

for all (z(r), r), then they are observationally equivalent

I HW4: the full-data distributions obtained under the CC, NC, andAC assumptions are observationally equivalent (under dropout)

I This is an important feature in sensitivity analysis, becausedifferences in inferences will be due to the different identifyingassumptions and not due to different fits to the observed data!

19 / 33

Page 38: Statistical Methods for Analysis with Missing Datafaculty.washington.edu › msadinle › teaching › UW-BIOST531...Statistical Methods for Analysis with Missing Data Lecture 16:

Observational Equivalence

I Two full-data distributions are said to be observationally equivalentif their implied observed-data distributions are the same

I This is, say I have two full-data distributions:

pA(z(r̄), z(r), r) and pB(z(r̄), z(r), r).

If ∫pA(z(r̄), z(r), r) dz(r̄) =

∫pB(z(r̄), z(r), r) dz(r̄)

for all (z(r), r), then they are observationally equivalent

I HW4: the full-data distributions obtained under the CC, NC, andAC assumptions are observationally equivalent (under dropout)

I This is an important feature in sensitivity analysis, becausedifferences in inferences will be due to the different identifyingassumptions and not due to different fits to the observed data!

19 / 33

Page 39: Statistical Methods for Analysis with Missing Datafaculty.washington.edu › msadinle › teaching › UW-BIOST531...Statistical Methods for Analysis with Missing Data Lecture 16:

Observational Equivalence

I Two full-data distributions are said to be observationally equivalentif their implied observed-data distributions are the same

I This is, say I have two full-data distributions:

pA(z(r̄), z(r), r) and pB(z(r̄), z(r), r).

If ∫pA(z(r̄), z(r), r) dz(r̄) =

∫pB(z(r̄), z(r), r) dz(r̄)

for all (z(r), r), then they are observationally equivalent

I HW4: the full-data distributions obtained under the CC, NC, andAC assumptions are observationally equivalent (under dropout)

I This is an important feature in sensitivity analysis, becausedifferences in inferences will be due to the different identifyingassumptions and not due to different fits to the observed data!

19 / 33

Page 40: Statistical Methods for Analysis with Missing Datafaculty.washington.edu › msadinle › teaching › UW-BIOST531...Statistical Methods for Analysis with Missing Data Lecture 16:

Observational Equivalence

I Two full-data distributions are said to be observationally equivalentif their implied observed-data distributions are the same

I This is, say I have two full-data distributions:

pA(z(r̄), z(r), r) and pB(z(r̄), z(r), r).

If ∫pA(z(r̄), z(r), r) dz(r̄) =

∫pB(z(r̄), z(r), r) dz(r̄)

for all (z(r), r), then they are observationally equivalent

I HW4: the full-data distributions obtained under the CC, NC, andAC assumptions are observationally equivalent (under dropout)

I This is an important feature in sensitivity analysis, becausedifferences in inferences will be due to the different identifyingassumptions and not due to different fits to the observed data!

19 / 33

Page 41: Statistical Methods for Analysis with Missing Datafaculty.washington.edu › msadinle › teaching › UW-BIOST531...Statistical Methods for Analysis with Missing Data Lecture 16:

Full-Data Identifiability

I CΘ: class of full-data distributions

I Θ: parameter space, either finite- or infinite-dimensional

I Say we were able to observe Z regardless of the value of R

I Identifiability of CΘ in the usual sense (e.g., Lehmann & Casella1998, p. 24) here is referred to as full-data identifiability

I A class of full-data distributions CΘ is said to be full-data identifiableif there exists a bijection from Θ to CΘ

I Full-data identifiability is an elementary requirement which simplysays that the class is properly parameterized

20 / 33

Page 42: Statistical Methods for Analysis with Missing Datafaculty.washington.edu › msadinle › teaching › UW-BIOST531...Statistical Methods for Analysis with Missing Data Lecture 16:

Full-Data Identifiability

I CΘ: class of full-data distributions

I Θ: parameter space, either finite- or infinite-dimensional

I Say we were able to observe Z regardless of the value of R

I Identifiability of CΘ in the usual sense (e.g., Lehmann & Casella1998, p. 24) here is referred to as full-data identifiability

I A class of full-data distributions CΘ is said to be full-data identifiableif there exists a bijection from Θ to CΘ

I Full-data identifiability is an elementary requirement which simplysays that the class is properly parameterized

20 / 33

Page 43: Statistical Methods for Analysis with Missing Datafaculty.washington.edu › msadinle › teaching › UW-BIOST531...Statistical Methods for Analysis with Missing Data Lecture 16:

Full-Data Identifiability

I CΘ: class of full-data distributions

I Θ: parameter space, either finite- or infinite-dimensional

I Say we were able to observe Z regardless of the value of R

I Identifiability of CΘ in the usual sense (e.g., Lehmann & Casella1998, p. 24) here is referred to as full-data identifiability

I A class of full-data distributions CΘ is said to be full-data identifiableif there exists a bijection from Θ to CΘ

I Full-data identifiability is an elementary requirement which simplysays that the class is properly parameterized

20 / 33

Page 44: Statistical Methods for Analysis with Missing Datafaculty.washington.edu › msadinle › teaching › UW-BIOST531...Statistical Methods for Analysis with Missing Data Lecture 16:

Full-Data Identifiability

I CΘ: class of full-data distributions

I Θ: parameter space, either finite- or infinite-dimensional

I Say we were able to observe Z regardless of the value of R

I Identifiability of CΘ in the usual sense (e.g., Lehmann & Casella1998, p. 24) here is referred to as full-data identifiability

I A class of full-data distributions CΘ is said to be full-data identifiableif there exists a bijection from Θ to CΘ

I Full-data identifiability is an elementary requirement which simplysays that the class is properly parameterized

20 / 33

Page 45: Statistical Methods for Analysis with Missing Datafaculty.washington.edu › msadinle › teaching › UW-BIOST531...Statistical Methods for Analysis with Missing Data Lecture 16:

Full-Data Identifiability

I CΘ: class of full-data distributions

I Θ: parameter space, either finite- or infinite-dimensional

I Say we were able to observe Z regardless of the value of R

I Identifiability of CΘ in the usual sense (e.g., Lehmann & Casella1998, p. 24) here is referred to as full-data identifiability

I A class of full-data distributions CΘ is said to be full-data identifiableif there exists a bijection from Θ to CΘ

I Full-data identifiability is an elementary requirement which simplysays that the class is properly parameterized

20 / 33

Page 46: Statistical Methods for Analysis with Missing Datafaculty.washington.edu › msadinle › teaching › UW-BIOST531...Statistical Methods for Analysis with Missing Data Lecture 16:

Full-Data Identifiability

I CΘ: class of full-data distributions

I Θ: parameter space, either finite- or infinite-dimensional

I Say we were able to observe Z regardless of the value of R

I Identifiability of CΘ in the usual sense (e.g., Lehmann & Casella1998, p. 24) here is referred to as full-data identifiability

I A class of full-data distributions CΘ is said to be full-data identifiableif there exists a bijection from Θ to CΘ

I Full-data identifiability is an elementary requirement which simplysays that the class is properly parameterized

20 / 33

Page 47: Statistical Methods for Analysis with Missing Datafaculty.washington.edu › msadinle › teaching › UW-BIOST531...Statistical Methods for Analysis with Missing Data Lecture 16:

(Observed-Data) Identifiability

I obs(CΘ): the class of observed-data distributions implied by CΘ

I A class of full-data distributions CΘ is said to be identifiable if thereexist bijections from Θ to CΘ and from obs(CΘ) to CΘ

I First bijection: full-data identifiability for CΘ

I Second bijection: we need a unique way to go back and forth fromobs(CΘ) to CΘ

I These imply a third bijection between obs(CΘ) and Θ: the commonnotion of identifiability applied to obs(CΘ)

21 / 33

Page 48: Statistical Methods for Analysis with Missing Datafaculty.washington.edu › msadinle › teaching › UW-BIOST531...Statistical Methods for Analysis with Missing Data Lecture 16:

(Observed-Data) Identifiability

I obs(CΘ): the class of observed-data distributions implied by CΘ

I A class of full-data distributions CΘ is said to be identifiable if thereexist bijections from Θ to CΘ and from obs(CΘ) to CΘ

I First bijection: full-data identifiability for CΘ

I Second bijection: we need a unique way to go back and forth fromobs(CΘ) to CΘ

I These imply a third bijection between obs(CΘ) and Θ: the commonnotion of identifiability applied to obs(CΘ)

21 / 33

Page 49: Statistical Methods for Analysis with Missing Datafaculty.washington.edu › msadinle › teaching › UW-BIOST531...Statistical Methods for Analysis with Missing Data Lecture 16:

(Observed-Data) Identifiability

I obs(CΘ): the class of observed-data distributions implied by CΘ

I A class of full-data distributions CΘ is said to be identifiable if thereexist bijections from Θ to CΘ and from obs(CΘ) to CΘ

I First bijection: full-data identifiability for CΘ

I Second bijection: we need a unique way to go back and forth fromobs(CΘ) to CΘ

I These imply a third bijection between obs(CΘ) and Θ: the commonnotion of identifiability applied to obs(CΘ)

21 / 33

Page 50: Statistical Methods for Analysis with Missing Datafaculty.washington.edu › msadinle › teaching › UW-BIOST531...Statistical Methods for Analysis with Missing Data Lecture 16:

(Observed-Data) Identifiability

I obs(CΘ): the class of observed-data distributions implied by CΘ

I A class of full-data distributions CΘ is said to be identifiable if thereexist bijections from Θ to CΘ and from obs(CΘ) to CΘ

I First bijection: full-data identifiability for CΘ

I Second bijection: we need a unique way to go back and forth fromobs(CΘ) to CΘ

I These imply a third bijection between obs(CΘ) and Θ: the commonnotion of identifiability applied to obs(CΘ)

21 / 33

Page 51: Statistical Methods for Analysis with Missing Datafaculty.washington.edu › msadinle › teaching › UW-BIOST531...Statistical Methods for Analysis with Missing Data Lecture 16:

(Observed-Data) Identifiability

I obs(CΘ): the class of observed-data distributions implied by CΘ

I A class of full-data distributions CΘ is said to be identifiable if thereexist bijections from Θ to CΘ and from obs(CΘ) to CΘ

I First bijection: full-data identifiability for CΘ

I Second bijection: we need a unique way to go back and forth fromobs(CΘ) to CΘ

I These imply a third bijection between obs(CΘ) and Θ: the commonnotion of identifiability applied to obs(CΘ)

21 / 33

Page 52: Statistical Methods for Analysis with Missing Datafaculty.washington.edu › msadinle › teaching › UW-BIOST531...Statistical Methods for Analysis with Missing Data Lecture 16:

Nonparametric Identifiability

I G: all possible observed-data distributions

I Say obs(CΘ) is a proper subset of G: CΘ imposes parametricrestrictions on what could be nonparametrically recovered fromobserved data alone

I CΘ is said to be nonparametrically identifiable if it is identifiable andobs(CΘ) = G

I Also known as nonparametric saturation or just-identification(Robins 1997, Vansteelandt et al. 2006, Hoonhout & Ridder 2018)

I Bijection between Θ and obs(CΘ) = G: we can think of anonparametrically identifiable class as being indexed by the set of allobserved-data distributions G

22 / 33

Page 53: Statistical Methods for Analysis with Missing Datafaculty.washington.edu › msadinle › teaching › UW-BIOST531...Statistical Methods for Analysis with Missing Data Lecture 16:

Nonparametric Identifiability

I G: all possible observed-data distributions

I Say obs(CΘ) is a proper subset of G: CΘ imposes parametricrestrictions on what could be nonparametrically recovered fromobserved data alone

I CΘ is said to be nonparametrically identifiable if it is identifiable andobs(CΘ) = G

I Also known as nonparametric saturation or just-identification(Robins 1997, Vansteelandt et al. 2006, Hoonhout & Ridder 2018)

I Bijection between Θ and obs(CΘ) = G: we can think of anonparametrically identifiable class as being indexed by the set of allobserved-data distributions G

22 / 33

Page 54: Statistical Methods for Analysis with Missing Datafaculty.washington.edu › msadinle › teaching › UW-BIOST531...Statistical Methods for Analysis with Missing Data Lecture 16:

Nonparametric Identifiability

I G: all possible observed-data distributions

I Say obs(CΘ) is a proper subset of G: CΘ imposes parametricrestrictions on what could be nonparametrically recovered fromobserved data alone

I CΘ is said to be nonparametrically identifiable if it is identifiable andobs(CΘ) = G

I Also known as nonparametric saturation or just-identification(Robins 1997, Vansteelandt et al. 2006, Hoonhout & Ridder 2018)

I Bijection between Θ and obs(CΘ) = G: we can think of anonparametrically identifiable class as being indexed by the set of allobserved-data distributions G

22 / 33

Page 55: Statistical Methods for Analysis with Missing Datafaculty.washington.edu › msadinle › teaching › UW-BIOST531...Statistical Methods for Analysis with Missing Data Lecture 16:

Nonparametric Identifiability

I G: all possible observed-data distributions

I Say obs(CΘ) is a proper subset of G: CΘ imposes parametricrestrictions on what could be nonparametrically recovered fromobserved data alone

I CΘ is said to be nonparametrically identifiable if it is identifiable andobs(CΘ) = G

I Also known as nonparametric saturation or just-identification(Robins 1997, Vansteelandt et al. 2006, Hoonhout & Ridder 2018)

I Bijection between Θ and obs(CΘ) = G: we can think of anonparametrically identifiable class as being indexed by the set of allobserved-data distributions G

22 / 33

Page 56: Statistical Methods for Analysis with Missing Datafaculty.washington.edu › msadinle › teaching › UW-BIOST531...Statistical Methods for Analysis with Missing Data Lecture 16:

Nonparametric Identifiability

I G: all possible observed-data distributions

I Say obs(CΘ) is a proper subset of G: CΘ imposes parametricrestrictions on what could be nonparametrically recovered fromobserved data alone

I CΘ is said to be nonparametrically identifiable if it is identifiable andobs(CΘ) = G

I Also known as nonparametric saturation or just-identification(Robins 1997, Vansteelandt et al. 2006, Hoonhout & Ridder 2018)

I Bijection between Θ and obs(CΘ) = G: we can think of anonparametrically identifiable class as being indexed by the set of allobserved-data distributions G

22 / 33

Page 57: Statistical Methods for Analysis with Missing Datafaculty.washington.edu › msadinle › teaching › UW-BIOST531...Statistical Methods for Analysis with Missing Data Lecture 16:

Nonparametric Identifiability

I Two nonparametrically identifiable classes are necessarilyobservationally equivalent

I Nonparametric identification additionally guarantees that theserestrictions do not constrain the observed-data distribution, andtherefore cannot be rejected based on the observed data

I Nonparametric identification is therefore a basic desirable property,particularly useful for comparing inferences under different missingdata assumptions

I HW4: the full-data distributions obtained under the CC, NC, andAC assumptions are nonparametric identified

I Fun fact: MAR and ICIN lead to nonparametric identification

23 / 33

Page 58: Statistical Methods for Analysis with Missing Datafaculty.washington.edu › msadinle › teaching › UW-BIOST531...Statistical Methods for Analysis with Missing Data Lecture 16:

Nonparametric Identifiability

I Two nonparametrically identifiable classes are necessarilyobservationally equivalent

I Nonparametric identification additionally guarantees that theserestrictions do not constrain the observed-data distribution, andtherefore cannot be rejected based on the observed data

I Nonparametric identification is therefore a basic desirable property,particularly useful for comparing inferences under different missingdata assumptions

I HW4: the full-data distributions obtained under the CC, NC, andAC assumptions are nonparametric identified

I Fun fact: MAR and ICIN lead to nonparametric identification

23 / 33

Page 59: Statistical Methods for Analysis with Missing Datafaculty.washington.edu › msadinle › teaching › UW-BIOST531...Statistical Methods for Analysis with Missing Data Lecture 16:

Nonparametric Identifiability

I Two nonparametrically identifiable classes are necessarilyobservationally equivalent

I Nonparametric identification additionally guarantees that theserestrictions do not constrain the observed-data distribution, andtherefore cannot be rejected based on the observed data

I Nonparametric identification is therefore a basic desirable property,particularly useful for comparing inferences under different missingdata assumptions

I HW4: the full-data distributions obtained under the CC, NC, andAC assumptions are nonparametric identified

I Fun fact: MAR and ICIN lead to nonparametric identification

23 / 33

Page 60: Statistical Methods for Analysis with Missing Datafaculty.washington.edu › msadinle › teaching › UW-BIOST531...Statistical Methods for Analysis with Missing Data Lecture 16:

Nonparametric Identifiability

I Two nonparametrically identifiable classes are necessarilyobservationally equivalent

I Nonparametric identification additionally guarantees that theserestrictions do not constrain the observed-data distribution, andtherefore cannot be rejected based on the observed data

I Nonparametric identification is therefore a basic desirable property,particularly useful for comparing inferences under different missingdata assumptions

I HW4: the full-data distributions obtained under the CC, NC, andAC assumptions are nonparametric identified

I Fun fact: MAR and ICIN lead to nonparametric identification

23 / 33

Page 61: Statistical Methods for Analysis with Missing Datafaculty.washington.edu › msadinle › teaching › UW-BIOST531...Statistical Methods for Analysis with Missing Data Lecture 16:

Nonparametric Identifiability

I Two nonparametrically identifiable classes are necessarilyobservationally equivalent

I Nonparametric identification additionally guarantees that theserestrictions do not constrain the observed-data distribution, andtherefore cannot be rejected based on the observed data

I Nonparametric identification is therefore a basic desirable property,particularly useful for comparing inferences under different missingdata assumptions

I HW4: the full-data distributions obtained under the CC, NC, andAC assumptions are nonparametric identified

I Fun fact: MAR and ICIN lead to nonparametric identification

23 / 33

Page 62: Statistical Methods for Analysis with Missing Datafaculty.washington.edu › msadinle › teaching › UW-BIOST531...Statistical Methods for Analysis with Missing Data Lecture 16:

Nonparametric Identifiability

p(z(r), r)

pA(z , r)

pA(z)

pA(z(r), r)

Identifying assumption A

Sum over r

∫· dz(r̄)

If pA(z(r), r) = p(z(r), r), then A leads to nonparametric identification:

I A does not impose restrictions on observed-data distribution

I A cannot be rejected from observed data alone

24 / 33

Page 63: Statistical Methods for Analysis with Missing Datafaculty.washington.edu › msadinle › teaching › UW-BIOST531...Statistical Methods for Analysis with Missing Data Lecture 16:

Nonparametric Identifiability

p(z(r), r)

pA(z , r)

pA(z)

pA(z(r), r)

Identifying assumption A

Sum over r

∫· dz(r̄)

If pA(z(r), r) = p(z(r), r), then A leads to nonparametric identification:

I A does not impose restrictions on observed-data distribution

I A cannot be rejected from observed data alone

24 / 33

Page 64: Statistical Methods for Analysis with Missing Datafaculty.washington.edu › msadinle › teaching › UW-BIOST531...Statistical Methods for Analysis with Missing Data Lecture 16:

Outline

Recap from Previous Lecture

Common Identifying Assumptions for Pattern-Mixture Models

Itemwise Conditionally Independent Nonresponse

Properties of Classes of Full-Data Distributions

Sensitivity Analysis

25 / 33

Page 65: Statistical Methods for Analysis with Missing Datafaculty.washington.edu › msadinle › teaching › UW-BIOST531...Statistical Methods for Analysis with Missing Data Lecture 16:

Sensitivity Analysis

Based on Scharfstein et al. (Biometrics 2018)3:

I Local sensitivity analysis

I Make a missing-data assumption A, explore changes in inferencesimplied by departures from A in a “neighborhood” around A

I See Chapter 7 of the lecture notes of Davidian and Tsiatisfor more onthis

I Global sensitivity analysis

I Make completely different missing-data assumptions, exploreinferences under all such assumptions

I We will illustrate global sensitivity analysis with an example presentedby Sadinle & Reiter (2017)

3https://doi.org/10.1111/biom.1272926 / 33

Page 66: Statistical Methods for Analysis with Missing Datafaculty.washington.edu › msadinle › teaching › UW-BIOST531...Statistical Methods for Analysis with Missing Data Lecture 16:

The Slovenian Plebiscite Data Revisited

I Slovenians voted for independence from Yugoslavia in a plebiscite in1991

I The Slovenian public opinion survey included these questions:

1. Independence: Are you in favor of Slovenian independence?2. Secession: Are you in favor of Slovenia’s secession from Yugoslavia?3. Attendance: Will you attend the plebiscite?

I Rubin, Stern and Vehovar (1995) analyzed these three questionsunder MAR

I Plebiscite results give us the proportions of non-attendants andattendants in favor of independence

27 / 33

Page 67: Statistical Methods for Analysis with Missing Datafaculty.washington.edu › msadinle › teaching › UW-BIOST531...Statistical Methods for Analysis with Missing Data Lecture 16:

The Slovenian Plebiscite Data Revisited

I Slovenians voted for independence from Yugoslavia in a plebiscite in1991

I The Slovenian public opinion survey included these questions:

1. Independence: Are you in favor of Slovenian independence?2. Secession: Are you in favor of Slovenia’s secession from Yugoslavia?3. Attendance: Will you attend the plebiscite?

I Rubin, Stern and Vehovar (1995) analyzed these three questionsunder MAR

I Plebiscite results give us the proportions of non-attendants andattendants in favor of independence

27 / 33

Page 68: Statistical Methods for Analysis with Missing Datafaculty.washington.edu › msadinle › teaching › UW-BIOST531...Statistical Methods for Analysis with Missing Data Lecture 16:

The Slovenian Plebiscite Data Revisited

Taken from Rubin, Stern and Vehovar (JASA, 1995)

28 / 33

Page 69: Statistical Methods for Analysis with Missing Datafaculty.washington.edu › msadinle › teaching › UW-BIOST531...Statistical Methods for Analysis with Missing Data Lecture 16:

Slovenian Data: a Bayesian ApproachI Let πz(r),r = p(z(r), r), π = {πz(r),r}

I Draw π(i) from posterior of π given the Slovenian data (problems 9and 10 of HW3)

I Do, for i = 1, . . . ,m:

π(i)

p(i)A (z , r)

p(i)A (z)

p(i)A (Independence=Yes, Attendance=Yes), p

(i)A (Attendance=No)

Identifying assumption A

Sum over r

Parameters of interest

29 / 33

Page 70: Statistical Methods for Analysis with Missing Datafaculty.washington.edu › msadinle › teaching › UW-BIOST531...Statistical Methods for Analysis with Missing Data Lecture 16:

Slovenian Data: a Bayesian ApproachI Let πz(r),r = p(z(r), r), π = {πz(r),r}

I Draw π(i) from posterior of π given the Slovenian data (problems 9and 10 of HW3)

I Do, for i = 1, . . . ,m:

π(i)

p(i)A (z , r)

p(i)A (z)

p(i)A (Independence=Yes, Attendance=Yes), p

(i)A (Attendance=No)

Identifying assumption A

Sum over r

Parameters of interest

29 / 33

Page 71: Statistical Methods for Analysis with Missing Datafaculty.washington.edu › msadinle › teaching › UW-BIOST531...Statistical Methods for Analysis with Missing Data Lecture 16:

Slovenian Data: a Bayesian ApproachI Let πz(r),r = p(z(r), r), π = {πz(r),r}

I Draw π(i) from posterior of π given the Slovenian data (problems 9and 10 of HW3)

I Do, for i = 1, . . . ,m:

π(i)

p(i)A (z , r)

p(i)A (z)

p(i)A (Independence=Yes, Attendance=Yes), p

(i)A (Attendance=No)

Identifying assumption A

Sum over r

Parameters of interest

29 / 33

Page 72: Statistical Methods for Analysis with Missing Datafaculty.washington.edu › msadinle › teaching › UW-BIOST531...Statistical Methods for Analysis with Missing Data Lecture 16:

Global Sensitivity Analysis for The Slovenian Data

ICIN

0.82 0.84 0.86 0.88 0.90

0.04

0.06

0.08

0.10

pr(Independence=YES, Attendance=YES)

pr(A

ttend

ance

=N

O)

MAR

0.82 0.84 0.86 0.88 0.90

0.04

0.06

0.08

0.10

pr(Independence=YES, Attendance=YES)pr

(Atte

ndan

ce=

NO

)

Pattern-Mixture CC

0.82 0.84 0.86 0.88 0.90

0.04

0.06

0.08

0.10

pr(Independence=YES, Attendance=YES)

pr(A

ttend

ance

=N

O)

Figure: Samples from joint posterior distributions of p(Independence = Yes,Attendance = Yes) and p(Attendance = No). Red diamond: results of plebiscite.

I All of these identifying assumptions lead to nonparametricidentifiability!

30 / 33

Page 73: Statistical Methods for Analysis with Missing Datafaculty.washington.edu › msadinle › teaching › UW-BIOST531...Statistical Methods for Analysis with Missing Data Lecture 16:

Global Sensitivity Analysis for The Slovenian Data

ICIN

0.82 0.84 0.86 0.88 0.90

0.04

0.06

0.08

0.10

pr(Independence=YES, Attendance=YES)

pr(A

ttend

ance

=N

O)

MAR

0.82 0.84 0.86 0.88 0.90

0.04

0.06

0.08

0.10

pr(Independence=YES, Attendance=YES)pr

(Atte

ndan

ce=

NO

)

Pattern-Mixture CC

0.82 0.84 0.86 0.88 0.90

0.04

0.06

0.08

0.10

pr(Independence=YES, Attendance=YES)

pr(A

ttend

ance

=N

O)

Figure: Samples from joint posterior distributions of p(Independence = Yes,Attendance = Yes) and p(Attendance = No). Red diamond: results of plebiscite.

I All of these identifying assumptions lead to nonparametricidentifiability!

30 / 33

Page 74: Statistical Methods for Analysis with Missing Datafaculty.washington.edu › msadinle › teaching › UW-BIOST531...Statistical Methods for Analysis with Missing Data Lecture 16:

Summary

Main take-aways from today’s lecture:

I Pattern-mixture models provide an alternative way of thinking aboutmissing data

I Ensuring identifiability is key when moving away from MAR

I Nonparametric identifiability is an important and desirable property

I Sensitivity analyses are recommended

31 / 33

Page 75: Statistical Methods for Analysis with Missing Datafaculty.washington.edu › msadinle › teaching › UW-BIOST531...Statistical Methods for Analysis with Missing Data Lecture 16:

Summary

Main take-aways from this class:

I The fundamental problem of inference with missing data: it isimpossible without extra, usually untestable, assumptions on howmissingness arises

I Willing to assume MAR?: plethora of approaches readily available:

I Frequentist likelihood-based inference (EM algorithm)

I Bayesian inference (Data Augmentation / Gibbs samplers)

I Multiple imputation (although worry about uncongeniality)

I Inverse-probability weighting (also, double robust procedures)

(they also could handle nonignorable missingness – current area of research)

I Avoid ad-hoc approaches such as single imputation andcomplete-case analysis

32 / 33

Page 76: Statistical Methods for Analysis with Missing Datafaculty.washington.edu › msadinle › teaching › UW-BIOST531...Statistical Methods for Analysis with Missing Data Lecture 16:

Thank you for your patience and participation!

THE END

33 / 33