jomo: an R package for Multilevel Multiple Imputation

Matteo Quartagno

MRC Clinical Trials Unit at UCL

3rd April 2019 (NASH)

MRC CTU at UCL

Missing Data

• Missing data common in medical/social research

• Two possible issues:

– Loss of information (particularly with missing data in covariates);– Bias (if the outcome of analysis model is involved in missingness

mechanism);

• Multiple Imputation has become gold standard way of

handling missing data:

– Makes use of all available information;– Generally used under MAR, i.e. assumption for missing data is

that reason for missingness is in observed variables;– Very flexible, keep desired analysis model with same

interpretation of results.

MRC CTU at UCL

Multiple Imputation

MRC CTU at UCL

Multiple Imputation

• The imputation step is key, several ways of imputing:

– Full conditional specification (most likely the one you used if you ever did MI);

– Joint Modelling

MRC CTU at UCL

Joint Modelling

MRC CTU at UCL

Multiple Imputation

• The imputation step is key, several ways of imputing:

– Full conditional specification

• (most likely the one you used if you ever did MI);

– Joint Modelling

• Data assumed to be (e.g) multivariate normally distributed;

• Binary/Categorical data modelled with latent normals;

• A very important concept is that of congeniality /

compatibility / consistency of analysis and imputation

MRC CTU at UCL

Compatibility

• Example: collect data on two variables, Y and X, for I

individuals nested in J clusters, e.g. children in schools;

• Analysis model is a random intercept model. R code:

fit <- lmer ( Y ~ X + ( 1 | schoolID ) , data )

• With missing data in Y. Imputation model needs to be the

same as analysis model, i.e. clustered by school.

• What if there are missing data in both X and Y? What if X has

a quadratic effect? What if there was a random slope?

– In all of these cases if the structure of analysis model is not reflected in the imputation model, MI estimates are going to be biased.

MRC CTU at UCL

The jomo R package

• jomo is R package built to do imputation guaranteeing

compatibility in such situations;

• Based on Joint Modelling imputation for

clustered/multilevel data

MRC CTU at UCL

Multilevel Joint Modelling MI

MRC CTU at UCL

Multilevel Joint Modelling MI

MRC CTU at UCL

The jomo R package

• jomo is R package built to do imputation guaranteeing

compatibility in such situations;

• Based on Joint Modelling imputation for

clustered/multilevel data

• It can handle missing data in continuous / binary /

categorical variables at level 1 (child) and/or 2 (school).

• Let’s start from a simple example;

MRC CTU at UCL

Jomo: single level example

• Data from Junior School Project:

MRC CTU at UCL

• Data from Junior School Project;

• Model of interest: linear regression of English score at 3

years over 1 year test score and sex

lm(english ~ ravens + sex)

• Missing data in English and ravens. We can assume

simple multivariate normal model with sex as covariate.

jomo can fit and impute from this model.

MRC CTU at UCL

Y <- JSPmiss[, c("english", "ravens")]

JSPmiss$cons <- 1

X <- JSPmiss[, c("cons", "sex")]

set.seed(1569)

imp <- jomo(Y = Y, X = X, nburn = 1000, nbetween =

1000, nimp = 5)

MRC CTU at UCL

• Once we have imputed data in imp, how do we fit model

on each imputed data? How do we combine estimates?

• Can use several packages, here we show how to use

mitools:

MRC CTU at UCL

Including categorical variables

• Easy to include categorical / binary variables. Just need

to be treated as factor:

MRC CTU at UCL

Random intercept model

• Motivation for jomo was multilevel data.

• So assume our substantive analysis model is:

lmer (english ~ ravens + sex + factor(fluent) + (1|school))

• If we have random intercept in analysis model, need to

reflect that in the imputation model as well

• In jomo, only thing that changes is that we need to tell

software about clustering.

MRC CTU at UCL

jomo: multilevel data example

MRC CTU at UCL

• Fitting the random intercept model on all imputed

datasets and combining the estimates is no different

than in the single level example.

MRC CTU at UCL

Random slope model

• If the analysis model has random slopes:

lmer (english ~ ravens + sex + factor(fluent) + (1+ravens|school))

• Or more generally if there is likely heterogeneity between

clusters, option “random” must be used:

• Imp <- jomo(Y = Y, X = X, clus = clus, meth = "random")

MRC CTU at UCL

Jomo: two-level imputation

• Some variables may be at level 2, e.g. school-related

variables;

MRC CTU at UCL

variables;

• Need to make sure we impute properly, i.e. same value

for all individuals in same school

• Easily dealt with by jomo.

MRC CTU at UCL

variables;

• Need to make sure we impute properly, i.e. same value

for all individuals in same school

• Easily dealt with by jomo.

MRC CTU at UCL

Checking convergence

• Jomo imputes by running MCMC. Crucial to check

chains have converged before registering imputations.

• Can do this with jomo.MCMCchain function.

MRC CTU at UCL

Checking convergence

• Jomo imputes by running MCMC. Crucial to check

chains have converged before registering imputations.

• Can do this with jomo.MCMCchain function.

MRC CTU at UCL

Using jomo in practice

• Jomo assumes flat prior for all parameters but

covariance matrices; default is identity, if not suitable

either use expert-informed priors or derived from data;

MRC CTU at UCL

Using jomo in practice

• Jomo assumes flat prior for all parameters but covariance

matrices; default is identity, if not suitable either use expert-

informed priors or derived from data;

• Recommended workflow:

1. Before running the imputation model (which may take some time), perform a ‘dry run’, i.e. run .MCMCchain function with nburn = 2 and check the output;

2. Re-run the same function for a larger number of iterations (e.g. 5000) and analyse trace plots to choose sensible number of burn-in and between-imputation iterations;

3. Run the jomo function for the chosen number of iterations

4. Fit the substantive model on the imputed data sets and apply Rubin’s rules.

MRC CTU at UCL

Mitml: alternative interface to jomo

• Package mitml offers an alternative interface, based on

the classic formula/data framework

MRC CTU at UCL

Jomo and compatibility

• We said the motivation for jomo was imputing when

there are compatibility problems

• Functions used thus far solve compatibility with random

intercept and approximately for random slopes

• However, still no compatibility with interactions, non-

linearities, survival models.

• For this, new “Substantive Model Compatible” functions.

SMC-jomo for friends…

MRC CTU at UCL

SMC-jomo: a simple example

• For example, if our substantive model is random

intercept model with quadratic effect:

MRC CTU at UCL

SMC-jomo

• Currently SMC-jomo available for imputation compatible

with lm and lmer, glm and glmer (binomial only), coxph,

polr and clmm;

• Can handle random intercept and slopes with

unstructured covariance matrix;

• Non-linearities and interactions;

• Future developments: splines, other variance structures,

other models (e.g. frailty), …

MRC CTU at UCL

Conclusions

• MI has become gold standard in handling missing data;

• However, needs to be used carefully. Compatibility of

imputation and analysis model is key;

• Jomo provides a software to impute compatibly when

analysis model is multilevel;

• Additionally the SMC-jomo functions can handle

interactions, nonlinearities, and are the only ones 100%

compatible with random slopes.

MRC CTU at UCL

Limitations/extensions

• Important to always check the sampler has converged

before starting registering imputations.

• Jomo can be very slow with large data sets, particularly

with categorical variables with several categories;

– Plan to speed up package in the future.

• We will keep maintaining the package and including new

functions, for imputation compatible with other analysis

models.

MRC CTU at UCL

Acknowledgements

• Joint work with James Carpenter (MRC-CTU and

LSHTM)

• Presentation based on R Journal paper with Simon

Grund (Leibniz Institute for Science and Mathematics

Education, Kiel, Germany)

jomo: an R package for Multilevel Multiple Imputation · Missing Data • Missing data common in...

Documents

Imputation of Missing Gas Permeability Data for Polymer

Evaluation of Single Missing Value Imputation Techniques

Missing Data and data imputation techniques

Multiple Imputation for Missing Data in KLoSAchipts.ucla.edu/wp-content/uploads/downloads/2012/03/...Multiple Imputation Multiple Imputation: Impute m 2 plausible values for each missing

Chapter 5-9: Missing Data Imputation

Missing Data and Multiple Imputation

Missing data-imputation - Andrew Gelman - Columbia …gelman/arm/missing.pdf · CHAPTER 25 Missing-data imputation ... Section 18.5) or else mitigated ... education levels of New

A Decision Tree-based Missing Value Imputation …crpit.com/confpapers/CRPITV121Rahman.pdf · A Decision Tree-based Missing Value Imputation Tech nique for Data ... very low since

Missing data and Multiple Imputation (II) Henrik Støvring ... · Missing data and Multiple Imputation (II) Henrik Støvring& Morten Frydenberg stovring@ph.au.dk–morten@ph.au.dk

Least-squares imputation of missing data entries

From Predictive Methods to Missing Data Imputation: An

Multiple Imputation of Missing Data › jfox › Books › ... · include mean imputation, replacing missing values for a variable with the mean of the observed values; regression

Multiple Imputation of Missing Categorical Data using ... · Psychological Test and Assessment Modeling, Volume 57, 2015 (4), 542-576 Multiple Imputation of Missing Categorical Data

IMPUTATION PROCESS FOR MISSING DATA IN CARGO TRAIN …

Imputation of missing network data: Some simple procedures

Chapter # Machine Learning Based Missing Value Imputation ... · Chapter # Machine Learning Based Missing Value Imputation Method for Clinical Dataset M. 2Mostafizur Rahman1 and Darryl

A Robust Missing Value Imputation Method Mifoimpute for Incomplete Molecular Descriptor Data and Comparative Analysis With Other Missing Value Imputation Methods

Multiple Imputation for Missing Data · What is Multiple Imputation? Multiple imputation creates multiple imputed values for any one given missing value. There are a variety of methods

Missing data, multiple imputation and associated software

Imputation of missing data under missing not at … · Imputation of missing data under missing not at random assumption & ... Introduction Model for nonignorable ... Yobs and Ymis: