Confounding adjustment: Ideas in Action -a case study

Xiaochun Li, Ph.D. Associate Professor Division of Biostatistics Indiana University School of Medicine

• Description of the data set• Quantity to be estimated• Summary of baseline characteristics• Approaches to data analyses• Results• Discussion

Outline

Linder Center data described and analyzed in Kereiakes et al. (2000)

• 6 month follow-up data on 996 patients who underwent an initial Percutaneous Coronary

Intervention (PCI) were treated with “usual care” alone or usual care plus a

relatively expensive blood thinner (IIB/IIIA cascade blocker

• has10 variables Y: 2 outcomes, mort6mo (efficacy) and cardcost (cost) X: 1 treatment variable, and 7 baseline covariates,

stent, height, female, diabetic, acutemi, ejecfrac and ves1proc

Simulation Setup

Baseline characteristics

Stent coronary stent deployment

female patient sex

diabetic diabetes mellitus

acutemi acute myocardial infarction

ves1proc number of vessels involved in initial PCI

height In centimeter

ejecfrac left ejection fraction %

Simulation data set was based on the Linder Center data

• 17 copies of the clustered Lindner data, with fudge factors added to ejfract and hgt, and some clipping

same correlation among covariates, same clustering patterns

• Contains the values of 10 simulated variables for 10,325 hypothetical patients

• To simplify analyses, the data contain no missing values.

• Details and dataset available from Bob’s website

The “LSIM10K” dataset

The population average treatment effect (ATE), i.e.,

E(Y1) - E(Y0)

Y1 and Y0 are conterfactual outcomes

In plain words: what if scenarios

The expected response if treatment had been assigned to the entire study population minus the expected response if control had been assigned to the entire study population

What do we want to estimate?

Baseline covariate balanceassessment

Variable C (Usual care

alone)

T (Usual care + Abciximab)

P value

stent 63% 69% <0.001

female 33% 34% 0.36

diabetic 23% 19% <0.001

acutemi 7% 15% <0.001

ves1proc 1.4 (±0.6) 1.3 (±0.6) <0.001

height (cm) 172.5 (±10) 171.5 (±10) <0.001

ejfract 53 (±8) 50 (±10) <0.001

Visualizing overall imbalance

Deep blue = high values

The following methods were applied to lsim10k

• Outcome regression adjustment (OR)• Propensity score (PS) stratification• Inverse-probability-treatment-weighted (IPTW)• Doubly robust estimation• Matching by

Mahalonobis distancePS only

Analytical Methodsfor confounding adjustment

ANALYSIS OF MORT6MOOR model for mort6mo :• treatment indicator (trtm) • main effect terms for all seven covariates• quadratic terms for both height and ejfract• Residual deviance: 2410.4 on 10323 degrees of freedom

PS model:• saturated model for the five categorical covariates (main effects and interaction terms up to fifth-order)• main effects and quadratic terms for height and ejfract

Covariates Balance Evaluations based on PS Quintiles

Female

Diabetic

Acutemi

Ves1proc

Heightstrata 2 (0.95 cm) and 3 (-1.50cm)

Height

• Existence of residual confounding after adjusting for PS quintiles

• The within-stratum between-group height difference

mean s.d. p

Stratum 2: 0.949 0.44 0.032

Stratum 3: -1.497 0.43 0.0005

Ejfractstrata 1 (0.81), 2 (-1.32) and 3 (-0.72)

• Existence of residual confounding after adjusting for PS quintiles

• The within-strata between-group height difference mean s.d. p-value

Stratum 1: 0.812 0.41 0.0475

Stratum 2: -1.322 0.33 7.38e-5

Stratum 3: -0.721 0.32 0.025

Ejfract

• Residual confounding within strata

• In PS stratification method, height and ejfract are further adjusted

stratum specific Treatment effect Height, ejfract main effects and their quadratic terms

PS Stratification

Results – mort6mo

Method u1 u0 △ SE

Outcome Regression

0.010 0.043 -0.032 0.0038

PS strat. 0.012 0.044 -0.033 0.0039

IPTW1 0.011 0.045 -0.034 0.0038

IPTW2 0.011 0.045 -0.034 0.0037

DR 0.011 0.043 -0.032 0.0037

Match Mahalanobis PS

NA NA -0.037-0.036

0.00440.0039

Results of all methods are consistent, providing evidence of treatment effectiveness at preventing death at 6 months.

True △=-0.036

ANALYSIS OF CARDCOST

cardcost model:•treatment indicator (trtm) • main effect terms for all seven covariates• quadratic terms for both height and ejfract

PS MODEL: SAME AS BEFORE

cardcost model of CA with PS stratification:

stratum specific Treatment effectHeight, ejfract main effects and their quadratic terms

Model checking – OR Adjusted R-squared: 0.0386

Model checking – OR (log transformed) Adjusted R-squared: 0.0693

Results – cardcostMethod u1 u0 △ SE

OR: original scale

15308 15300 8 210

OR: Log transformed

13536 13702 -166 111

PS strat. 13580 13639 -59 119

IPTW1 15545 15226 -319 409

IPTW2 15408 15303 -105 229

DR 15393 15292 -101 226

Match Mahalanobis PS

NA NA 150-3

178215

IPTW 1 vs 2

• All methods give consistent results on the 2 outcomes

• All PS based results have similar variance except IPTW1

• IPTWs depend on approx. correct PS model• OR depends on approx. correct outcome model• DR is a fortuitous combination of OR and IPTW: de

pends on one of models being right• Nonparametric models of either models may be an

alternative to parametric models

Discussion

Double Robustness

Method PS outcome △ SEIPTW2 wrong NA 464 214

wrong wrong right

wrong right wrong

463166

217214233

• wrong PS model: adjust for one covariate ‘acutemi’ only• wrong OR model for card cost: adjust for the treatment indicator ‘trtm’ and the ‘acutemi’ covariate

By “right”, we mean approximately.

• The majority applications in literature use a parametric logistic regression model that assume covariates are linear and additive on the log odds scale May include selected interactions and polynomial terms

• Accurate PS estimation is impeded by High dimensional covariates – which ones should we de-

confound? Unknown functional form – how do they relate to the

treatment selection

• PS model misspecification can substantially bias the estimated treatment effect

• Nonparametric approach is flexible to accommodate nonlinear/non-additive relationship of covariates to treatment assignment, e.g., trees

Propensity score estimation

Nonparametric regression techniques

• Generalized Boosted Models (GBM) to estimate the propensity score function Friedman, 2001; Madigan and Ridgeway, 2004;

McCaffrey, Ridgeway, and Morral, 2004 R package: twang

• Regression tree model to predict cardcost Ripley, 1996; Therneau and Atkinson, 1997 R package: rpart

• A multivariate nonparametric regression technique• Sum of a large set of simple regression trees modelling

log-odds gbm finds mle of g(x)=log(p(x)/(1-p(x)), p(x)=P(T=1|x)

• Predict treatment assignment from a large number of pretreatment covariates – adaptively choose them

• Nonlinear• No need to select variables• Can model complex interactions• Invariant to monotone transformations of x

E.g, same PS estimates whether use age, log(age) or age2

• Outperforms alternative methods in prediction error

Generalized Boosted Models (GBM)

Results – cardcostnonparametric approach

Method u1 u0 △ SE

DR:parametric models

15393 15292 -101 226

DR:Gbm + parametric model

15303 15213 -90 210

DR:Gbm + tree

15233 15356 123 172

• People try quintiles, deciles for propensity score stratification – need data driven approach (based on bias-variance tradeoff) for number of strata

• Model selection: PS model, and outcome model Nonparametric estimation of models may be intuitive,

but not clear about the properties of the causal estimates

Nonparametric caveat: still need to define a set of “confounders” based on knowledge of causal relationship among treatment, outcome and covariates rather than conditioning indiscriminatly on all covariates that have associations with treatment and outcome

Future research

Confounding adjustment: Ideas in Action -a case study

Documents

Confounding & Effect Modification

Confounding and Effect Modification

Bias and confounding

Bias & Confounding

Basic Ideas - Faculty of Medicine, McGill University...1 Confounding and Collinearity in Multiple Linear Regression Basic Ideas Confounding: A third variable, not the dependent (outcome)

4.3.2. controlling confounding stratification

Confounding and Validity 2009

Trade Adjustment Assistance: New Ideas for an Old Program · U.S. Congress, Office of Technology Assessment,Trade Adjustment Assistance: New Ideas for an Old Program-Special Report,OTA-ITE-346

Confounding and Misclassification

Causal Effect Identification by Adjustment under Confounding ...2010, Shpitser et al., 2010, Maathuis and Colombo, 2015, van der Zander et al., 2014]. Orthogonal to confounding, sampling

Sesi 6 Ci Confounding

Beware of Confounding Variables

Confounding - Copy

Definition of Confounding

Dan Gillen Department of Statistics University of ...dgillen/STAT211/Handouts/Discussion... · Stat 211 - D. Gillen Confounding Collapsibility Confounding vs. Noncollapsibility Adjustment

Confounding adjustment: Ideas in Action -a case study Xiaochun Li, Ph.D. Associate Professor Division of Biostatistics Indiana University School of Medicine

Confounding. Objectives To define and discuss confounding To discuss methods of diagnosing confounding To define positive, negative and qualitative confounding

Confounding in epidemiology

5.2.2 dags for confounding

Confounding, Effect Modification and Bias - IEH Consulting web... · • Confounding bias –Stratified analysis –Adjustment in the analyses. Title: Confounding, Effect Modification