Selection Bias with Linear Probability Models

Selection Bias with Linear Probability Models

(LPM)

Suneel ChatlaGalit Shmueli

Institute of Service Science,National Tsing Hua University, Taiwan

Outline

Ø Introduction to self selectionØ Popular methods for selection bias

correctiono Two step methods (2SLS)o Matching methods (PSM)

Ø Incorporating LPM into 2SLS and PSM Ø Simulation studyØ Conclusions

Quasi-experiments

Like randomized experimental designs that test causal hypotheses but lack random assignment (=self selection)

Pros

• When random assignment is impractical and/or unethical

• Easier to setup, greater external validity• Minimize threats to ecological validity

Cons

• Estimates are subject to contamination by confounding variables (Biased)

• Do not have total control over extraneous variables

Why we need Quasi experiments?

Two Methods for Addressing Selection Bias


Two step methods: Heckman vs Olsen

Stage 1: Selection model (T)

AdjustmentStage 2: Outcomemodel (Y)

𝐸[𝑇|𝑋] = Φ(𝑋𝛾) 𝐼𝑀𝑅 =𝜙(𝑋𝛾)Φ(𝑋𝛾) 𝑌 = 𝑋𝜷 + 𝛿𝐼𝑀𝑅 + 𝜀Heckman

(1977)

𝐸[𝑇|𝑋] = 𝑋𝛾 𝜆 = 𝑋𝛾 − 1 𝑌 = 𝑋𝜷 + 𝛿𝜆 + 𝜀Olsen (1980)

Probit

LPM

Heckman’s

• Bivariate normality

• Inconsistent second stage standard errors

• Identification issues

• Expensive computation

• Convergence issues

Olsen’s

• Linear conditional expectation

• Inconsistent second stage standard errors

• Identification issues

• Cheaper computation

• No convergence issues

In Short: For Continuous Outcome

Open Research Questions

1. Selection model with unequal sample sizes (treat/control) - continuous outcome

2. Binary outcome model – coefficient consistency

3. Selection model with unequal sample sizes (treat/control) + binary outcome model with unequal sample sizes

Simulation Design

Selection model: 𝑆∗ = −0.5 + 0.5𝑥? − 0.5𝑥@ + 1.5𝑥A − 𝑥B + 𝜔

𝑇 = D 1𝑖𝑓𝑆∗ > 0

0𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

Continuous Outcome model: 𝑌 = 0.5 − 1.5𝑥? + 0.5𝑥@ + 𝑥A + 𝜀

Binary Outcome:

𝑌O = D 1𝑖𝑓𝑌 > 00𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

𝑁00, 0.5 −0.4−0.4 0.5

Q1: Continuous outcome: treat/control sample size ratio has no influence

Q2: Binary outcome - coefficients inconsistent

How about marginals?

Q3: Binary outcome - divergence of marginals with imbalance ratio

Outcome cut-off 50% Outcome cut-off 25% Outcome cut-off 5%

Sel

ectio

n cu

t-of

f 50

%Sel

ectio

n cu

t-of

f 25

%Sel

ectio

n cu

t-of

f 5%

Summary: Heckman Vs Olsen

Ø Continuous outcome: Heckman and Olsen corrections are similar, even when unbalanced

Ø Binary outcome: marginal effects from Heckman and Olsen corrections, diverge with imbalance

ØLPM in both stages provides consistent estimates (OLS)

ØBut how about Probit?


Matching Methods

Stage 1: Selection model (T)

Covariate balance

Stage 2: Outcomemodel (Y)

𝑙𝑜𝑔𝑖𝑡(𝐸 𝑇 𝑋 ) = (𝑋𝛾)|𝑝 𝑇 = 1− 𝑝 𝑇 = 0 |< 𝜀

𝑌 = 𝑋𝜷 + 𝜀Rosenbaum and Rubin (1985)

𝐸 𝑇 𝑋 = (𝑋𝛾)|𝑝 𝑇 = 1− 𝑝 𝑇 = 0 |< 𝜀

𝑌 = 𝑋𝜷 + 𝜀LPM

Propensity Score Matching (PSM)

ü Only accounts for observable/observed covariates

ü Requires large samples and substantial overlap between treatment and control

ü What happens to ATE if we use LPM for matching?

Simulation Design

Selection model: 𝑇 = 𝐵𝑒𝑟𝑛𝑜𝑢𝑙𝑙𝑖( Z

Z[\](^_[`))

Outcome model :𝑌 = 𝑇 + 𝑋𝛽 + 𝜀

𝑁(0, {0.1,1,5})

𝑋~𝑁 0,1 and 𝛽 = 1

• Sample size1000

•Standard deviation0.1,1,5

• Bootstrap50

• 𝑚𝑒𝑎𝑛 𝑌hi? −𝑚𝑒𝑎𝑛 𝑌hij

ATE

Identical ATE from Logit and LPM matching

Summary & Future Researchü LPM similar to logit in terms of estimated Average

Treatment Effectü Ongoing work: what about binary outcome

models?

ü Logit faces problems if insufficient overlap between treat/control

ü Ongoing work: does LPM have overlap issues?

Thank you!

Data & Analytics

Selection Bias with Linear Probability Models