26
Propensity Score Matching A Primer in R 1 David Zepeda Assistant Professor Supply Chain & Information Management [email protected] Center for Health Policy and Healthcare Research Brown Bag Series April 1, 2015

Propensity Score Matching A Primer in R 1 David Zepeda Assistant Professor Supply Chain & Information Management [email protected] Center for Health Policy

Embed Size (px)

Citation preview

Page 1: Propensity Score Matching A Primer in R 1 David Zepeda Assistant Professor Supply Chain & Information Management d.zepeda@neu.edu Center for Health Policy

Propensity Score MatchingA Primer in R

1

David ZepedaAssistant Professor

Supply Chain & Information [email protected]

Center for Health Policy and Healthcare ResearchBrown Bag Series

April 1, 2015

Page 2: Propensity Score Matching A Primer in R 1 David Zepeda Assistant Professor Supply Chain & Information Management d.zepeda@neu.edu Center for Health Policy

Outline

1. Problem description2. Theory3. Two-Step Approach4. Implementation in R5. Example 1 – Hospitals6. Example 2 – Primary Care Clinics7. Example 3 – Farm Land 8. References

2

Page 3: Propensity Score Matching A Primer in R 1 David Zepeda Assistant Professor Supply Chain & Information Management d.zepeda@neu.edu Center for Health Policy

Problem

3

Consider a study with n observational units. There are two treatment states, 0 and 1.

The response of the ith unit is indicated by if the ith unit received treatment = 1 and if the ith unit received treatment = 0.

Mean causal effect =

Page 4: Propensity Score Matching A Primer in R 1 David Zepeda Assistant Professor Supply Chain & Information Management d.zepeda@neu.edu Center for Health Policy

Problem

4

The goal is to estimate the average effect over observations.

Let be a vector of the characteristics of the unit (i.e., covariates) that are not consequences of the treatment.

The mean causal effect averaged over all units is the average treatment effect (ATE).

ATE =

Page 5: Propensity Score Matching A Primer in R 1 David Zepeda Assistant Professor Supply Chain & Information Management d.zepeda@neu.edu Center for Health Policy

Problem

5

An observational unit is generally assigned only one of the two treatments.

The treatment is not randomly assigned.

Results in a number of potential problems regarding bias and model dependence.

Page 6: Propensity Score Matching A Primer in R 1 David Zepeda Assistant Professor Supply Chain & Information Management d.zepeda@neu.edu Center for Health Policy

Problem

6

Source: Ho, D. E., Imai, K., King, G. & Stuart, E.A. 2007. Matching as nonparametric preprocessing for reducing model dependence in parametric causal inference. Political Analysis, 15:199-236.

Page 7: Propensity Score Matching A Primer in R 1 David Zepeda Assistant Professor Supply Chain & Information Management d.zepeda@neu.edu Center for Health Policy

Let T be an indicator of treatment

Now compare a randomly selected treated unit to a randomly selected control unit.

Because conditioning on T, the difference between two randomly selected observational units is not in general the mean causal effect!

Theory

7

Page 8: Propensity Score Matching A Primer in R 1 David Zepeda Assistant Professor Supply Chain & Information Management d.zepeda@neu.edu Center for Health Policy

But, if treatment assignment is , then we have

Theory

8

Page 9: Propensity Score Matching A Primer in R 1 David Zepeda Assistant Professor Supply Chain & Information Management d.zepeda@neu.edu Center for Health Policy

Theory

9

A tool used to estimate treatment effects is a “balancing score”. Suppose is a function of covariates such that

such that

Sampling a treated and control unit with the same value of we have

Page 10: Propensity Score Matching A Primer in R 1 David Zepeda Assistant Professor Supply Chain & Information Management d.zepeda@neu.edu Center for Health Policy

Theory

10

A propensity score is p.

The true propensity score is a “balancing score” such that if treatment and control groups have identical propensity score distributions then all covariates are balanced between the two groups.

Therefore, we can use logistic regression or another approach (i.e., probit) to estimate p so we can get matched samples!

Can now use same parametric analysis on the matched dataset as would have been used to analyze the original raw dataset prior to propensity score matching.

Page 11: Propensity Score Matching A Primer in R 1 David Zepeda Assistant Professor Supply Chain & Information Management d.zepeda@neu.edu Center for Health Policy

1. Preprocessing via matchingDistributions of be matched as closely as possible (i.e., balance)Relationship between T and is eliminatedDo not introduce biasDo no increase inefficiency too muchCreate a dataset closer to an experiment

2. Parametric analysis after matching Holding covariates constantIncrease robustnessLess sensitive to modeling assumptionsA variety of treatment effects can be analyzed (i.e., ATE)

Two-Step Approach

11

Page 12: Propensity Score Matching A Primer in R 1 David Zepeda Assistant Professor Supply Chain & Information Management d.zepeda@neu.edu Center for Health Policy

What is R?A language and environment for statistical computing and

graphics Provides a wide variety of statistical and graphical techniquesIs highly extensible Provides an Open Source route to participation Great care has been taken over the defaults for the minor design

choices in graphicsUser retains full control Available as Free Software!Allows users to add additional functionalityCan be extended (easily) via packages.

The R Project for Statistical Computinghttp://www.r-project.org/

Implementation in R

12

Page 13: Propensity Score Matching A Primer in R 1 David Zepeda Assistant Professor Supply Chain & Information Management d.zepeda@neu.edu Center for Health Policy

MATCHIT PackageDichotomous treatment variableExperimental and observational dataImproving parametric statistical modelsReduces model dependenceSemi-parametric and non-parametric preprocessingAssess covariate distributions in the two groups (i.e., balance)

Large range of matching methodsExactSubclassificationNearest neighborOptimalGenetic

Implementation in R

13

Page 14: Propensity Score Matching A Primer in R 1 David Zepeda Assistant Professor Supply Chain & Information Management d.zepeda@neu.edu Center for Health Policy

Exact matchingSimplest version of matchingMatch each treated unit to all possible control unitsExactly the same values on all the covariatesSufficient matches often cannot be found

SubclassificationForms subclasses with “close” distributions of covariatesVarious subclassification schemesCan be used in conjunction with other matching methods

Nearest neighbor matchingSelects “best” control matches for each treated unitChooses the control unit not yet matched closest to treated unit

Implementation in R

14

Page 15: Propensity Score Matching A Primer in R 1 David Zepeda Assistant Professor Supply Chain & Information Management d.zepeda@neu.edu Center for Health Policy

Optimal matchingFinds matched samples with smallest average absolute distanceHelpful when there are not many appropriate control matches

Genetic matchingUses a genetic search algorithmOptimal balance achieved after matchingPerforms statistical tests for determining balance

Variety of options for matching methodsNumber of matched control unitsMatching with or without replacementKernel matchingDiscard treated units, control units, or bothNumber of subclassesDistance measurement (i.e., logit)

Implementation in R

15

Page 16: Propensity Score Matching A Primer in R 1 David Zepeda Assistant Professor Supply Chain & Information Management d.zepeda@neu.edu Center for Health Policy

Association between hospital system affiliation and hospital inventory in California hospitals (Zepeda, Nyaga, & Young, WP 2015)California hospital data from 2007 – 2009878 observations (126 affiliated with smaller hospital systems)Preprocessing of data on affiliation with smaller hospital systemsGenetic matching method2 control observations with replacement for every treated

observation126 observations in treatment group156 observations in control groupPropensity score balancing improved by 95%

Example 1

16

Page 17: Propensity Score Matching A Primer in R 1 David Zepeda Assistant Professor Supply Chain & Information Management d.zepeda@neu.edu Center for Health Policy

Example 1

17

Page 18: Propensity Score Matching A Primer in R 1 David Zepeda Assistant Professor Supply Chain & Information Management d.zepeda@neu.edu Center for Health Policy

Association between IT-leveraging capability and high quality diabetes care in Minnesota primary care clinics (Zepeda & Sinha, WP 2015)Minnesota primary care clinics in 2010450 observations (135 with high IT-leveraging capability)Preprocessing of data on high IT-leveraging capabilityOptimal matching method1 control observations without replacement for every treated

observation135 observations in treatment group135 observations in control groupPropensity score balancing improved by 76%

Example 2

18

Page 19: Propensity Score Matching A Primer in R 1 David Zepeda Assistant Professor Supply Chain & Information Management d.zepeda@neu.edu Center for Health Policy

Example 2

19

Page 20: Propensity Score Matching A Primer in R 1 David Zepeda Assistant Professor Supply Chain & Information Management d.zepeda@neu.edu Center for Health Policy

Effect of easements on the selling price of farms in Minnesota (Taff & Weisberg, 2007)Federal Conservation Reserve Program (CRP)Temporary conservation easement by USDA (10-15 years)Annual payment by USDA for enrolled landLand valuation theory predicts that temporary easements should

have no effect on value of properties

DataOct 1, 2002 – Sep 30, 2004Farm properties with short-term conservation easementsFarm properties with no conservation easementsCovariates2,937 property sales (271 were restricted by CRP contracts)

Example 3

20

Page 21: Propensity Score Matching A Primer in R 1 David Zepeda Assistant Professor Supply Chain & Information Management d.zepeda@neu.edu Center for Health Policy

The primary objectiveCompare 271 sales with CRP restrictions to sales without

Standard observational study approachUse all sales with no CRP as a comparison group

Potential problemProperties sold without a random assignmentDifferences between observable sample and target population

may be a cause for bias

Using propensity score matchingMimic a randomized experimentSample of non-CRP and CRP salesClosely agree on salient property characteristics (i.e., balance)

Example 3

21

Page 22: Propensity Score Matching A Primer in R 1 David Zepeda Assistant Professor Supply Chain & Information Management d.zepeda@neu.edu Center for Health Policy

Example 3

22

MediansUpper 75%Lower 25%Dotted lines = 95%

Page 23: Propensity Score Matching A Primer in R 1 David Zepeda Assistant Professor Supply Chain & Information Management d.zepeda@neu.edu Center for Health Policy

Six models developed and testedModels 1 – 3: use all data, CRP and portion of land RESTRICTEDModel 4: restricts data to sales with PRODUCTIVITY measureModel 5: matched sample on CRP restrictionModel 6: matched sample with PRODUCTIVITY measure

Consistency in resultsCRP contracts negatively associated with sales pricesMost of CRP effect is captured by RESTRICTED amountCounter to land valuation theory

Example 3

23

Page 24: Propensity Score Matching A Primer in R 1 David Zepeda Assistant Professor Supply Chain & Information Management d.zepeda@neu.edu Center for Health Policy

Example 3

24

Page 25: Propensity Score Matching A Primer in R 1 David Zepeda Assistant Professor Supply Chain & Information Management d.zepeda@neu.edu Center for Health Policy

The R Project for Statistical Computinghttp://www.r-project.org/

MATCHIT R Packagehttp://gking.harvard.edu/matchitHo, D. E., Imai, K., King, G. & Stuart, E.A. 2007. Matching as

nonparametric preprocessing for reducing model dependence in parametric causal inference. Political Analysis, 15:199-236.

ExamplesZepeda, D., Nyaga, G., & Young, G. 2015. Supply Chain Risk

Management and Hospital Inventory: Effects of System Affiliation. Working Paper.

Zepeda, D. & Sinha, K. IT-Leveraging Capability for Reducing Health Care Disparities: An Empirical Analysis of Primary Care Operations. Working Paper.

Taff, S.J. & Weisberg, S. 2007. Compensated short-term conservation restrictions may reduce sales prices. The Appraisal Journal, Winter.

References

25

Page 26: Propensity Score Matching A Primer in R 1 David Zepeda Assistant Professor Supply Chain & Information Management d.zepeda@neu.edu Center for Health Policy

Thank You!

26

David ZepedaAssistant Professor

Supply Chain & Information [email protected]