Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Impact Evaluation
Measuring ImpactMeasuring Impact
Sebastian MartinezSebastian MartinezThe World BankThe World Bank
Impact Evaluation Methods for PolicymakersImpact Evaluation Methods for Policymakers
Note: slides by Sebastian Martinez. The content of this presentation reflects the views of the author, and not necessarily those of the World Bank. December 2007.
2
Measuring Impact
1) Causal InferenceCounterfactualsCounterfeit Counterfactuals:
Before and After (pre-post)Enrolled-not enrolled (apples and oranges)
2) IE Methods Toolbox:Randomized Controls Randomized Promotion (IV)Discontinuity Design (RDD)Difference in Difference (Diff-in-diff)Matching (P-score matching)
3
Impact Evaluation
Logical FrameworkTheory
Measuring ImpactIdentification Strategy
DataOperational PlanResources
4
Measuring Impact
1)Causal InferenceCounterfactualsCounterfeit Counterfactuals:
Before and After (pre-post)Enrolled-not enrolled (apples and oranges)
2) IE Methods Toolbox:Randomized Controls Randomized Promotion (IV)Discontinuity Design (RDD)Difference in Difference (Diff-in-diff)Matching (P-score matching)
5
Our Objective:
Estimate the CAUSAL effect (impact) of
intervention P (program or treatment)
on outcome Y (indicator, measure of success)
Example: what is the effect ofa cash transfer program (P)on household consumption (Y)?
6
Causal Inference
What is the effect of P on Y?
Answer:
α= (Y | P=1)-(Y | P=0)
Can we all go home?
7
Problem of MISSING DATA
For a program beneficiary:
we observe (Y | P=1): Consumption level (Y) with a cash transfer program (P)
but we do not observe (Y | P=0):Consumption level (Y) without a cash transfer program (P)
α= (Y | P=1)-(Y | P=0)
8
Solution
Estimate what would have happened to Y in the absence of P
We call this the…………
COUNTERFACTUALHint: The key to a good impact
evaluation is a validcounterfactual!
9
Estimating Impact of P on Y
OBSERVE (Y | P=1)Intention to Treat (ITT) -Those offered treatment Treatment on the Treated (TOT) – Those receiving treatment
ESTIMATE counterfactual for (Y | P=0)
Use comparison or control group
α= (Y | P=1)-(Y | P=0)
IMPACT = outcome with treatment - counterfactual
10
The perfect “Clone”
6 Candies
Impact = 6 Impact = 6 -- 4 = 2 Candies4 = 2 Candies
Beneficiary Control
4 Candies
11
In reality, use statistics
Average Y = 6 Candies
Impact = 6 Impact = 6 -- 4 = 2 Candies4 = 2 Candies
Beneficiary Control
Average Y = 4 Candies
12
Getting Good Counterfactuals
Understand the DATA GENERATION processBehavioral process by which program participation (treatment) is determined
How are benefits assigned?What are the eligibility rules?
The treated observation and the counterfactual:have identical characteristics, except for benefiting from the intervention
Hint: With a good counterfactual, the only reason for different outcomes between treatments and controls is the
intervention (P)
13
Case Study
What is the effect of a cash transfer program (P) on household consumption (Y)?
PROGRESA/OPORTUNIDADES ProgramNational anti-poverty program in Mexico
Started 19975 million beneficiaries by 2004Eligibility – based on poverty index
Cash transfersconditional on school and health care attendance
Rigorous impact evaluation with rich data506 communities, 24K householdsBaseline 1997, follow-up 2008
Many outcomes of interest. Here we consider:Standard of living: consumption per capita
14
Eligibles(Poor)
Ineligibles(Non-Poor)
Case Study
Not Enrolled
Enrolled
Eligibility and Enrollment
15
Measuring Impact
1) Causal InferenceCounterfactuals
Counterfeit Counterfactuals:Before and After (pre-post)Enrolled-not enrolled (apples and oranges)
2) IE Methods Toolbox:Randomized Controls Randomized Promotion (IV)Discontinuity Design (RDD)Difference in Difference (Diff-in-diff)Matching (P-score matching)
16
Counterfeit Counterfactuals
Two common counterfactuals to be avoided!!
Before and After (pre-post)Data on the same individuals before and after an
intervention
Enrolled-not enrolled (apples and oranges) Data on a group of individuals that enrolled in a program, and another group that did not
We don’t know why
Both counterfactuals may lead to biased results
17
Counterfeit Counterfactual #1
Before and AfterY
TimeT=0
Baseline
T=1
Endline
A-B = 4
A-C = 2
IMPACT?
B
A
C (counterfactual)
18
Case 1: Before and After
2 Points in TimeMeasure beneficiaries’:
Consumption at T=0Consumption at T=1
Estimate of counterfactual
(Yi,t| P=0) = (Yi,t-1| P=0)
“Impact” = A-B = 35
Time
What is the effect of a cash transfer program (P) on household consumption (Y)?
B
T=0 T=1
Y
233
268 A
α =35
19
Case 1: Before and After
Control - Before Treatment - After t-statMean 233.48 268.75 16.3
Case 1 - Before and After
Linear Regression Multivariate Linear Regression
Estimated Impact on CPC 35.27** 34.28**(2.16) (2.11)
** Significant at 1% level
Case 1 - Before and After
20
Case 1: Before and After
2 Points in TimeOnly measure beneficiaries:
Consumption at T=0Consumption at T=1
Estimate of counterfactual(Yi,t| P=0) = (Yi,t-1| P=0)
“Impact” = A-B = 35
Does not control for time varying factors
Boom: Impact = A-CA-B = overestimate
Recession: Impact = A-DA-B = underestimate
Time
What’s the Problem?
B
T=0(1997)
T=1(1998)
Y
233
268 A
α =35
D?
C? Impact
Impact
21
Measuring Impact
1) Causal InferenceCounterfactuals
Counterfeit Counterfactuals:Before and After (pre-post)
Enrolled-not enrolled (apples and oranges)
2) IE Methods Toolbox:Randomized Controls Randomized Promotion (IV)Discontinuity Design (RDD)Difference in Difference (Diff-in-diff)Matching (P-score matching)
22
Counterfeit Counterfactual #2Enrolled-not enrolled
Post-treatment data on 2 groupsEnrolled: treatment groupNot-enrolled: “control” group (counterfactual)
Those ineligible to participateThose that choose NOT to participate
Selection BiasReason for not enrolling may be correlated with outcome (Y)
Control for observablesBut not unobservables!!
Estimated impact is confounded with other things
23
Eligibles(Poor)
Ineligibles(Non-Poor)
Case 2: Enrolled- not enrolled
Not Enrolled
Y = 290
Enrolled
Y = 268
Measure outcomes in post-treatment (1998)
In what ways might enrolled/not enrolled be different, other than program?
24
Not Enrolled Enrolled t-statMean CPC 290.16 268.7541 5.6
Case 2 - Enrolled/Not Enrolled
Linear Regression Multivariate Linear Regression
Estimated Impact on CPC -22.7** -4.15(3.78) (4.05)
** Significant at 1% level
Case 2 - Enrolled/Not Enrolled
Case 2: Enrolled- not enrolled
25
What is going on??
Which of these do we believe?Problem with Before-After:
Can not control for other time-varying factors
Problem with Enrolled-Not Enrolled:Do no know if other factors, beyond the intervention, are affecting the outcome
Linear Regression
Multivariate Linear Regression
Linear Regression
Multivariate Linear Regression
Estimated Impact on CPC 35.27** 34.28** -22.7** -4.15
(2.16) (2.11) (3.78) (4.05)** Significant at 1% level
Case 1 - Before and After Case 2 - Enrolled/Not Enrolled
Case Study
26
Measuring Impact
1) Causal InferenceCounterfactualsCounterfeit Counterfactuals:
Before and After (pre-post)Enrolled-not enrolled (apples and oranges)
2)IE Methods Toolbox:Randomized Controls Randomized Promotion (IV)Discontinuity Design (RDD)Difference in Difference (Diff-in-diff)Matching (P-score matching)
27
Choosing your methods…..
To identify an IE method for your program, consider:
Prospective/retrospectiveEligibility rulesRoll-out plan (pipeline)
Is universe of eligibles larger than available resources at a given point in time?
Budget and capacity constraints?Excess demand for program?Eligibility criteria?Geographic targeting? Etc….
Hint: Choose the most robust strategy that fits the operational context
28
Choosing your methods
Identify the “best” possible design given the operational context
Best design = fewest risks for contaminationHave we controlled for “everything”?
Internal validity
Is the result valid for “everyone”?External validityLocal versus global treatment effect
29
Measuring Impact
1) Causal InferenceCounterfactualsCounterfeit Counterfactuals:
Before and After (pre-post)Enrolled-not enrolled (apples and oranges)
2)IE Methods Toolbox:Randomized ControlsRandomized Promotion (IV)Discontinuity Design (RDD)Difference in Difference (Diff-in-diff)Matching (P-score matching)
30
Randomized Controls
When universe of eligibles > # benefits:Randomize! Lottery for who is offered benefits
Fair, transparent and ethical way to assign benefits to equally deserving populations
Oversubscription:Give each eligible unit the same chance of receiving treatment
Compare those offered treatment with those not offered treatment (controls)
Randomized phase in:Give each eligible unit the same chance of receiving treatment first, second, third….
Compare those offered treatment first, with those offered treatment later (controls)
31
Randomization
1. Universe2. Random Sample
of Eligibles
Ineligible =
Eligible =
3. Randomize Treatment
Not Enrolled =
Enrolled =
External Validity Internal Validity
32
Unit of Randomization
Choose according to type of program:Individual/HouseholdSchool/Health Clinic/catchment areaBlock/Village/CommunityWard/District/Region
Keep in mind:Need “sufficiently large” number of units to detect minimum desired impact.Spillovers/contaminationOperational and survey costs
Hint: As a rule of thumb, choose to randomize at the minimum viable unit of implementation.
33
Oportunidades Evaluation SampleUnit of randomization:
community
Random phase in: 320 treatment communities (14,446 households)
First transfers distributed April 1998
186 control communities (9,630 households)First transfers November 1999
Case 3: Randomization
34
Variables Treatment (4,670)
Control (2727) t-stats
Consumption per capita 233.47 233.4 -0.04
1.02 1.3Head's age 41.94 42.35 1.2
0.2 0.27Head's education 2.95 2.81 -2.16
0.04 0.05Spouse's age 37.02 36.96 -0.38
0.7 0.22Spouse's education 2.76 2.76 -0.006
0.03 0.04
Speaks an indigenous language 41.69 41.95 0.21
0.007 0.009Head is female 0.073 0.078 0.66
0.003 0.005Household at baseline 5.76 5.7 -1.21
0.02 0.038
Bathroom at baseline 0.57 0.56 -1.040.007 0.009
Total hectareas of land 1.63 1.72 1.35
0.03 0.05Min. Distance loc-urban 109.28 106.59 -1.02
0.6 0.81
RANDOMIZATION
Case 3: Baseline Balance
35
Control Treatment t-statMean CPC Baseline 233.40 233.47 0.04
Mean CPC Followup 239.5 268.75 9.6
Case 3 - Randomization
Linear Regression Multivariate Linear RegressionEstimated Impact on CPC 29.25** 29.79**
(3.03) (3.00)** Significant at 1% level
Case 3 - Randomization
Case 3: Randomization
36
Case 1 - Before and After
Case 2 - Enrolled/Not Enrolled
Case 3 - Randomization
Multivariate Linear Regression
Multivariate Linear Regression
Multivariate Linear Regression
Estimated Impact on CPC 34.28** -4.15 29.79**
(2.11) (4.05) (3.00)** Significant at 1% level
Case Study
37
Measuring Impact
1) Causal InferenceCounterfactualsCounterfeit Counterfactuals:
Before and After (pre-post)Enrolled-not enrolled (apples and oranges)
2) IE Methods Toolbox:Randomized Controls
Randomized Promotion (IV)Discontinuity Design (RDD)Difference in Difference (Diff-in-diff)Matching (P-score matching)
38
Randomized Promotion (IV)Common scenarios:
National Program with universal eligibilityVoluntary inscription in program
Can we compare enrolled to not enrolled?Selection Bias!
39
Randomized Promotion (IV)
Possible solution: provide additional promotion, encouragement or incentives to a random sub-sample:
InformationEncouragement (small gift or prize)TransportOther help/incentives
Necessary conditions:1. Promoted and non-promoted groups are comparable:
Promotion not correlated with population characteristicsGuaranteed by randomization
2. Promoted group has higher enrollment in the program3. Promotion does not affect outcomes directly
40
Randomized Promotion
Universal Eligibility
Eligible =
Randomize Promotion
Enrollment
Never Always
Promotion
No Promotion
Enroll =
41
Randomized PromotionNOT Promoted
Enrolled = 30%Y = 80
Always Enroll
Enroll if Encouraged
Never Enroll
IMPACT
∆ Enrolled= 0.5∆ Y=20
Impact = 40
Promoted
Enrolled = 80%Y = 100
42
ExamplesMaternal Child Health Insurance in Argentina
Intensive information campaigns
Employment Program in ArgentinaTransport voucher
Community Based School Management in Nepal
Assistance from NGO
Health Risk Funds in IndiaAssistance from Community Resource Teams
43
Randomized Promotion
Pilot test promotion strategy vigorously!Produces additional information of interest:
How to increase enrollment
Don’t have to “exclude” anyone, but…..Strategy depends on success and validity of promotionProduces a local average treatment effect
Randomized Promotion is an Instrumental Variable (IV)
A variable correlated with treatment but nothing else (i.e. random promotion)More details in the appendix
44
Randomized Control
Enrolled = 0%Y = 239
Enroll if Encouraged
Never Enroll
IMPACT
∆ Enrolled= 0.92
∆ Y=29
TOT Impact = 31
Randomized Treatment
(Promoted)
Enrolled = 92%Y = 268
Case 4: IV
45
Estimate TOT effect of Oportunidades on consumptionRun 2SLS regression
Case 4: IV - TOT
Linear Regression Multivariate Linear Regression
Estimated Impact on CPC 29.88** 30.44**(3.09) (3.07)
** Significant at 1% level
Case 4 - IV
46
Measuring Impact
1) Causal InferenceCounterfactualsCounterfeit Counterfactuals:
Before and After (pre-post)Enrolled-not enrolled (apples and oranges)
2) IE Methods Toolbox:Randomized Controls Randomized Promotion (IV)
Discontinuity Design (RDD)Difference in Difference (Diff-in-diff)Matching (P-score matching)
47
Discontinuities in Eligibility
Social programs many times target programs according to an eligibility index:Anti-poverty programs:
targeted to households below a given poverty index
Pension programs:targeted to population above a certain age
Scholarships: targeted to students with high scores on standardized
test
Hint: For a discontinuity design, you need:-Continuous eligibility index
-Clearly defined eligibility cut-off
48
Example:
Eligibility index (score) from 1 to 100 Based on pre-intervention characteristics
Score <=50 are eligibleScore >50 are not eligibleOffer treatment to eligibles
49
6065
7075
80O
utco
me
20 30 40 50 60 70 80Score
Regression Discontinuity Design - Baseline
50
6065
7075
80O
utco
me
20 30 40 50 60 70 80Score
Regression Discontinuity Design - Baseline
Not Eligible
Eligible
51
6570
7580
Out
com
e
20 30 40 50 60 70 80Score
Regression Discontinuity Design - Post Intervention
52
6570
7580
Out
com
e
20 30 40 50 60 70 80Score
Regression Discontinuity Design - Post Intervention
IMPACT
53
Oportunidades assigned benefits based on a poverty index
WhereTreatment = 1 if score <=750Treatment = 0 if score >750
Case 5: Discontinuity Design
54
Fitte
d va
lues
puntaje estimado en focalizacion276 1294
153.578
379.224
2
Baseline – No treatment
0 1 ( )i i iy Treatment scoreβ β δ ε= + + +
Case 5: Discontinuity Design
55
Fitte
d va
lues
puntaje estimado en focalizacion276 1294
183.647
399.51
Treatment Period
Case 5: Discontinuity Design
Estimated Impact on CPC
** Significant at 1% level
Case 5 - Regression DiscontinuityMultivariate Linear Regression
30.58**(5.93)
56
Potential Disadvantages of RD
Local average treatment effects – not always generalizablePower: effect is estimated at the discontinuity, so we generally have fewer observations than in a randomized experiment with the same sample size Specification can be sensitive to functional form: make sure the relationship between the assignment variable and the outcome variable is correctly modeled, including:
Nonlinear RelationshipsInteractions
57
Advantages of RD for Evaluation
RD yields an unbiased estimate of treatment effect at the discontinuityCan many times take advantage of a known rule for assigning the benefit that are common in the designs of social policy
No need to “exclude” a group of eligible households/individuals from treatment
58
Measuring Impact
1) Causal InferenceCounterfactualsCounterfeit Counterfactuals:
Before and After (pre-post)Enrolled-not enrolled (apples and oranges)
2) IE Methods Toolbox:Randomized Controls Randomized Promotion (IV)Discontinuity Design (RDD)
Difference in Difference (Diff-in-diff)Matching (P-score matching)
59
Diff-in-Diff
Compare change in outcomes between treatments and non-treatment
Impact is the difference in the change in outcomes
Impact = (Yt1-Yt0) - (Yc1-Yc0)
60
TimeTreatment
Outcome
Treatment Group
Control Group
Average Treatment Effect
B
A
D
C
61
TimeTreatment
Outcome
Treatment Group
Control Group
EstimatedAverageTreatment Effect
Average Treatment Effect
62
Diff in diff
Fundamental assumption that trends (slopes) are the same in treatments and controlsNeed a minimum of three points in time to verify this and estimate treatment (two pre-intervention)
63
Case 6: Diff-in-Diff
Not Enrolled Enrolled t-statMean ∆CPC 8.26 35.92 10.31
Case 6 - Diff in Diff
Linear Regression Multivariate Linear Regression
Estimated Impact on CPC 27.66** 25.53**(2.68) (2.77)
** Significant at 1% level
Case 6 - Diff in Diff
64
Case Study
Case 1 - Before and After
Case 2 - Enrolled/Not
Enrolled
Case 3 - Randomization
Case 4 - IV (TOT)
Case 5 - Regression
Discontinuity
Case 6 - Diff in Diff
Multivariate Linear
RegressionMultivariate Linear
Regression
Multivariate Linear
Regression 2SLS
Multivariate Linear
Regression
Multivariate Linear
RegressionEstimated Impact on CPC 34.28** -4.15 29.79** 30.44** 30.58** 25.53**
(2.11) (4.05) (3.00) (3.07) (5.93) (2.77)** Significant at 1% level
65
Measuring Impact
1) Causal InferenceCounterfactualsCounterfeit Counterfactuals:
Before and After (pre-post)Enrolled-not enrolled (apples and oranges)
2) IE Methods Toolbox:Randomized Controls Randomized Promotion (IV)Discontinuity Design (RDD)Difference in Difference (Diff-in-diff)
Matching (P-score matching)
66
Matching
Pick up the ideal comparison that matches the treatment group from a larger survey.The matches are selected on the basis of similarities in observed characteristicsThis assumes no selection bias based on unobservable characteristics.
Source: Martin Ravallion
67
Propensity-Score Matching (PSM)
Controls: non- participants with same characteristics as participants
In practice, it is very hard. The entire vector of X observed characteristics could be huge.
Rosenbaum and Rubin: match on the basis of the propensity score=
P(Xi) = Pr (Di=1|X)Instead of aiming to ensure that the matched control for each participant has exactly the same value of X, same result can be achieved by matching on theprobability of participation.This assumes that participation is independent of outcomes given X.
68
Steps in Score Matching
1. Representative & highly comparables survey of non-participants and participants.
2. Pool the two samples and estimated a logit (or probit) model of program participation.
3. Restrict samples to assure common support(important source of bias in observational studies)
4. For each participant find a sample of non-participants that have similar propensity scores
5. Compare the outcome indicators. The difference is the estimate of the gain due to the program for that observation.
6. Calculate the mean of these individual gains to obtain the average overall gain.
69
Density of scores for participants
Density
0 1Propensity score
Region of common support
70
PSM vs an experiment
Pure experiment does not require the untestable assumption of independence conditional on observablesPSM requires large samples and good data
71
Lessons on Matching Methods
Typically used when neither randomization, RD or other quasi-experimental options are not possible (i.e. no baseline)
Be cautious of ex-post matchingMatching on endogenous variables
Matching helps control for OBSERVABLE heterogeneityMatching at baseline can be very useful:
Estimation:combine with other techniques (i.e. diff in diff)Know the assignment rule (match on this rule)
Sampling:selecting non-randomized evaluation samples
Need good quality dataCommon support can be a problem
72
P-score Quintiles
Xi T C t-score T C t-score T C t-score T C t-score T C t-scoreAge Head 68.04 67.45 -1.2 53.61 53.38 -0.51 44.16 44.68 1.34 37.67 38.2 1.72 32.48 32.14 -1.18Educ Head 1.54 1.97 3.13 2.39 2.69 1.67 3.25 3.26 -0.04 3.53 3.43 -0.98 2.98 3.12 1.96Age Spouse 55.95 55.05 -1.43 46.5 46.41 0.66 39.54 40.01 1.86 34.2 34.8 1.84 29.6 29.19 -1.44Educ Spouse 1.89 2.19 2.47 2.61 2.64 0.31 3.17 3.19 0.23 3.34 3.26 -0.78 2.37 2.72 1.99Ethnicity 0.16 0.11 -2.81 0.24 0.27 -1.73 0.3 0.32 1.04 0.14 0.13 -0.11 0.7 0.66 -2.3Female Head 0.19 0.21 0.92 0.42 0.16 -1.4 0.092 0.088 -0.35 0.35 0.32 -0.34 0.008 0.008 0.83
Quintile 4 Quintile 5Quintile 1 Quintile 2 Quintile 3
Case 7 - PROPENSITY SCORE: Pr(treatment=1)Variable Coef. Std. Err.
Age Head -0.0282433 0.0024553Educ Head -0.054722 0.0086369Age Spouse -0.0171695 0.0028683Educ Spouse -0.0643569 0.0093801Ethnicity 0.4166998 0.0397539Female Head -0.2260407 0.0714199_cons 1.6048 0.1013011
Case 7: P-Score Matching
73
Linear Regression Multivariate Linear Regression
Estimated Impact on CPC 1.16 7.06+(3.59) (3.65)
** Significant at 1% level, + Significant at 10% level
Case 7 - Matching
Case 7: P-Score Matching
74
Case Study: Results Summary
Case 1 - Before and After
Case 2 - Enrolled/Not
Enrolled
Case 3 - Randomization
Case 4 - IV (TOT)
Case 5 - Regression
Discontinuity
Case 6 - Diff in Diff
Case 7 - Matching
Multivariate Linear
RegressionMultivariate Linear
Regression
Multivariate Linear
Regression 2SLS
Multivariate Linear
Regression
Multivariate Linear
Regression
Multivariate Linear
RegressionEstimated Impact on CPC 34.28** -4.15 29.79** 30.44** 30.58** 25.53** 7.06+
(2.11) (4.05) (3.00) (3.07) (5.93) (2.77) (3.65)** Significant at 1% level
75
Methods SummaryDiscontinuity
DesignRandomized Promotion
IV
Risks
External Validity
Internal Validity
MatchingDiff-in-DiffRandomization
76
Measuring Impact
1) Causal InferenceCounterfactualsCounterfeit Counterfactuals:
Before and After (pre-post)Enrolled-not enrolled (apples and oranges)
2) IE Methods Toolbox:Randomized Controls Randomized Promotion (IV)Discontinuity Design (RDD)Difference in Difference (Diff-in-diff)Matching (P-score matching)
Combinations of the above
77
Remember…..
Objective of impact evaluation is to estimate the CAUSAL effect of a program on outcomes of interestIn designing the program we must understand the data generation process
behavioral process that generates the datahow benefits are assigned
Fit the best evaluation design to the operational context
78
Appendix 1:Two Stage Least Squares (2SLS)
Model with endogenous Treatment (T):
Stage 1: Regress endogenous variable on the IV (Z) and other exogenous regressors
Calculate predicted value for each observation: T hat
1 2y T xα β β ε= + + +
0 1 1T x Zδ δ θ τ= + + +
79
Appendix 1Two stage Least Squares (2SLS)
Stage 2: Regress outcome y on predicted variable (and other exogenous variables)
Need to correct Standard Errors (they are based on T hat rather than T)
In practice just use STATA - ivregIntuition: T has been “cleaned” of its correlation with ε.
^
1 2( )y T xα β β ε= + + +