Session 1 general overview

Chris Nicoletti

Activity #267: Analysing the socio-economic impact of the Water Hibah on beneficiary households and communities (Stage 1)

Impact Evaluation Training Curriculum Session 1 April 16, 2013

This material constitutes supporting material for the "Impact Evaluation in Practice" book. This additional material is made freely but please acknowledge its use as follows: Gertler, P. J.; Martinez, S., Premand, P., Rawlings, L. B. and Christel M. J. Vermeersch, 2010, Impact Evaluation in Practice: Ancillary

Material, The World Bank, Washington DC (www.worldbank.org/ieinpractice). The content of this presentation reflects the views of the authors and not necessarily those of the World Bank.

MEASURING IMPACT Impact Evaluation Methods for Policy

Makers

3

• My name is Chris Nicoletti • From NORC • Senior Impact Evaluation Analyst. • Worked in Zambia, Ghana, Cape Verde, Philippines,

Indonesia, Colombia, Burkina Faso, etc. • Live in Colorado

– I like to ski, hike, climb, bike, etc. – Married and do not have any children

• What is your name? • Let’s go around the room and do introductions…

Introduction…

Impact Evaluation Training Curriculum - Activity 267

4

Tuesday - Session 1

INTRODUCTION AND OVERVIEW

1) Introduction

2) Why is evaluation valuable?

3) What makes a good evaluation?

4) How to implement an evaluation?

Wednesday - Session 2

EVALUATION DESIGN

5) Causal Inference

6) Choosing your IE method/design

7) Impact Evaluation Toolbox

Thursday - Session 3

SAMPLE DESIGN AND DATA COLLECTION

9) Sample Designs

10) Types of Error and Biases

11) Data Collection Plans

12) Data Collection Management

Friday - Session 4

INDICATORS & QUESTIONNAIRE DESIGN

1) Results chain/logic models

2) SMART indicators

3) Questionnaire Design

Outline: topics being covered

5

Today, we will answer these questions…


Why is evaluation valuable?

What makes a good impact evaluation?

How to implement an impact evaluation?

1

2

3



1

2

3

6 6

Today, we will answer these questions…




1

2

3 How to implement an impact evaluation?

7 7

Why Evaluate?


Need evidence on what works

Information key to sustainability

Improve program/policy implementation

1

2

3

Limited budget and bad policies could hurt

Design (eligibility, benefits) Operations (efficiency & targeting)

Budget negotiations Informing beliefs and the press Results agenda and Aid effectiveness

8

Results-Based Management is a global trend

Establishing links between monitoring and evaluation, policy formulation, and budgets

Managers are judged by their programs’ performance, not their control of inputs: A shift in focus from inputs to outcomes.

Critical to effective public sector management

What is new about results?


9

Monitoring vs. Evaluation

Monitoring Evaluation Frequency Regular, Continuous Periodic

Coverage All programs Selected program, aspects

Data Universal Sample based

Depth of Information

Tracks implementation, looks at WHAT

Tailored, often to performance and impact/ WHY

Cost Cost spread out Can be high

Utility Continuous program improvement, management Major program decisions


10

Monitoring

A continuous process of collecting and analyzing information, to compare how well a project, program or policy is

performing against expected results, and

to inform implementation and program management.


11 11

Impact Evaluation Answers


What was the effect of the program on outcomes?

How much better off are the beneficiaries because of the program/policy?

How would outcomes change if the program design changes?

Is the program cost-effective?

12

Evaluation

A systematic, objective assessment of an on-going or completed project, program, or policy, its design, implementation and/or results, to determine the relevance and fulfillment of objectives,

development efficiency, effectiveness, impact and sustainability, and

to generate lessons learned to inform the decision making process,

tailored to key questions.


13

Impact Evaluation

An assessment of the causal effect of a project , program or policy on beneficiaries. Uses a counterfactual… to estimate what the state of the beneficiaries would have

been in the absence of the program (the control or comparison group), compared to the observed state of beneficiaries (the treatment group), and

to determine intermediate or final outcomes attributable to the intervention.


14 14

Impact Evaluation Answers


What is effect of a household (hh) water connection on hh water expenditure?

Does contracting out primary health care lead to an increase in access?

Does replacing dirt floors with cement reduce parasites & improve child health?

Do improved roads increase access to labor markets & raise income?

15 15

Answer these questions





1

2

3

16 16

How to asses impact


What is beneficiary’s test score with program compared to without program?

Compare same individual with & without programs at same point in time

Formally, program impact is: α = (Y | P=1) - (Y | P=0)

e.g. How much does an education program improve test scores (learning)?

17 17

Solving the evaluation problem


Estimated impact is difference between treated observation and counterfactual.

Counterfactual: what would have happened without the program.

Need to estimate counterfactual.

Never observe same individual with and without program at same point in time.

Counterfactual is key to impact evaluation.

18 18

Counterfactual Criteria


Treated & Counterfactual (1) Have identical characteristics, (2) Except for benefiting from the intervention.

No other reason for differences in outcomes of treated and counterfactual.

Only reason for the difference in outcomes is due to the intervention.

19 19

2 Counterfeit Counterfactuals


Before and After

Those not enrolled Those who choose not to

enroll in the program Those who were not offered

the program

Same individual before the treatment

20 20

1. Before and After: Examples


You do not take into consideration things that are changing over the intervention period.

Agricultural assistance program Financial assistance to purchase inputs. Compare rice yields before and after. Before is normal rainfall, but after is drought. Find fall in rice yield. Did the program fail? Could not separate (identify) effect of financial

assistance program from effect of rainfall.

21 21

2.Those not enrolled: Example 1


Compare employment & earning of those who sign up to those who did not

Job training program offered

Who signs up? Those who are most likely to benefit -i.e. those with more ability- would have higher earnings than non-participants without job training

Poor estimate of counterfactual

22

What’s wrong?


Selection bias: People choose to participate for specific reasons

1

2

3

Job Training: ability and earning Health Insurance: health status and medical

expenditures

Many times reasons are related to the outcome of interest

Cannot separately identify impact of the program from these other factors/reasons

23

Possible Solutions???


Need to guarantee comparability of treatment and control groups. ONLY remaining difference is intervention.

In this training we will consider: Experimental Designs Quasi-experiments (Regression Discontinuity, Double

differences) Non-experimental (or) Instrumental Variables.

EXPERIMENTAL DESIGN!!!

24 24

Answer these questions





1

2

3

25

When to use Impact Evaluation?

Evaluate impact when project is: Innovative Replicable/scalable Strategically relevant for reducing

poverty Evaluation will fill knowledge gap Substantial policy impact

Use evaluation within a program to test alternatives and improve programs


26

Choosing what to evaluate

Criteria Large budget share Affects many people Little existing evidence of impact for

target population (IndII Examples?)

No need to evaluate everything

Spend evaluation resources wisely

27

IE for ongoing program Development

Are there potential program adjustments that would benefit from a causal impact evaluation?

Implementing parties have specific questions they are concerned with.

Are there parts of a program that may not be working?

28

How to make evaluation impact policy focused

Example: Scale up pilot? (i.e., Water Hibah) Criteria: Need at least a X% average increase in beneficiary outcome over a given period

Address policy-relevant questions What policy questions need to be answered? What outcomes answer those questions? What indicators measures outcomes? How much of a change in the outcomes

would determine success?

29

Policy impact of evaluation

What is the policy purpose?

Provide evidence for pressing decisions

Design evaluation with policy makers

IndII Examples???

30

Decide what need to learn.

Experiment with alternatives.

Measure and inform. Adopt better alternatives

overtime.

Policy impact of evaluation

Change in incentives Rewards for changing programs. Rewards for generating knowledge. Separating job performance from knowledge generation.

Cultural shift From retrospective evaluation to prospective evaluation.

Look back and judge

31

• Choosing what to evaluate is something that should take time and careful consideration.

• Impact evaluation is more expensive and often requires third party consultation.

• The questions that require an IE to answer should be evident in your logic models and M&E plans from the beginning.

• Remember, IE is an assessment of the causal effect of a project, program or policy on beneficiaries.

Choice should come from existing logic models and M&E plans.

CHOICE #1

Retrospective Design or Prospective Design?

33

Retrospective Analysis

Retrospective Analysis is necessary when we have to work with a pre-assigned program (expanding an existing program) and existing data (baseline?)

Examples: Regression Discontinuity: Education Project (Ghana) Difference in Differences: RPI (Zambia) Instrumental variables: Piso firme (México)

34

• Use whatever is available – the data was not collected for the purposes at hand.

• The researcher gets to choose what variables to test, based on previous knowledge and theory.

• Subject to misspecification bias. • Theory is used instrumentally, as a way to provide a

structure justifying the identifying assumptions. • Less money on data collection (sometimes), more money

on analysis. • Does not really require “buy in” from implementers or field

staff.

Retrospective Designs

35

Prospective Analysis

In Prospective Analysis, the evaluation is designed in parallel with the assignment of the program, and the baseline data can be gathered.

Example: Progresa/Oportunidades (México) CDSG (Colombia)

36

• Intentionally collect data for the purposes of the impact evaluation.

• The variables collected in a prospective evaluation are collected because they were considered potential outcome variables.

• You should report on all of your outcome variables.

• The evaluation itself may be a form of treatment. • It is the experimental design that is instrumental - gives

more power both to test the theory and to challenge it. • More money on data collection, less money on analysis. • Requires “buy in” from implementers and field staff.

Prospective Designs

37

Prospective Designs

Use opportunities to generate good controls The majority of programs cannot assign benefits to all the entire eligible population

Not all eligible receive the program

Budget limitations: Eligible beneficiaries that receive benefits are potential treatments Eligible beneficiaries that do not receive benefits are potential

controls

Logistical limitations: Those that go first are potential treatments Those that go later are potential controls

38

• The decision to conduct an impact evaluation was made after the program began, and ex post control households were identified.

• We are now trying to use health data from Puskesmas to “fill in the gaps” of the baseline.

• This would be a retrospective design, because there was not an experimental design in place for the roll out of the program.

An example: Socio-econ impact of Endline Water Hibah

CHOICE #2

What type of Evaluation Design do you use?

40

Types of Designs

Prospective

Randomized Assignment

Randomized Promotion

Regression Discontinuity

Retrospective

Regression Discontinuity

Differences in Differences

Matching

Model-based / Instrumental Variable

41

How to choose?

Identification strategy depends on the implementation of the program

Evaluation strategy depends on the rules of operations

42

Who gets the program?

Eligibility criteria Are benefits targeted?

How are they targeted?

Can we rank eligible's priority?

Are measures good enough for fine rankings?

Roll out Equal chance to go first, second, third?

43

Rollout base on budget/administrative constraints

Ethical Considerations

Equally deserving beneficiaries deserve an equal chance of going first

Give everyone eligible an equal chance If rank based on some criteria, then criteria

should be quantitative and public

Equity

Transparent & accountable method

Do not delay benefits

44

The Method depends on the rules of operation

Targeted Universal

In Stages

Without cut-off o Randomization o Randomized

Rollout

With cut-off

o RD/DiD o Match/DiD


Immediately

Without cut-off

o Randomized Promotion


With cut-off



45

• Provision of services to villages and households under the Water Hibah is not determined by randomization, but by assessment and WTP.

• The dataset design exhibits some characteristics of a controlled experiment with connected and unconnected, but connection decision is not determined by randomization.

• Household matching is not an efficient method with the potential discrepancies we identified in the pilot test, and does not work very well with the sample design that was chosen.

• Village-level matching is not feasible because there are usually connected and unconnected in a single village (locality).

• The design we have chosen is: pretest-posttest-nonequivalent-control-group quasi-experimental design that will use regression-adjusted Difference-in-Difference impact estimators.


CHOICE #3

What type of Sample Design do you use?

47

Types of Designs Random Sampling

Multi-Stage Sampling Systematic Sampling Stratified Sampling

Convenience Sampling Snowball Sampling

Types of Sample Designs

Plus any combination of them!

48

• It is important to note that sample design can be extremely complex.

• A good summary is provided by Duflo (2006): • The power of the design is the probability that, for a given effect size and a given

statistical significance level, we will be able to reject the hypothesis of zero effect. Sample sizes, as well as other (evaluation & sample) design choices, will affect the power of an experiment.

• There are lots of things to consider, such as: • The impact estimator to be used; The test parameters (power level, significance

level); The minimum detectable effect; Characteristics of the sampled (target) population (population sizes for potential levels of sampling, means, standard deviations, intra-unit correlation coefficients (if multistage sampling is used)); and the sample design to be used for the sample survey

A good sample design requires expert knowledge

49

The basic process is this…

Level of Power

Level of Hypothesis

Tests

Correlations in outcomes

within groups (ICCS)

Mean and Variance of

outcomes & MDES

50

• Most times, you do not have all of this information.

• Use existing studies; other data sources; assumptions. • Working backwards to fit a certain power size. • Working backwards b/c expected level of impact that you want to test for.

• You are working backwards to fit a certain budget!

• Build in marginal costs for each stage of sampling. • Decide whether or not to pursue project.

The reality is…

51

• Outcome indicators: we have simplified versions of them in the baseline, but they have been modified for endline Use baseline dataset to calculate ICCs.

• Highest variation in outcome indicators was identified across villages (localities) primary sample unit is the village.

• The # of households in the village was found to improve the efficiency of the design stratify villages based on the # of households

• Marginal costs of village visit vs. household visit were included.

• The final sample design that was identified is referred to as: Stratified Multi-stage sampling with 250 villages and 7-14 households per experimental group = 7,000 hhs.


What can IndII Do?

Ensure your M&E systems are relevant and reliable…

53

Data: Coordinate IE & Monitoring Systems

Typical content Lists of beneficiaries Distribution of benefits Expenditures Outcomes Ongoing process evaluation

Projects/programs regularly collect data for management purposes

Information is needed for impact evaluation

54

Manage M&E for results

Tailor policy questions

Precise unbiased estimates

Use your resources wisely Better methods

Cheaper data

Timely feedback and program changes

Improve results on the ground

Prospective evaluations are easier and better with reliable M&E

55

Evaluation uses information to:

Verify who is beneficiary

When started

What benefits were actually delivered

Necessary condition for program to have an impact: Benefits need to

get to targeted beneficiaries.

56

Overall Messages

Evaluation design

Impact evaluation Is useful for: Validating program design Adjusting program structure Communicating to finance ministry & civil society

A good one requires estimating the counterfactual: What would have happened to beneficiaries if had not

received the program Need to know all reasons why beneficiaries got program &

others did not

57

Other messages

Good M&E is crucial not only to effective project management but can be a driver for reform

Monitoring and evaluation are separate, complementary functions, but both are key to results-based management

Have a good M&E plan before you roll out your project and use it to inform the journey!

Design the timing and content of M&E results to further evidence-based dialogue

Good monitoring systems & administrative data can improve IE.

Easiest to use prospective designs.

Stakeholder buy-in is very important

58

Tuesday - Session 1

INTRODUCTION AND OVERVIEW

1) Introduction

2) Why is evaluation valuable?

3) What makes a good evaluation?

4) How to implement an evaluation?

Wednesday - Session 2

EVALUATION DESIGN

5) Causal Inference

6) Choosing your IE method/design

7) Impact Evaluation Toolbox

Thursday - Session 3

SAMPLE DESIGN AND DATA COLLECTION

9) Sample Designs

10) Types of Error and Biases

11) Data Collection Plans

12) Data Collection Management

Friday - Session 4

INDICATORS & QUESTIONNAIRE DESIGN

1) Results chain/logic models

2) SMART indicators

3) Questionnaire Design

Outline: topics being covered

Thank You!

This material constitutes supporting material for the "Impact Evaluation in Practice" book. This additional material is made freely but please acknowledge its use as follows: Gertler, P. J.; Martinez, S., Premand, P., Rawlings, L. B. and Christel M. J. Vermeersch, 2010, Impact Evaluation in Practice: Ancillary

Material, The World Bank, Washington DC (www.worldbank.org/ieinpractice). The content of this presentation reflects the views of the authors and not necessarily those of the World Bank.

MEASURING IMPACT Impact Evaluation Methods for Policy

Makers

Documents

Session 1 general overview