60
Chris Nicoletti Activity #267: Analysing the socio-economic impact of the Water Hibah on beneficiary households and communities (Stage 1) Impact Evaluation Training Curriculum Session 1 April 16, 2013

Session 1 general overview

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Session 1   general overview

Chris Nicoletti

Activity #267: Analysing the socio-economic impact of the Water Hibah on beneficiary households and communities (Stage 1)

Impact Evaluation Training Curriculum Session 1 April 16, 2013

Page 2: Session 1   general overview

This material constitutes supporting material for the "Impact Evaluation in Practice" book. This additional material is made freely but please acknowledge its use as follows: Gertler, P. J.; Martinez, S., Premand, P., Rawlings, L. B. and Christel M. J. Vermeersch, 2010, Impact Evaluation in Practice: Ancillary

Material, The World Bank, Washington DC (www.worldbank.org/ieinpractice). The content of this presentation reflects the views of the authors and not necessarily those of the World Bank.

MEASURING IMPACT Impact Evaluation Methods for Policy

Makers

Page 3: Session 1   general overview

3

• My name is Chris Nicoletti • From NORC • Senior Impact Evaluation Analyst. • Worked in Zambia, Ghana, Cape Verde, Philippines,

Indonesia, Colombia, Burkina Faso, etc. • Live in Colorado

– I like to ski, hike, climb, bike, etc. – Married and do not have any children

• What is your name? • Let’s go around the room and do introductions…

Introduction…

Impact Evaluation Training Curriculum - Activity 267

Page 4: Session 1   general overview

4

Tuesday - Session 1

INTRODUCTION AND OVERVIEW

1) Introduction

2) Why is evaluation valuable?

3) What makes a good evaluation?

4) How to implement an evaluation?

Wednesday - Session 2

EVALUATION DESIGN

5) Causal Inference

6) Choosing your IE method/design

7) Impact Evaluation Toolbox

Thursday - Session 3

SAMPLE DESIGN AND DATA COLLECTION

9) Sample Designs

10) Types of Error and Biases

11) Data Collection Plans

12) Data Collection Management

Friday - Session 4

INDICATORS & QUESTIONNAIRE DESIGN

1) Results chain/logic models

2) SMART indicators

3) Questionnaire Design

Outline: topics being covered

Page 5: Session 1   general overview

5

Today, we will answer these questions…

Impact Evaluation Training Curriculum - Activity 267

Why is evaluation valuable?

What makes a good impact evaluation?

How to implement an impact evaluation?

1

2

3

Why is evaluation valuable?

What makes a good impact evaluation?

1

2

3

Page 6: Session 1   general overview

6 6

Today, we will answer these questions…

Impact Evaluation Training Curriculum - Activity 267

Why is evaluation valuable?

What makes a good impact evaluation?

1

2

3 How to implement an impact evaluation?

Page 7: Session 1   general overview

7 7

Why Evaluate?

Impact Evaluation Training Curriculum - Activity 267

Need evidence on what works

Information key to sustainability

Improve program/policy implementation

1

2

3

Limited budget and bad policies could hurt

Design (eligibility, benefits) Operations (efficiency & targeting)

Budget negotiations Informing beliefs and the press Results agenda and Aid effectiveness

Page 8: Session 1   general overview

8

Results-Based Management is a global trend

Establishing links between monitoring and evaluation, policy formulation, and budgets

Managers are judged by their programs’ performance, not their control of inputs: A shift in focus from inputs to outcomes.

Critical to effective public sector management

What is new about results?

Impact Evaluation Training Curriculum - Activity 267

Page 9: Session 1   general overview

9

Monitoring vs. Evaluation

Monitoring Evaluation Frequency Regular, Continuous Periodic

Coverage All programs Selected program, aspects

Data Universal Sample based

Depth of Information

Tracks implementation, looks at WHAT

Tailored, often to performance and impact/ WHY

Cost Cost spread out Can be high

Utility Continuous program improvement, management Major program decisions

Impact Evaluation Training Curriculum - Activity 267

Page 10: Session 1   general overview

10

Monitoring

A continuous process of collecting and analyzing information, to compare how well a project, program or policy is

performing against expected results, and

to inform implementation and program management.

Impact Evaluation Training Curriculum - Activity 267

Page 11: Session 1   general overview

11 11

Impact Evaluation Answers

Impact Evaluation Training Curriculum - Activity 267

What was the effect of the program on outcomes?

How much better off are the beneficiaries because of the program/policy?

How would outcomes change if the program design changes?

Is the program cost-effective?

Page 12: Session 1   general overview

12

Evaluation

A systematic, objective assessment of an on-going or completed project, program, or policy, its design, implementation and/or results, to determine the relevance and fulfillment of objectives,

development efficiency, effectiveness, impact and sustainability, and

to generate lessons learned to inform the decision making process,

tailored to key questions.

Impact Evaluation Training Curriculum - Activity 267

Page 13: Session 1   general overview

13

Impact Evaluation

An assessment of the causal effect of a project , program or policy on beneficiaries. Uses a counterfactual… to estimate what the state of the beneficiaries would have

been in the absence of the program (the control or comparison group), compared to the observed state of beneficiaries (the treatment group), and

to determine intermediate or final outcomes attributable to the intervention.

Impact Evaluation Training Curriculum - Activity 267

Page 14: Session 1   general overview

14 14

Impact Evaluation Answers

Impact Evaluation Training Curriculum - Activity 267

What is effect of a household (hh) water connection on hh water expenditure?

Does contracting out primary health care lead to an increase in access?

Does replacing dirt floors with cement reduce parasites & improve child health?

Do improved roads increase access to labor markets & raise income?

Page 15: Session 1   general overview

15 15

Answer these questions

Impact Evaluation Training Curriculum - Activity 267

Why is evaluation valuable?

How to implement an impact evaluation?

What makes a good impact evaluation?

1

2

3

Page 16: Session 1   general overview

16 16

How to asses impact

Impact Evaluation Training Curriculum - Activity 267

What is beneficiary’s test score with program compared to without program?

Compare same individual with & without programs at same point in time

Formally, program impact is: α = (Y | P=1) - (Y | P=0)

e.g. How much does an education program improve test scores (learning)?

Page 17: Session 1   general overview

17 17

Solving the evaluation problem

Impact Evaluation Training Curriculum - Activity 267

Estimated impact is difference between treated observation and counterfactual.

Counterfactual: what would have happened without the program.

Need to estimate counterfactual.

Never observe same individual with and without program at same point in time.

Counterfactual is key to impact evaluation.

Page 18: Session 1   general overview

18 18

Counterfactual Criteria

Impact Evaluation Training Curriculum - Activity 267

Treated & Counterfactual (1) Have identical characteristics, (2) Except for benefiting from the intervention.

No other reason for differences in outcomes of treated and counterfactual.

Only reason for the difference in outcomes is due to the intervention.

Page 19: Session 1   general overview

19 19

2 Counterfeit Counterfactuals

Impact Evaluation Training Curriculum - Activity 267

Before and After

Those not enrolled Those who choose not to

enroll in the program Those who were not offered

the program

Same individual before the treatment

Page 20: Session 1   general overview

20 20

1. Before and After: Examples

Impact Evaluation Training Curriculum - Activity 267

You do not take into consideration things that are changing over the intervention period.

Agricultural assistance program Financial assistance to purchase inputs. Compare rice yields before and after. Before is normal rainfall, but after is drought. Find fall in rice yield. Did the program fail? Could not separate (identify) effect of financial

assistance program from effect of rainfall.

Page 21: Session 1   general overview

21 21

2.Those not enrolled: Example 1

Impact Evaluation Training Curriculum - Activity 267

Compare employment & earning of those who sign up to those who did not

Job training program offered

Who signs up? Those who are most likely to benefit -i.e. those with more ability- would have higher earnings than non-participants without job training

Poor estimate of counterfactual

Page 22: Session 1   general overview

22

What’s wrong?

Impact Evaluation Training Curriculum - Activity 267

Selection bias: People choose to participate for specific reasons

1

2

3

Job Training: ability and earning Health Insurance: health status and medical

expenditures

Many times reasons are related to the outcome of interest

Cannot separately identify impact of the program from these other factors/reasons

Page 23: Session 1   general overview

23

Possible Solutions???

Impact Evaluation Training Curriculum - Activity 267

Need to guarantee comparability of treatment and control groups. ONLY remaining difference is intervention.

In this training we will consider: Experimental Designs Quasi-experiments (Regression Discontinuity, Double

differences) Non-experimental (or) Instrumental Variables.

EXPERIMENTAL DESIGN!!!

Page 24: Session 1   general overview

24 24

Answer these questions

Impact Evaluation Training Curriculum - Activity 267

Why is evaluation valuable?

How to implement an impact evaluation?

What makes a good impact evaluation?

1

2

3

Page 25: Session 1   general overview

25

When to use Impact Evaluation?

Evaluate impact when project is: Innovative Replicable/scalable Strategically relevant for reducing

poverty Evaluation will fill knowledge gap Substantial policy impact

Use evaluation within a program to test alternatives and improve programs

Impact Evaluation Training Curriculum - Activity 267

Page 26: Session 1   general overview

26

Choosing what to evaluate

Criteria Large budget share Affects many people Little existing evidence of impact for

target population (IndII Examples?)

No need to evaluate everything

Spend evaluation resources wisely

Page 27: Session 1   general overview

27

IE for ongoing program Development

Are there potential program adjustments that would benefit from a causal impact evaluation?

Implementing parties have specific questions they are concerned with.

Are there parts of a program that may not be working?

Page 28: Session 1   general overview

28

How to make evaluation impact policy focused

Example: Scale up pilot? (i.e., Water Hibah) Criteria: Need at least a X% average increase in beneficiary outcome over a given period

Address policy-relevant questions What policy questions need to be answered? What outcomes answer those questions? What indicators measures outcomes? How much of a change in the outcomes

would determine success?

Page 29: Session 1   general overview

29

Policy impact of evaluation

What is the policy purpose?

Provide evidence for pressing decisions

Design evaluation with policy makers

IndII Examples???

Page 30: Session 1   general overview

30

Decide what need to learn.

Experiment with alternatives.

Measure and inform. Adopt better alternatives

overtime.

Policy impact of evaluation

Change in incentives Rewards for changing programs. Rewards for generating knowledge. Separating job performance from knowledge generation.

Cultural shift From retrospective evaluation to prospective evaluation.

Look back and judge

Page 31: Session 1   general overview

31

• Choosing what to evaluate is something that should take time and careful consideration.

• Impact evaluation is more expensive and often requires third party consultation.

• The questions that require an IE to answer should be evident in your logic models and M&E plans from the beginning.

• Remember, IE is an assessment of the causal effect of a project, program or policy on beneficiaries.

Choice should come from existing logic models and M&E plans.

Page 32: Session 1   general overview

CHOICE #1

Retrospective Design or Prospective Design?

Page 33: Session 1   general overview

33

Retrospective Analysis

Retrospective Analysis is necessary when we have to work with a pre-assigned program (expanding an existing program) and existing data (baseline?)

Examples: Regression Discontinuity: Education Project (Ghana) Difference in Differences: RPI (Zambia) Instrumental variables: Piso firme (México)

Page 34: Session 1   general overview

34

• Use whatever is available – the data was not collected for the purposes at hand.

• The researcher gets to choose what variables to test, based on previous knowledge and theory.

• Subject to misspecification bias. • Theory is used instrumentally, as a way to provide a

structure justifying the identifying assumptions. • Less money on data collection (sometimes), more money

on analysis. • Does not really require “buy in” from implementers or field

staff.

Retrospective Designs

Page 35: Session 1   general overview

35

Prospective Analysis

In Prospective Analysis, the evaluation is designed in parallel with the assignment of the program, and the baseline data can be gathered.

Example: Progresa/Oportunidades (México) CDSG (Colombia)

Page 36: Session 1   general overview

36

• Intentionally collect data for the purposes of the impact evaluation.

• The variables collected in a prospective evaluation are collected because they were considered potential outcome variables.

• You should report on all of your outcome variables.

• The evaluation itself may be a form of treatment. • It is the experimental design that is instrumental - gives

more power both to test the theory and to challenge it. • More money on data collection, less money on analysis. • Requires “buy in” from implementers and field staff.

Prospective Designs

Page 37: Session 1   general overview

37

Prospective Designs

Use opportunities to generate good controls The majority of programs cannot assign benefits to all the entire eligible population

Not all eligible receive the program

Budget limitations: Eligible beneficiaries that receive benefits are potential treatments Eligible beneficiaries that do not receive benefits are potential

controls

Logistical limitations: Those that go first are potential treatments Those that go later are potential controls

Page 38: Session 1   general overview

38

• The decision to conduct an impact evaluation was made after the program began, and ex post control households were identified.

• We are now trying to use health data from Puskesmas to “fill in the gaps” of the baseline.

• This would be a retrospective design, because there was not an experimental design in place for the roll out of the program.

An example: Socio-econ impact of Endline Water Hibah

Page 39: Session 1   general overview

CHOICE #2

What type of Evaluation Design do you use?

Page 40: Session 1   general overview

40

Types of Designs

Prospective

Randomized Assignment

Randomized Promotion

Regression Discontinuity

Retrospective

Regression Discontinuity

Differences in Differences

Matching

Model-based / Instrumental Variable

Page 41: Session 1   general overview

41

How to choose?

Identification strategy depends on the implementation of the program

Evaluation strategy depends on the rules of operations

Page 42: Session 1   general overview

42

Who gets the program?

Eligibility criteria Are benefits targeted?

How are they targeted?

Can we rank eligible's priority?

Are measures good enough for fine rankings?

Roll out Equal chance to go first, second, third?

Page 43: Session 1   general overview

43

Rollout base on budget/administrative constraints

Ethical Considerations

Equally deserving beneficiaries deserve an equal chance of going first

Give everyone eligible an equal chance If rank based on some criteria, then criteria

should be quantitative and public

Equity

Transparent & accountable method

Do not delay benefits

Page 44: Session 1   general overview

44

The Method depends on the rules of operation

Targeted Universal

In Stages

Without cut-off o Randomization o Randomized

Rollout

With cut-off

o RD/DiD o Match/DiD

o RD/DiD o Match/DiD

Immediately

Without cut-off

o Randomized Promotion

o Randomized Promotion

With cut-off

o RD/DiD o Match/DiD

o Randomized Promotion

Page 45: Session 1   general overview

45

• Provision of services to villages and households under the Water Hibah is not determined by randomization, but by assessment and WTP.

• The dataset design exhibits some characteristics of a controlled experiment with connected and unconnected, but connection decision is not determined by randomization.

• Household matching is not an efficient method with the potential discrepancies we identified in the pilot test, and does not work very well with the sample design that was chosen.

• Village-level matching is not feasible because there are usually connected and unconnected in a single village (locality).

• The design we have chosen is: pretest-posttest-nonequivalent-control-group quasi-experimental design that will use regression-adjusted Difference-in-Difference impact estimators.

An example: Socio-econ impact of Endline Water Hibah

Page 46: Session 1   general overview

CHOICE #3

What type of Sample Design do you use?

Page 47: Session 1   general overview

47

Types of Designs Random Sampling

Multi-Stage Sampling Systematic Sampling Stratified Sampling

Convenience Sampling Snowball Sampling

Types of Sample Designs

Plus any combination of them!

Page 48: Session 1   general overview

48

• It is important to note that sample design can be extremely complex.

• A good summary is provided by Duflo (2006): • The power of the design is the probability that, for a given effect size and a given

statistical significance level, we will be able to reject the hypothesis of zero effect. Sample sizes, as well as other (evaluation & sample) design choices, will affect the power of an experiment.

• There are lots of things to consider, such as: • The impact estimator to be used; The test parameters (power level, significance

level); The minimum detectable effect; Characteristics of the sampled (target) population (population sizes for potential levels of sampling, means, standard deviations, intra-unit correlation coefficients (if multistage sampling is used)); and the sample design to be used for the sample survey

A good sample design requires expert knowledge

Page 49: Session 1   general overview

49

The basic process is this…

Level of Power

Level of Hypothesis

Tests

Correlations in outcomes

within groups (ICCS)

Mean and Variance of

outcomes & MDES

Page 50: Session 1   general overview

50

• Most times, you do not have all of this information.

• Use existing studies; other data sources; assumptions. • Working backwards to fit a certain power size. • Working backwards b/c expected level of impact that you want to test for.

• You are working backwards to fit a certain budget!

• Build in marginal costs for each stage of sampling. • Decide whether or not to pursue project.

The reality is…

Page 51: Session 1   general overview

51

• Outcome indicators: we have simplified versions of them in the baseline, but they have been modified for endline Use baseline dataset to calculate ICCs.

• Highest variation in outcome indicators was identified across villages (localities) primary sample unit is the village.

• The # of households in the village was found to improve the efficiency of the design stratify villages based on the # of households

• Marginal costs of village visit vs. household visit were included.

• The final sample design that was identified is referred to as: Stratified Multi-stage sampling with 250 villages and 7-14 households per experimental group = 7,000 hhs.

An example: Socio-econ impact of Endline Water Hibah

Page 52: Session 1   general overview

What can IndII Do?

Ensure your M&E systems are relevant and reliable…

Page 53: Session 1   general overview

53

Data: Coordinate IE & Monitoring Systems

Typical content Lists of beneficiaries Distribution of benefits Expenditures Outcomes Ongoing process evaluation

Projects/programs regularly collect data for management purposes

Information is needed for impact evaluation

Page 54: Session 1   general overview

54

Manage M&E for results

Tailor policy questions

Precise unbiased estimates

Use your resources wisely Better methods

Cheaper data

Timely feedback and program changes

Improve results on the ground

Prospective evaluations are easier and better with reliable M&E

Page 55: Session 1   general overview

55

Evaluation uses information to:

Verify who is beneficiary

When started

What benefits were actually delivered

Necessary condition for program to have an impact: Benefits need to

get to targeted beneficiaries.

Page 56: Session 1   general overview

56

Overall Messages

Evaluation design

Impact evaluation Is useful for: Validating program design Adjusting program structure Communicating to finance ministry & civil society

A good one requires estimating the counterfactual: What would have happened to beneficiaries if had not

received the program Need to know all reasons why beneficiaries got program &

others did not

Page 57: Session 1   general overview

57

Other messages

Good M&E is crucial not only to effective project management but can be a driver for reform

Monitoring and evaluation are separate, complementary functions, but both are key to results-based management

Have a good M&E plan before you roll out your project and use it to inform the journey!

Design the timing and content of M&E results to further evidence-based dialogue

Good monitoring systems & administrative data can improve IE.

Easiest to use prospective designs.

Stakeholder buy-in is very important

Page 58: Session 1   general overview

58

Tuesday - Session 1

INTRODUCTION AND OVERVIEW

1) Introduction

2) Why is evaluation valuable?

3) What makes a good evaluation?

4) How to implement an evaluation?

Wednesday - Session 2

EVALUATION DESIGN

5) Causal Inference

6) Choosing your IE method/design

7) Impact Evaluation Toolbox

Thursday - Session 3

SAMPLE DESIGN AND DATA COLLECTION

9) Sample Designs

10) Types of Error and Biases

11) Data Collection Plans

12) Data Collection Management

Friday - Session 4

INDICATORS & QUESTIONNAIRE DESIGN

1) Results chain/logic models

2) SMART indicators

3) Questionnaire Design

Outline: topics being covered

Page 59: Session 1   general overview

Thank You!

Page 60: Session 1   general overview

This material constitutes supporting material for the "Impact Evaluation in Practice" book. This additional material is made freely but please acknowledge its use as follows: Gertler, P. J.; Martinez, S., Premand, P., Rawlings, L. B. and Christel M. J. Vermeersch, 2010, Impact Evaluation in Practice: Ancillary

Material, The World Bank, Washington DC (www.worldbank.org/ieinpractice). The content of this presentation reflects the views of the authors and not necessarily those of the World Bank.

MEASURING IMPACT Impact Evaluation Methods for Policy

Makers