Using Randomized Evaluations to test Development Effectiveness Dean Karlan Kamilla Gumede Annie...

Preview:

Citation preview

Using Randomized Evaluations to test Development Effectiveness

Dean KarlanKamilla Gumede

Annie Duflo

3ie Conference on Perspectives on Evaluations,

Cairo, April 1, 20091

Why Evaluate?

• Three simple reasons

1.To motivate those with money to give more

2.To know where to spend limited resources

3.To know how to improve programs

2

Peter Singer

• Utilitarian• Would you save a child drowning in a lake if it would cost

you $100 in ruined clothing or a missed appointment?• Would you send $100 right now to an NGO in a poor

country to save a child?• Why are these questions not the same?

– Some say because “who really knows if my $100 can save a child? Maybe it will just get wasted.”

– This is a common excuse for inaction.

• Evaluation rebuts this.

3

Why Evaluate?

• Three simple reasons

1.To motivate those with money to give more

2.To know where to spend limited resources

3.To know how to improve programs

4

Key Themes

• Evaluation is an investment– Not a cost

• Context matters– Replication and theory are needed to make

reliable prescriptions

• Quantitative vs qualitative is a false debate– Confusion between survey methodology and

measurement of counterfactual

5

6

Why is More Evidence Needed?

• Knowing what to do (several ideas seem good)Example: Can community empowerment or external auditors best

reduce corruption in road construction?

• Sometimes conventional wisdom needs to be rethought.Example: Group liability is an essential and necessary aspect of

successful microfinance schemes. Consumer lending not beneficial, should focus on entrepreneurial credit.

• Can teach us how to improve designExample: Reminders to save. Marketing of rainfall insurance.

Providing computer games to improve math skills.

7

Different Types of Evaluation

(1) Process evaluation• Audit and monitoring• Did the intended policy actually happen?

• How many people reached, books distributed etc

(2) Impact evaluation• What effect (if any) did the policy have?

• How would individuals who did benefit from the program have fared in the absence of the program

• How would those who did not benefit have fared if they had been exposed to the program

8

Why is Measuring Impact so Hard?

• To know the impact of a program must be able to answer counterfactual:– How would individual have fared without the program– But can’t observed same individual with and without the program

• Need an adequate comparison group– Individuals who, except for the fact that they were not beneficiaries of the

program, are similar to those who received the program

• Common approaches:– Before and after– Cross section

• Programs done in particular place at particular time for a reason

• Even with more sophisticated approaches can’t control for unobservables– Study 1.9 million voters matched on 10 characteristics gave wrong policy

conclusion (Arceneaux, Gerber, Green, 2004)

• No simple way of determining when alternatives to randomization will give you the right answer

9

Randomized Evaluation

• Assign to treatment and control randomly

• By construction program beneficiaries are not more motivated, richer, more educated etc than non-beneficiaries

• Gives nice clean results—hard to manipulate or dispute

• Randomization can be incorporated in many different ways

• Must be planned ex-ante

• Can be done ethically (in many cases, is more ethical as it is fair, and avoids favoritism, nepotism, politicking, etc.)

• Can measure externalities or spill-overs

10

How to Introduce Randomness

1. Lottery - e.g. whether you get into the training program

2. Randomize order of phase-in of a program- e.g. order in which you clean up springs

3. Randomly encourage some more than others- e.g. offer saving commitment scheme to some bank

account holders

11

Constraints and Opportunities for Evaluation

• Not all programs can (or should) be evaluated using the randomized methodology

• Hard to evaluate monetary policy or freedom of the press

• Projects that are most straightforward to evaluate: – Serve specific beneficiaries (individuals or communities)– Limited budget or organizational constraints gives natural rational for phasing

• Opportunities to do a randomized evaluation– Beta testing– Pilot project– Expanding project into a new area over several years– Popular program is over subscribed– Test impact of national program if take-up is not yet 100 percent

• Can measure outcomes that might seem hard to measure– Empowerment of girls– How to reduce corruption

12

Post-Conflict Sierra Leone

A B D U L L A T I F J A M E E L P O V E R T Y A C T I O N L A B

13

Sierra Leone Background

• Devastating 10 year civil war.

• Conflict was intergenerational not ethnic reflecting lack of power and control among young as well as economic mismanagement.

• Many decision makers killed, schools and other infrastructure destroyed.

• Government of Sierra Leone with World Bank implementing ambitious decentralization program

• Piloting a Community Driven Development program which gives money to villages who decide their own priorities for investment. Includes processes designed to promote participation of excluded groups.

14

Sierra Leone Evaluation

• 250 villages in two districts– half get CDD pilot, half do not

• Outcomes: trust, participation

• Measures of trust– frequency of common actions that require trust of neighbor

• Measure of participation– follow common decisions,– observe role of youth, women, outsiders– how often do the speak, do people respond– outcomes more linked to preferences of youth or elders?

15

Women’s Empowerment in India

A B D U L L A T I F J A M E E L P O V E R T Y A C T I O N L A B

16

Women’s Empowerment

• Does having a woman leader make a difference?

• In 1992, India devolved power over expenditures on local public goods to Gram Panchyats. One third randomly chosen to be headed by women.

• Many believed policy had little impact as women leaders appeared to be deferring to their husbands.

• Collected data on women’s preferences and public works carried out in West Bengal and Rajasthan in reserved and unreserved villages.

• Women invested more in goods that were of higher priority to women in that state

• Perceptions surveys suggest that even when women are doing a better job than men (eg on water quality) perceptions are that they are doing a worse job. Potential justification for quotas of some kind.

17

Comparative Costs in Education

A B D U L L A T I F J A M E E L P O V E R T Y A C T I O N L A B

18

Calculating comparative costs

• MDGs for education seek to get 100% participation in primary school and gender equality in education participation more generally

• Many different approaches have been tried to increase enrolment

• Reducing the cost of school is clearly effective in increasing attendance:– Progresa– Providing free uniforms to school children– School meals for preschoolers

• Health can have an important impact—and be very cost effective

• Gender equity– do women teachers make a difference?– Scholarships for girls– General programs had important impact on girls

• Quantity and quality

19

20

Which proved true, which false?

1. Giving children in poor schools computer math games increases math test scores.

2. Group liability in microfinance produces higher repayment rates than when individuals get individual loans.

3. Monitoring by locals is more effective in combating corruption in local community projects than bringing in an outside auditor from the government auditing agency (Indonesia).

21

Impact Evaluations: A Public Good

• Evaluation results benefit everybody. Knowledge of what works and what does not is a global public good

• If it is difficult for donors to distinguish good and bad evaluations, promoters will prefer biased or imprecise methods, and select the best result for their program

• Sponsors understand the game that is being played, and rationally discount any estimate that is presented to them– Money flows to the most eloquent, empassioned and connected?

• If nobody knows what works and what does not, efficiency and support for development assistance falls

• Randomized evaluation can cut through this– Difficult to manipulate results– Simple standard for sponsors to recognize quality– Simple presentation of results, no fancy econometrics.

What We Have Learned So Far

Kamilla Gumede

3ie Conference on Perspectives on Evaluations,

Cairo, April 1, 200922

J-PAL has 181 completed and ongoing projects in 30 countries

Sectors we work in

25

General findings

• Aid effectiveness: Lot to learn from randomized experiments; impacts cannot be assumed.

• Poor, rational … and human: Randomized evaluations help shape economic theory.

• Important, cross-sectoral similarities.

26

Aid effectiveness

• Impacts cannot be assumed, should be rigorously tested.

• Lack intuition about sub-program cost effectiveness.

• Cheap, effective and scalable anti-poverty programs do exist.

27

Poor, rational … and human

• Procrastination affects the poor too (much)

• Demand for commitment devices

• Something special about zero

28

Cross-sectoral similarities

• Problem of provider absence

• Take-up (incentives, commitment needed)

• Sensitivity to positive prices

29

Conclusion

• Randomized experiments are feasible.

• New generation of studies go beyond impacts to look inside ‘black box’ and understand why projects work (or don’t) and for whom.

• Impacts cannot be assumed, must be tested.

• Many similarities across sectors and locations.

Working with implementing organizations

• Randomized controlled trials can be used to evaluate impact (of an intervention vs no intervention)

• But also to improve products and programs and understand what works best (one program variation vs another)

• It can be very easy to implement, and in some cases can be fairly cheap– e.g product innovation testing in microfinance

How can RCTs be useful for practitioners?

Vendors caught in a debt trap

Safe savings products

Financial literacy

Defaults due to health events

Health prevention?

Insurance?

Farmers often do not make productive investments

Innovative financial product design?

Training?

Independent insurance?

Bundled product?

Voluntary?

Commitment mechanisms?

Problem Solution Implementation

Our partners

• We work with different types of partners: • NGOs, Micro Finance institutions, Governments

and government bodies, International organizations

• Some examples: – Pratham, an Indian NGO focused on education in all

India– Green Bank, a Micro Finance Institution in the

Philippines– Care International in Ghana, Malawi and Uganda– The Rajasthan police in India– The government in Sierra Leone

How do we work with organizations

• First step Talk to the partner organization and jointly identify issues they deal with, and interventions they’d like to test

• Seva Mandir, an Indian NGO, was concerned about the health status in the district in which they work

– We conducted a baseline survey to identify main health issues in the districts, and relevant interventions to test

• Green Bank in the Philippines was concerned about a specific problem: why don’t people save more?

– We brainstormed about possible product designs before deciding to evaluate one

• In other cases, organizations are interested in evaluating the impact of a project that they have already designed

– Micro-credit programs

– Care International Village Loans and Savings Associations program

How do we work with organizations

• Second step: take the organization’s staff through the steps of implementing an RCT, and its implications for the organization

• Third step: identify best ways to introduce a randomized design without disturbing the organization’s objectives– In India, one of the main issues we identified in the baseline was a very

low rate of immunization– Designed a health camp + incentives program– There was not enough funds to do this program everywhere : the

project had to be implemented in a limited number of villages– In the Philippines, the organization was not sure whether the new

product would work, and wanted to pilot it first– Randomizing was a way to pilot the program on a small number, in

such a way that it could be rigorously tested

How do we work with organizations

• Fourth step: identify the right area and the right sample frame– Either in new areas, or among existing

beneficiaries, depending on the evaluation– Need to do the evaluation in a representative

area– Need to understand what population is

targeted by the program, so that the study is carried out among such populations

How do we work with organizations

• Fifth step: start the evaluation! – Conduct a baseline– Randomize (i.e randomly assign individuals, or

communities, to receive the new program or not)– Start the intervention in treatment groups– After some time, conduct a follow up survey

• Note that in some cases, what we are interested in can be found in the organization’s database– For e.g when we are interested in a new savings

product aiming to get people to save more

How do organizations use our results

• Pratham Read India– Have scale up the program to more than

20,000,000 children in India, will scale to 60,000,000

• Seva Mandir immunization program (in India)

• Flip chart program was stopped after it was proven not effective

Scaling up beyond those organizations

• Once we have findings about what works and what does not in different contexts, one of our goals is to encourage scaling up of those ideas

• Deworm the World– An NGO was created to scale up a deworming program

– Was evaluated in Kenya and proved very effective at keeping children in school

• Microfinance products: – Through practitioners’ manuals and technical assistants

– Starting such an exercise for text reminders to remind people to save

Recommended