IFPRI- RIMS Workshop

Nicholas Minot (IFPRI/Uganda)

Atsuko Toda (IFAD/Vietnam)

Nguyen Ngoc Ahn (DEPOCEN)

RIMS+ surveys:

A tool for project design and evaluation

Background on RIMS in Vietnam

Results and Information

Management System (RIMS) 3rd -level results are associated with

project impact on child malnutrition

and household living standards.

IFPRI project focus on the

household survey used to collect

third-level results

Background on RIMS

RIMS survey guidelines

Should be implemented for large, national IFAD projects

Should be done before, and at end of project

Sample size: 900 beneficiary households

Returning to same households not recommended

Concern about concentration of IFAD program efforts

Administrative complications of finding old households

Background on RIMS

RIMS questionnaire

Objective is to measure assets and

child nutrition

divided into three sections Section 1 – Household demographics

Section 2 – Housing, assets, and food

security

Section 3 – Anthropometry

Background on RIMS

Standardization of RIMS questionnaire

Ensures comparability across countries

Makes analysis relatively quick

Assures quality

But little flexibility in questionnaire design & analysis

Does not collect intermediary indicators

Changes in RIMS+

Overview of changes

Changes Rationale

1. Expanded questionnaire Collect additional information to diagnose farmer

constraints, improve design of interventions, and

measure impact on intermediate indicators

2. Use of control group Better measurement of impact of project by

controlling for broader changes in rural conditions

3. Additional training and

supervision

Improve quality of data

4. GPS to geo-reference

households

Facilitate return to same households (panel) and

better supervision of enumerators

5. Flexible questionnaire &

analysis

Address information needs of the IFAD project and

IFAD planning in general

Changes in RIMS+

1. Expanded questionnaire

RIMS+ RIMS New info in RIMS+

A. Member

characteristics

1. Household

demographics

+ ethnicity, school attendance, & reasons for not

attending

B. Housing 2. Survey questions + roof material, ownership status, location of

toilet

C. Assets 2. Survey questions + agricultural equipment

D. Land (no info) Farm size, ownership, irrigation, distance

E. Crop production (no info) Production, sales, & prices for 25 crops; cost of

6 inputs

F. Livestock &

fisheries

(no info) Herd size, sales, & costs for 12 types of animals,

use of vet services, type of feeding

Changes in RIMS+

1. Expanded questionnaire (continued)

RIMS+ RIMS New info in RIMS+

G. Extension &

market access

(no info) Access to extension, who uses, cooperatives,

details of sales, distance to markets

H. Non-farm

activities

(no info) Income and business expenses for 11 non-farm

income sources, gender roles

I. Food security 2. Survey questions + coping strategies and quality of diet

J. Credit & borrowing (no info) Access to credit, info on loans received

K. Socio-Economic

Development Plan

(no info) Knowledge of and participation in SEDP process

L. Risk &

vulnerability

(no info) Perceived risk of six natural disasters

M. Anthropometry 3. Anthropometry No new information

Changes in RIMS+

2. Use of control group

Control group is 300 households that are similar to

beneficiaries but not in project area

Useful to control for changes in rural areas due to other factors

Beneficiary

households

Control

households

Impact according

to current

before-after

comparison

Actual impact using

info from control

group

Example 1 Income rises

8%

Income rises 4%

due to economic

growth

Suggests that project

caused 8% increase in

income

Actually, only a 4%

increase due to project

Example 2 Income does

not change

Income falls 4%

due to drought

Suggests that project

had no effect

Actually, 4% increase in

income due to project

Changes in RIMS+

2. Use of control group (continued)

Time

Before project After project

Control group

Beneficiary householdsOutcome

indicator

Actual effect

of projectBefore-after

difference

is hypothetical

path of beneficiary

households without the

project, based on

growth in control group

Changes in RIMS+

3. Additional training and supervision

Because questionnaire is longer and somewhat more complicated, need for additional training & supervision of enumerators

IFPRI & DEPOCEN prepared detailed enumerator manual

DEPOCEN provided 5 days of training plus testing of questionnaire

DEPOCEN also provided additional supervision during data collection, particularly important in first week of data collection

Changes in RIMS+

4. Use of GPS units

GPS units are sometimes used in

RIMS surveys

Main purpose is to make it easier

to find household to interview in

later round of survey

Additional benefit of verifying

that enumerators have visited

households in village

Changes in RIMS+

5. Flexible questionnaire & analysis of results

Original RIMS is analyzed in a “black box” Advantage is analysis is fast, reliable, and comparable

But little opportunity to customize results for project

RIMS+ questionnaire can be customized for project

Type of IFAD project Possible customization of questionnaire

Farmer training &

extension

Access to extension, sources of info, perception of

usefulness, adoption of advice, yield

Linking farmers to

market

Travel time to markets, types of buyers, degree of

competition, prices received, share sold

Promotion of non-farm

enterprises

Number & composition of NFEs, profitability, training

needs, perceived constraints, factors affecting success

Improved access to

credit

Sources of credit, interest rates paid, use of credit, reasons

for use of informal credit, factors affecting repayment rate

Changes in RIMS+

5. Flexible questionnaire & analysis of results

RIMS+ analysis can be customized to address questions

relevant for project design & implementation Is access to extension services different for female-headed

farmers?

Can pepper be successfully grown by small-scale farmers

with limited resources?

Is targeting landless households more (or less) pro-poor

than targeting farmers with less than 0.5 hectares?

Is satisfaction with project services higher in one district

than in another?

Expanded questionnaire

More information and more complicated questionnaire

Requires additional training and supervision

Longer interview time (double at least)

Requires a new data entry program

Separate data entry in CSPro for 1200 questionnaires

At least 2 days in preparing CSpro entry data form

Another 2 days for training in data entry in CSPro in addition to

RIMS training.

Increased complexity in analysis and reporting

Cost and implementation issues

Use of control group

Increased workload with financial implication (additional 300 non-

project household)

Implementing survey in non-project area is more difficult due to

logistics, cooperation

Data entry in both RIMS and CSPro

RIMS software to enter RIMS core questions for 900 beneficiary households

Data entry in CSPro for full questionnaire for1200 household sample

Additional training/supervision

Project managers do not see immediate benefit


Use of GPS


Increased training time (1/2 day) and additional time at

household (10 minutes)

Not easy to use due to language barrier

Additional burden due to the fact that interviewers already

have to carry weight and scale


Component First-time costs Per survey costs

Expanded questionnaire in

data collection

Already carried out under

IFAD-IFPRI Partnership

Interview time is

approximately doubled

Use of control group No fixed cost Increases field costs by 50-

100%

Additional training &

supervision

Enumerator manual

prepared under

Partnership

Approximately US$ 10-

15k per survey

Use of GPS units Cost to purchase =

US$ 100 x 20 units =

US$ 2000

Modest - GPS units can be

shared across projects or

rented

Analysis of data Large initial cost of

preparing analysis

programs, already

undertaken by Partnership

For standard analysis,

negligible. For

customized analysis,

requires Stata skills

Cost estimates

Questions

Results of Vietnam RIMS+

Which crops are pro-poor?

How does crop commercialization vary across farmers?

Do female-headed farmers have equal access to modern

inputs?

How important is income from non-farm activities?

How to farmers perceive the risks of natural disasters?

Is food security threatened by crop commercialization?

How involved are farmers in the preparation of the Socio-

Economic Development plans?

Will raising farmer income improve child nutrition?

Which crops are pro-poor?


• Rice is grown by majority of the poor,

but fewer high-income households

• Maize, groundnut, red onion, bananas,

tea, and vegetables are grown by both

poor and non-poor

• Avocado, mango, durian, pepper,

sugarcane, coffee, and cashew are

grown disproportionately by high-

income farms

• This is not to say they can’t be grown

by poor farmers, but any untargeted

support to these crops will not be pro-

poor

Is input use less among female-headed households?


• Not much evidence that input use per hectare is lower

• But smaller farm sizes lead to smaller crop production and lower

income

What is the importance of non-farm income?


• Even the 20% of farms with the smallest area (less than 0.10

hectares) earns the bulk of their income from crop production

• 45% of smallest farms rent, sharecrop, borrow, or use illegally

other land

How do farmers perceive the risk of different natural

disasters?


• Perception of disaster risk varies by province

• Also, perception of likely losses is greater for poor households

Is food security threatened by commercialization?


• Commercialization is defined as the share of the value of crop

production that is sold

• Relationship holds even after controlling for per capita income

and farm size in regression analysis

Will raising farmer income improve child nutrition?


• Yes, but effect is weak

• Many other variables influence child nutrition: sanitation, health care,

education, child rearing practices, etc.

-50

5

Z-s

co

res

10 12 14 16 18Log of per capita income

Length/height-for-age Z-score Weight-for-length/height Z-score

lowess haz06 lnpcinc lowess whz06 lnpcinc

Summary & conclusions

RIMS+ surveys probably not suitable for all IFAD projects because of additional costs

Conditions under which it is most suitable: IFAD project design is flexible, can be revised

in light of new information from survey IFAD project focuses on a new topic or new

region, so there is a need for information There are gaps in knowledge about farm

household livelihoods and behavior relevant to project

IFAD project is relatively large, implying an adequate M&E budget

When is RIMS+ most suitable?

Additional issues

Size of control group

At the moment, 900 treatment to meet standard RIMS requirement and 300 control

But typically control group is similar size

It would reduce costs to develop a Core Module and additional modules that are selected depending on project (e.g. agricultural marketing, credit, extension)

RIMS+ would require additional capacity building for IFAD project staff

Project has prepared an enumerator manual and data entry programs and could also prepare an implementation guidelines if needed

Summary & conclusions

Page 28

Objective of Impact Evaluation

Measure the effect of the program on its beneficiaries (and eventually on its non-

beneficiaries) by answering the counterfactual question:

How would individuals who participated in a program have fared in the absence of the program?

How would those who were not exposed to the program have fared in the presence of the program?

Two main problems arise: confounding factors and selection biases.

Page 29

Comparing averages Individual-level measure of impact : what would be the outcome (e.g. farm incomes)

had he/she not participated to the program (in our case the treatment?

Compare the individual with the program, to the same individual without the program, at the same time ?

- can never observe both, missing data problem.

Instead: Average impact on given groups of individuals

Compare mean outcome in group of participants (Treatment group) to mean outcome in similar group of non-participants (Control group)

Average Treatment effect on the treated (ATT):

Page 30

Building a control group

Compare what is comparable.

Treatment” and “Control” groups must look the same if there was no program.

Generally, those individuals who benefit from the program initially differ from those

who don’t.

External selection: programs are explicitly targeted (Particular areas, Particular individuals).

Self selection: the decision to participate is voluntary.

Pb with comparing beneficiaries and non-beneficiaries: the difference can be attributed to

both the impact or the original differences.

SELECTION BIAS - when individuals or groups are selected or self select for

treatment on characteristics that may also affect their outcomes.

Page 31

Initial

PopulationSelection

Treatment Group

(receives procedure X)

Impact = Y Exp – Y Control

Quintile I

(Poorer)

Quintile II Quintile III Quintile IV QuintileV

(Richer)

Program selection does not lead to selection bias

(from Bernard 2006)

Control group

(does not receives procedure X)

Page 32

Initial

Population

Quintile I

(Poorer)

Quintile II Quintile III Quintile IV QuintileV

(Richer)

Control group

(does not receives procedure X)

Treatment Group

(receives procedure X)

Program selection leads to selection bias

Selection

Impact ≠ Y Exp – Y Control

Page 33

“Sign” of the selection bias (1)

Program targeted on “worse-off” households

Treatment Control

Observed difference is negative

Actual impact

Page 34

Treatment Control

Observed difference is very large

Actual impact

“Sign” of the selection bias (2)

Program targeted on “better-off” households

Impact evaluation for policy

decisions

Impact evaluations needed to-

curtailing inefficient programs,

to scaling up interventions

adjusting program benefits,

to selecting among various program alternatives.

The Mexican Progresa/Oportunidades evaluation became

influential because of

the innovative nature of the program

its impact evaluation provided credible and strong evidence

Page 35

Role of qualitative data Qualitative data-a key supplement to quantitative impact evaluations

providing complementary perspectives on program’s performance.

Employ mixed methods (Bamberger, Rao & Woolcock 2010).

Approaches include FGD, expert elicitation, key informant interviews

(Rao and Woolcock 2003).

Useful 1. Can use to develop hypotheses as to how and why the

program would work

2. Before quantitative IE results become available, qualitative work can

provide quick insights happenings in the program.

3. In the analysis stage, it can provide context and explanations for the

quantitative results

Page 36

Focusing on quantitative methods

A central feature of IE is use of longitudinal data to use

“difference-in-differences” or “double difference” methods.

Methods rely on baseline data collected before the project

implementation and follow-up data after it starts to develop

a “before/after” comparison.

Data collected from households receiving the program and

those that do not (“with the program” / “without the

program”).

Page 37

Double difference methods:

continued Why both “before/after” and “with/without” data are necessary ?

Suppose only collected data from beneficiaries.

Suppose between the baseline and follow-up, some adverse event occurs.

—the benefits of the program being more than offset by the damage from bad

event. These effects would show up in the difference over time in the

intervention group, in addition to the effects attributable to the program.

More generally, restricting the evaluation to only “before/after” comparisons

makes it impossible to separate program impacts from the influence of other

events that affect beneficiary households.

To guard against this add a second dimension to evaluation design that includes

data on households “with” and “without” the program.

Page 38

Illustration of double difference

Survey round

Intervention group

(Group I)

Control group

(Group C)Difference across groups

Follow-up I1 C1 I1 – C1

Baseline I0 C0 I0 – C0

Difference across time I1 – I0 C1 – C0

Double-difference

(I1 – C1) – (I0 – C0)

Page 39

Randomization With random program assignment all individuals-same chance of receiving the program.

With well done randomized design evaluation, beneficiaries and non-beneficiaries on

average, the same observed and, more important, unobserved characteristics (since they

are more difficult to control for).

In this way a credible basis for comparison is established, freed from selectivity concerns,

and the direction of causality is certain.

A further advantage to a randomized design is that program impact is easy to

calculate and easier to understand and explain.

Heckman and Smith (1995)-however, point

Randomization bias- the process of randomization itself leads to a different

beneficiary pool than would otherwise have been treated

substitution bias where non-beneficiaries obtain similar treatments from

different sources—a form of “contamination.”

Page 40

Matching Matching methods of program evaluation construct a comparison group by

“matching” treatment households to comparison group households based on

observable characteristics.

The impact is estimated as the average difference in the outcomes for each

treatment household from a weighted average of outcomes in each similar

comparison group household from the matched sample.

Matching methods differ in the selection of the matched comparison and in how

these weighted average differences in outcomes are constructed.

One popular approach is propensity score matching (PSM).

Page 41

Regression discontinuity

The regression discontinuity design (RDD)-method that can

be used for programs that have a continuous eligibility index

with a clearly defined cutoff score to determine eligibility.

To apply RDD, two main conditions are needed:

1. A continuous eligibility index.

2. A clearly defined cutoff score, that is, a point on the index

above or below which the population is classified as eligible

for the program.

Page 42

RDD- Continued

The regression discontinuity measures the difference in post-

intervention outcomes, such as incomes between the units

near the eligibility cutoff

The difference is estimated using regression based on sub-

sample around the cutoff point

Page 43

Encouragement design Encouragement design is useful when intervention cannot be randomly administered to

some and not others.

The method requires - a randomly-selected group of beneficiaries receive extra

encouragement to undertake the intervention.

Encouragement -additional information or incentives.

By randomizing encouragement and carefully tracking outcomes for those who

do and do not receive encouragement, it is possible to obtain reliable estimates

of encouragement and intervention itself

compare results for the randomly-selected encouraged group vs. results for the

randomly-selected not-encouraged group. This quantity of interest, known as

the “Intention-to-Treat” effect, or ITT, is the effect of the encouragement itself

Page 44

Encouragement design: continued Effect of the treatment obtained by adjusting the ITT by the amount of non-

compliance

LATE=ITT/Compliance rate

Compliance Rate = Fraction of Subjects that were treated in the treatment

group - Fraction of Subjects that were treated in the control group

With 100% compliance rate LATE = ITT - all assigned to the treatment take

the treatment and all those assigned to the control do not take the treatment.

The compliance rate can be thought of as the fraction of subjects that fall into

the sub-population of “compliers”, the group for whom the decision to take

treatment was directly affected by the assignment.

This is the group induced by the encouragement to take advantage of the

treatment.

Page 45

Finally on encouragement design Compliers-the group of people that actually stick to the experimental protocol-

take treatment if assigned to the treatment group and not if assigned to control.

For policy compliers are the only ones who are actually affected by the

encouragement.

Usually, the compliance rate < 1

LATE effect estimates the effect of treatment only for the sub-population of

compliers and it does not constitute the effect of the treatment for the whole

sample.

Special case when the control group can be excluded from taking the treatment,

the non-compliance can only occur in the treatment group and the LATE =ATT

In general, the compliance rate depends on the encouragement.

Page 46

Power calculation

Power

The ability of a study to detect an impact. Conducting a power

calculation is a crucial step in impact evaluation design,

Power calculation

A calculation of the sample required for the impact evaluation,

which depends on the minimum effect size and required level of

confidence.

Page 47

Power –continued We discuss the basic intuition behind power calculations by focusing on the

simplest case—an evaluation conducted using a RCT and assuming that

noncompliance is not an issue.

Power calculations indicate the minimum sample size needed to conduct IE.

Assess whether existing data sets are large enough for the purpose of

conducting an impact evaluation.

Avoid collecting too much information, which can be very costly.

Page 48

Large samples better resemble population (both

treatment and control) (World Bank 2008)

Page 49

Type 1 and Type 2 error

A type I error is made when an evaluation concludes that a

program has had an impact, when in reality it had no impact.

A type II error occurs when an evaluation concludes that the

program has had no impact, when in fact it has had an

impact.

the likelihood of a type I error can be set by a parameter

called the “confidence level.

Many factors affect the likelihood of committing a type II

error, but the sample size is crucial

Page 50

Power stuff continued If the average of 50,000 units treated is same as the average weight of 50,000

comparison units, then one probably can confidently conclude that the program

has had no impact.

By contrast, if a sample of two treatment children weigh on average the same as

a sample of two comparison children, it is harder to reach a reliable conclusion.

The power (or statistical power) of an impact evaluation is the probability that it

will detect a difference between the treatment and comparison groups, when in

fact one exists. An impact evaluation has a high power if there is a low risk of

not detecting real program impacts, that is, of committing a type II error.

Page 51

Power calculations: continued (World

Bank 2008)

Involves the following steps

Does the program create clusters?

What is the outcome indicator?

Is it required to compare program impacts between subgroups?

What is the minimum level of impact that would justify the

investment made in the intervention?

What is a reasonable level of power for the evaluation being

conducted?

6. What are the baseline mean and variance of the outcome indicators?

Page 52

Power calculations: continued Power calculations involve different steps, depending on whether the program randomly

assigns benefits among clusters or simply assigns benefits randomly among all units in a

population.

No clusters – take a random sample of population (entire)

If subgroups will need larger sample (for example both male and female)

Minimum level of impact below which the program will be treated as not successful?

For an evaluation to identify small effects in difference in mean outcomes sample will

required to be larger – minimum detectable effect should be chosen carefully

There can be different power levels – standard 80 percent i.e. find impact in 80 percent

of cases when one has occurred

Get mean and variance in baseline right- more variance in the baseline will require larger

sample to capture effect

Think of sensitivity to sample size to assumptions -lower expected impact, higher

variance in the outcome indicator, or a higher power level

Page 53

Brief blurb on power with clusters In the presence of clustering, guiding principle is that the number of clusters

matters more than the number of individuals within the clusters. A sufficient

number of clusters is required to test whether a program has had an impact by

comparing outcomes in samples of treatment and comparison units.

If district is cluster -2 districts versus 100 districts on average latter could give

similar treatment and comparison groups but can be costly

All steps 1-6 like before except

How variable is the outcome indicator within clusters?

In general, higher intra-cluster correlation in outcomes increases the number of

clusters required to achieve a given power level – gain less by adding one more

person from same village than from other village

Page 54

In this project

There are both clusters

Unclustered interventions

Page 55

Data & Analytics

IFPRI- RIMS Workshop