Monitoring and Evaluation: Evaluation Designs. Objectives of the Session By the end of this session, participants will be able to: Understand the purpose,

Monitoring and Evaluation: Evaluation Designs

Objectives of the Session

By the end of this session, participants will be able to:

• Understand the purpose, strengths, and shortcomings of different study designs

• Distinguish between study designs that enable us to causally link program activities to observed changes and study designs that do not

• Link evaluation designs to the types of decisions that need to be made

Causality Requirements

• A precedes B.

• B is present only when A is present.

• We can rule out all other possible causes of B.

The Basic Experimental Principle

• The intervention is the only difference between two groups

• This is achieved by random assignment

Class Activity

Can you name situations in which random assignment can be used in evaluation?

An Experimental Design

Experimental group

Control group

O1 X O2

O3 O4

RA

An Experimental Design-Cont’d.

• In this design, there are two groups, an experimental group and a control group. Both have been randomly selected and both complete the pre-test. Only the experimental group gets the intervention, then both groups complete the post-test.

An Experimental Design-Cont’d.

Steps1. Identify people or groups, some of which could get the

intervention.2. Pre-test everyone.3. Randomly assign some of the participants to either the

control group or the experimental group.4. Deliver the intervention to the experimental group. The

control group may receive an alternative intervention or nothing at all.

5. Post-test both groups with the same instrument under the same conditions.

Factors that May Lead Us to Make Invalid Conclusions

• Dropout: There may be loss to follow-up.

• Instrumentation effects: Occur when a questionnaire is changed between pre-test and post-test.

• Testing effects: Occur because study participants remember questions that were asked of them at pre-test and perform better at post-test because they are familiar with the questions.

A Second Experimental Design

Experimental group

Control group

X O2

O4

RA

A Second Experimental Design-Cont’d

• In this design, experimental and control groups are formed; however, there is no pre-test. Instead, the experimental group gets the intervention and then both groups are measured at the end of the program.

A Non-Experimental Design

Experimental group

O1 X O2

Time

A Non-Experimental Design-Cont’d

• In this method of evaluation, only people who are participating in the program get the pre- and post-test.

Steps

1. Pre-test everyone in the program.

2. Deliver the intervention.

3. Post-test the same individuals.

This design does not provide any information about what kinds of results might have occurred without the program and is the weakest in terms of scientific rigor.

Another Factor that May Lead to Invalid Conclusions

• History effects: These occur when extraneous events (events that occur outside the study) influence study-measured outcomes.

A Second Non-Experimental Design

Experimental group

O1 O2 O3 X O4 O5 O6

Time

A Second Non-Experimental Design-Cont’d

• For this design, a survey is administered multiple times - before, during, and after a program

A Second Non-Experimental Design-Cont’d

Steps1. Select a program-outcome measure that can be used

repeatedly.2. Decide who will be in the experimental group. Will it be

the same group of people measured many times, or will it be successive groups of different people?

3. Collect at least three measurements prior to the intervention that were made at regular intervals.

4. Check the implementation of the intervention.5. Continue to collect measurements, at least through the

duration of the program.

A Quasi-Experimental Design

Experimental group

O1 X O2

Time

O3 O4Comparison group

---------------------------------

A Quasi-Experimental Design-Cont’d.

• In this design, two groups which are similar, but which were not formed by random assignment, are measured both before and after one of the groups gets the program intervention.

A Quasi-Experimental Design-Cont’d.

Steps1. Identify people who will be getting the program.2. Identify people who are not getting the program,

but are other ways very similar.3. Pre-test both groups.4. Deliver the intervention to the experimental group.

The control group may receive an alternative intervention or nothing at all.

5. Post-test both groups.

Threat to Validity

• Selection effects: Occur when people selected for a comparison group differ from the experimental group.

Summary Features of Different Study Designs

True experiment Quasi-experiment Non-experimentalPartial coverage/ new programs

Control group

Strongest design

Most expensive

Partial coverage/ new programs

Comparison group

Weaker than experimental design

Moderately expensive

Full coverage programs

--

Weakest design

Least expensive

Summary Features of Different Study Designs-Cont’d.

I. Non-experimental (One-Group, Post-Only)

II. Non-experimental (One-Group, Pre- and Post-Program)

IMPLEMENT PROGRAM ASSESS TARGET GROUP AFTER PROGRAM

ASSESS TARGET GROUP BEFORE PROGRAM

IMPLEMENT PROGRAMASSESS TARGET GROUP AFTER PROGRAM

Summary Features of Different Study Designs-ctd

III. Experimental (Pre- and Post-Program with Control Group)

RANDOMLY ASSIGN PEOPLE FROM THE SAME TARGET POPULATION TO GROUP A OR GROUP B

TARGET GROUP A

CONTROL GROUP B

ASSESS TARGET GROUP A

IMPLEMENT PROGRAM WITH TARGET GROUP A



ASSESS CONTROL GROUP B

Summary Features of Different Study Designs

IV. Quasi-Experimental (Pre- and Post-Program with Non-

Randomized Comparison Group)

ASSESS TARGER GROUP BEFORE PROGRAM

IMPLEMENT PROGRAMASSESS TARGET GROUP AFTER PROGRAM

ASSESS COMPARISON GROUP BEFORE PROGRAM

ASSESS COMPARISON GROUP AFTER PROGRAM

Summary Features of Different Study Designs-Cont’d.

• The different designs vary in their capacity to produce information that allows you to link program outcomes to program activities.

• The more confident you want to be about making these connections, the more rigorous the design and costly the evaluation.

• Your evaluator will help determine which design will maximize your program’s resources and answer your team’s evaluation questions with the greatest degree of certainty.

Important Issues to Consider When Choosing

a Design

• Complex evaluation designs are most costly, but allow for greater confidence in a study’s findings.

• Complex evaluation designs are more difficult to implement, and so require higher levels of expertise in research methods and analysis.

• Be prepared to encounter stakeholder resistance to the use of comparison or control groups, such as a parent wondering why his or her child will not receive a potentially beneficial intervention

• No evaluation design is immune to threats to its validity; there is a long list of possible complications associated with any evaluation study. However, your evaluator will help you maximize the quality of your evaluation study.

Exercise

• A maternity hospital wishes to determine if the offer of post-partum family-planning methods will increase contraceptive use among women who deliver at the hospital.

• What study design would you recommend to test the hypothesis that women who are offered postpartum family-planning services are more likely to use family planning than women are not offered services?

Exercise

• You have been asked to evaluate the impact of a national mass-media AIDS-prevention campaign on condom use.

• What study design would you choose and why?

Linking Evaluation Design to Decision-Making

Deciding Upon An Appropriate Evaluation Design

• Indicators: What do you want to measure?– Provision– Utilization– Coverage– Impact

• Type of inference: How sure to you want to be?– Adequacy– Plausibility– Probability

• Other factors

Source: Habicht, Victora, and Vaughan (1999)

Clarification of Terms

Types of evaluation

Performance or process evaluation

Provision

Are the services available?

Are they accessible?

Is their quality adequate?

Utilization Are the services being used?

Coverage Is the target population being reached?

Impact evaluation

Impact Were there improvements in disease patterns or health-related behaviors?

Clarification of TermsAdequacy assessment

•Did the expected changes occur?•Are objectives being met?•Are activities were performed as planned?•May or may not require before/after comparison. Does not require controls

Plausibility assessment

•Did the program seem to have an effect to an intervention above and beyond other external influences?•Requires before-and-after comparison with controls and treatment of confounding factors.

Probability assessment

•Did the program have an effect (P < x%)?•Determines the statistical probability that the intervention caused the effect.•Requires before/after comparison with randomized control.

Adequacy Assessment

• Adequacy studies only describe if a condition is met or not– Typically addresses provision, utilization or coverage

aspects. No need for control, pre/post data in such cases

• Hypothesis tested: Are expected levels achieved?

– Can also answer questions of impact (magnitude of change) provided pre/post data is available

• Hypothesis tested: Difference is equal or greater than expected

Features of Adequacy Assessment

• Simplest (and cheapest) of evaluation models, as it does not try to control for external effects. Data are needed only for outcomes.

• If only input or output results are needed, then the lack of controls is not a problem.

• When measuring impact, however, it is not possible to infer that the change is due to the program due to lack of controls.

• Also, if there is no change, it will not be possible to say whether the lack of change is due to program inefficiency, or if the program has impeded a further deterioration.

Class ActivityFor each of the following outcomes of interest, provide indicators that would be useful in the evaluation of a program for control of diarrheal diseases aimed at young children with emphasis on the promotion of oral rehydration salts (ORS):

- Provision: Are the services available?Are services accessible?Is their quality adequate?

- Utilization: Are the services being used?

- Coverage: Is target population being reached?

- Impact: Were there improvements indisease patterns or health behaviors?

Adequacy Assessment Inferences

• Are objectives being met?– Compares program performance with previously-established

adequacy criteria, e.g. 80% ORT-use rate– No control group– 2+ measurements to assess adequacy of change over time

• Provision, utilization, coverage– Are activities being performed as planned?

• Impact– Are observed changes in health or behavior of expected

direction and magnitude?

• Cross-sectional or longitudinal

Source: Habicht, Victora and Vaughan (1999)

Class Activity

• What are the advantages of adequacy evaluations?

• What are the limitations of adequacy evaluations?

• If an adequacy evaluation shows a lack of change in indicators, how can this be interpreted?

• Which of the study designs discussed earlier can be used for adequacy evaluations?

Plausibility Assessment Inferences (1)• Program appears to have effect above and beyond impact

of non-program influences• Includes control group

– Historical control group• Compares changes in community before & after program and attempts to

rule out external factors• Same target population

– Internal control group• Compares groups/individuals with different intensities of exposure to

program (dose-response)• Compares previous exposure to program between individuals with and

without the disease (case-control)

– External control group• Compares communities/geographic areas with and without the program• Population that were never targeted by the intervention, but who share

key characteristics with the beneficiaries


Plausibility Assessment Inferences (2)

• Provision, utilization, coverage– Intervention group appears to have better performance

than control– Cross-sectional, longitudinal, longitudinal-control

• Impact– Changes in health/behavior appear to be more

beneficial in intervention than control group– Cross-sectional, longitudinal, longitudinal-control, case-

control


Controls and Confounding Factors

• For all types of controls, the groups being compared should be similar in all respect except their exposure to the intervention

• That is almost never possible, however. There is always one factor that influences one group more than another (confounding factor). E.g., mortality due to diarrhea may be due to better access to drinking water, not to the ORS program.

• To eliminate this problem, confounding must be measured and statistically treated, either via matching, standardization, or multivariate analysis.

Probability Assessment Inferences

• There is only a small probability that the differences between program and control areas were due to chance (P < .05)

• Requires control group• Requires randomization• Often not feasible for assessing program effectiveness

– Randomization needed before program starts– Political factors– Scale-up– Inability to generalize results– Known efficacy of intervention


Summary

Assessment Objective What it says Data needs

Adequacy:(assessment of change in outcome)

Assess whether impact was reached

Indicates whether resources were well spent or not

Outcome data collected among beneficiaries

Plausibility:(before/after comparison controlling for confounding factors)

Understand what affects the outcomes

Helps understand the determinants of success/failure of program

Outcome data plus confounders collected among beneficiaries and controls

Probability:(Causal analysis of before/after differences)

Determine the causal effect of one intervention on the outcome

Establishes precise causation between action and effect

Outcome data collected among beneficiaries and control

Discuss with Decision-Makers Before Choosing Evaluation Design

Possible Areas of Concern to Different Decision-Makers

Type of evaluation Provision Utilization Coverage Impact

Adequacy Health center manager

International Agencies

District health managers

International Agencies

Plausibility International Agencies

Donor agencies

Scientists

Probability Donor Agencies & Scientists


Evaluation Flow from Simpler to More Complex Designs

Type of Evaluation Provision Utilization Coverage Impact

Adequacy 1st 2nd 3rd 4th (b)

Plausibility 4th (a) 5th

Probability


Key Issues to Discuss with Decision Makers Before Choosing a Design

• Is there a need for collecting new data? If so, at what level?

• Does design include intervention-control or a before-after comparison?

• How rare is the event to be measured?

• How small is the difference to be detected?

• How complex will the data analysis be?

• How much will alternative designs cost?


References• Adamchak S et al. (2000). A Guide to Monitoring and

Evaluating Adolescent Reproductive Health Programs. Focus on Young Adults, Tool Series 5. Washington, D.C.: Focus on Young Adults.

• Fisher A et al. (2002). Designing HIV/AIDS Intervention Studies. An Operations Research Handbook. New York: The Population Council.

• Habicht JP et al. (1999). Evaluation Designs for Adequacy, Plausibility, and Probability of Public Health Programme Performance and Impact. International Journal of Epidemiology, 28: 10-18.

• Rossi P et al. (1999). Evaluation. A Systematic Approach. Thousand Oaks: Sage Publications.

Documents

Monitoring and Evaluation: Evaluation Designs. Objectives of the Session By the end of this session, participants will be able to: Understand the purpose,