PPA 502 – Program Evaluation Lecture 3c – Strategies for Impact Assessment

PPA 502 – Program Evaluation

Lecture 3c – Strategies for Impact Assessment

Introduction

The ultimate purpose of a social program is to ameliorate some social problem or improve some social condition. If the program theory is sound and the program plan well implemented, those social benefits are expected to follow. Rarely are those benefits assured, however. Practical and conceptual shortcomings combined with the intractable nature of many social problems all too easily undermine the effectiveness of social programs.

Introduction

A general principle applies: The more rigorous the research design, the more plausible the resulting estimate of intervention effects.

The design of impact evaluations faces two competing pressures:– Evaluations should be undertaken with sufficient rigor

that relatively firm conclusions can be reached.

– Practical considerations of time, money, cooperation, and protection of participants limit the design options and methodological procedures that can be employed.

Introduction

Evaluators assess the effects of social programs by:– Comparing information about outcomes for participants

and nonparticipants,

– Making repeated measurements on participants before and after intervention,

– Or other methods that attempt to achieve the equivalent of such comparisons.

The basic aim of impact assessment is to produce an estimate of the net effects of an intervention.

Introduction Impact assessment is relevant at many stages of

the process.– Pilot demonstrations to estimate whether a proposed

program will work.– Program design to test the most effective ways to

develop and integrate the various program elements.– Program initiation to test the efficacy of the program at

a limited number of sites.– Program modification to test the effects of the changes.– Program continuation to test for sunset legislation,

funding renewal, or program defense.

Key Concepts in Impact Assessment The experimental model.

– The optimal way to assess impact is a randomized field experiment.

• Random assignment• Treatment and control groups• Net outcome assessment.

Prerequisites for assessing impact.– Program’s objectives must be well-articulated to make it

possible to specify credible measures of the expected outcomes.

– The interventions must be sufficiently well-implemented that there is no question that critical elements have been delivered to appropriate targets.

Key Concepts in Impact Assessment Linking interventions to outcomes.

– Establishing impact essentially amounts to establishing causality.

– Most causal relationships in social science expressed as probabilities.

– Conditions limited causality• External conditions and causes.

• Biased selection.

• Other social programs with similar targets.

Key Concepts in Impact Assessment “Perfect” versus “good enough” impact

assessments.– Intervention and target may not allow perfect

design.– Time and resource constraints.– Importance often determines rigor.– Review design options to determine most

appropriate.

Key Concepts in Impact Assessment Gross versus net outcomes.

Effects

Design

factors)

gconfoundin s(extraneou

processesother

of Effects

effect)(net

onInterventi

of Effects

OutcomeGross

Extraneous Confounding Factors

Uncontrolled selection.– Preexisting differences between treatment and control

groups.

– Self-selection.

– Program location and access.

– Deselection processes (attrition bias).

Endogenous change.– Secular drift.

– Interfering events.

– Maturational trends.

Design Effects

Stochastic effects.– Significance (Type I error).

– Power (Type II error).

– The key is finding the proper balance between the two.

Measurement reliability.– Does the measure produce the same results repeatedly?

– Unreliability dilutes and obscures real differences.

– Reproducibility should not fall below 75 or 80%.

Design effects

Measurement validity.– Does the instrument measure what it is intended

to measure?– Criteria.

• Consistency with usage.

• Consistency with alternative measures.

• Internal consistency.

• Consequential predictability.

Design Effects

Choice of outcome measures.– A critical measurement problem in evaluations

is that of selecting the best measures for assessing outcomes.

• Conceptualization.• Reliability.• Feasibility.• Proxy.

The Hawthorne Effect.– Delivery affected by context.

Design Effects

Missing information.– Missing information is generally not randomly

distributed.– Often must be supplemented by alternative

survey items, unobtrusive measures, or estimates.

Design Effects

Sample design effects.– Must select an unbiased sample of the universe of

interest.• Select a relevant sensible universe.

• Design a means of selecting an unbiased sample (random).

• Implement sample design with fidelity.

Minimizing design effects.– Planning.

– Pretesting.

– Sampling.

Design Strategies for Isolating the Effects of Extraneous Factors Randomized controls. Regression-discontinuity controls. Matched construct controls. Statistically equated controls. Reflexive controls. Repeated measures reflexive controls. Time-series reflexive controls. Generic controls.

Design Strategies for Isolating the Effects of Extraneous Factors Full- versus partial-coverage programs.

– Full coverage means absence of control group. Must use reflexive controls.

A Catalog of Impact Assessment Designs

Research Design Intervention Assignment Types of Controls Used Data Collection

I. Designs for partial coverage programs

A. Randomized or true experiments Random assignment controlled by researcher

Experimental and control groups randomly selected

Minimum data needed are after-intervention measures

B. Quasi-experiments

1. Regression-discontinuity Nonrandom but fixed and none to researcher

Selected targets compared to unselected targets, holding selection constant

Typically consists of multiple before-and after-intervention outcome measures.

2. Matched controls Nonrandom and unknown Intervention group matched with controls selected by researcher

Typically consists of before- and after-intervention measures

3. Statistically equated controls Nonrandom and often nonuniform Exposed and unexposed targets compared by means of statistical controls

Before-and-after or after-only intervention outcome measures and control variables

4. Generic controls Nonrandom Exposed target compared with outcome measures available for general population

After-intervention outcome measures on targets plus publicly available norms of outcome levels in the general population.

II. Designs for full-coverage programs

A. Simple before-and-after studies Nonrandom and uniform Targets measured before and after intervention

Outcome measured on exposed targets before and after intervention.

B. Cross-section studies for nonuniform programs

Nonrandom and nonuniform Targets differentially exposed to intervention compared with statistical controls

After-intervention outcome measures and control variables

C. Panel studies: Several repeated measures for nonuniform programs

Nonrandom and nonuniform Targets measured before, during, and after intervention

Repeated measures of exposure to intervention and of outcome

D. Time series: Many repeated measures

Nonrandom and nonuniform Large aggregates cmpared before and after intervention

Many repeated before- and after-intervention outcome measures on large aggregates

Judgmental Approaches to Impact Assessment Connoisseurial impact assessments. Administrator impact assessments. Participant’s assessments. The use of judgmental assessments.

– Limited funds.– No preintervention measures.– Full-coverage, uniform programs.

Inference Validity Issues in Impact Assessment Reproducibility (can a researcher using the

same design in the same setting achieve the same results?).– Power of design.– Fidelity of implementation.– Appropriateness of statistical models.

Inference Validity Issues in Impact Assessment Generalizability (the applicability of the

findings to similar situations that were not studied.).– Unbiased sample.– Faithful representation of actual program.– Common settings.

Stress reproducibility first through several iterations, then focus on generalizability.

Pooling evaluations: meta-analysis.

Documents

PPA 502 – Program Evaluation Lecture 3c – Strategies for Impact Assessment