“Progress and Prospects for Place-based Randomized Trials” Rockefeller Foundation, Bellagio, Nov 11-15, 2002 Cluster Randomized Trials for Evaluating Problem

“Progress and Prospects for Place-based Randomized Trials” Rockefeller Foundation, Bellagio, Nov 11-15, 2002

Cluster Randomized Trials for Evaluating Problem Behavior

Prevention Programs

Adapted from Paper Presented at Conference on“Progress and Prospects for Place-based Randomized Trials”

Rockefeller Foundation Study and Conference CenterBellagio, Italy, November 11-15, 2002

With additional notes from Howard Bloom and Stephen Raudenbush


Phases of Research in Intervention Development

• Basic Research & Hypothesis Development• Component Development and Pilot Studies• Prototype Studies of Complete Programs• Efficacy Trials of Refined Programs

– Well controlled randomized trials

• Treatment Effectiveness Trials– Generalizability of effects under standardized delivery

• Implementation Effectiveness Trials– Effectiveness with real-world variations in

implementation

• Demonstration Studies– Implementation and evaluation in multiple whole

systems


Limited Use of Phases

• Few researchers have followed the phases sequence systematically– Botvin comes closest

• Different research skills and interests are needed at the different phases

• Higher stages are often too costly for standard research grant funding

• Holder and colleagues suggested modifications to incorporate interventions developed outside the research environment


How to measure impacts?

• Outcome = what happened in the presence of a program

• Counterfactual = what would have happened in the absence of the program

• Impact = Outcome minus Counterfactual


Why randomize?

• To eliminate selection bias, and

• Thereby produce internally valid impact estimates


Why randomize groups?

• Because of the program – Spillover effects– Service delivery efficiencies– Spatially concentrated problems

• Because of the evaluation– To reduce opposition to randomization– To separate the treatment and control

groups


On Rationale for Group-Based Trials

Key Assumptions in Clinical Trials:

* There is only one "version" of each treatment* A person's outcome does not depend on the treatment

assignment of other persons

Consider Education

* Teacher enacts, therefore defines, treatment* Classmates create a context

Parallels in Mental Health

* therapists* group settings


Why Random Assignment of Schools

• Intervention is delivered to intact classrooms• Random assignment of classes is subject to

contamination across classes within schools• Some programs include school-wide

components• Credible causal statements require group

equivalence at both the group and individual levels– On the outcome variable– On presumed mediating variables– On motivation or desire to change


What are some other examples?

• Gosnell (block groups)• PROGRESA (villages)• Sesame Street (block groups)• Jobs-Plus (public housing

developments)• Hot-spots (police patrol areas)


School-Based Randomized Prevention Research Studies• First Waterloo Study was the first with a sufficient

N of schools for randomization to be “real”– Some earlier studies were claimed as “randomized” with only

one or two schools per condition

• Other smoking prevention studies in early-mid 80’s– McAlister, Hansen/Evans, Murray/Luepker, Perry/Murray,

Biglan/Ary, Dielman

• Other substance abuse prevention studies in late 80’s and 90’s– Johnson/Hansen/Flay/Pentz, Botvin and colleagues

• Extended to sexual behavior, AIDs, & Violence in 90’s– Also McArthur Network initiated trials of Comer

• Character Education in 2002– New DoE funding


Issues re Randomization

• Ethical resistance to the idea of randomization is rare

• Control schools like to have a program too– Use usual Health Education (treatment as usual)– Offer special, but unrelated, program

• E.g., Aban Aya Health Enhancement Curriculum as control for Social Development (violence, sex, drug prevention) Curriculum

– Pay schools for access to collect data from students, parents and teachers -- $500-$2,000 per year

• Currently, many schools are too busy to be in intervention condition– Too many teaching and testing demands– Too many other special programs already


Approaches to Randomization

• Pure randomization from a large population• Obtain agreement first

– Even prior agreements can break down (Waterloo)

• Then randomize from matched sets defined by– Presumed predictors of the outcome (Graham et al., Aban Aya)– Actual predictors of the outcome (Hawaii Positive Action trial)– Pretest levels of the outcome (has anyone ever achieved this?)

• If schools refuse or drop out, replace from the same set– Only one school of 15 initial selections/assignments for Aban

Aya refused and was replaced

• Or if there are no more cases in the set, drop the set (and watch out for representativeness)– We had to drop multiple sets in the Hawaii Positive Action trial

because of refusal by schools assigned to the program


The Hawaii Positive Action® Research Project

• Funded by NIH/NIDA (Five year grant)

• Being done in Hawaii schools

• 10 Program schools and 10 matched Control schools

• Random assignment from matched pairs


Ethnic distribution of students in P, C & all Hawaii Elem. Schools

(PA Trial)

0.00

5.00

10.00

15.00

20.00

25.00

30.00

35.00

40.00

Ethnic Group

Per

cen

t

Program Schools

Control Schools

All of Hawaii


Characteristics of P and C schools (Hawaii PA Trial)

0

10

20

30

40

50

60

70

80

90

100

Enr

ol/1

0

Sta

bilit

y %

Lunc

h %

Bud

get

/day

Per

Cap

/100

0

Sp

Ed

%

LEP

%

Bel

owA

vera

ge

Aab

ove

Ave

rage

Bel

owA

vera

ge

Abo

veA

vera

ge

Abs

ente

es%

Dis

cplu

nary

Ref

eral

s

Dis

p. R

's p

er10

stu

dent

s

SAT Read: % SAT Math: %

School Demographics Achievement Behavior

Program Schools

Control Schools


Expense --> Small Ns?

• Yes, in many cases– Average efficacy trial (where research funds support

the intervention) has 4-8 schools per condition, and costs ~$500,000 per year.

– Effectiveness trials (where intervention is less costly) have 10-20 schools per condition for $500,000 per year.

• Limit costs by using more small schools– Raises questions about generalizability of results to

large schools

• Limit costs by limiting variability between schools– Also limits generalizability of results


The Nature of Control Groups

• Medical model suggests use of a placebo and double blinding, neither of which is possible for educational programs

• Subjects (both students and schools) should have equal expectations of what they will get from the program

• Few studies have used alternative programs to control for Hawthorne effect or student expectancies– TVSFP, Sussman, Aban Aya

• It is not possible to have pure controls in schools today – they all have multiple programs– Must monitor what other programs both sets of schools

have


Implications of no blinding• Requires careful monitoring of program delivery• Assessment of acceptance of, involvement in, and

expectations of program by target audience• Monitoring of what happens in control schools• Data collectors blinded to conditions

– Or at least to comparisons being made– This condition has rarely been met in prevention research

• Data collectors not known to students– To ensure greater confidentiality and more honest reports of

behavior

• Classroom teachers should not be present (or be unobtrusive) during student surveys

• Use unobtrusive measures -- rarely used so far– Use of archival data and playground observations are

possibilities– Though they have their own problems


Breakdown of Randomization/Design

• Failure of randomization– Don’t use posttest-only designs (to my knowledge none have)

• Schools drop out during course of study– Use signed agreements (none dropped out of Aban Aya)

• Configuration of schools is changed during course of study– E.g., A school is closed, two schools are combined– Drop the paired school as well (& add replacement set if it’s soon

enough)

• A program school refuses to deliver the program, or delivers it poorly– E.g., Schaps Child Development Study only had 5 schools of 12

implement the program well – and reports emphasize results from these 5.

– Botvin often reported results only for students who received more than 60% of the lessons

– “Intention to Treat” analysis should be reported first. Reporting results for the high-implementation group is appropriate only as a secondary level of analysis


Parental Consent Issues• Historical use of “passive” consent

– Parents informed, but only respond if want to “opt out” their child or themselves

• Some IRBs require active signed consent• When is active consent required?

– If asking “sensitive” questions• Drug use, sexual behavior, illegal behavior, family relationships

– If students “required” to participate• Protection of Pupil Rights Act (PPRA)

– Data are not anonymous (or totally confidential)– There is more than minimal risk if data become non-confidential

• Thus, passive consent should be allowed if:– Not asking about sensitive issues

• Allows surveys of young students (K-3/4)

– Students not required to participate• By NIH rules, students already must be given the opportunity to opt out of complete

surveys or to skip questions• Requires careful “assent” procedures

– Strict non-disclosure protocols are followed• Multiple levels of ID numbers for tracking• No individual (or classroom or school) – level data released


Changes in Student Body During a Study

• Transfers out and in– Students who transfer out of or into a study school are, on

average, at higher risk than other students– Are transfers out replaced by transfers in, or are rates

different– Are rates the same across experimental conditions?

• Absenteeism– Students with higher rates of absenteeism are also, on

average, at higher risk than others– Are rates the same across experimental conditions?

• Rates of transfers in/out, absenteeism, or dropout that are differential by condition present the most serious problem– Requires careful assessment and analysis– Missing data techniques of limited value when rates are

differential because not MCAR– But may be useful for MAR (that is, if missing is predictable)


Complex Outcomes, Intensive Measurement and Long-term

Follow-up• Many expected outcomes and mediators leads

to extensive and intensive measurement• Early concern with measurement reactivity

– Led to recommendation of complex designs to rule it out– No longer considered very seriously --– “If only behavior were so easily changed!”

• Long-term follow-up imperative– Few programs with documented effects into or through

high school

• The longer the study, the more the attrition– Due to drop-outs, transfers, absenteeism, refusals


The Nature of the Target Population

• Universal, Selective and Indicated Interventions– Universal = complete population– Selective = those at higher risk– Indicated = those already evidencing early stages

• Implications of variation in risk levels of students in universal interventions– Suggests multi-level/nested interventions might be

desirableE.g., Fast Track

– Suggests analyses by risk level


Hypothetical example of differential effects by risk

level

0

1

2

3

4

5

6

T1 T2 T3 T4 T5

Time of measurement

Lev

el o

f b

ehav

ior

Hi Risk Program

Hi Risk Control

Med Risk Program

Med Risk Control

Lo Risk Program

Lo Risk Control


Further implications

• The design imperative: maximize degrees of freedom.

• The analytic imperative: analyze impacts in accord with randomization.

• A common mishap: “Randomization by group accompanied by an analysis appropriate to randomization by individual is an exercise in self-deception.” (Cornfield (1978).


How should the sample be designed?

• Design parameters to consider– The number of individuals per randomized

group– The number of randomized groups – this is

the MOST IMPORTANT– The proportion of groups randomized to the

program or to control status


Schools Students per school

Randomized 20 40 60 80 100 120

10 0.56 0.50 0.48 0.47 0.46 0.46

20 0.38 0.34 0.32 0.32 0.31 0.31

30 0.30 0.27 0.26 0.25 0.25 0.25

40 0.26 0.23 0.22 0.22 0.21 0.21

50 0.23 0.21 0.20 0.19 0.19 0.19

60 0.21 0.19 0.18 0.18 0.17 0.17

70 0.20 0.17 0.17 0.16 0.16 0.16

80 0.18 0.16 0.16 0.15 0.15 0.15

Minimum Detectable Effect Size For Randomizing Schools


Some implications for sample design

1. It is extremely important to randomize an adequate number of groups.

2. It is often far less important how many individuals per group you have.

3. Choosing the “best” allocation of groups to the program or control status can raise complex issues.


Unit of Analysis• Has received the most persistent attention• Early studies were analyzed at the student

level• Early recommendation was to analyze at the

school level – the level of random assignment• Much attention to intraclass correlation

– Typically only in the .01-.05 range– With 4-10 schools per condition, analyses at the student and

school level can produce same p values

• Development of multi-level analysis techniques– Bryke & Raudenbush, Goldstein, Hedeker & Gibbons– Longitudinal data seen as another level of nesting– Growth curve analyses becoming popular


Complex Interventions

• Always thought of as curricula, or whole programs, not separate components– Few field-based tests of efficacy of separate components to date– But curricula/programs based on basic and hypothesis-driven

research

• Programs have grown more complex over the years• Multiple outcomes are the norm

– Multiple behaviors + Character + Achievement

• Also multiple ecologies are involved– School-wide– Involvement of parents/families– Involvement of community (e.g., Aban Aya)

• Therefore, multiple mediators, both distal and proximal– Distal: Family patterns, school climate, community involvement– Proximal: Attitudes, normative beliefs, self-efficacy, intentions


Examples

• Example of major moderation from Aban Aya– Effects for males only

• Examples of mediation from Aban Aya– Following slides

• Example of another kind of process analysis– Later slide from Positive Action

• Example of another kind of moderation– Later slide from Positive Action


School-based Prevention/Promotion Research

Needs More…• Larger randomized trials

– With more schools per condition

• Comparisons with “treatment as usual”• Measurement of implementation process and

program integrity• Assessment of effects on presumed mediators

– Helps test theories

• Multiple measures/sources of data– Surveys of students, parents, teachers, staff, community– Teacher and parent reports of behavior– School records for behavior and achievement

• Multiple, independent trials of promising programs– At both efficacy and effectiveness levels

• Cost-effectiveness analyses

Documents

“Progress and Prospects for Place-based Randomized Trials” Rockefeller Foundation, Bellagio, Nov 11-15, 2002 Cluster Randomized Trials for Evaluating Problem