Classifying Designs of MSP Evaluations Lessons Learned and Recommendations Barbara E. Lovitts June 11, 2008

Classifying Designs of MSP Evaluations

Lessons Learned and Recommendations

Barbara E. Lovitts

June 11, 2008

2

The Sample of Projects

First level of screening

1. Final year – APR start and end dates.

2. Type of evaluation design:

• Experimental (random assignment)

• Quasi-experimental

– Comparison group study with equating

– Regression-discontinuity study

3

Findings – Final Year

• Started with 124 projects.

• Ended with 88 projects (results are not final).

• Projects eliminated based on:

– Evidence in project narrative or evaluation report.

– Information provided by the Project Director.

4

Evaluation Design

Type of Design Starting Number Ending Number

Experimental 3* 0

Quasi-Experimental

47* 19*

*Results are not final.

5

Findings - Quasi-Experimental Designs

• Many studies had a one-group pre-/post design (eliminated).

• In many treatment/comparison group studies, the comparison teachers were in the same school at the same grade level as the treatment teachers (not eliminated).

6

Applying the Rubric

Challenges

• Projects used different designs to evaluate different outcomes (e.g., content knowledge, pedagogy, efficacy)

• Projects used different designs to evaluate different participant groups (e.g., teachers, students)

• Projects used different designs at different grade levels or for different instruments.

7

Applying the Rubric

Solution

• Identify each measured outcome and group (e.g., 5th grade teachers – earth science content knowledge).

• Apply the rubric to each outcome/group combination that was evaluated using an experimental or a quasi-experimental design

8

Applying the Rubric

A. Baseline Equivalence of Groups (Quasi- Experimental Only)

Criterion:

• No significant pre-intervention differences between treatment and comparison on variables related to the study’s key outcomes; or

• Adequate steps were taken to address the lack of baseline equivalence in the statistical analysis.

9

Applying the Rubric

Common Issues:

• No pre-test information on outcome-related measures.

• Within groups pre-test results given for the treatment and comparison groups, but no tests of between groups differences.

• Projects match groups on unit of assignment (e.g., schools, teachers), but do not provide data on unit of assessment (e.g., teachers, students).

10

Applying the Rubric

Recommendation: Baseline Equivalence

Participant Group and Outcome

Treatment Pre-test

Comparison Pre-test

p-value

mean or percent

mean or percent

mean or percent

mean or percent

mean or percent

mean or percent

11

Applying the Rubric

B. Sample Size

Criterion:

• Sample size was adequate

– Based on a power analysis with recommended:

• significance level = 0.05

• power = 0.8

• minimum detectable effect informed by the literature or otherwise justified

12

Applying the Rubric

Common Issues:

• Power analyses rarely conducted.

• Different sample sizes given throughout the APR and Evaluation Report.

• Sample sizes in the APR and Evaluation Report do not match.

• Report sample size for teachers but not for students, or for students but not for teachers.

• Subgroup sizes:

– are not reported

– reported inconsistently

– vary by discipline, subdiscipline (e.g., earth science, physical science), and/or grade level

13

Applying the Rubric

Recommendation: Sample Size


Treatment (Final sample

size)

Comparison (Final sample size)

Power Calculation

Assumptions (if available)

N N Alpha = Power = MDE =

N N Alpha = Power = MDE =

Recommended significance levels: alpha = 0.05, power = 0.8, Minimal Detectable Effect (MDE) = informed by the literature.

14

C. Quality of the Data Collection Methods

Criterion:

• The study used existing data collection instruments that had already been deemed valid and reliable to measure key outcomes; or

• The study used data collection instruments developed specifically for the study that were sufficiently pre-tested with subjects who were comparable to the study sample.

15

Applying the Rubric

Common Issues:

• Locally developed instruments not tested for validity or reliability.

• Identify an instrument in the APR and select “not tested for validity or reliability,” but a Google search shows that the instrument has been tested for validity and reliability.

• Use many instruments but do not report validity or reliability for all of them.

• Do not provide results for all instruments.

16

Applying the Rubric

Recommendation: Data Collection Instruments


Name of Instrument Evidence for Validity and Reliability

Teacher content knowledge – math

DTAMS {cite website or other reference were evidence can be found}

Teacher content knowledge – marine biology

Locally developed instrument

Narrative description of the evidence

Teacher content knowledge - science

Borrowed items from [instrument name(s)].

Total # of items. # of items borrowed from each instrument.

17

Applying the Rubric

D. Quality of the Data Collection Methods

Criterion:

• The methods, procedures, and timeframes used to collect the key outcome data from treatment and comparison groups were the same.

18

Applying the Rubric

Common Issues:

• Little to no information is provided in general about data collection.

• Information is provided for the treatment group but for not the comparison group.

• Treatment teachers typically receive the pre-test before the summer institute and a post-test at the end of the summer institute, and sometime another post-test at the end of the school year.

• Comparison teachers receive a pre-test at the beginning of the school year and a post-test at the end of the school year.

• Comparison teachers receive a single test at the beginning of the year.

19

Applying the Rubric

Recommendation: Quality of Data Collection Methods

1. Participant Group and Outcome ______________

A. Method/procedure for collecting data from treatment group (describe):

B. Was the same method/procedure used to collect data from the comparison group? ___ Yes ___ No If no, please describe how the method/procedure was different:

(continued)

20

Applying the Rubric

C. Time Frame


Month and Year

Pre-test Post-test Repeated Post-test

Treatment group

Comparison Group

21

Applying the Rubric

E. Data Reduction Rates

Criterion:

• The study measured the key outcome variable(s) in the post-tests for at least 70% of the original study sample (treatment and comparison groups combined)

• 0r there is evidence that the high rates of data reduction were unrelated to the intervention; AND

• The proportion of the original study sample that was retained in the follow-up data collection activities (e.g., post-intervention surveys) and/or for whom post-intervention data were provided (e.g., test scores) was similar for both the treatment and comparison groups (i.e., less than or equal to a 15% difference),

• Or the proportion of the original study sample that was retained in the follow-up data collection was different for the treatment and comparison groups, and sufficient steps were taken to address this differential attrition were not taken in the statistical analysis.

22

Applying the Rubric

Common Issues:

• Attrition information is typically not reported.

• Abt can sometimes calculate attrition, but it is difficult because sample and subsample sizes are not reported consistently.

• If projects provide data on attrition or if Abt can calculate it, it is usually for the treatment group only.

• Projects rarely provide data on student attrition, though some occasionally mention that there is a lot of student mobility, but it is not quantified.

23

Applying the Rubric

Recommendation: Data Reduction Rates


Original Sample

Size

Pre-test sample

size

Post-test

sample size

Post-test N/Pre-test

N

Post-test N/ Original

N

Treatment

Comparison

24

Applying the Rubric

E. Relevant Data

Criterion:

• The final report includes treatment and comparison group post-test means and tests of significance for key outcomes; or

• Provides sufficient information for calculation of statistical significance (e.g., mean, sample size, standard deviation/standard error).

25

Applying the Rubric

Common Issues:

• Projects reports that the results were significant or non-significant but do not provide supporting data.

• Projects provide p-values but do not provide means or percents.

• Projects provide means/percents, p-values, but not standard deviations.

• Projects provide within group data for the treatment and comparison groups but do not provide between-group tests of significance.

• Projects with treatment and comparison groups provide data for the treatment group only.

• Projects provide significant results but do not identify the type of statistical test they performed.

• Projects provide an overwhelming amount of data for a large number of subgroups (e.g., on individual test or survey items).

26

Applying the Rubric

Recommendation: Relevant Data


Mean or Percent

SD or SE t, F, or Chi square

p-value

Treatment

Comparison

27

Documents

Classifying Designs of MSP Evaluations Lessons Learned and Recommendations Barbara E. Lovitts June 11, 2008