Upload
garey-mccormick
View
218
Download
0
Embed Size (px)
Citation preview
Classifying Designs of MSP Evaluations
Lessons Learned and Recommendations
Barbara E. Lovitts
June 11, 2008
2
The Sample of Projects
First level of screening
1. Final year – APR start and end dates.
2. Type of evaluation design:
• Experimental (random assignment)
• Quasi-experimental
– Comparison group study with equating
– Regression-discontinuity study
3
Findings – Final Year
• Started with 124 projects.
• Ended with 88 projects (results are not final).
• Projects eliminated based on:
– Evidence in project narrative or evaluation report.
– Information provided by the Project Director.
4
Evaluation Design
Type of Design Starting Number Ending Number
Experimental 3* 0
Quasi-Experimental
47* 19*
*Results are not final.
5
Findings - Quasi-Experimental Designs
• Many studies had a one-group pre-/post design (eliminated).
• In many treatment/comparison group studies, the comparison teachers were in the same school at the same grade level as the treatment teachers (not eliminated).
6
Applying the Rubric
Challenges
• Projects used different designs to evaluate different outcomes (e.g., content knowledge, pedagogy, efficacy)
• Projects used different designs to evaluate different participant groups (e.g., teachers, students)
• Projects used different designs at different grade levels or for different instruments.
7
Applying the Rubric
Solution
• Identify each measured outcome and group (e.g., 5th grade teachers – earth science content knowledge).
• Apply the rubric to each outcome/group combination that was evaluated using an experimental or a quasi-experimental design
8
Applying the Rubric
A. Baseline Equivalence of Groups (Quasi- Experimental Only)
Criterion:
• No significant pre-intervention differences between treatment and comparison on variables related to the study’s key outcomes; or
• Adequate steps were taken to address the lack of baseline equivalence in the statistical analysis.
9
Applying the Rubric
Common Issues:
• No pre-test information on outcome-related measures.
• Within groups pre-test results given for the treatment and comparison groups, but no tests of between groups differences.
• Projects match groups on unit of assignment (e.g., schools, teachers), but do not provide data on unit of assessment (e.g., teachers, students).
10
Applying the Rubric
Recommendation: Baseline Equivalence
Participant Group and Outcome
Treatment Pre-test
Comparison Pre-test
p-value
mean or percent
mean or percent
mean or percent
mean or percent
mean or percent
mean or percent
11
Applying the Rubric
B. Sample Size
Criterion:
• Sample size was adequate
– Based on a power analysis with recommended:
• significance level = 0.05
• power = 0.8
• minimum detectable effect informed by the literature or otherwise justified
12
Applying the Rubric
Common Issues:
• Power analyses rarely conducted.
• Different sample sizes given throughout the APR and Evaluation Report.
• Sample sizes in the APR and Evaluation Report do not match.
• Report sample size for teachers but not for students, or for students but not for teachers.
• Subgroup sizes:
– are not reported
– reported inconsistently
– vary by discipline, subdiscipline (e.g., earth science, physical science), and/or grade level
13
Applying the Rubric
Recommendation: Sample Size
Participant Group and Outcome
Treatment (Final sample
size)
Comparison (Final sample size)
Power Calculation
Assumptions (if available)
N N Alpha = Power = MDE =
N N Alpha = Power = MDE =
Recommended significance levels: alpha = 0.05, power = 0.8, Minimal Detectable Effect (MDE) = informed by the literature.
14
C. Quality of the Data Collection Methods
Criterion:
• The study used existing data collection instruments that had already been deemed valid and reliable to measure key outcomes; or
• The study used data collection instruments developed specifically for the study that were sufficiently pre-tested with subjects who were comparable to the study sample.
15
Applying the Rubric
Common Issues:
• Locally developed instruments not tested for validity or reliability.
• Identify an instrument in the APR and select “not tested for validity or reliability,” but a Google search shows that the instrument has been tested for validity and reliability.
• Use many instruments but do not report validity or reliability for all of them.
• Do not provide results for all instruments.
16
Applying the Rubric
Recommendation: Data Collection Instruments
Participant Group and Outcome
Name of Instrument Evidence for Validity and Reliability
Teacher content knowledge – math
DTAMS {cite website or other reference were evidence can be found}
Teacher content knowledge – marine biology
Locally developed instrument
Narrative description of the evidence
Teacher content knowledge - science
Borrowed items from [instrument name(s)].
Total # of items. # of items borrowed from each instrument.
17
Applying the Rubric
D. Quality of the Data Collection Methods
Criterion:
• The methods, procedures, and timeframes used to collect the key outcome data from treatment and comparison groups were the same.
18
Applying the Rubric
Common Issues:
• Little to no information is provided in general about data collection.
• Information is provided for the treatment group but for not the comparison group.
• Treatment teachers typically receive the pre-test before the summer institute and a post-test at the end of the summer institute, and sometime another post-test at the end of the school year.
• Comparison teachers receive a pre-test at the beginning of the school year and a post-test at the end of the school year.
• Comparison teachers receive a single test at the beginning of the year.
19
Applying the Rubric
Recommendation: Quality of Data Collection Methods
1. Participant Group and Outcome ______________
A. Method/procedure for collecting data from treatment group (describe):
B. Was the same method/procedure used to collect data from the comparison group? ___ Yes ___ No If no, please describe how the method/procedure was different:
(continued)
20
Applying the Rubric
C. Time Frame
Participant Group and Outcome
Month and Year
Pre-test Post-test Repeated Post-test
Treatment group
Comparison Group
21
Applying the Rubric
E. Data Reduction Rates
Criterion:
• The study measured the key outcome variable(s) in the post-tests for at least 70% of the original study sample (treatment and comparison groups combined)
• 0r there is evidence that the high rates of data reduction were unrelated to the intervention; AND
• The proportion of the original study sample that was retained in the follow-up data collection activities (e.g., post-intervention surveys) and/or for whom post-intervention data were provided (e.g., test scores) was similar for both the treatment and comparison groups (i.e., less than or equal to a 15% difference),
• Or the proportion of the original study sample that was retained in the follow-up data collection was different for the treatment and comparison groups, and sufficient steps were taken to address this differential attrition were not taken in the statistical analysis.
22
Applying the Rubric
Common Issues:
• Attrition information is typically not reported.
• Abt can sometimes calculate attrition, but it is difficult because sample and subsample sizes are not reported consistently.
• If projects provide data on attrition or if Abt can calculate it, it is usually for the treatment group only.
• Projects rarely provide data on student attrition, though some occasionally mention that there is a lot of student mobility, but it is not quantified.
23
Applying the Rubric
Recommendation: Data Reduction Rates
Participant Group and Outcome
Original Sample
Size
Pre-test sample
size
Post-test
sample size
Post-test N/Pre-test
N
Post-test N/ Original
N
Treatment
Comparison
24
Applying the Rubric
E. Relevant Data
Criterion:
• The final report includes treatment and comparison group post-test means and tests of significance for key outcomes; or
• Provides sufficient information for calculation of statistical significance (e.g., mean, sample size, standard deviation/standard error).
25
Applying the Rubric
Common Issues:
• Projects reports that the results were significant or non-significant but do not provide supporting data.
• Projects provide p-values but do not provide means or percents.
• Projects provide means/percents, p-values, but not standard deviations.
• Projects provide within group data for the treatment and comparison groups but do not provide between-group tests of significance.
• Projects with treatment and comparison groups provide data for the treatment group only.
• Projects provide significant results but do not identify the type of statistical test they performed.
• Projects provide an overwhelming amount of data for a large number of subgroups (e.g., on individual test or survey items).
26
Applying the Rubric
Recommendation: Relevant Data
Participant Group and Outcome
Mean or Percent
SD or SE t, F, or Chi square
p-value
Treatment
Comparison
27