36
Are Teacher-Level Value-Added Estimates Biased? An Experimental Validation of Non- Experimental Estimates Thomas J. Kane Douglas O. Staiger HGSE Dartmouth College

Are Teacher-Level Value- Added Estimates Biased? An Experimental Validation of Non-Experimental Estimates Thomas J. KaneDouglas O. Staiger HGSEDartmouth

Embed Size (px)

Citation preview

Page 1: Are Teacher-Level Value- Added Estimates Biased? An Experimental Validation of Non-Experimental Estimates Thomas J. KaneDouglas O. Staiger HGSEDartmouth

Are Teacher-Level Value-Added Estimates Biased?An Experimental Validation of Non-Experimental Estimates

Thomas J. Kane Douglas O. Staiger

HGSE Dartmouth College

Page 2: Are Teacher-Level Value- Added Estimates Biased? An Experimental Validation of Non-Experimental Estimates Thomas J. KaneDouglas O. Staiger HGSEDartmouth

LAUSD Data

Grades 2 through 5 Three Time Periods:

Years before Random Assignment: Spring 2000 through Spring 2003 Years of Random Assignment: Either Spring 2004 or 2005 Years after Random Assignment: Spring 2005 (or 2006) through

Spring 2007 Outcomes:

California Standards Test (Spring 2004- 2007) Stanford 9 Tests (Spring 2000 through 2002) California Achievement Test (Spring 2003)

Covariates: Student: baseline math and reading scores (interacted with grade),

race/ethnicity (hispanic, white, black, other or missing), ever retained, Title I, Eligible for free lunch, Gifted and talented, Special education, English language development (level 1-5).

Peers: Means of all the above for students in classrooms. Fixed Effects: School x Grade x Track x Year

Sample Exclusions: Special Education Exclusion: >20 percent special education classes Small and Large Class Exclusion: Fewer than 5 and more than 36

students in class

All standardized by grade and year.

Page 3: Are Teacher-Level Value- Added Estimates Biased? An Experimental Validation of Non-Experimental Estimates Thomas J. KaneDouglas O. Staiger HGSEDartmouth

Experimental Design

Sample of NBPTS applicants from Los Angeles area.

Sample of Comparison teachers working in same school, grade, calendar track.

LAUSD chief of staff wrote letters to principals inviting them to draw up two classrooms that they would be willing to assign to either teacher.

If principal agreed, classroom rosters (not individual students) were randomly assigned by LAUSD on the day of switching.

Yielded 78 pairs of teachers (156 classrooms and 3500 students) for whom we had estimates of “value-added” impacts from the pre-experimental period.

Page 4: Are Teacher-Level Value- Added Estimates Biased? An Experimental Validation of Non-Experimental Estimates Thomas J. KaneDouglas O. Staiger HGSEDartmouth

Step 1: Estimate a Variety of Non-Experimental Specifications using Pre-Experimental Data

erroryear by Student

shockyear by classroom persistent-Non

effectTeacher

.covariates level -classroom andStudent

gain).or (level scorest Student te

,

ijt

jt

j

ijt

ijt

ijtjtjijtijtijtijt

X

A

whereXA

Generate Empirical Bayes estimates (VAj) of teacher effects using a variety of specifications of A, X.

Page 5: Are Teacher-Level Value- Added Estimates Biased? An Experimental Validation of Non-Experimental Estimates Thomas J. KaneDouglas O. Staiger HGSEDartmouth

Step 2: Test Validity of VAj in Predicting Within-Pair Experimental Differences

.

1,...,78p and 1,2jfor jpjppjp VAY At the classroom level:

1,1:H .scoresTest

1,1:H ,scoresTest

1:H ,scoresTest

0:H then stics,characteri Baseline

to...equal is Y variabledependent When the

o2year alexperiment

o1year alexperiment

oyear alexperiment

o

Differencing within each pair, p=1 through 78:

1,...,78pfor ~1212 ppppp VAVAYY

Page 6: Are Teacher-Level Value- Added Estimates Biased? An Experimental Validation of Non-Experimental Estimates Thomas J. KaneDouglas O. Staiger HGSEDartmouth

Summary of Sample Comparisons

The experimental sample of teachers was more experienced. (15 vs. 10.5 years in LAUSD)

The pre-experimental mean and s.d. of VAj were similar in the experimental and non-experimental samples.

Could not reject the hypothesis of no relationship between VA2p-VA1p and differences in mean baseline characteristics.

Could not reject the hypothesis of no differential attrition or teacher switching.

Page 7: Are Teacher-Level Value- Added Estimates Biased? An Experimental Validation of Non-Experimental Estimates Thomas J. KaneDouglas O. Staiger HGSEDartmouth

Teacher Effects

Teacher by Year Random

Effect

Mean Sample Size per Teacher

Math Levels with...No Controls 0.455 0.224 48.612Student/Peer Controls (incl. prior scores) 0.228 0.180 42.843Student/Peer Controls (incl. prior scores) & School F.E. 0.216 0.178 42.843Student Fixed Effects 0.098 0.072 48.612

Math Gains with...No Controls 0.228 0.223 45.171Student/Peer Controls 0.227 0.220 45.171Student/Peer Controls & School F.E. 0.217 0.221 45.171

Specification Used for Non-experimental Teacher Effect

Note: The above estimates are based on the total variance in estimated teacher fixed effects using observations from the pre-experimental data (years 1999-2000 through 2002-03). See the text for discussion of the estimation of the decomposition into teacher by year random effects, student-level error, and "actual" teacher effects. The sample was limited to schools with teachers in the experimental sample. Any individual students who were in the experiment were dropped from the pre-experimental estimation, to avoid any spurious relationship due to regression to the mean, etc.

Table 3: Non-experimental Specifications of Teacher Effects

Standard Deviation of Each Component (in Student-level

Standard Deviation Units)

Page 8: Are Teacher-Level Value- Added Estimates Biased? An Experimental Validation of Non-Experimental Estimates Thomas J. KaneDouglas O. Staiger HGSEDartmouth

Why would student fixed-effect models underestimate differences in teacher value added?

When we demean student data, we subtract off 1/T of current teacher’s effect (T=#years of data on each student) underestimate magnitude of teacher effect by 1/T

(i.e., need d.f. correction) In our data, typical student had 2-4 years of data, so

magnitude is biased down by ½ to ¼.

Subtract off even more of teacher effect if some of current teacher’s effect persists into scores in future years (FE model assumes no persistence) underestimate magnitude by 1/T for teacher in year T

(since this teacher’s effect only in last year’s score) underestimate magnitude by more than 1/T for

teachers in earlier years, with downward bias largest for first teacher.

If first teacher’s effect completely persistent, we would subtract off all of the effect & estimate no variance in 1st year teacher effect.

Page 9: Are Teacher-Level Value- Added Estimates Biased? An Experimental Validation of Non-Experimental Estimates Thomas J. KaneDouglas O. Staiger HGSEDartmouth

Test Score Second Year

Test Score Third Year

Coefficient R2 N: Coefficient Coefficient

Math Levels with...No Controls 0.495*** 0.183 78 0.273** 0.128

(0.103) (0.103) (0.097)Student/Peer Controls (incl. prior scores) 0.863*** 0.213 78 0.378* 0.068

(0.178) (0.174) (0.137)Student/Peer Controls (incl. prior scores) & School F.E. 0.918*** 0.230 78 0.410* 0.102

(0.180) (0.178) (0.140)Student Fixed Effects 1.987*** 0.161 78 0.915 0.344

(0.488) (0.471) (0.428)

Math Gains with...No Controls 0.833*** 0.168 78 0.373 0.054

(0.204) (0.191) (0.153)Student/Peer Controls 0.841*** 0.170 78 0.381 0.054

(0.211) (0.201) (0.159)Student/Peer Controls & School F.E. 0.878*** 0.176 78 0.405 0.067

(0.217) (0.210) (0.165)

Note: Each baseline characteristic listed in the columns was used as a dependent variable, regressing the within-pair difference in mean baseline characteristic on different non-experimental estimates of teacher effects. The coefficients were estimated in separate bivariate regressions with no constant. Robust standard errors are reported in parentheses.

Table 6: Predicting Outcomes During Experimental Period (Pair-level Regressions)

Specification Used for Non-experimental Teacher EffectTest Score First Year

Page 10: Are Teacher-Level Value- Added Estimates Biased? An Experimental Validation of Non-Experimental Estimates Thomas J. KaneDouglas O. Staiger HGSEDartmouth

-1.5

-1-.

50

.51

1.5

With

in P

air

Diff

ere

nce

in E

nd

of F

irst Y

ear

Tes

t Sco

re

0 .2 .4 .6 .8Within Pair Difference in Pre-experimental Value-added

Observed Linear Fitted Values45-degree Line Lowess Fitted Values

Mathematics

Figure 1: Within Pair Differences in Pre-experimentalValue-added and End of First Year Test Score

Page 11: Are Teacher-Level Value- Added Estimates Biased? An Experimental Validation of Non-Experimental Estimates Thomas J. KaneDouglas O. Staiger HGSEDartmouth

Test Score Second Year

Test Score Third Year

Coefficient R2 N: Coefficient Coefficient

Math Levels with...No Controls 0.495*** 0.183 78 0.273** 0.128

(0.103) (0.103) (0.097)Student/Peer Controls (incl. prior scores) 0.863*** 0.213 78 0.378* 0.068

(0.178) (0.174) (0.137)Student/Peer Controls (incl. prior scores) & School F.E. 0.918*** 0.230 78 0.410* 0.102

(0.180) (0.178) (0.140)Student Fixed Effects 1.987*** 0.161 78 0.915 0.344

(0.488) (0.471) (0.428)

Math Gains with...No Controls 0.833*** 0.168 78 0.373 0.054

(0.204) (0.191) (0.153)Student/Peer Controls 0.841*** 0.170 78 0.381 0.054

(0.211) (0.201) (0.159)Student/Peer Controls & School F.E. 0.878*** 0.176 78 0.405 0.067

(0.217) (0.210) (0.165)

Note: Each baseline characteristic listed in the columns was used as a dependent variable, regressing the within-pair difference in mean baseline characteristic on different non-experimental estimates of teacher effects. The coefficients were estimated in separate bivariate regressions with no constant. Robust standard errors are reported in parentheses.

Table 6: Predicting Outcomes During Experimental Period (Pair-level Regressions)

Specification Used for Non-experimental Teacher EffectTest Score First Year

Page 12: Are Teacher-Level Value- Added Estimates Biased? An Experimental Validation of Non-Experimental Estimates Thomas J. KaneDouglas O. Staiger HGSEDartmouth

Structural Model for Estimating Fade-out Parameter, δ

error.year by student correlatedSerially

shockyear by classroom persistent-Non

year tin teacher ofEffect

impactcher school/teafor parameter out -Fade

impactcher school/tea Cumulative

gainor level scorest Student te

, 1

ijt

jt

jt

ijt

ijt

jtjtijtijtijtijtijt

A

whereA

Page 13: Are Teacher-Level Value- Added Estimates Biased? An Experimental Validation of Non-Experimental Estimates Thomas J. KaneDouglas O. Staiger HGSEDartmouth

IV Strategy for Estimating Fade-Out Parameter (δ) in Non-Exp Data

We can rewrite the error component model as:

OLS estimates of δ biased, because Aijt-1 correlated with error

Use prior year teacher dummies to instrument for Aijt-

1

Assumes that prior year teacher assignment is not correlated with

Control for teacher or classroom fixed effects to capture current teacher/classroom effects.

11 ijtijtjtjtijtijt AA

1or itit

Page 14: Are Teacher-Level Value- Added Estimates Biased? An Experimental Validation of Non-Experimental Estimates Thomas J. KaneDouglas O. Staiger HGSEDartmouth

A B C

Math 0.489*** 0.478*** 0.401***(0.006) (0.006) (0.007)

N: 89,277 89,277 89,277

English Language Arts 0.533*** 0.514*** 0.413***(0.007) (0.007) (0.009)

N: 87,798 87,798 87,798

Current Teacher F.E. Yes No NoCurrent Classroom F.E. No Yes YesStudent Controls No No Yes

Note: Coefficients were estimated using separate 2SLS regressions with student test score as the dependent variable. Each specification included controls as indicated, grade-by-year F.E. Baseline test score is instrumented using a teacher dummy variable for the teacher associated with the baseline test.

Table 11: IV Estimates of Teacher Effect Fade-out Coefficient

Page 15: Are Teacher-Level Value- Added Estimates Biased? An Experimental Validation of Non-Experimental Estimates Thomas J. KaneDouglas O. Staiger HGSEDartmouth

Joint Validity of Non-Experimental Estimates of δ and VAj

.

1:H :ValidityJoint ofTest

~

o

1212

ptpt

pt

ptpt VAVAYY

Page 16: Are Teacher-Level Value- Added Estimates Biased? An Experimental Validation of Non-Experimental Estimates Thomas J. KaneDouglas O. Staiger HGSEDartmouth

Year 0 Year 1 Year 2Years 0, 1,

and 2 Pooled

P-value for Test of Coefficients

Equivalent Across Years

Math Levels with...Student/Peer Controls (incl. prior scores) 0.852*** 0.894* 0.209 0.843*** 0.311

(0.177) (0.429) (0.826) (0.207)

Math Gains with...Student/Peer Controls 0.828*** 0.889 0.060 0.819*** 0.289

(0.207) (0.477) (0.941) (0.239)

Language Arts Levels with...Student/Peer Controls (incl. prior scores) 0.987*** 1.155 2.788 1.054** 0.144

(0.277) (0.689) (1.454) (0.343)

Language Arts Gains with...Student/Peer Controls 0.826** 0.668 1.880 0.829** 0.170

(0.262) (0.631) (1.413) (0.319)

N: 78 78 78 234

Table 12: Predicting Outcomes in Future Years Using Estimated Fade-out Coefficients

Specification Used for Non-experimental Teacher Effect

Note: Each year's classroom average test score was used as the dependent variable, regressing the within-pair difference in average test score on different non-experimental estimates of teacher effects discounted in year two by the coefficients in column "C" of Table 11 and in year three by the square of those same coefficients. The coefficients were estimated in separate regressions with no constant. Robust standard errors are reported in parentheses.

Page 17: Are Teacher-Level Value- Added Estimates Biased? An Experimental Validation of Non-Experimental Estimates Thomas J. KaneDouglas O. Staiger HGSEDartmouth

Potential Sources of Fade-out

Unused knowledge may becomes inoperable.

Grade-specific content is not entirely reflected in future achievement. (e.g. even if you’ve not forgotten logarithms, may not hurt you in calculus)

Page 18: Are Teacher-Level Value- Added Estimates Biased? An Experimental Validation of Non-Experimental Estimates Thomas J. KaneDouglas O. Staiger HGSEDartmouth

Potential Sources of Fade-out

Unused knowledge becomes inoperable. Grade-specific content is not entirely relevant

for future achievement. (e.g. even if you’ve not forgotten logarithms, may not hurt you in calculus)

Takes more effort to keep students at high performance level than at low performance level.

Students of best teachers mixed with students of worst teachers in following year, and new teacher will focus effort on students who are behind.( no fade-out if teachers were all effective)

Page 19: Are Teacher-Level Value- Added Estimates Biased? An Experimental Validation of Non-Experimental Estimates Thomas J. KaneDouglas O. Staiger HGSEDartmouth

Math ELA Math ELA Math ELA

Standard Deviation in Teacher's Value-added 0.184 0.135 0.157 0.121 0.191 0.162

Standard Deviation in Baseline Achievement in Teacher's Classroom 0.400 0.408 0.512 0.513 0.528 0.539

Correlation between Teacher's Value-added and Baseline Achievement in Teacher's Classroom 0.120 0.118 0.041 0.083 0.114 0.103

Note: Estimated using non-experimental samples of 4th and 5th graders in years 2000-2003 for Los Angeles, 2000-2006 for New York City, and 2006-2007 for Boston. Teacher value-added and baseline achievement estimated including controls for student-level controls for baseline test scores, race/ethnicity, special ed, ELL, and free lunch status; classroom peer means of the student-level characteristics; and grade-by-year F.E.

Los Angeles

Table 13: Comparing Assortive Matching in Los Angeles to Other Urban Districts

New York City Boston

Is Teacher-Student Sorting Different in Los Angeles?

Page 20: Are Teacher-Level Value- Added Estimates Biased? An Experimental Validation of Non-Experimental Estimates Thomas J. KaneDouglas O. Staiger HGSEDartmouth

Summary of Main Findings:

All non-experimental specifications provided information regarding experimental outcomes, but those controlling for baseline score yielded unbiased predictions with highest explanatory power.

The experimental impacts in both math and english language arts seem to fade out at annual rate of .4-.6.

Similar fade-out was observed non-experimentally.

Depending on source, fade-out has important implications for calculations of long-term benefits of improvements in average teacher effects.

Page 21: Are Teacher-Level Value- Added Estimates Biased? An Experimental Validation of Non-Experimental Estimates Thomas J. KaneDouglas O. Staiger HGSEDartmouth

Next steps:

Test for “complementaries” in teacher effects across years. (e.g. What is the effect of having a high or low-value added teacher in two consecutive years?)(Current experiment won’t help, but

STAR experiment might.)

Page 22: Are Teacher-Level Value- Added Estimates Biased? An Experimental Validation of Non-Experimental Estimates Thomas J. KaneDouglas O. Staiger HGSEDartmouth

Empirical Methods:

2.Generating Empirical Bayes Estimates of Non-Experimental Teacher Effects

.

22

2

1

2

22

ˆˆ

ˆ

ˆ

ˆˆ

noisesignal

signalj

tjt

jj

jj

hVar

VA

jt

jjtjt

nVar

h2

2 ˆˆ

1

|

1

Page 23: Are Teacher-Level Value- Added Estimates Biased? An Experimental Validation of Non-Experimental Estimates Thomas J. KaneDouglas O. Staiger HGSEDartmouth

Non-experimental School

Experimental SampleNon-experimental

SampleNon-experimental

Sample

Mean Teacher Effect in Math -0.009 -0.003 0.005 S.D. 0.195 0.196 0.196Mean Teacher Effect in ELA -0.010 0.003 0.003 S.D. 0.149 0.148 0.147

Black, Non-Hispanic 0.166 0.138 0.123Hispanic 0.258 0.311 0.325White, Non-Hispanic 0.466 0.447 0.425Other, Non-Hispanic 0.110 0.102 0.123Teacher Race/Ethnicity Missing 0.000 0.003 0.003

Years of Experience 15.490 10.542 10.758

N: 165 1,785 11,352

Experimental School

Table 1: Sample Comparison - Teachers

Note: Descriptive statistics based on the experimental years (2003-04 and 2004-05). The mean teacher effect in math and ELA were estimated using the full sample of schools and teachers, controlling for baseline scores, student characteristics, and peer controls.

Page 24: Are Teacher-Level Value- Added Estimates Biased? An Experimental Validation of Non-Experimental Estimates Thomas J. KaneDouglas O. Staiger HGSEDartmouth

Non-experimental School

Experimental SampleNon-experimental

SampleNon-experimental

Sample

Math Scores 2004 Mean 0.027 -0.110 0.024 S.D. 0.931 0.941 1.008 2005 Mean -0.008 -0.113 0.028 S.D. 0.936 0.940 1.007 2006 Mean 0.001 -0.100 0.037 S.D. 0.960 0.941 1.006 2007 Mean -0.016 -0.092 0.030 S.D. 0.956 0.941 1.006

ELA Scores 2004 Mean 0.038 -0.113 0.023 S.D. 0.913 0.936 1.008 2005 Mean 0.009 -0.117 0.027 S.D. 0.920 0.930 1.009 2006 Mean 0.039 -0.096 0.037 S.D. 0.923 0.928 1.001 2007 Mean 0.018 -0.095 0.037 S.D. 0.940 0.936 1.000

N: 3,554 43,766 273,525

Table 2: Sample Comparison - Students

Experimental School

Note: Descriptive statistics based on the experimental years (2003-04 and 2004-05). Students present both years are counted only once.

Page 25: Are Teacher-Level Value- Added Estimates Biased? An Experimental Validation of Non-Experimental Estimates Thomas J. KaneDouglas O. Staiger HGSEDartmouth

Non-experimental School

Experimental SampleNon-experimental

SampleNon-experimental

Sample

Black, Non-Hispanic 0.112 0.115 0.113Hispanic 0.768 0.779 0.734White, Non-Hispanic 0.077 0.060 0.088Other, Non-Hispanic 0.044 0.046 0.066

Grade 2 0.377 0.280 0.288Grade 3 0.336 0.201 0.207Grade 4 0.113 0.215 0.211Grade 5 0.131 0.305 0.294

N: 3,554 43,766 273,525

Table 2: Sample Comparison - Students (cont.)

Experimental School

Note: Descriptive statistics based on the experimental years (2003-04 and 2004-05). Students present both years are counted only once.

Page 26: Are Teacher-Level Value- Added Estimates Biased? An Experimental Validation of Non-Experimental Estimates Thomas J. KaneDouglas O. Staiger HGSEDartmouth

English Language

Status

Math Score

Language Score

Gifted and Talented

Special Education Hispanic Black

Free Lunch

Level 1 to 3

Math Levels with Student/Peer Controls -0.081 0.036 -0.014 -0.049 -0.053 0.008 0.031 -0.026(0.230) (0.268) (0.022) (0.033) (0.041) (0.041) (0.061) (0.071)

N: 44 44 78 78 78 78 78 78

ELA Levels with Student/Peer Controls 0.089 0.296 0.023 -0.066 -0.037 0.008 0.084 -0.097(0.323) (0.359) (0.032) (0.051) (0.097) (0.066) (0.085) (0.132)

N: 44 44 78 78 78 78 78 78

Note: Each baseline characteristic listed in the columns was used as a dependent variable, regressing the within-pair difference in mean baseline characteristic on different non-experimental estimates of teacher effects. The coefficients were estimated in separate bivariate regressions with no constant. Robust standard errors are reported in parentheses. Baseline math and language arts scores were missing for the pairs that were in second grade.

Specification Used for Non-experimental Teacher Effect

Baseline Demographics & Program ParticipationBaseline Scores

Table 4. Baseline Student Characteristics Regressed on Non-Experimental Teacher Effects

Page 27: Are Teacher-Level Value- Added Estimates Biased? An Experimental Validation of Non-Experimental Estimates Thomas J. KaneDouglas O. Staiger HGSEDartmouth

First Year Second Year Third Year

Math Levels with Student/Peer Controls -0.004 0.029 -0.018 -0.028(0.049) (0.057) (0.058) (0.133)

N: 78 78 78 78

ELA Levels with Student/Peer Controls -0.035 0.006 0.030 -0.148(0.077) (0.084) (0.097) (0.171)

N: 78 78 78 78

Missing Test Score

Table 5: Attrition and Teacher Switching

Note: Each baseline characteristic listed in the columns was used as a dependent variable, regressing the within-pair difference in mean baseline characteristic on different non-experimental estimates of teacher effects. The coefficients were estimated in separate bivariate regressions with no constant. Robust standard errors are reported in parentheses.

Specification Used for Non-experimental Teacher Effect Switched Teacher

Page 28: Are Teacher-Level Value- Added Estimates Biased? An Experimental Validation of Non-Experimental Estimates Thomas J. KaneDouglas O. Staiger HGSEDartmouth

Specification Used for Non-experimental Teacher Effect First Year Score

Math Levels with Student/Peer Controls 0.845*** 0.423* 0.421* 0.08 0.076(0.181) (0.178) (0.185) (0.145) (0.290)

N: 2,905 2,685 2,305 2,504 1,892

ELA Levels with Student/Peer Controls 1.073*** 0.605* 0.718* 0.589* 0.626(0.271) (0.275) (0.280) (0.249) (0.376)

N: 2,903 2,691 2,312 2,503 1,891

Student-Level Controls No No No No NoSecond Year Teacher F.E. YesSecond x Third Year Teacher F.E. Yes

Table 7: Predicting Experimental Performance in Math and ELA (Student-level Regressions)

Note: The above were estimated with student-level regressions using fixed effects for each experimental teacher pair. Robust standard errors (in parentheses) allow for clustering at the teacher-pair level. The sample for specifications including teacher fixed effects are limited to students in grades 3-5 as teacher identifiers for secondary grades are not yet available.

Second Year Score Third Year Score

Page 29: Are Teacher-Level Value- Added Estimates Biased? An Experimental Validation of Non-Experimental Estimates Thomas J. KaneDouglas O. Staiger HGSEDartmouth

Specification Used for Non-experimental Teacher Effect

Math Levels with Student/Peer Controls 1.096*** 0.952*** 0.246*** 0.144*** 0.115*** 0.008(0.016) (0.010) (0.011) (0.016) (0.012) (0.026)

N: 114,767 108,505 97,908 67,079 88,993 32,429

ELA Levels with Student/Peer Controls 0.869*** 0.745*** 0.223*** 0.135*** 0.140*** 0.067*(0.022) (0.012) (0.013) (0.020) (0.015) (0.032)

N: 114,963 108,656 98,009 67,140 89,028 32,442

Student-Level Controls No Yes Yes Yes Yes YesSecond Year Teacher F.E. YesSecond x Third Year Teacher F.E. Yes

2004-05

Table 8. Estimating Fade-Out in the Non-Experimental Sample (Student-level Regressions)

2005-06 2006-07

Note: The 2004-05 teacher effect is estimated using data from 1999-2000 through 2002-03 excluding schools who participated in the experiment. Above we report the coefficients on that estimated 2004-05 teacher effect in predicting a student's 2004-05, 2005-06, and 2006-07 scores respectively. The sample for specifications including teacher fixed effects are limited to students in grades 3-5 as teacher identifiers for secondary grades are not yet available.

Page 30: Are Teacher-Level Value- Added Estimates Biased? An Experimental Validation of Non-Experimental Estimates Thomas J. KaneDouglas O. Staiger HGSEDartmouth

English Language

Status

Gifted and Talented

Special Education Hispanic Black

Free Lunch

Level 1 to 3

Math Levels (incl. prior scores, student/peer controls)Experimental Sample -0.109 -0.016 -0.047 -0.045 0.006 0.033 -0.014

(0.242) (0.031) (0.035) (0.033) (0.039) (0.049) (0.075)1,840 3,038 3,038 3,038 3,038 3,038 3,038

0.130+ 0.028+ -0.032** 0.016 0.000 -0.006 -0.018(0.071) (0.015) (0.013) (0.032) (0.018) (0.010) (0.045)24,864 34,196 34,196 34,196 34,196 34,196 34,196

0.241*** 0.038*** -0.021*** 0.001 -0.011* -0.005 -0.006(0.033) (0.007) (0.005) (0.007) (0.005) (0.004) (0.014)258,533 359,368 359,368 359,368 359,368 359,368 359,368

ELA Levels (incl. prior scores, student/peer controls)Experimental Sample 0.398 0.036 -0.075 -0.011 0.000 0.075 -0.100

(0.378) (0.036) (0.054) (0.090) (0.073) (0.067) (0.138)1,843 3,038 3,038 3,038 3,038 3,038 3,038

0.190 0.042* -0.029+ 0.003 -0.009 -0.014 -0.051(0.127) (0.021) (0.017) (0.045) (0.028) (0.012) (0.055)24,914 34,196 34,196 34,196 34,196 34,196 34,196

0.233*** 0.042*** -0.021*** 0.000 -0.004 -0.003 -0.032(0.044) (0.009) (0.006) (0.010) (0.007) (0.005) (0.020)258,885 359,368 359,368 359,368 359,368 359,368 359,368

Sam

ple

Sam

ple

Note: Each baseline characteristic listed in the columns was used as a dependent variable, regressing the baseline characteristic on non-experimental estimate of a student's 2003-04 or 2004-05 teacher effect. The coefficients were estimated in separate regressions. Robust standard errors are reported in parentheses. All specifications include school by grade and grade by year fixed effects.

Non-experimental Teachers in Non-experimental Schools

Table 9. Baseline Student Characteristics Regressed on Non-Experimental Teacher Effects

Specification Used for Non-experimental Teacher Effect & Sample

Baseline Score

Baseline Demographics & Program Participation

Non-experimental Teachers in Experimental Schools

Non-experimental Teachers in Non-experimental Schools

Non-experimental Teachers in Experimental Schools

Page 31: Are Teacher-Level Value- Added Estimates Biased? An Experimental Validation of Non-Experimental Estimates Thomas J. KaneDouglas O. Staiger HGSEDartmouth

Why would current gains be related to prior teacher assignments?

We find teacher effect fading out Let VAt = value added of teacher in year t

ak = % left after k years Then At = VAt + a1VAt-1 + a2VAt-2 + …

Implies gains include % of prior teacher effect (At – At-1) = VAt + (a1 – 1)VAt-1 + (a2 – a1)VAt-2 + …

Our estimate of a1≈0.5 implies Variance of prior teacher effect would be roughly 25%

of the variance of current teacher effect. Prior teacher effect would enter with negative sign.

Does fade-out mean the non-structural approach would be biased? Do we need to estimate full human capital production function? Depends partially on correlation among VAjt,VAjt-1VAat-1…

,

Page 32: Are Teacher-Level Value- Added Estimates Biased? An Experimental Validation of Non-Experimental Estimates Thomas J. KaneDouglas O. Staiger HGSEDartmouth

Why would current gains be related to future teacher assignments?

Students are assigned to future teachers based on current performance. e.g., tracking, student sorting This is why the unadjusted mean end of year score was a

biased measure of teacher effects. (If differences in baseline scores were just random noise, mean student scores from the non-experimental period would have been a noisy but unbiased estimator).

In value-added regression, this generates relationship between future teacher assignment (in t+1) and current end-of-year score (in t) (that is, future teacher assignments are endogenous to current year gains).

We would expect future teacher assignments to be related to current gains, as Rothstein (2007) reports.

Page 33: Are Teacher-Level Value- Added Estimates Biased? An Experimental Validation of Non-Experimental Estimates Thomas J. KaneDouglas O. Staiger HGSEDartmouth

0.843*** 0.852*** 0.849***(0.033) (0.033) (0.033)

-0.488*** -0.488***(0.020) (0.020)

0.068**(0.025)

N: 40,672 40,672 40,672

Note: The 2003-04, 2004-05 and 2004-05 teacher effect regressors were estimated using data from 1999-2000 through 2002-03 excluding schools who participated in the experiment. Above we report the coefficients on the estimated 2004-05 teacher effect in predicting a student's 2004-05 scores. All specifications include school by grade and grade by year fixed effects, and student-level controls.

Math Levels with Student/Peer Controls - 2003-04 Teacher

Math Levels with No Controls - 2005-06 Teacher Using baseline score as dependent variable

Student-level Controls

Table 10. Effects of Current and Prior Year Teachers Using Non-Experimental Sample (Student-level Regressions)

Specification Used for Non-experimental Teacher EffectStudent-level

ControlsStudent-level

Controls

2004-05

Math Levels with Student/Peer Controls - 2004-05 Teacher

Page 34: Are Teacher-Level Value- Added Estimates Biased? An Experimental Validation of Non-Experimental Estimates Thomas J. KaneDouglas O. Staiger HGSEDartmouth

What is the variance in teacher effects on student achievement?

Non-Experimental Studies: Armour (1971), Hanushek (1976), McCaffrey et. al. (2004),

Murnane and Phillips (1981), Rockoff (2004), Hanushek, Rivkin and Kain (2005), Jacob and Lefgren (2005), Aaronson, Barrow and Sander (2007), Kane, Rockoff and Staiger (2006), Gordon, Kane and Staiger (2006)

Standard Deviation in teacher-effect estimated .10 to .25 student-level standard deviations.

Experimental Study (TN Class-Size Experiment): Nye, Konstantopoulous and Hedges (2004) Teachers and students were randomly assigned to classes of

various sizes, grades K through 3. Looked at teacher effect, net of class size category effects and

school effects. Standard Deviation in teacher-effect estimated .08 to .11 student-

level standard deviations. Even higher (.10 to .18) in low SES schools.

Page 35: Are Teacher-Level Value- Added Estimates Biased? An Experimental Validation of Non-Experimental Estimates Thomas J. KaneDouglas O. Staiger HGSEDartmouth

Teacher Effects

Teacher by Year Random

Effect

Mean Sample Size per Teacher

English Language Arts Levels with...No Controls 0.458 0.220 48.391Student/Peer Controls (incl. prior scores) 0.182 0.169 42.730Student/Peer Controls (incl. prior scores) & School F.E. 0.173 0.168 42.730Student Fixed Effects 0.082 0.041 48.391

English Language Arts Gains with...No Controls 0.186 0.205 44.366Student/Peer Controls 0.177 0.202 44.366Student/Peer Controls & School F.E. 0.170 0.202 44.366

Specification Used for Non-experimental Teacher Effect

Note: The above estimates are based on the total variance in estimated teacher fixed effects using observations from the pre-experimental data (years 1999-2000 through 2002-03). See the text for discussion of the estimation of the decomposition into teacher by year random effects, student-level error, and "actual" teacher effects. The sample was limited to schools with teachers in the experimental sample. Any individual students who were in the experiment were dropped from the pre-experimental estimation, to avoid any spurious relationship due to regression to the mean, etc.

Table 3: Non-experimental Specifications of Teacher Effects (cont.)

Standard Deviation of Each Component (in Student-level

Standard Deviation Units)

Page 36: Are Teacher-Level Value- Added Estimates Biased? An Experimental Validation of Non-Experimental Estimates Thomas J. KaneDouglas O. Staiger HGSEDartmouth

Interpretation of Coefficient on Lagged Student Performance

We estimate several non-experimental specifications Βo =0 (no controls), Βo =1 (“gains”), Βo <1 (“quasi-gains”)

and ask: Which yields unbiased estimates of teacher effects (μj)? Which minimizes the mean squared error in predicting

student outcomes?. We place no structural interpretation on Βo .

Βo presumably contains a number of different roles– (i) systematic selection of students to teachers, (ii) fade-out of prior educational inputs, (iii) measurement error.

These separate roles are difficult to identify. The various biases introduced may or may not be

offsetting.

ijtkk

kjtotijijtijtijt xAXA )1(