Teacher evaluation present

Using Tests and Value-Added Modeling for Teacher and Administrator Evaluation: A Consumer’s Guide

John Cronin, Ph.D. – Senior Director of Education ResearchNorthwest Evaluation Association

What NWEA supports

• The evaluation process should focus on helping teachers improve.

• The principal or designated evaluator should control the evaluation.

• Tests should inform principal decision-making and not be the deciding factor in an evaluation.

• Multiple measures should be used.

Distinguishing teacher effectiveness

from teacher evaluation

• Teacher effectiveness – The judgment of a teacher’s ability to positively impact learning in the classroom.

• Teacher evaluation – The judgment of a teacher’s overall performance including:

– Teacher effectiveness

– Common standards of job performance

– Participation in the school community

– Adherence to professional standards

Effective teaching and professional job performance

Evidence of professional

responsibilities

Evidence of student learning


practice

The evaluation of teaching by classroom observation and use of artifacts

The evaluation of the teacher’s effectiveness in making progress toward their goals and fulfilling the responsibilities of a professional educator.

The evaluation of a teacher’s contribution to student learning and growth

A simple framework for teacher evaluation



responsibilities



practice

Domains 1 – 4:• Instructional planning

and delivery• Knowledge of students

and learning• Content knowledge and

expertise• Learning environment

Domain 6: Professional practices and responsibilities

Domain 5: Data-driven practice20% Measure of student growth

A simple framework for teacher evaluation –Texas style



responsibilities



practice

Domains 1 – 4:• Instructional planning

and delivery• Knowledge of students

and learning• Content knowledge and

expertise• Learning environment

Domain 6: Professional practices and responsibilities

Domain 5: Data-driven practice20% Measure of student growth

A simple framework for teacher evaluation –Texas style

Purposes of summative evaluation

• Make an accurate and defensible judgment of an educator’s job performance.

• Provide ratings of performance that provide meaningful differentiation across educators.

• Help educators focus on their students and their practice.

• Retain your top educators.

• Dismiss ineffective educators.

The greatest tragedy of this century in

education so far, was the number of

young, talented teachers who lost their

positions in the last recession.

Employment of Elementary Teachers

2007-2012

1538000 1544270 1544300

1485600

1415000

1360380

2007 2008 2009 2010 2011 2012

NUMBER OF TEACHERS

Source: (2012, May) Bureau of Labor Statistics – Occupational Employment Statistics Numbers exclude special education and kindergarten teachers

The elementary school teacher workforce shrunk by 178,000 teachers (11%) between May, 2007 and May, 2012.

The impact of seniority based layoffs on

school quality

Source: Boyd, L., Lankford, H., Loeb, S., and Wycoff, J. (2011). Center for Education Policy. Stanford University.

In a simulation study of implementation of a layoff of 5% of teachers using New York City data, reliance on seniority based layoffs resulted would:

• Result in 25% more teachers laid off.

• Teachers laid off would be .31 standard deviations more effective (using a value-added criterion) than those lost using an effectiveness criterion.

• 84% of teachers with unsatisfactory ratings would be retained.

What teacher effectiveness infers

• Evidence of Learning – A claim that the improvement in learning (or lack of it) reflected on one or more tests is caused by the teacher.

• Evidence of good practice – That the observers ratings or conclusions are reliableand associated with behaviors that cause improved learning in the classroom.

The evolving evaluation landscape – principal observation

Teacher observation as a part of

teacher evaluation

Systematic observation of teacher performance is a central part of every state’s teacher evaluation plan.

If performance ratings aren’t consistent with teacher growth, the media and public will demand to know why.

“The (Race to the Top teacher evaluation) changes, already under way in some cities and states, are intended to provide meaningful feedback and, critically, to weed out weak performers. And here are some of the early results:

In Florida, 97 percent of teachers were deemed effective or highly effective in the most recent evaluations. In Tennessee, 98 percent of teachers were judged to be “at expectations.” In Michigan, 98 percent of teachers were rated effective or better.”

Source: New York Times (2013, March 30). Curious Grade for Teachers: Nearly all Pass. Retrieved from: http://www.nytimes.com/2013/03/31/education/curious-grade-for-teachers-nearly-all-pass.html?pagewanted=all&_r=0

http://www.nytimes.com/2013/03/31/education/curious-grade-for-teachers-nearly-all-pass.html?pagewanted=all&_r=0

Learn from the experience of others

1% 2%

75%

23%

Evaluator Rating

ineffective

Minimally Effective

Effective

Highly Effective

Results of Georgia Teacher Evaluation

Pilot

Florida District

Highly Effective

Effective Needs Improvement

Developing Unsatisfactory VA Score Florida Ranking

Ranking

1 44.4% 55.6% 0.0% 0.0% 0.0%

2 25.0% 75.0% 0.0% 0.0% 0.0%

3 90.9% 9.1% 0.0% 0.0% 0.0%

4 60.7% 39.3% 0.0% 0.0% 0.0%

5 81.2% 18.8% 0.0% 0.0% 0.0%

6 37.3% 54.2% 1.7% 0.0% 6.8%

7 81.3% 18.8% 0.0% 0.0% 0.0%

8 41.7% 55.6% 1.4% 1.4% 0.0%

9 52.2% 47.8% 0.0% 0.0% 0.0%

10 27.0% 66.2% 1.4% 0.0% 5.4%

11 7.1% 72.6% 9.5% 10.7% 0.0%

Teacher Evaluation Ratings in Eleven Florida

Schools - 2013

Florida District

Highly Effective

Effective

Needs Improvement

Developing

Unsatisfactory

VA Score

Florida Ranking

Ranking

1 44.4% 55.6% 0.0% 0.0% 0.0% 0.39 109 1

2 25.0% 75.0% 0.0% 0.0% 0.0% 0.37 121 2

3 90.9% 9.1% 0.0% 0.0% 0.0% -0.14 2802 9

4 60.7% 39.3% 0.0% 0.0% 0.0% -0.14 2797 8

5 81.2% 18.8% 0.0% 0.0% 0.0% -0.16 2831 10

6 37.3% 54.2% 1.7% 0.0% 6.8% 0.12 880 5

7 81.3% 18.8% 0.0% 0.0% 0.0% 0.22 402 3

8 41.7% 55.6% 1.4% 1.4% 0.0% -0.34 3274 11

9 52.2% 47.8% 0.0% 0.0% 0.0% 0.16 664 4

10 27.0% 66.2% 1.4% 0.0% 5.4% 0 1764 6

11 7.1% 72.6% 9.5% 10.7% 0.0% -0.08 2445 7

Teacher Evaluation Ratings in Eleven Florida

Schools - 2013

Florida District

Highly Effective



1 44.4% 55.6% 0.0% 0.0% 0.0%

2 25.0% 75.0% 0.0% 0.0% 0.0%

3 90.9% 9.1% 0.0% 0.0% 0.0%

4 60.7% 39.3% 0.0% 0.0% 0.0%

5 81.2% 18.8% 0.0% 0.0% 0.0%

6 37.3% 54.2% 1.7% 0.0% 6.8%

7 81.3% 18.8% 0.0% 0.0% 0.0%

8 41.7% 55.6% 1.4% 1.4% 0.0%

9 52.2% 47.8% 0.0% 0.0% 0.0%

10 27.0% 66.2% 1.4% 0.0% 5.4%

11 7.1% 72.6% 9.5% 10.7% 0.0%

Teacher Evaluation Ratings in Six Florida

Districts 2013

Florida District

Highly Effective



1 44.4% 55.6% 0.0% 0.0% 0.0% 0.39 109

2 25.0% 75.0% 0.0% 0.0% 0.0% 0.37 121

3 90.9% 9.1% 0.0% 0.0% 0.0% -0.14 2802

4 60.7% 39.3% 0.0% 0.0% 0.0% -0.14 2797

5 81.2% 18.8% 0.0% 0.0% 0.0% -0.16 2831

6 37.3% 54.2% 1.7% 0.0% 6.8% 0.12 880

7 81.3% 18.8% 0.0% 0.0% 0.0% 0.22 402

8 41.7% 55.6% 1.4% 1.4% 0.0% -0.34 3274

9 52.2% 47.8% 0.0% 0.0% 0.0% 0.16 664

10 27.0% 66.2% 1.4% 0.0% 5.4% 0 1764

11 7.1% 72.6% 9.5% 10.7% 0.0% -0.08 2445

Teacher Evaluation Ratings in Six Florida

Districts 2013

The actual proportion of teachers for which student growth can be measured through the state assessment.

25%

http://www.nwea.org/sites/www.nwea.org/files/resources/MakeAssessmentMatter_5-2014.pdf

Ineffective (Growth

Measures)

Developing (Growth Measures) Effective (Growth Measures) Highly Effective (Growth Measures)

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

In

eff

ecti

ve (

Ob

servati

on

al)

0 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2

1 2 3 4 4 4 4 5 5 5 5 5 5 5 5 5 5 5 5 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6

2 2 4 5 6 6 6 7 7 7 7 7 8 8 8 8 8 8 8 8 8 8 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9

3 2 5 6 7 7 8 8 9 9 9 10 10 10 10 10 10 11 11 11 11 11 11 11 11 11 11 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12

4 3 5 7 8 9 9 10 10 11 11 11 12 12 12 12 13 13 13 13 13 13 14 14 14 14 14 14 14 14 14 15 15 15 15 15 15 15 15 15 15 15

5 3 6 8 9 10 11 11 12 12 13 13 14 14 14 14 15 15 15 15 16 16 16 16 16 16 16 17 17 17 17 17 17 17 17 17 18 18 18 18 18 18

6 3 6 8 10 11 12 13 13 14 14 15 15 16 16 16 17 17 17 17 18 18 18 18 18 19 19 19 19 19 19 19 20 20 20 20 20 20 20 20 20 21

7 3 7 9 11 12 13 14 15 15 16 16 17 17 18 18 18 19 19 19 20 20 20 20 20 21 21 21 21 21 22 22 22 22 22 22 22 23 23 23 23 23

8 3 7 10 11 13 14 15 16 17 17 18 18 19 19 20 20 20 21 21 21 22 22 22 23 23 23 23 23 24 24 24 24 24 24 25 25 25 25 25 25 25

9 3 8 10 12 14 15 16 17 18 18 19 20 20 21 21 22 22 23 23 23 24 24 24 24 25 25 25 25 26 26 26 26 26 27 27 27 27 27 27 28 28

10 3 8 11 13 14 16 17 18 19 20 20 21 22 22 23 23 24 24 25 25 25 26 26 26 27 27 27 27 28 28 28 28 29 29 29 29 29 29 30 30 30

11 3 8 11 13 15 17 18 19 20 21 22 22 23 24 24 25 25 26 26 27 27 27 28 28 28 29 29 29 30 30 30 30 31 31 31 31 31 32 32 32 32

12 4 8 12 14 16 17 19 20 21 22 23 24 24 25 26 26 27 27 28 28 29 29 29 30 30 30 31 31 31 32 32 32 33 33 33 33 33 34 34 34 34

13 4 9 12 14 16 18 20 21 22 23 24 25 26 26 27 28 28 29 29 30 30 31 31 31 32 32 33 33 33 34 34 34 34 35 35 35 35 36 36 36 36

14 4 9 12 15 17 19 20 22 23 24 25 26 27 27 28 29 30 30 31 31 32 32 33 33 33 34 34 35 35 35 36 36 36 37 37 37 37 38 38 38 38

15 4 9 13 15 18 19 21 23 24 25 26 27 28 29 29 30 31 31 32 33 33 34 34 35 35 35 36 36 37 37 37 38 38 38 39 39 39 40 40 40 40

Develo

pin

g (

Ob

servati

on

al)

16 4 9 13 16 18 20 22 23 25 26 27 28 29 30 31 31 32 33 33 34 35 35 36 36 37 37 37 38 38 39 39 39 40 40 40 41 41 41 42 42 42

17 4 9 13 16 19 21 23 24 25 27 28 29 30 31 32 33 33 34 35 35 36 37 37 38 38 39 39 39 40 40 41 41 42 42 42 43 43 43 44 44 44

18 4 10 14 17 19 21 23 25 26 28 29 30 31 32 33 34 35 35 36 37 37 38 38 39 40 40 41 41 41 42 42 43 43 44 44 44 45 45 45 46 46

19 4 10 14 17 20 22 24 26 27 28 30 31 32 33 34 35 36 36 37 38 39 39 40 40 41 42 42 43 43 43 44 44 45 45 46 46 46 47 47 47 48

20 4 10 14 17 20 22 24 26 28 29 31 32 33 34 35 36 37 38 38 39 40 41 41 42 42 43 43 44 45 45 45 46 46 47 47 48 48 48 49 49 49

21 4 10 14 18 21 23 25 27 29 30 31 33 34 35 36 37 38 39 40 40 41 42 42 43 44 44 45 45 46 46 47 47 48 48 49 49 50 50 50 51 51

22 4 10 15 18 21 23 26 27 29 31 32 34 35 36 37 38 39 40 41 42 42 43 44 44 45 46 46 47 47 48 48 49 49 50 50 51 51 52 52 52 53

23 4 10 15 18 21 24 26 28 30 31 33 34 36 37 38 39 40 41 42 43 43 44 45 46 46 47 48 48 49 49 50 50 51 51 52 52 53 53 54 54 54

24 4 11 15 19 22 24 27 29 31 32 34 35 36 38 39 40 41 42 43 44 45 45 46 47 48 48 49 50 50 51 51 52 52 53 53 54 54 55 55 56 56

25 4 11 15 19 22 25 27 29 31 33 34 36 37 39 40 41 42 43 44 45 46 47 47 48 49 50 50 51 52 52 53 53 54 54 55 55 56 56 57 57 58

26 4 11 16 19 23 25 28 30 32 34 35 37 38 39 41 42 43 44 45 46 47 48 49 49 50 51 51 52 53 53 54 55 55 56 56 57 57 58 58 59 59

27 4 11 16 20 23 26 28 30 32 34 36 37 39 40 42 43 44 45 46 47 48 49 50 50 51 52 53 53 54 55 55 56 57 57 58 58 59 59 60 60 61

28 4 11 16 20 23 26 29 31 33 35 37 38 40 41 42 44 45 46 47 48 49 50 51 52 52 53 54 55 55 56 57 57 58 59 59 60 60 61 61 62 62

29 4 11 16 20 24 26 29 31 34 35 37 39 40 42 43 45 46 47 48 49 50 51 52 53 54 54 55 56 57 57 58 59 59 60 61 61 62 62 63 63 64

30 4 11 16 20 24 27 30 32 34 36 38 40 41 43 44 45 47 48 49 50 51 52 53 54 55 56 56 57 58 59 59 60 61 61 62 62 63 64 64 65 65

Eff

ecti

ve (

Ob

servati

on

al)

31 4 11 17 21 24 27 30 32 35 37 39 40 42 43 45 46 47 49 50 51 52 53 54 55 56 57 57 58 59 60 61 61 62 63 63 64 64 65 66 66 67

32 4 11 17 21 25 28 30 33 35 37 39 41 43 44 46 47 48 50 51 52 53 54 55 56 57 58 59 59 60 61 62 62 63 64 64 65 66 66 67 68 68

33 4 12 17 21 25 28 31 33 36 38 40 42 43 45 46 48 49 50 52 53 54 55 56 57 58 59 60 61 61 62 63 64 64 65 66 66 67 68 68 69 69

34 4 12 17 21 25 28 31 34 36 38 40 42 44 46 47 49 50 51 53 54 55 56 57 58 59 60 61 62 63 63 64 65 66 66 67 68 68 69 70 70 71

35 4 12 17 22 25 29 32 34 37 39 41 43 45 46 48 49 51 52 53 55 56 57 58 59 60 61 62 63 64 64 65 66 67 68 68 69 70 70 71 72 72

36 4 12 17 22 26 29 32 35 37 39 41 43 45 47 49 50 52 53 54 55 57 58 59 60 61 62 63 64 65 66 66 67 68 69 69 70 71 72 72 73 74

37 4 12 17 22 26 29 32 35 38 40 42 44 46 48 49 51 52 54 55 56 58 59 60 61 62 63 64 65 66 67 68 68 69 70 71 71 72 73 74 74 75

38 4 12 18 22 26 30 33 36 38 40 43 45 46 48 50 52 53 55 56 57 58 60 61 62 63 64 65 66 67 68 69 69 70 71 72 73 73 74 75 75 76

39 4 12 18 22 26 30 33 36 39 41 43 45 47 49 51 52 54 55 57 58 59 61 62 63 64 65 66 67 68 69 70 71 71 72 73 74 75 75 76 77 77

40 4 12 18 23 27 30 33 36 39 41 44 46 48 50 51 53 55 56 57 59 60 61 63 64 65 66 67 68 69 70 71 72 73 73 74 75 76 77 77 78 79

41 4 12 18 23 27 31 34 37 39 42 44 46 48 50 52 54 55 57 58 60 61 62 63 65 66 67 68 69 70 71 72 73 74 75 75 76 77 78 78 79 80

42 5 12 18 23 27 31 34 37 40 42 45 47 49 51 53 54 56 58 59 60 62 63 64 66 67 68 69 70 71 72 73 74 75 76 76 77 78 79 80 80 81

43 5 12 18 23 27 31 34 37 40 43 45 47 49 51 53 55 57 58 60 61 63 64 65 66 68 69 70 71 72 73 74 75 76 77 78 78 79 80 81 82 82

44 5 12 18 23 28 31 35 38 41 43 46 48 50 52 54 56 57 59 60 62 63 65 66 67 69 70 71 72 73 74 75 76 77 78 79 80 80 81 82 83 84

45 5 13 19 24 28 32 35 38 41 44 46 48 51 53 54 56 58 60 61 63 64 66 67 68 69 71 72 73 74 75 76 77 78 79 80 81 82 82 83 84 85

Hig

hly

Eff

ecti

ve (

Ob

servati

on

al)

46 5 13 19 24 28 32 35 39 41 44 47 49 51 53 55 57 59 60 62 63 65 66 68 69 70 71 73 74 75 76 77 78 79 80 81 82 83 83 84 85 86

47 5 13 19 24 28 32 36 39 42 45 47 49 52 54 56 58 59 61 63 64 66 67 69 70 71 72 74 75 76 77 78 79 80 81 82 83 84 85 85 86 87

48 5 13 19 24 29 32 36 39 42 45 47 50 52 54 56 58 60 62 63 65 66 68 69 71 72 73 74 76 77 78 79 80 81 82 83 84 85 86 87 87 88

49 5 13 19 24 29 33 36 40 43 45 48 50 53 55 57 59 61 62 64 66 67 69 70 71 73 74 75 77 78 79 80 81 82 83 84 85 86 87 88 89 89

50 5 13 19 24 29 33 37 40 43 46 48 51 53 55 57 59 61 63 65 66 68 69 71 72 74 75 76 77 79 80 81 82 83 84 85 86 87 88 89 90 90

51 5 13 19 25 29 33 37 40 43 46 49 51 54 56 58 60 62 64 65 67 69 70 72 73 74 76 77 78 79 81 82 83 84 85 86 87 88 89 90 91 92

52 5 13 19 25 29 33 37 41 44 47 49 52 54 56 58 61 62 64 66 68 69 71 72 74 75 77 78 79 80 82 83 84 85 86 87 88 89 90 91 92 93

53 5 13 19 25 30 34 37 41 44 47 50 52 55 57 59 61 63 65 67 68 70 72 73 75 76 77 79 80 81 82 84 85 86 87 88 89 90 91 92 93 94

54 5 13 20 25 30 34 38 41 44 47 50 53 55 57 60 62 64 66 67 69 71 72 74 75 77 78 80 81 82 83 85 86 87 88 89 90 91 92 93 94 95

55 5 13 20 25 30 34 38 41 45 48 50 53 56 58 60 62 64 66 68 70 71 73 75 76 78 79 80 82 83 84 85 87 88 89 90 91 92 93 94 95 96

56 5 13 20 25 30 34 38 42 45 48 51 54 56 58 61 63 65 67 69 70 72 74 75 77 78 80 81 82 84 85 86 87 89 90 91 92 93 94 95 96 97

57 5 13 20 25 30 35 38 42 45 48 51 54 56 59 61 63 65 67 69 71 73 74 76 78 79 81 82 83 85 86 87 88 90 91 92 93 94 95 96 97 98

58 5 13 20 26 30 35 39 42 46 49 52 54 57 59 62 64 66 68 70 72 73 75 77 78 80 81 83 84 85 87 88 89 90 92 93 94 95 96 97 98 99

59 5 13 20 26 31 35 39 43 46 49 52 55 57 60 62 64 66 68 70 72 74 76 77 79 81 82 83 85 86 88 89 90 91 92 94 95 96 97 98 99 100

60 5 13 20 26 31 35 39 43 46 49 52 55 58 60 63 65 67 69 71 73 75 76 78 80 81 83 84 86 87 88 90 91 92 93 95 96 97 98 99 100 101

The New York Evaluation Matrix

0

10

20

30

40

50

60

70

80

90

100

60 70 80 90 100

Principal Rating Value-added rating

Why differentiating ratings is

important

3.65%

7.22%

44% 44.23%

1.43% 4.59%

38.71%

55.27%

0.33% 1.94%

45.33%

52.40%

0.00%

10.00%

20.00%

30.00%

40.00%

50.00%

60.00%

Ineffective Developing Effective Highly Effective

New York Teacher Ratings by Component

Value-Added Locat Assessment Prinicpal Observation

New York Teacher Ratings

Value-Added Local Assessment

Principal Observation

Ineffective 4216 1347 306

Developing 8337 4334 1793

Effective 51660 36508 41953

Highly Effective 51080 52132 48503

Bill and Melina Gates Foundation (2013, January). Ensuring Fair and Reliable

Measures of Effective Teaching: Culminating Findings from the MET Projects Three-

Year Study

Observation by Reliability coefficient(relative to state test value-added gain)

Proportion of test variance explained

Principal – 1 .51 26.0%

Principal – 2 .58 33.6%

Principal and other administrator .67 44.9%

Principal and three short observations by peer observers

.67 44.9%

Two principal observations and two peer observations

.66 43.6%

Two principal observations and two different peer observers

.69 47.6%

Two principal observations one peer observation and three short observations by peers

.72 51.8%

Reliability of a variety of teacher observation

implementations

Non-cognitive factors

Jackson (2012) argues that teachers may have more impact on non-cognitive factors that are essential to student success like attendance, grades, and suspensions.

In education, value-added measurement has focused policy-makers on the teacher’s contribution to academic success, as reflected in test scores.

These are not the only measures that matter however.

Employing value-added methodologies, Jackson found that teachers had a substantive effect on non-cognitive outcomes that was independentof their effect on test scores

• Lowered the average student absenteeism by 7.4 days.

• Improved the probability that students would enroll in the next grade by 5 percentage points.

• Reduced the likelihood of suspension by 2.8%

• Improved the average GPA by .09 (Algebra) or .05 (English)

Source: Jackson, K. (2013). Non-Cognitive Ability, Test Scores and Teacher Quality: Evidence from 9th Grade Teachers in North Carolina. Northwestern University and NBER

Non-cognitive factors

The evolving evaluation landscape – testing and

growth measurement

Two ways tests are used in evaluation

and their claims

• Produces rankings of teachers relative to each other based on assessment results.

• Introduces controls to account for factors that may influence growth that are outside the teachers influence.

• Advances a claim of causation – that the teachers ranking is based on learning caused.

• Can be applied to as few as 20% of the teachers in a school system (Whitehurst, 2013).

Value-Added measures

Whitehurst, G. J. (2013). Teacher value- added: Do we want a ten percent solution? The Brown Center Chalkboard, April 24. Washington, DC: Brookings Institution. Retrieved October 2, 2014, from www.brookings.edu/blogs/brown-center-chalkboard/posts/2013/04/24-merit-pay-whitehurst

• Are a contract negotiated between the principal and teacher around student results.

• Do not produce rankings that compare teacher results across settings

• Do not introduce controls to account for factors that may influence growth that are outside the teachers influence.

• Do not advance a claim of causation – teacher competence is demonstrated by fulfillment of the contract

Student Learning Objectives

Percent of students who say they do not receive their state accountability test results.

37%

Make Assessment Matter: Students and Educators Want Tests that Support Learning (2014). –Portland, OR. NWEA and Grunwald Associates LLC.

http://www.nwea.org/sites/www.nwea.org/files/resources/MakeAssessmentMatter_5-2014.pdf

Issues in the use of growth and value-

added measures

Differences among value-added

models

Los Angeles Times Study

Los Angeles Times Study #2

http://projects.latimes.com/value-added/value-added-comparison#stacey-nelson

http://projects.latimes.com/value-added/value-added-comparison#arturo-gustavo-abarca

Issues in the use of value-added measures

Control for statistical error

All models attempt to address this

issue. Nevertheless, many teachers

value-added scores will fall within

the range of statistical error.

What Makes Schools Work Study -Mathematics

-10.0

-5.0

0.0

5.0

10.0

15.0

-10.0 -5.0 0.0 5.0 10.0 15.0

Year

2

Year 1

Value-added index by teacher

Data used represents a portion of the teachers who participated in Vanderbilt

University’s What Makes Schools Work Project, funded by the federal Institute of

Education Sciences

Issues in the use of value-added

measures

The choice of value-added model

The choice of model has an

important impact on teacher ratings.

Issues in the use of growth measures

The choice of test.

Many assessments are not

designed to measure growth.

Others do not measure growth

equally well for all students.

Tests are not equally accurate for all

students

California STAR NWEA MAP

-12.00

-11.00

-10.00

-9.00

-8.00

-7.00

-6.00

-5.00

-4.00

-3.00

-2.00

-1.00

0.00

1.00

2.00

3.00

4.00

5.00

6.00

7.00

8.00

9.00

10.00

11.00

12.00

Ave

rage

Gro

wth

Ind

ex

Sco

re a

nd

Ran

ge

Mathematics Growth Index Distribution by Teacher - Validity Filtered

Q5

Q4

Q3

Q2

Q1

Each line in this display represents a single teacher. The graphic

shows the average growth index score for each teacher (green

line), plus or minus the standard error of the growth index estimate

(black line). We removed students who had tests of questionable

validity and teachers with fewer than 20 students.

Range of teacher value-added

estimates

Issues in the use of growth and value-added measures

“Among those who ranked in the top

category on the TAKS reading test, more

than 17% ranked among the lowest two

categories on the Stanford. Similarly

more than 15% of the lowest value-added

teachers on the TAKS were in the highest

two categories on the Stanford.”

Corcoran, S., Jennings, J., & Beveridge, A., Teacher Effectiveness on High and Low

Stakes Tests, Paper presented at the Institute for Research on Poverty summer

workshop, Madison, WI (2010).

Three ways tests are used in

evaluation and their issues

• Do not provide evidence of teacher effectiveness.• Teachers using SLOs may be evaluated against less

rigorous criteria than teachers evaluated by value-added methods.

• Goals are not consistent in difficulty.• Goals are not consistent across teachers.

Student Learning Objectives

New York Teacher Ratings

Value-Added Local Assessment

Principal Observation

Ineffective 4216 1347 306

Developing 8337 4334 1793

Effective 51660 36508 41953

Highly Effective 51080 52132 48503

Ultimately – the principal should

decide

• Evaluation inherently involves judgment – not a bad thing.

• Evidence should inform and not direct their judgment.

• The implemented system should differentiate performance.

• Courts respect the judgment of school administrators relative to personnel decisions.

If evaluators do not differentiate their ratings, then all differentiation comes from the test.

“The (Race to the Top teacher evaluation) changes, already under way in some cities and states, are intended to provide meaningful feedback and, critically, to weed out weak performers. And here are some of the early results:

In Florida, 97 percent of teachers were deemed effective or highly effective in the most recent evaluations. In Tennessee, 98 percent of teachers were judged to be “at expectations.” In Michigan, 98 percent of teachers were rated effective or better.”

Source: New York Times (2013, March 30). Curious Grade for Teachers: Nearly all Pass. Retrieved from: http://www.nytimes.com/2013/03/31/education/curious-grade-for-teachers-nearly-all-pass.html?pagewanted=all&_r=0

http://www.nytimes.com/2013/03/31/education/curious-grade-for-teachers-nearly-all-pass.html?pagewanted=all&_r=0

The importance of non-cognitive factors in teacher evaluation

Solving one problem can sometimes create another.

Suggested reading

Baker B., Oluwole, J., Green, P. (2013). The legal consequences of mandating high stakes decisions based on low quality information: Teacher evaluation in the Race to the Top Era. Education Policy Analysis Archives. Vol 21. No 5.