55
Presenter - John Cronin, Ph.D. Contacting us: Rebecca Moore: 503-548-5129 E-mail: [email protected] Visit our website: www.kingsburycenter.org Using tests for high stakes evaluation, what educators need to know in Connecticut

Connecticut mesuring and modeling growth

Embed Size (px)

Citation preview

Page 1: Connecticut   mesuring and modeling growth

Presenter - John Cronin, Ph.D.

Contacting us:Rebecca Moore: 503-548-5129E-mail: [email protected]

Visit our website: www.kingsburycenter.org

Using tests for high stakes evaluation, what educators need to know in Connecticut

Page 2: Connecticut   mesuring and modeling growth

Connecticut requirements

• Components of the evaluation– Student growth (45%) - including the state test, one non-standardized

indicator, and (optional) one other standardized indicator.• Requires a beginning of the year, mid-year, and end-of year conference

– Teacher practice and performance (40%) – • First and second year teachers – 3 in-class observations• Developing or below standard – 3 in-class observations• Proficient or exemplary – 3 observations of practice, one in-class

– Whole-school learning indicator or student feedback (5%)– Parent or peer feedback (10%)

Page 3: Connecticut   mesuring and modeling growth

Connecticut requirements

Requirements for goal setting• Process has each teacher set one to four goals with their principal taking into

account:• Take into account the academic track record and overall needs and strengths of

the students the teacher is teaching that year/semester;• Address the most important purposes of a teacher’s assignment through self-

reflection;• Be aligned with school, district and state student achievement objectives;• Take into account their students’ starting learning needs vis a vis relevant baseline

data when available.• Consideration of control factors tracked by the state-wide public school

information system that may influence teacher performance ratings, including, but not limited to, student characteristics, student attendance and student mobility

Page 4: Connecticut   mesuring and modeling growth

What changes for educators?

1.The proficiency standards get higher.2.Teachers become accountable for all students.

Page 5: Connecticut   mesuring and modeling growth

Difficulty of ACT college readiness standards

Page 6: Connecticut   mesuring and modeling growth

Moving from Proficiency to Growth

All students count when accountability is measured

through growth.

Page 7: Connecticut   mesuring and modeling growth

One district’s change in 5th grade math performance relative to Kentucky cut scores

One district’s change in 5th grade math performance relative to Kentucky cut scores

proficiency college readiness

Page 8: Connecticut   mesuring and modeling growth

Number of 5th grade students meeting math growth target in the same district

Number of 5th grade students meeting math growth target in the same district

Page 9: Connecticut   mesuring and modeling growth

How does the process work?

Page 10: Connecticut   mesuring and modeling growth

How does the process work?

Page 11: Connecticut   mesuring and modeling growth

Connecticut requirements

• Criteria for student growth indicator– Fair to students

• The indicator of academic growth and development is used in such a way as to provide students an opportunity to show that they have met or are making progress in meeting the learning objective. The use of the indicator of academic growth and development is as free as possible from bias and stereotype.

– Fair to teachers • The use of an indicator of academic growth and development is fair when a teacher has the

professional resources and opportunity to show that his/her students have made growth and when the indicator is appropriate to the teacher’s content, assignment and class composition.

– Reliable– Valid– Useful

• The indicator may be used to provide the teacher with meaningful feedback about student knowledge, skills, perspective and classroom experience that may be used to enhance student learning and provide opportunities for teacher professional growth and development.

Page 12: Connecticut   mesuring and modeling growth

Issues in the use of growth and value-added measures

Measurement design of the instrument

Many assessments are not designed to measure growth. Others do not measure growth equally well for all students.

Page 13: Connecticut   mesuring and modeling growth

Tests are not equally accurate for all students

California STAR NWEA MAP

Page 14: Connecticut   mesuring and modeling growth

Tests are not equally accurate for all students

Grade 6 New York Mathematics

Page 15: Connecticut   mesuring and modeling growth

Issues in the use of growth and value-added measures

Measurement sensitivity

Assessments must align with the curriculum and should be instructionally sensitive.

Page 16: Connecticut   mesuring and modeling growth

College and career readiness assessments will not necessarily be instructionally sensitive

When ability in science is defined in terms of scientific reasoning…achievement will be less closely tied to age and exposure, and more closely related to general intelligence. In other words, science reasoning tasks are relatively insensitive to instruction.

…when science is defined in terms of knowledge of facts that are taught in school…(then) those students who have been taught the facts will know them, and those who have not will…not. A test that assesses these skills is likely to be highly sensitive to instruction.

A third case might arise in the discussion of ethical and moral dimensions of science, where maturity, rather than intelligence or curriculum exposure might be the most important factor. Here it may well be that the assessment is not particularly sensitive to instruction

Black, P. and Wiliam, D.(2007) 'Large-scale assessment systems: Design principles drawn from international comparisons', Measurement: Interdisciplinary Research & Perspective, 5: 1, 1 — 53

Page 17: Connecticut   mesuring and modeling growth

Issues in the use of growth and value-added measures

Measurement sensitivity

Classroom tests, which are designed to measure mastery, may not measure improvement well.

Page 18: Connecticut   mesuring and modeling growth

Issues in the use of growth and value-added measures

Instructional alignment

Tests should align to the teacher’s instructional responsibilities.

Page 19: Connecticut   mesuring and modeling growth

Issues in the use of growth and value-added measures

Uncovered Subjects and Teachers

High quality tests may not be administered, or available, for many teachers and grades. Subjects like social studies may be particularly problematic.

Page 20: Connecticut   mesuring and modeling growth

Considerations for developing your own assessment and student learning objectives

• Developing valid instruments is very time consuming and resource intensive.

• The assessments developed must discriminate between effective and ineffective teachers.

• The assessments must be valid in other respects.– Aligned to curriculum– Unbiased items

• The assessments can’t be open to security violations or cheating

Page 21: Connecticut   mesuring and modeling growth

How does the process work?

Page 22: Connecticut   mesuring and modeling growth

Issues in the use of growth and value-added measures

Control for statistical error

All models attempt to address this issue. Nevertheless, many teachers value-added scores will fall within the range of statistical error.

Page 23: Connecticut   mesuring and modeling growth

Sources of error in assessment

• The students.• The testing conditions.• The assessments.

Measurement error in the assessments can be dwarfed by error introduced by the testing conditions and the students.

Page 25: Connecticut   mesuring and modeling growth

Range of teacher value-added estimates

Page 26: Connecticut   mesuring and modeling growth

Issues in the use of growth and value-added measures

“Among those who ranked in the top category on the TAKS reading test, more than 17% ranked among the lowest two categories on the Stanford. Similarly more than 15% of the lowest value-added teachers on the TAKS were in the highest two categories on the Stanford.”

Corcoran, S., Jennings, J., & Beveridge, A., Teacher Effectiveness on High and Low Stakes Tests, Paper presented at the Institute for Research on Poverty summer workshop, Madison, WI (2010).

Page 27: Connecticut   mesuring and modeling growth

Issues in the use of growth and value-added measures

Instability of results

A variety of factors can cause value-added results to lack stability.

Results are more likely to be stable at the extremes. The use of multiple-years of data is highly recommended.

Page 29: Connecticut   mesuring and modeling growth

“Significant evidence of bias plagued the value-added model estimated for the Los Angeles Times in 2010, including significant patterns of racial disparities in teacher ratings both by the race of the student served and by the race of the teachers (see Green, Baker and Oluwole, 2012). These model biases raise the possibility that Title VII disparate impact claims might also be filed by teachers dismissed on the basis of their value-added estimates. 

Additional analyses of the data, including richer models using additional variables mitigated substantial portions of the bias in the LA Times models (Briggs & Domingue, 2010).”

Baker, B. (2012, April 28). If it’s not valid, reliability doesn’t matter so much! More on VAM-ing & SGP-ing Teacher Dismissal.

Possible racial bias in models

Page 30: Connecticut   mesuring and modeling growth

“The findings indicate that these modeling choices can significantly influence outcomes for individual teachers, particularly those in the tails of the performance distribution who are most likely to be targeted by high-stakes policies.” 

Ballou, D., Mokher, C. and Cavalluzzo, L. (2012) Using Value-Added Assessment for Personnel Decisions: How Omitted Variables and Model Specification Influence Teachers’ Outcomes.

Instability at the tails of the distribution

LA Times Teacher #1LA Times Teacher #2

Page 31: Connecticut   mesuring and modeling growth

Teachers with growth scores in lowest and highest quintile over two years using NWEA’s Measures of Academic Progress

Bottom quintile Y1&Y2

Top quintile Y1&Y2

Number 59/493 63/493

Percent 12% 13%

r .64 r2 .41

Typical r values for measures of teaching effectiveness range between .30 and .60 (Brown Center on Education Policy, 2010)

Reliability of teacher value-added estimates

Page 32: Connecticut   mesuring and modeling growth

How does the process work?

Page 33: Connecticut   mesuring and modeling growth

Challenges with goal setting

• Lack of a “racing form”. What have this teacher and these students done in the past?

• Lack of comparison groups. What have other teachers done in the past.

• What is the objective? Is the objective to meet a standard of performance or demonstrate improvement?

• Do you set safety goals or stretch goals?

Page 34: Connecticut   mesuring and modeling growth

Issues in the use of growth and value-added measures

Model Wars

There are a variety of models in the marketplace. These models may come to different conclusions about the effectiveness of a teacher or school. Differences in findings are more likely to happen at the extremes.

Page 35: Connecticut   mesuring and modeling growth

Issues in the use of growth and value-added measures

Lack of random assignment

The use of a value-added model assumes that the school doesn’t add a source of variation that isn’t controlled for in the model.

e.g. Young teachers are assigned disproportionate numbers of students with poor discipline records.

Page 36: Connecticut   mesuring and modeling growth

How does the process work?

Page 37: Connecticut   mesuring and modeling growth

New York Rating System

• 60 points assigned from classroom observation• 20 points assigned from state assessment• 20 points assigned from local assessment• A score of 64 or less is rated ineffective.

Page 38: Connecticut   mesuring and modeling growth
Page 39: Connecticut   mesuring and modeling growth

Connecticut requirements

Page 40: Connecticut   mesuring and modeling growth

Other issues

Security and Cheating

When measuring growth, one teacher who cheats disadvantages the next teacher.

Page 41: Connecticut   mesuring and modeling growth

Other issues (1) Each district shall define effectiveness and

ineffectiveness utilizing a pattern of summative ratings derived from the new evaluation system.

(2) At the request of a district or employee, the State Department of Education or a third-party entity approved by the SDE will audit the evaluation components that are combined to determine an individual's summative rating in the event that such components are significantly dissimilar (i.e. include both exemplary and below standard ratings) to determine a final summative rating.

(3) The State Department of Education or a third-party designated by the SDE will audit evaluations ratings of exemplary and below standard to validate such exemplary or below standard ratings by selecting ten districts at random annually

Page 42: Connecticut   mesuring and modeling growth

Other issues

Security and Cheating

When measuring growth, one teacher who cheats disadvantages the next teacher.

Page 43: Connecticut   mesuring and modeling growth

Cheating

Atlanta Public SchoolsCrescendo Charter SchoolsPhiladelphia Public SchoolsWashington DC Public SchoolsHouston Independent School DistrictMichigan Public Schools

Page 44: Connecticut   mesuring and modeling growth

Case Study #1 - Mean value-added performance in mathematics by school – fall to spring

Page 45: Connecticut   mesuring and modeling growth

Case Study #1 - Mean spring and fall test duration in minutes by school

Page 46: Connecticut   mesuring and modeling growth

Case Study #1 - Mean value-added growth by school and test duration

Page 47: Connecticut   mesuring and modeling growth

Differences in fall-spring test durations

Case Study # 2

Differences in growth index score based on fall-spring test durations

Page 48: Connecticut   mesuring and modeling growth

Case Study # 2

Differences in spring -fall test durations Differences in raw growth based by spring-fall test duration

How much of summer loss is really summer loss?

Page 49: Connecticut   mesuring and modeling growth

Case Study # 2

Differences in fall-spring test duration (yellow-black) andDifferences in growth index scores (green) by school

Page 50: Connecticut   mesuring and modeling growth

Security considerations

• Teachers should not be allowed to view the contents of the item bank or record items.

• Districts should have policies for accomodation that are based on student IEPs.

• Districts should consider having both the teacher and a proctor in the test room.

• Districts should consider whether other security measures are needed for both the protection of the teacher and administrators.

Page 51: Connecticut   mesuring and modeling growth

Other issues

Proctoring

Proctoring both with and without the classroom teacher raises possible problems.

Documentation that test administration procedures were properly followed is important.

Page 52: Connecticut   mesuring and modeling growth

Potential Litigation Issues

The use of value-added data for high stakes personnel decisions does not yet have a strong, coherent, body of case law.

Expect litigation if value-added results are the lynchpin evidence for a teacher-dismissal case until a body of case law is established.

Page 53: Connecticut   mesuring and modeling growth

Possible legal issues

• Title VII of the Civil Rights Act of 1964 – Disparate impact of sanctions on a protected group.

• State statutes that provide tenure and other related protections to teachers.

• Challenges to a finding of “incompetence” stemming from the growth or value-added data.

Page 54: Connecticut   mesuring and modeling growth

Recommendations

• Embrace the formative advantages of growth measurement as well as the summative.

• Create comprehensive evaluation systems with multiple measures of teacher effectiveness (Rand, 2010)

• Select measures as carefully as value-added models.• Use multiple years of student achievement data.• Understand the issues and the tradeoffs.

Page 55: Connecticut   mesuring and modeling growth

Presenter - John Cronin, Ph.D.

Contacting us:NWEA Main Number: 503-624-1951 E-mail: [email protected]

The presentation and recommended resources are available at our website: www.kingsburycenter.org

Thank you for attending