51
It’s a myth: High stakes cause test score inflation Richard P. Phelps researchED 2017 National Conference 7 October, 2017 Brooklyn, NY

It's a myth: High stakes cause test score inflation

Embed Size (px)

Citation preview

It’s a myth: High stakes cause test score inflation

Richard P. Phelps

researchED 2017 National Conference

7 October, 2017

Brooklyn, NY

Educational testing in the US: early 1980s

researchED, Brooklyn High stakes & test score inflation 7 October, 2017

Student testing with stakes reintroduced late 1970s, early 1980s

Debra P. v. Turlington

“Truth in testing” laws

researchED, Brooklyn High stakes & test score inflation 7 October, 2017

Educational testing in the US: 1980s

Residency in rural, poor Appalachia, 1980s

Surprised by claims that state and school district scored “above average” on national tests

Investigated, all US states claimed to be “above average”

John J. Cannell, M.D.

researchED, Brooklyn High stakes & test score inflation 7 October, 2017

“Welcome to Lake Wobegon, where all the women are strong, all the men are good-looking, and all

the children are above average.”- Garrison Keillor, A Prairie Home Companion

researchED, October High stakes & test score inflation 7 October, 2017

Cannell’s suspects

• Lax security• Outdated or invalid norms• Deliberate educator manipulation (i.e., cheating)

researchED, Brooklyn High stakes & test score inflation 7 October, 2017

US Education Establishment Responds

researchED, Brooklyn High stakes & test score inflation 7 October, 2017

“While supporting Cannell’s general finding … our analyses lead us to conclusions that are different, and certainly less sensational, than the ones he reached.”

— Linn, Graue, Sanders , CRESST, 1990

researchED, Brooklyn High stakes & test score inflation 7 October, 2017

“There are many reasons for the Lake Wobegon Effect, most of which are less sinister than those emphasized by Cannell.”

— Linn, CRESST, 2000

CRESST’s Lake Wobegon suspects

Outdated or invalid norms

High stakes, that induce “teaching to the test” (i.e., test coaching) under pressure

researchED, Brooklyn High stakes & test score inflation 7 October, 2017

“We know that tests that are used for accountability tend to be taught to in ways that produce inflated scores.”

— Daniel Koretz, CRESST, 1992

“Corruption of indicators is a continuing problem where tests are used for accountability or other high-stakes purposes.”

— Robert Linn, CRESST, 2000

researchED, Brooklyn High stakes & test score inflation 7 October, 2017

researchED, Brooklyn High stakes & test score inflation 7 October, 2017

CRESST counters Cannell’s Lake Wobegon study with their own, 1991

Students took test a few years. Scores rose. Then took “competing test” district had used before. Scores fell.

researchED, Brooklyn High stakes & test score inflation 7 October, 2017

CRESST 1991 “Generalization” Study

Unnamed school district

Unnamed tests

Neither replicable nor falsifiable

A conference presentation; not peer-reviewed.

researchED, Brooklyn High stakes & test score inflation 7 October, 2017

CRESST 1991 “Generalization” Study

3 tests in the study

1.Annual NRT

2.Parallel form

3.A “competing” NRT

researchED, Brooklyn High stakes & test score inflation 7 October, 2017

1991 CRESST “Generalization” Study

researchED, Brooklyn High stakes & test score inflation 7 October, 2017

1991 CRESST “Generalization” Study

School district test was only “perceived to be high stakes.”

researchED, Brooklyn High stakes & test score inflation 7 October, 2017

1991 CRESST “Generalization” Study

Study’s assumptions

1. Publication of aggregate results = “high stakes”

2. “Competing” NRTs should get same results

3. “Test coaching” improves scores

4. Low-stakes test scores are reliable and can be used to benchmark unreliable high stakes scores

5. High-stakes cause test-score inflation?

Jim Popham “high stakes” definition 1987

... Such tests include the many statewide achievement tests whose results are reported by local newspapers on a

school-by-school or district-by-district basis.”

researchED, Brooklyn High stakes & test score inflation 7 October, 2017

1. Publication of aggregate results = high stakes?

Jim Popham “high stakes” definition 1992

A test “subject to legal scrutiny.”

Tests such as those used “for employment, licensure, or a high school graduation requirement”

researchED, Brooklyn High stakes & test score inflation 7 October, 2017

1. Publication of aggregate results = high stakes?

“High-stakes test. A test used to provide results that have important, direct consequences for examinees, programs, or institutions involved in the testing.” (p.176)

“Low-stakes test. A test used to provide results that have only minor or indirect consequences for examinees, programs, or institutions involved in the testing.” (p.178)

Standards for Educational and Psychological Testing

researchED, Brooklyn High stakes & test score inflation 7 October, 2017

1. Publication of aggregate results = high stakes?

researchED, Brooklyn High stakes & test score inflation 7 October, 2017

“...tests taken to obtain admission to an educational program or taken during and at the conclusion of a program to obtain a qualification.”

“…high-stakes decisions, such as whether a student will move on to the next grade level or receive a diploma.”

1. Publication of aggregate results = high stakes?

A high-stakes test is a test with important consequences for the test taker. Passing has important benefits, such as a high school diploma, a scholarship, or a license to practice a profession.

Wikipedia

researchED, Brooklyn High stakes & test score inflation 7 October, 2017

1. Publication of aggregate results = high stakes?

2. Research: Comparability of different tests

Scores Comparable

?

Scores Not Comparable

NRTsFreeman, Kuhs, Porter, Floden, Schmidt, Schwille (1983); Debra P. v. Turlington (1984); Cohen, Spillane (1993); La Marca, Redfield, Winter, Bailey, and Despriet (2000); Wainer (2011)

StandardsArchbald (1994); Buckendahl, Plake, Impara, Irwin (2000); Bhola, Impara, Buckendahl (2003); Phelps (2005)

CRTsMassell, Kirst, Hoppe (1997); Wiley, Hembry, Buckendahl, Forte,Towles Nebelsick-Gullett (2015)

researchED, Brooklyn High stakes & test score inflation 7 October, 2017

3. Research: Effects of test coaching

It worksSignificant score

increase from learning format tricks

Aldeman & Powers (1980) Samson (1985)

Scruggs (1985) Roznowski & Bassett

(1992) McMann (1994) Holmes, Keffer (1995) Camel & Chung (2002)

Filizola (2008)

researchED, Brooklyn High stakes & test score inflation 7 October, 2017

4. Research: Low-stakes test reliability

Reliable“no incentive to manipulate

scores”

Kipliinger, Linn (1992)O’Neil, Sugre, Baker (1995) *Hout, Elliot (2011)

* 1 of 2 groups

Not reliablestudent effort varies;

scores easy to manipulate

Rothe (1947); Jennings (1953); Uguroglu, Walberg (1979); Taylor & White (1981); Arvey, et al. (1990); Schmit, Ryan (1992); Brown & Walberg (1993); Kim, McLean (1995), Wolf, Smith (1995), Wolf, Smith, DiPaulo (1996); Schiel (1996); Sundre (1999), Sundre, Moore (2002), Sundre, Wise (2003); DeMars (2000), Wise (2006ª, 2006b), Wise, DeMars (2005, 2005, 2006, 2010), Wise, et al., (2009); Hoyt (2001); Eklof (2006, 2007, 2010);

….....etc.

researchED, Brooklyn High stakes & test score inflation 7 October, 2017

researchED, Brooklyn High stakes & test score inflation 7 October, 2017

“…for consequential exams, the average score on the motivation scale was quite high with a low standard deviation. Essentially, most of the students were displaying uniformly high levels of motivation (i.e., ceiling effect).

However, for the nonconsequential groups, motivation played an important role in predicting test performance. The overall motivation scores for the no consequence groups were lower than the motivation for the consequential groups, with much greater variability.”

—Cole, Bergin, Whittaker (2008), p. 612

4. Research: Low-stakes test reliability

5. High stakes cause test score inflation?

researchED, Brooklyn High stakes & test score inflation 7 October, 2017

Then, why no score inflation with certification and licensure tests?

More left-out-variable bias

CRESST’s Linn (2000) cites higher gains on a federal anti-poverty program’s pre-post testing over 9 months than over 12 as evidence of inflation

researchED, Brooklyn High stakes & test score inflation 7 October, 2017

Cannell found score inflation in elementary school tests in dozens of states – none of those tests had high stakes.

Cannell also found score inflation in secondary school tests in dozens of states – only one had high stakes.

researchED, Brooklyn High stakes & test score inflation 7 October, 2017

Test Score Inflation Occurs where Security is Lax

Cannell’s test categorizations confirmed

researchED, Brooklyn High stakes & test score inflation 7 October, 2017

Confusions from misinformation

1. Tests sample from larger domains

2. Campbell’s Law

3. “Teaching to the test” & “Narrowing the curriculum”

4. Incentives and causes

5. Educators face many incentives; “high stakes” only one

6. Today’s tests have much higher stakes than past tests

1. No one wants to be responsible for test security

researchED, Brooklyn High stakes & test score inflation 7 October, 2017

1. Tests only sample larger domains

"Tests are about making a measurement, and generally, tests are trying to measure something huge." — Daniel Koretz

researchED, Brooklyn High stakes & test score inflation 7 October, 2017

TRUE of many tests, e.g., NRTs, aptitude, IQ tests

NOT TRUE of well-done standards-based tests

2. Campbell’s Law — a truism

researchED, Brooklyn High stakes & test score inflation 7 October, 2017

"The more any quantitative social indicator is used for social decision-making, the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to monitor."

Social indicators can be beneficial:

- for understanding- monitor progress- benchmarking- setting goals- process improvements

3. Teaching the test; Narrowing the curriculum

researchED, Brooklyn High stakes & test score inflation 7 October, 2017

4. Incentives and causes

researchED, Brooklyn High stakes & test score inflation 7 October, 2017

Question:

Do high stakes present an incentive to cheat on tests?

Answer:

Of course they do

5. Educators face many incentives

researchED, Brooklyn High stakes & test score inflation 7 October, 2017

Incentives of test “stakes” is just one

6. Today’s tests have higher stakes

researchED, Brooklyn High stakes & test score inflation 7 October, 2017

Exactly the opposite is true.

Koretz: States in 1980s and 1990s were “chicken feed” compared to today’s tests.

7. No one inside education wishes to be responsible for test security

researchED, Brooklyn High stakes & test score inflation 7 October, 2017

… including test development firms.

Large-scale test, tight security

researchED, Brooklyn High stakes & test score inflation 7 October, 2017

Large-scale test, lax security

researchED, Brooklyn High stakes & test score inflation 7 October, 2017

Harms of disinformation

1. Acceptance of low standard for research as valid

2. Unfairly discredits useful evaluation tool

3. Test security (in U.S.) remains shoddy

4. Teachers given mixed messages

5. Now spreading worldwide

6. Corruption of Test Standards barely averted

researchED, Brooklyn High stakes & test score inflation 7 October, 2017

1. Acceptance of very low quality standard for popular research results

researchED, Brooklyn High stakes & test score inflation 7 October, 2017

CRESST studies:

- no controls - secret test- secret

location- secret

definitions

Non-replicable, Non-falsifiable

2. Uniquely useful evalution tool is discredited

researchED, Brooklyn High stakes & test score inflation 7 October, 2017

…and, in the US, the only objective measure available to the public (i.e., not under the control of insiders).

3. Test security (in U.S.) remains shoddy

researchED, Brooklyn High stakes & test score inflation 7 October, 2017

ACT, SAT, PARCC, SBAC now administered statewide by schools, on varying dates. Tests save money, hassle, gain customers by outsourcing (or, ignoring) test security.

4. Teachers given mixed messages

researchED, Brooklyn High stakes & test score inflation 7 October, 2017

“Teaching to the test” is unethical; Don’t do it! Teach content beyond the standards.

“Teaching to the test works! You and your students will be better off if you do it!

5. Standards corruption barely averted

researchED, Brooklyn High stakes & test score inflation 7 October, 2017

6. Disinformation spreading worldwide

researchED, Brooklyn High stakes & test score inflation 7 October, 2017

• Motive alone is not sufficient if test security is tight.

• Means and opportunity exist only in the absence of security measures and form and item rotation.

Artificial test score gains (score inflation) are caused by lax security; they require means

and opportunity.

researchED, Brooklyn High stakes & test score inflation 7 October, 2017

Test Security in South Carolina:

“Unlike their other two tests,

… teachers are allowed to look at test booklets, … teachers may obtain test booklets before the day of testing, … booklets are not sealed, and … testing is not routinely monitored by state officials. … Outside test proctors are not used, … test questions have not been rotated every year, and … answer sheets have not been scanned for suspicious erasures or

analyzed for cluster variance. … There are no state regulations that govern test security and test

administration for norm-referenced testing done independently in the local school districts.”

researchED, Brooklyn High stakes & test score inflation 7 October, 2017

Cannel’s score-inflated test

Test Security in South Carolina:

“South Carolina also administers a graduation exam and a criterion referenced test, both of which have significant security measures.

… Teachers are not allowed to look at either of these two test booklets,

… teachers may not obtain booklets before the day of testing, … the graduation test booklets are sealed, … testing is routinely monitored by state officials, … special education students are generally included in all tests,… outside test proctors administer the graduation exam, and … most test questions are rotated every year on the criterion

referenced test.”

researchED, Brooklyn High stakes & test score inflation 7 October, 2017

Tests not in Cannell’s study

Lessons Learned

If terms can be defined arbitrarily, and not specified, any research result is possible.

Cleverly-disguised falsehoods and obfuscation can be well-rewarded in US education schools (e.g., with endowed professorships at Harvard and Stanford).

researchED, Brooklyn High stakes & test score inflation 7 October, 2017

US education: Research quality standards extremely low for popular results; impossibly high for unpopular results

http://nonpartisaneducation.org/Review/Articles/v6n3.htm

researchED, Brooklyn High stakes & test score inflation 7 October, 2017

[email protected]