A prospective approach to standard setting

Copyright © 2006 Educational Testing Service

Listening. Learning. Leading.

A prospective approach to

standard settingIsaac I. Bejar, Henry I. Braun, Rick Tannenbaum

Educational Testing ServicePresented at

ASSESSING AND MODELING COGNITIVE DEVELOPMENT IN SCHOOL:

INTELLECTUAL GROWTH AND STANDARD SETTINGMaryland Assessment Research Center for Education Success

University of MarylandOctober 19-20, 2006

2Copyright © 2006 Educational Testing Service

Outline

• Present rationale for a prospective approach to the standard setting process in K-12 that is explicitly informed by learning and developmental considerations

• Review the evolution of validity over the last 60 years focusing on the implications for standard setting and assessment design

• Review conceptual developments in standard setting and argue that a prospective approach is a natural step in the evolution of the standard setting process

• Finally, we sketch steps in a prospective standard setting• Discuss remaining challenges


Why are performance standards important?

• Increasingly, academic performance is being communicated in terms of standards (e.g. 30% of students at or above proficient)

• Consequential decisions about students and/or schools are being made on the basis of results framed in terms of standards

• Policy-makers and the public make inferences about public schools based on their interpretations of the standards and standards-based reports


What are we making inferences about?

“Standard setting still can not be reduced to a problem of statistical estimation. Fundamentally, standard setting involves the development of a policy about what is to be required for each level of performance. This policy is stated in the performance standards and implemented through the cutscores.” (Kane, 2001, p. 85, emphasis added)


Some inferences of interest

• Inferences about individual students’ level of achievement one point in time

• Inferences about individual students’ performance next year

• Population inferences about proportion of students at different levels of achievements

• Inferences as to the progress of a school or district

Basic

Proficient

Advanced


Problems with current standard setting practice • Historically, standard setting has been a retrospective

judgmental process carried out– independently of other factors that inform the design of the

assessment, – after the assessment is administered the first time.

• The consequences of a retrospective approach are – Reliance on subject matter expertise rather than research on

student learning and development – Potential conflation of policy and psychometrics– Difficulty in achieving coherence of cut scores across grades

• Risks– Cut scores may not be well supported psychometrically– Insufficient evidence to adequately support desired inferences



Key turning points in the evolution of test validity


Validity overview

• Validation as theory testing– Cronbach and Meehl (1955): Gathering evidence for

score interpretation follows scientific principles“The investigation of a test's construct validity is not essentially different from

the general scientific procedures for developing and confirming theories.”

• Items increasingly seen as validity-building blocks

– Fischer (1973): LLTM

– Embretson (1983): Construct representation


Validity overview (cont.)

• Validity is an ongoing argument that seeks to clarify what a measurement means and to understand the limitations of each score interpretation

(adapted from Cronbach,1988)• Validity as consequence

– “Validity is an overall evaluative judgment, founded on empirical evidence and theoretical rationales, of the adequacy and appropriateness of inferences and actions based on test scores.”

(Messick, 1989)


Validity overview (cont.)

• Validity as argument (Kane, 2004)– Kane elaborates Cronbach’s “validation as argument”

thesis through specification of

– Interpretive argument– Build a chain of reasoning from the test construction process to

the desired claims.

– Validity argument– Amass theoretical and empirical support for the truthfulness of

the claims and set appropriate boundaries.


Validity through design: ECD (e.g., Mislevy et al. 2003)

• Evidence Centered Design– Make explicit the claim(s) you will want to make

about scores at individual and aggregate levels

– Determine the student observables that would provide support for the claims we wish to make.

– Carefully design and write tasks that would elicit those observables.

– Assemble assessments targeted to support the desired claims as strongly as possible



Evolution of standard setting


Some history

• Through 1980’s standard setting mainly concerned with procedural issues but signs of concern by e.g., Glass (1978), Shepard (1980) begin to emerge

• NAGB calls for the use of performance standards (see Lissitz and Bourke 1995)

• Kane (1993) emphasizes the need to separate policy from procedure• Performance level descriptors become more prominent (Hansche,

1998)• The judgmental task imposed on standard setting panelist strongly

criticized (Pellegrino, Jones, Mitchell (1999)• Response by Hambleton et al. (1999) does not address basic criticism


Some history (cont.)

• Cizek (2001)– Zieky (2001) on how standard setting has changed– Kane (2001) on how standard setting has not changed and the importance of

separating policy and method– Camilli et al. (2001)– “In the long run, standard setting will make its most valuable contribution to

teaching and learning at all levels if procedures are developed that are more closely aligned with cognitive and developmental models of competence in content disciplines” (2001, p. 471, italics added).

• Validity oriented standard setting and the idea of “canonical response patterns”

– Haertl and Lorie (2004)– Lorie (2001)

• On the importance of coherent standards (Ferrara, Johnson, Cheng, 2005; Lewis and Haug (2005)



A prospective approach


Outline of an approach

• Standard-Setting for K-12– Mastery of material at grade “n” is not an end in itself

but a milestone in a student’s progression through school.

– Common-sense meanings of achieving proficiency in grade n are:i. Student has met requirements for grade nii. All things being equal, the student has a high probability of

achieving proficiency in grade n+1,


Standard-Setting for K-12• Ideally, (i) and (ii) should be consistent. To support

forward-looking inferences, we should have:• A developmental perspective in the creation of content

frameworks and content standards (e.g., Wilson, 2004).

• A prospective approach to standard-setting in which both content frameworks and preliminary performance standards guide assessment design process

• “In a coherent educational assessment system, all components should work to prepare the student to meet or exceed that cut score; each component suggests the cut score”, Lewis and Haug, 2005, 12, emphasis added)


Multi grade content standards

Research-based Competency model

Test Specifications (blueprint)

Performance level descriptors (PLDs)

Pragmatic &psychometricconstraints

Assessment instrument developed

Final cut-scores

Grade n Grade n +1Grade n-1

Assessment administered, calibrated, and scaled

Pro-forma Canonical response

patterns

Des

ign

Dev

elo

p

Task Model Library

Performance standards


Hansche, L., Hambleton, R., Mills, C. N., Jaeger, R. M. (1998) Handbook for the development of performance standards.



Downloaded from http://www.nctm.org/focalpoints/downloads.asp, on October 10, 2005



Downloaded from http://www.nctm.org/focalpoints/downloads.asp, on October 10, 2005

n-1 n


• A competency model is a recasting and fleshing out of a broad framework, such as the NCTM curricular guidelines, for developing assessments

• A competency model is assembled from various sources, including basic research on student learning

• A central goal in developing a competency model is to structure it such as to facilitate the translation of policy into performance standards.

Research-based Competency model


• Performance level descriptors are typically narratives that elaborate the meaning of performance standards

• PLDs are developed with reference to a competency model

• PLDs are associated with “evidence rules”

Performance level descriptors (PLDs)


The student is capable of formulating a persuasive argument appropriate to a specific audience or

recipient.

Fragment of a PLDIf [(evidence (T3, T10,T11)]


Test Specifications (blueprint)

Pragmatic &psychometricconstraints

Performance standards

PLDs

ºººº

ºººº

PLDs PLDs

ºººº

ºººººººº

ºººº

ºººº

ºººº

ºººº

ºººº

n-1 n n+1

Task Model Library

T1

T2

.

.

.

.

.

.

.

Tn


Pro-forma Canonical Response

Patterns (CRP)

T1

T2

.

.

T9

Basic

Profic

ient

Advance

d

CRPfor top basic

CRPfor top proficient

CRPBottom

advancedCRP

for bottom proficient

n-1


Setting final cut scores

• The panel starts with preliminary cut scores that have been obtained by directly mapping canonical response patterns to a scale once it is available. Are there any inconsistencies?

• The panel’s role is to accept or adjust preliminary cut scores in light of data from the administration.

• The panel’s cognitive task is less burdensome than the usual standard setting task

• Arbitrariness (Glass, 1978) is greatly reduced since much thought has gone into where the cuts should be

Final cut-scores

Pro-forma Canonical response

patterns


I II III IV

XXXXX XXXX

XXXX

XXXX

Content strands

Advanced

Basic

Proficient

Below Basic

WI WII WIII WIV

P P P B

P B B BB

B BB BB

BB

0 1 2 3 Basic

Below basic

0

1

2

3


Some attributes of the model

• Prospective: The competency model influences test development through the early specification of performance standards

• Progressive: The approach calls for coordination in content frameworks and performance standards across grades

• Predictive: PLDs and performance standards are explicitly based on theoretical and empirical evidence about trajectories of student learning and development


Rationale redux

• A prospective approach– requires a coordinated set of standards, which

encourages articulated pedagogy across grades and reduces possibility of confusing accountability outcomes.

– provides better support for forward-looking inferences– strengthens foundations for consequential validity


Some specific challenges

• Explicate the approach to an operational level

• Address complications entailed by intervening treatment of variable effectiveness (i.e. next year’s instruction).

• Formulate and implement feasible validation strategies?



Thank you

Documents

A prospective approach to standard setting