Upload
clyde
View
41
Download
0
Embed Size (px)
DESCRIPTION
A prospective approach to standard setting. Isaac I. Bejar, Henry I. Braun, Rick Tannenbaum Educational Testing Service Presented at ASSESSING AND MODELING COGNITIVE DEVELOPMENT IN SCHOOL: INTELLECTUAL GROWTH AND STANDARD SETTING Maryland Assessment Research Center for Education Success - PowerPoint PPT Presentation
Citation preview
Copyright © 2006 Educational Testing Service
Listening. Learning. Leading.
A prospective approach to
standard settingIsaac I. Bejar, Henry I. Braun, Rick Tannenbaum
Educational Testing ServicePresented at
ASSESSING AND MODELING COGNITIVE DEVELOPMENT IN SCHOOL:
INTELLECTUAL GROWTH AND STANDARD SETTINGMaryland Assessment Research Center for Education Success
University of MarylandOctober 19-20, 2006
2Copyright © 2006 Educational Testing Service
Outline
• Present rationale for a prospective approach to the standard setting process in K-12 that is explicitly informed by learning and developmental considerations
• Review the evolution of validity over the last 60 years focusing on the implications for standard setting and assessment design
• Review conceptual developments in standard setting and argue that a prospective approach is a natural step in the evolution of the standard setting process
• Finally, we sketch steps in a prospective standard setting• Discuss remaining challenges
3Copyright © 2006 Educational Testing Service
Why are performance standards important?
• Increasingly, academic performance is being communicated in terms of standards (e.g. 30% of students at or above proficient)
• Consequential decisions about students and/or schools are being made on the basis of results framed in terms of standards
• Policy-makers and the public make inferences about public schools based on their interpretations of the standards and standards-based reports
4Copyright © 2006 Educational Testing Service
What are we making inferences about?
“Standard setting still can not be reduced to a problem of statistical estimation. Fundamentally, standard setting involves the development of a policy about what is to be required for each level of performance. This policy is stated in the performance standards and implemented through the cutscores.” (Kane, 2001, p. 85, emphasis added)
5Copyright © 2006 Educational Testing Service
Some inferences of interest
• Inferences about individual students’ level of achievement one point in time
• Inferences about individual students’ performance next year
• Population inferences about proportion of students at different levels of achievements
• Inferences as to the progress of a school or district
Basic
Proficient
Advanced
6Copyright © 2006 Educational Testing Service
Problems with current standard setting practice • Historically, standard setting has been a retrospective
judgmental process carried out– independently of other factors that inform the design of the
assessment, – after the assessment is administered the first time.
• The consequences of a retrospective approach are – Reliance on subject matter expertise rather than research on
student learning and development – Potential conflation of policy and psychometrics– Difficulty in achieving coherence of cut scores across grades
• Risks– Cut scores may not be well supported psychometrically– Insufficient evidence to adequately support desired inferences
Copyright © 2006 Educational Testing Service
Listening. Learning. Leading.
Key turning points in the evolution of test validity
8Copyright © 2006 Educational Testing Service
Validity overview
• Validation as theory testing– Cronbach and Meehl (1955): Gathering evidence for
score interpretation follows scientific principles“The investigation of a test's construct validity is not essentially different from
the general scientific procedures for developing and confirming theories.”
• Items increasingly seen as validity-building blocks
– Fischer (1973): LLTM
– Embretson (1983): Construct representation
9Copyright © 2006 Educational Testing Service
Validity overview (cont.)
• Validity is an ongoing argument that seeks to clarify what a measurement means and to understand the limitations of each score interpretation
(adapted from Cronbach,1988)• Validity as consequence
– “Validity is an overall evaluative judgment, founded on empirical evidence and theoretical rationales, of the adequacy and appropriateness of inferences and actions based on test scores.”
(Messick, 1989)
10Copyright © 2006 Educational Testing Service
Validity overview (cont.)
• Validity as argument (Kane, 2004)– Kane elaborates Cronbach’s “validation as argument”
thesis through specification of
– Interpretive argument– Build a chain of reasoning from the test construction process to
the desired claims.
– Validity argument– Amass theoretical and empirical support for the truthfulness of
the claims and set appropriate boundaries.
11Copyright © 2006 Educational Testing Service
Validity through design: ECD (e.g., Mislevy et al. 2003)
• Evidence Centered Design– Make explicit the claim(s) you will want to make
about scores at individual and aggregate levels
– Determine the student observables that would provide support for the claims we wish to make.
– Carefully design and write tasks that would elicit those observables.
– Assemble assessments targeted to support the desired claims as strongly as possible
Copyright © 2006 Educational Testing Service
Listening. Learning. Leading.
Evolution of standard setting
13Copyright © 2006 Educational Testing Service
Some history
• Through 1980’s standard setting mainly concerned with procedural issues but signs of concern by e.g., Glass (1978), Shepard (1980) begin to emerge
• NAGB calls for the use of performance standards (see Lissitz and Bourke 1995)
• Kane (1993) emphasizes the need to separate policy from procedure• Performance level descriptors become more prominent (Hansche,
1998)• The judgmental task imposed on standard setting panelist strongly
criticized (Pellegrino, Jones, Mitchell (1999)• Response by Hambleton et al. (1999) does not address basic criticism
14Copyright © 2006 Educational Testing Service
Some history (cont.)
• Cizek (2001)– Zieky (2001) on how standard setting has changed– Kane (2001) on how standard setting has not changed and the importance of
separating policy and method– Camilli et al. (2001)– “In the long run, standard setting will make its most valuable contribution to
teaching and learning at all levels if procedures are developed that are more closely aligned with cognitive and developmental models of competence in content disciplines” (2001, p. 471, italics added).
• Validity oriented standard setting and the idea of “canonical response patterns”
– Haertl and Lorie (2004)– Lorie (2001)
• On the importance of coherent standards (Ferrara, Johnson, Cheng, 2005; Lewis and Haug (2005)
Copyright © 2006 Educational Testing Service
Listening. Learning. Leading.
A prospective approach
16Copyright © 2006 Educational Testing Service
Outline of an approach
• Standard-Setting for K-12– Mastery of material at grade “n” is not an end in itself
but a milestone in a student’s progression through school.
– Common-sense meanings of achieving proficiency in grade n are:i. Student has met requirements for grade nii. All things being equal, the student has a high probability of
achieving proficiency in grade n+1,
17Copyright © 2006 Educational Testing Service
Standard-Setting for K-12• Ideally, (i) and (ii) should be consistent. To support
forward-looking inferences, we should have:• A developmental perspective in the creation of content
frameworks and content standards (e.g., Wilson, 2004).
• A prospective approach to standard-setting in which both content frameworks and preliminary performance standards guide assessment design process
• “In a coherent educational assessment system, all components should work to prepare the student to meet or exceed that cut score; each component suggests the cut score”, Lewis and Haug, 2005, 12, emphasis added)
18Copyright © 2006 Educational Testing Service
Multi grade content standards
Research-based Competency model
Test Specifications (blueprint)
Performance level descriptors (PLDs)
Pragmatic &psychometricconstraints
Assessment instrument developed
Final cut-scores
Grade n Grade n +1Grade n-1
Assessment administered, calibrated, and scaled
Pro-forma Canonical response
patterns
Des
ign
Dev
elo
p
Task Model Library
Performance standards
19Copyright © 2006 Educational Testing Service
Hansche, L., Hambleton, R., Mills, C. N., Jaeger, R. M. (1998) Handbook for the development of performance standards.
20Copyright © 2006 Educational Testing Service
Multi grade content standards
Downloaded from http://www.nctm.org/focalpoints/downloads.asp, on October 10, 2005
21Copyright © 2006 Educational Testing Service
Multi grade content standards
Downloaded from http://www.nctm.org/focalpoints/downloads.asp, on October 10, 2005
n-1 n
22Copyright © 2006 Educational Testing Service
• A competency model is a recasting and fleshing out of a broad framework, such as the NCTM curricular guidelines, for developing assessments
• A competency model is assembled from various sources, including basic research on student learning
• A central goal in developing a competency model is to structure it such as to facilitate the translation of policy into performance standards.
Research-based Competency model
23Copyright © 2006 Educational Testing Service
• Performance level descriptors are typically narratives that elaborate the meaning of performance standards
• PLDs are developed with reference to a competency model
• PLDs are associated with “evidence rules”
Performance level descriptors (PLDs)
24Copyright © 2006 Educational Testing Service
The student is capable of formulating a persuasive argument appropriate to a specific audience or
recipient.
Fragment of a PLDIf [(evidence (T3, T10,T11)]
25Copyright © 2006 Educational Testing Service
Test Specifications (blueprint)
Pragmatic &psychometricconstraints
Performance standards
PLDs
ºººº
ºººº
PLDs PLDs
ºººº
ºººººººº
ºººº
ºººº
ºººº
ºººº
ºººº
n-1 n n+1
Task Model Library
T1
T2
.
.
.
.
.
.
.
Tn
26Copyright © 2006 Educational Testing Service
Pro-forma Canonical Response
Patterns (CRP)
T1
T2
.
.
T9
Basic
Profic
ient
Advance
d
CRPfor top basic
CRPfor top proficient
CRPBottom
advancedCRP
for bottom proficient
n-1
27Copyright © 2006 Educational Testing Service
Setting final cut scores
• The panel starts with preliminary cut scores that have been obtained by directly mapping canonical response patterns to a scale once it is available. Are there any inconsistencies?
• The panel’s role is to accept or adjust preliminary cut scores in light of data from the administration.
• The panel’s cognitive task is less burdensome than the usual standard setting task
• Arbitrariness (Glass, 1978) is greatly reduced since much thought has gone into where the cuts should be
Final cut-scores
Pro-forma Canonical response
patterns
28Copyright © 2006 Educational Testing Service
I II III IV
XXXXX XXXX
XXXX
XXXX
Content strands
Advanced
Basic
Proficient
Below Basic
WI WII WIII WIV
P P P B
P B B BB
B BB BB
BB
0 1 2 3 Basic
Below basic
0
1
2
3
29Copyright © 2006 Educational Testing Service
Some attributes of the model
• Prospective: The competency model influences test development through the early specification of performance standards
• Progressive: The approach calls for coordination in content frameworks and performance standards across grades
• Predictive: PLDs and performance standards are explicitly based on theoretical and empirical evidence about trajectories of student learning and development
30Copyright © 2006 Educational Testing Service
Rationale redux
• A prospective approach– requires a coordinated set of standards, which
encourages articulated pedagogy across grades and reduces possibility of confusing accountability outcomes.
– provides better support for forward-looking inferences– strengthens foundations for consequential validity
31Copyright © 2006 Educational Testing Service
Some specific challenges
• Explicate the approach to an operational level
• Address complications entailed by intervening treatment of variable effectiveness (i.e. next year’s instruction).
• Formulate and implement feasible validation strategies?
Copyright © 2006 Educational Testing Service
Listening. Learning. Leading.
Thank you