Upload
morris-blair
View
216
Download
0
Tags:
Embed Size (px)
Citation preview
1
The use of asynchronously scored items in adaptive test sessions.
Marty McCallSmarter Balanced Assessment Consortium
2015 CCSSO NCSA San Diego CA
2
Assessments produce evidence of student performance on challenging tasks that evaluate the Common Core State Standards (CCSS). . . . . They emphasize deep knowledge of core concepts and ideas within and across the disciplines—along with analysis, synthesis, problem solving, communication, and critical thinking—thereby requiring a focus on complex performances as well as on specific concepts, facts, and skills (Smarter Balanced (2010). Theory of Action. p.1).
Smarter Balanced Theory of Action
3
Nature of tasks under CCSS and ECD
• Influence of cognitive psychology– Interest in mental processes and models– Purpose of task is to provide evidence confirming
or refuting hypotheses about what a student knows – Evidence Centered Design
– Tasks exemplify desired learning
• Integration of skills into complex tasks– Core foundational concepts building across years– Tasks demand use of several skills and concepts– Emphasis on communication and insight
4
6 Key Components of Evidence-Centered Design
6. Develop Items or Performance Tasks
1. Define the domain Common Core Standards Math/ELA
2. Define claims to be made4 ELA & 4 Math ClaimsContent Specifications
3. Define assessment targets Knowledge, Skills, & Abilities
4. Define evidence required Evidence to be Elicited from Student
5. Develop Task Models Methods for Eliciting Evidence
5
• Overall Claim
– Reading– Writing– Speaking and Listening– Research/Inquiry
• Claims provide overall test and reporting structure– Targets are nested within claims
Approved English Language Arts Claims
6
Test design –Assessments have an adaptive and a non-adaptive
component
C1: ReadingLiterary
Informational
C3: Listening C4: Research
Adaptive session PT
C2: Writing
7
• CENTRAL IDEAS• REASONING & EVIDENCE• KEY DETAILS• WORD MEANINGS• REASONING & EVIDENCE• ANALYSIS WITHIN OR ACROSS TEXTS• TEXT STRUCTURES & FEATURES• LANGUAGE USE
Reading Targets – info and lit
8
• WRITE/REVISE BRIEF TEXTS - Narrative strategies
• WRITE/REVISE BRIEF TEXTS - Organizing ideas
• WRITE/REVISE BRIEF TEXTS - provide support for opinions
• COMPOSE FULL TEXTS• EDIT/CLARIFY
Writing Targets
9
• WRITE BRIEF TEXTS - Narrative strategies
• WRITE BRIEF TEXTS - Organizing ideas• WRITE BRIEF TEXTS - provide support
for opinions• CENTRAL IDEAS• REASONING & EVIDENCE
Require written responses
10
Test design –Every adaptive test component must have:
Adaptive session
• 1 written response for a WRITE BRIEF TEXTS target
• 2 written response in reading, 1 for a literary passage, 1 for an informational passage addressing either• CENTRAL IDEAS• REASONING & EVIDENCE
11
• Chosen adaptively, based on best information• Standalone written response items are chosen
like any other polytomous item using best expected information of the item as a whole
• Passages are chosen from by passage sets of items. The passage with the highest expected information value is selected from the set of passages with a written response item.
• No change to ability estimate is made. The test proceeds adaptively from that point.
How do these fit in a CAT session?
12
• Responses are scored separately• Human or AI scoring• Scored responses are combined with CAT
and PT responses to make up complete test event
• Overall test scores are calculated from all scored responses using IRT parameters
How are they scored and combined with other
responses?
13
AI Scoring• Issues– AI scoring is controversial, particularly with higher ed and in
the ELA educator community– Reliability for short answers is not quite there, although
improving rapidly– AI engines aren’t built to be a real-time plug-in
• Promising results– Some items can be scored reliably enough to be used on
summative tests (but not others).– If you want to use AI engines, have the AI experts involved in
task development. AI engines can aid in finding exemplars & outliers
– Can achieve high accuracy when essays are scored by once by hand, once by AI—higher than 2 humans
14
• The SA items tend to be difficult• Long administration time• Expensive to administer and score• Don’t contribute to ongoing score estimate • Easy stand-alone items may be
overexposed• Items embedded in passages will often be
mismatched informationally
Some risks
14
15
• Everyone is in favor of getting out of the MC box.
• Allows the kind of skill integration promoted in current instruction
• For high-stakes tests, prevents the proxy from becoming the objective.
Some Advantages
15