The use of asynchronously scored items in adaptive test sessions. Marty McCall Smarter Balanced Assessment Consortium 1 2015 CCSSO NCSA San Diego CA

1

The use of asynchronously scored items in adaptive test sessions.

Marty McCallSmarter Balanced Assessment Consortium

2015 CCSSO NCSA San Diego CA

2

Assessments produce evidence of student performance on challenging tasks that evaluate the Common Core State Standards (CCSS). . . . . They emphasize deep knowledge of core concepts and ideas within and across the disciplines—along with analysis, synthesis, problem solving, communication, and critical thinking—thereby requiring a focus on complex performances as well as on specific concepts, facts, and skills (Smarter Balanced (2010). Theory of Action. p.1).

Smarter Balanced Theory of Action

3

Nature of tasks under CCSS and ECD

• Influence of cognitive psychology– Interest in mental processes and models– Purpose of task is to provide evidence confirming

or refuting hypotheses about what a student knows – Evidence Centered Design

– Tasks exemplify desired learning

• Integration of skills into complex tasks– Core foundational concepts building across years– Tasks demand use of several skills and concepts– Emphasis on communication and insight

4

6 Key Components of Evidence-Centered Design

6. Develop Items or Performance Tasks

1. Define the domain Common Core Standards Math/ELA

2. Define claims to be made4 ELA & 4 Math ClaimsContent Specifications

3. Define assessment targets Knowledge, Skills, & Abilities

4. Define evidence required Evidence to be Elicited from Student

5. Develop Task Models Methods for Eliciting Evidence

5

• Overall Claim

– Reading– Writing– Speaking and Listening– Research/Inquiry

• Claims provide overall test and reporting structure– Targets are nested within claims

Approved English Language Arts Claims

6

Test design –Assessments have an adaptive and a non-adaptive

component

C1: ReadingLiterary

Informational

C3: Listening C4: Research

Adaptive session PT

C2: Writing

7

• CENTRAL IDEAS• REASONING & EVIDENCE• KEY DETAILS• WORD MEANINGS• REASONING & EVIDENCE• ANALYSIS WITHIN OR ACROSS TEXTS• TEXT STRUCTURES & FEATURES• LANGUAGE USE

Reading Targets – info and lit

8

• WRITE/REVISE BRIEF TEXTS - Narrative strategies

• WRITE/REVISE BRIEF TEXTS - Organizing ideas

• WRITE/REVISE BRIEF TEXTS - provide support for opinions

• COMPOSE FULL TEXTS• EDIT/CLARIFY

Writing Targets

9

• WRITE BRIEF TEXTS - Narrative strategies

• WRITE BRIEF TEXTS - Organizing ideas• WRITE BRIEF TEXTS - provide support

for opinions• CENTRAL IDEAS• REASONING & EVIDENCE

Require written responses

10

Test design –Every adaptive test component must have:

Adaptive session

• 1 written response for a WRITE BRIEF TEXTS target

• 2 written response in reading, 1 for a literary passage, 1 for an informational passage addressing either• CENTRAL IDEAS• REASONING & EVIDENCE

11

• Chosen adaptively, based on best information• Standalone written response items are chosen

like any other polytomous item using best expected information of the item as a whole

• Passages are chosen from by passage sets of items. The passage with the highest expected information value is selected from the set of passages with a written response item.

• No change to ability estimate is made. The test proceeds adaptively from that point.

How do these fit in a CAT session?

12

• Responses are scored separately• Human or AI scoring• Scored responses are combined with CAT

and PT responses to make up complete test event

• Overall test scores are calculated from all scored responses using IRT parameters

How are they scored and combined with other

responses?

13

AI Scoring• Issues– AI scoring is controversial, particularly with higher ed and in

the ELA educator community– Reliability for short answers is not quite there, although

improving rapidly– AI engines aren’t built to be a real-time plug-in

• Promising results– Some items can be scored reliably enough to be used on

summative tests (but not others).– If you want to use AI engines, have the AI experts involved in

task development. AI engines can aid in finding exemplars & outliers

– Can achieve high accuracy when essays are scored by once by hand, once by AI—higher than 2 humans

14

• The SA items tend to be difficult• Long administration time• Expensive to administer and score• Don’t contribute to ongoing score estimate • Easy stand-alone items may be

overexposed• Items embedded in passages will often be

mismatched informationally

Some risks

14

15

• Everyone is in favor of getting out of the MC box.

• Allows the kind of skill integration promoted in current instruction

• For high-stakes tests, prevents the proxy from becoming the objective.

Some Advantages

15

16

Thank you for your attentionQuestions?

Contact [email protected]

mailto:[email protected]

Documents

The use of asynchronously scored items in adaptive test sessions. Marty McCall Smarter Balanced Assessment Consortium 1 2015 CCSSO NCSA San Diego CA