28
Copyright © 2014 by Educational Testing Service. All rights reserved. Practical Issues and Challenges in Operationalizing K-12 CAT Presentation at the National Conference on Student Assessment (NCSA) 2015, San Diego, CA Yi Du, Ph.D. Educational Testing Services 1

Practical Issues and Challenges in Operationalizing K-12 CAT - the Conference … · 2015. 7. 14. · Presentation at the National Conference on Student Assessment (NCSA) 2015, San

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Practical Issues and Challenges in Operationalizing K-12 CAT - the Conference … · 2015. 7. 14. · Presentation at the National Conference on Student Assessment (NCSA) 2015, San

Copyright © 2014 by Educational Testing Service. All rights reserved.

Practical Issues and Challenges in

Operationalizing K-12 CAT

Presentation at the National Conference on Student Assessment (NCSA) 2015, San Diego, CA

Yi Du, Ph.D.

Educational Testing Services

1

Page 2: Practical Issues and Challenges in Operationalizing K-12 CAT - the Conference … · 2015. 7. 14. · Presentation at the National Conference on Student Assessment (NCSA) 2015, San

Copyright © 2014 by Educational Testing Service. All rights reserved.Copyright © 2014 by Educational Testing Service. All rights reserved.

Introduction

• Using computerized adaptive tests (CAT) for the State standard-based assessments becomes quite attractive.

• Theoretical foundation of CAT, such as five basic components of a CAT procedure have been well-researched (Weiss, 1984)

• Implementation Questions for CAT Were Well Studied (Way, 2005; Davey, 2011).

• However, issues addressed from operational CAT practices remain.

2

Page 3: Practical Issues and Challenges in Operationalizing K-12 CAT - the Conference … · 2015. 7. 14. · Presentation at the National Conference on Student Assessment (NCSA) 2015, San

Copyright © 2014 by Educational Testing Service. All rights reserved.Copyright © 2014 by Educational Testing Service. All rights reserved.

Issues Addressed from Operational CAT Practices

– How to measure a student accurately when he/she didn’t complete a CAT test?

– Does a selection mechanism make use of an exposure control procedure?

– How to ensure the item selection mechanism meets all required test specifications?

– How to replicate students’ CAT results at State end?– How to ensure the accuracy of the CAT results?– How to ensure the item selection mechanism provides

appropriate difficulty level of items to students, not frustrating both low- or high-achieving students?

– How to communicate CAT results to test users?– Does CAT allow students for skipping items during exams?

3

Page 4: Practical Issues and Challenges in Operationalizing K-12 CAT - the Conference … · 2015. 7. 14. · Presentation at the National Conference on Student Assessment (NCSA) 2015, San

Copyright © 2014 by Educational Testing Service. All rights reserved.Copyright © 2014 by Educational Testing Service. All rights reserved.

How to Ensure the Quality of CAT Results?

• Rigorous quality control procedures and post hoc analyses were well implemented throughout the entire assessment process for paper-and-pencil tests (PPT) in most states

• Comprehensive QC procedures and post hoc analysis for online tests, especially for CAT, may not be well established yet

– QC results from CAT may not be as straightforward as those from PPT.

– Most technical characteristics in CAT are not visible to test users.

– Those invisible parts have significant impact on the quality of CAT results.

4

Page 5: Practical Issues and Challenges in Operationalizing K-12 CAT - the Conference … · 2015. 7. 14. · Presentation at the National Conference on Student Assessment (NCSA) 2015, San

Copyright © 2014 by Educational Testing Service. All rights reserved.Copyright © 2014 by Educational Testing Service. All rights reserved.

Objectives of the Presentation

• Focus on the issues and challenges from the operational CAT practices

• Provide thoughts for practitioners on operational CAT

• Discuss how post-hoc analysis as a tool can help us understand the issues better

• Provides examples on how to use post-hoc analysis to examine and ensure the quality of CAT scores

5

Page 6: Practical Issues and Challenges in Operationalizing K-12 CAT - the Conference … · 2015. 7. 14. · Presentation at the National Conference on Student Assessment (NCSA) 2015, San

Copyright © 2014 by Educational Testing Service. All rights reserved.Copyright © 2014 by Educational Testing Service. All rights reserved.

Issues of the Presentation

Major practical issues we concerned related to items selection and score estimation:

– Test specifications (blueprints)

– Item performance

– Item exposure and overlapped

– Attemptedness of a test

– Incompletion of tests

– Accuracy of the final results

6

Page 7: Practical Issues and Challenges in Operationalizing K-12 CAT - the Conference … · 2015. 7. 14. · Presentation at the National Conference on Student Assessment (NCSA) 2015, San

Copyright © 2014 by Educational Testing Service. All rights reserved.Copyright © 2014 by Educational Testing Service. All rights reserved.

Test Specifications (Blueprints)

• Test specifications– Ensure a CAT system accurately assesses a full range of standards.

– Should be determined with the simulation of tests based on the item pools, prior to administering the CAT.

• In CAT, every student has a unique test form; this form is built on the fly.

• Components for specifications: item pool, item selection algorithm, simulations.

7

Page 8: Practical Issues and Challenges in Operationalizing K-12 CAT - the Conference … · 2015. 7. 14. · Presentation at the National Conference on Student Assessment (NCSA) 2015, San

Copyright © 2014 by Educational Testing Service. All rights reserved.Copyright © 2014 by Educational Testing Service. All rights reserved.

Test Specifications

• Is it necessary that every student meets the test specifications in CAT?

• Did every student actually meet the test specifications in a CAT?

What analysis can help answer these questions?

• Conventional analysis

• Post-hoc real data simulation

8

Page 9: Practical Issues and Challenges in Operationalizing K-12 CAT - the Conference … · 2015. 7. 14. · Presentation at the National Conference on Student Assessment (NCSA) 2015, San

Copyright © 2014 by Educational Testing Service. All rights reserved.Copyright © 2014 by Educational Testing Service. All rights reserved.

Examining Whether the Test Specifications Were Met

Specifications Actually Tested

Content Category ItemsTotal Items by Domain

% Met %Less % More

ReadingVocabulary 7-8

21-2482% 1% 17%

Literary 7-8 89% 0% 11%Informational 7-8 96% 1% 3%

Writing

Organization/Purpose

510

100% 0% 0%Evidence/Elaboration 100% 0% 0%Conventions 5 100% 0% 0%

Listening Listening 10 10 95% 0% 5%Overall 41-44 100% 0% 0%

9

Page 10: Practical Issues and Challenges in Operationalizing K-12 CAT - the Conference … · 2015. 7. 14. · Presentation at the National Conference on Student Assessment (NCSA) 2015, San

Copyright © 2014 by Educational Testing Service. All rights reserved.Copyright © 2014 by Educational Testing Service. All rights reserved.

Test Specifications

Advanced approaches to examine the test specifications:

• Post hoc (real data) simulations

• Software

– Open sources

• SimulCAT (Kyung Han, 2012)

• FireStar (Choi, 2009)

• SimuMCAT (Lihua, Yao, 2011)

• Concerto (David Magis, 2014)

– Commercial

• CATSim (David Weiss)10

Page 11: Practical Issues and Challenges in Operationalizing K-12 CAT - the Conference … · 2015. 7. 14. · Presentation at the National Conference on Student Assessment (NCSA) 2015, San

Copyright © 2014 by Educational Testing Service. All rights reserved.Copyright © 2014 by Educational Testing Service. All rights reserved.

Attemptedness Status

• Determine whether a student’s score should be counted, valid, and reported

• What is attemptedness in CAT?

– Just login the tests, or

– respond to items?• How many questions responded to qualify for a score?

• How many questions responded to qualify for a valid score?

– Aattemptedness rule for overall scores and for domain scores?

• how many students were effected by the policy and whether the policy is appropriate

11

Page 12: Practical Issues and Challenges in Operationalizing K-12 CAT - the Conference … · 2015. 7. 14. · Presentation at the National Conference on Student Assessment (NCSA) 2015, San

Copyright © 2014 by Educational Testing Service. All rights reserved.Copyright © 2014 by Educational Testing Service. All rights reserved.

Attemptedness Analysis

Score # CompletionIncompletion

with Valid Score

Incompletion with Lowest

ScoreNS TotalN

Overall 20529 58 0 12 20599

Domain1 20599 0 0 0 20599

Domain2 20524 0 0 75 20599

Domain3 20599 0 0 0 20599

Domain4 20587 0 0 12 20599

12

Page 13: Practical Issues and Challenges in Operationalizing K-12 CAT - the Conference … · 2015. 7. 14. · Presentation at the National Conference on Student Assessment (NCSA) 2015, San

Copyright © 2014 by Educational Testing Service. All rights reserved.Copyright © 2014 by Educational Testing Service. All rights reserved.

Attemptedness Status Analysis

% Incorrect%Correct (1,2,3,4) % Omitted %Not Seen %NS

ELA3 49.29 50.65 0.01 0.02 0.03

ELA4 48.99 50.35 0.03 0.05 0.1

ELA5 47.96 51.97 0.03 0.02 0.02

13

Page 14: Practical Issues and Challenges in Operationalizing K-12 CAT - the Conference … · 2015. 7. 14. · Presentation at the National Conference on Student Assessment (NCSA) 2015, San

Copyright © 2014 by Educational Testing Service. All rights reserved.Copyright © 2014 by Educational Testing Service. All rights reserved.

Incompletion of A CAT Test

• Tests are considered “complete” if students respond to the minimum number of operational items. Otherwise, the tests are “incomplete.”

• CAT rules:

– Is a student allowed for skipping an item in the middle of CAT?

• If it is a “Yes” and if a test is considered attempted and scored,

– Omit: Skip items in the middle and complete a test

– Incomplete: Stop in the middle and never complete.

• Items were presented in several ways:

– Seen and responded to

– Seen and not responded to

– Not presented

• Scoring rules and QC procedures should consider those cases

14

Page 15: Practical Issues and Challenges in Operationalizing K-12 CAT - the Conference … · 2015. 7. 14. · Presentation at the National Conference on Student Assessment (NCSA) 2015, San

Copyright © 2014 by Educational Testing Service. All rights reserved.Copyright © 2014 by Educational Testing Service. All rights reserved.

Incompletion but Valid Score Adjustment

Several approaches are to score the incomplete tests:

• Score the unanswered items as incorrect.

– The item parameters of all unanswered were imputed by

• The average value of items in the item pool

• The average value of items a student answered

• The range of items a student answered

• Score the actually completed portion, and adjust the incomplete portion proportionally with a student’s ability estimate.

• It is an ongoing research topic.

15

Page 16: Practical Issues and Challenges in Operationalizing K-12 CAT - the Conference … · 2015. 7. 14. · Presentation at the National Conference on Student Assessment (NCSA) 2015, San

Copyright © 2014 by Educational Testing Service. All rights reserved.Copyright © 2014 by Educational Testing Service. All rights reserved.

Accuracy of Final Scores

– A detailed technically description of the methodology of the provisional and final scoring computation process should be provided by vendors,

– Statistical properties of the final scores such as test reliability, standard errors of measurement, test information functions, etc. for bias and precision should be computed and provided by vendors,

– At State end, State psychometricians or researchers may need to conduct other analysis to ensure the quality of scoring.

16

Page 17: Practical Issues and Challenges in Operationalizing K-12 CAT - the Conference … · 2015. 7. 14. · Presentation at the National Conference on Student Assessment (NCSA) 2015, San

Copyright © 2014 by Educational Testing Service. All rights reserved.Copyright © 2014 by Educational Testing Service. All rights reserved.

Scope of Scoring QC

• Final Ability Estimate Procedures

– MLE, Bayesian (EAP or MAP), Inverse TCC

• Overall scores and domain scores

• Theta to scale scores

– Transformation from theta to scale

– Achievement level

– Range

17

Page 18: Practical Issues and Challenges in Operationalizing K-12 CAT - the Conference … · 2015. 7. 14. · Presentation at the National Conference on Student Assessment (NCSA) 2015, San

Copyright © 2014 by Educational Testing Service. All rights reserved.Copyright © 2014 by Educational Testing Service. All rights reserved.

Conventional Analysis

Percentage of Students in Achievement Level

Level 1 Level 2 Level 3 Level 4 TOP TwoEL3 38.6 26.72 18.9 15.78 34.68EL4 43.44 21.89 19.81 14.86 34.67EL5 35.9 22.57 26.93 14.6 41.53EL6 31.32 29.54 27.64 11.5 39.14EL7 31.93 27.11 30.75 10.22 40.97EL8 26.18 30.98 32.74 10.1 42.84

MA3 37.72 27.54 23.97 10.77 34.74MA4 32.96 36.18 20.46 10.41 30.87MA5 45.11 29 13.83 12.07 25.9MA6 40.15 32 16.4 11.44 27.84MA7 40.29 30.45 17.78 11.48 29.26MA8 43.47 27.25 15.7 13.58 29.28

18

Page 19: Practical Issues and Challenges in Operationalizing K-12 CAT - the Conference … · 2015. 7. 14. · Presentation at the National Conference on Student Assessment (NCSA) 2015, San

Copyright © 2014 by Educational Testing Service. All rights reserved.Copyright © 2014 by Educational Testing Service. All rights reserved.

Conventional Analysis

Student Mean and Standard Deviation across Grades

OP2015 OP 2014

Test Theta_mean Theta_std Theta_mean Theta_std

EL3 -1.30 1.01 -1.23 1.05

EL4 -0.86 1.05 -0.74 1.11

EL5 -0.32 1.06 -0.31 1.10

EL6 -0.05 1.06 -0.05 1.11

EL7 0.22 1.08 0.12 1.14

EL8 0.47 1.06 0.39 1.14

MA3 -1.39 0.98 -1.27 0.96

MA4 -0.85 0.99 -0.70 1.00

MA5 -0.55 1.10 -0.33 1.07

MA6 -0.28 1.25 -0.08 1.18

MA7 -0.09 1.32 0.03 1.34

MA8 0.13 1.41 0.28 1.33

19

Page 20: Practical Issues and Challenges in Operationalizing K-12 CAT - the Conference … · 2015. 7. 14. · Presentation at the National Conference on Student Assessment (NCSA) 2015, San

Copyright © 2014 by Educational Testing Service. All rights reserved.Copyright © 2014 by Educational Testing Service. All rights reserved.

Accuracy of Student Testing Results

• The error variance provides estimates of precision

SEM N %

Grade 3

greater or equal than 2.5 0 0greater or equal than 1.5 but less than 2.5 1 0greater or equal than 0.5 but less than 1.5 281 0.57greater or equal than 0.3 but less than 0.5 11599 23.57greater or equal than 0 but less than 0.3 37330 75.86Total 49211 100.00

Grade 4

greater or equal than 2.5 0 0greater or equal than 1.5 but less than 2.5 0 0greater or equal than 0.5 but less than 1.5 169 0.58greater or equal than 0.3 but less than 0.5 13196 45.18greater or equal than 0 but less than 0.3 15842 54.24Total 29207 10020

Page 21: Practical Issues and Challenges in Operationalizing K-12 CAT - the Conference … · 2015. 7. 14. · Presentation at the National Conference on Student Assessment (NCSA) 2015, San

Copyright © 2014 by Educational Testing Service. All rights reserved.Copyright © 2014 by Educational Testing Service. All rights reserved.

Advanced Analysis

• Check the ability estimate procedures: generate a program to compute MLE, EAP, MAP, or Inverse TCC, based on the IRT model used,

• Replicate vendor’s results,

• Track students’ response paths to examine if the item selection algorithm is effective. (Measurement precision, security, content balance, maximum item usage).

21

Page 22: Practical Issues and Challenges in Operationalizing K-12 CAT - the Conference … · 2015. 7. 14. · Presentation at the National Conference on Student Assessment (NCSA) 2015, San

Copyright © 2014 by Educational Testing Service. All rights reserved.Copyright © 2014 by Educational Testing Service. All rights reserved.

Conventional Item Analysis

22

ItemID a-Part b_part c_part d_partScore Point

Score of 0

Score of 1

Score of 2

P-valueN Students

Item1 0.7049 -0.06291 0.21 0,1 13% 87% 0.87 23Item2 0.58321 1.83536 0.25 0,1 60% 40% 0.4 3457

Item3 0.65237 0.36911 0.25 0,1 39% 61% 0.61 3457

Item4 0.3744 1.90488 0.20 0,1 84% 16% 0.16 25

Item5 0.57374 1.90652.39802, -.39802 0,1,2 63% 11% 26% 0.31 27

Item6 0.43612 2.75943 0.23 0,1 89% 11% 0.11 27Item7 0.16635 0.9321 0.34 0,1 32% 68% 0.68 19

Item8 0.47194 0.82735 0.25 0,1 42% 58% 0.58 19Item9 0.78146 0.96798 0.27 0,1 53% 47% 0.47 19

Item10 0.36417 1.04496 0.31 0,1 35% 65% 0.65 1383

Page 23: Practical Issues and Challenges in Operationalizing K-12 CAT - the Conference … · 2015. 7. 14. · Presentation at the National Conference on Student Assessment (NCSA) 2015, San

Copyright © 2014 by Educational Testing Service. All rights reserved.Copyright © 2014 by Educational Testing Service. All rights reserved.

Item Exposure Control

• Item exposure is an important consideration for test security in the continuous testing environment of CATs.

• High item exposure rates pose a formal threat to test security.

• It is important to check if item exposure control is implemented for security

23

Page 24: Practical Issues and Challenges in Operationalizing K-12 CAT - the Conference … · 2015. 7. 14. · Presentation at the National Conference on Student Assessment (NCSA) 2015, San

Copyright © 2014 by Educational Testing Service. All rights reserved.Copyright © 2014 by Educational Testing Service. All rights reserved.

Visually InspectionbValue Nitems >=3000 1000-3000 500-1000 100-500 1-100 0(-2.5,-2.0](-2.0,-1.5] 3 3(-1.5,-1.0] 12 1 4 7(-1.0,-0.5] 48 3 5 8 9 23(-0.5,0.0] 98 11 13 12 18 42 2(0.0,0.5] 149 13 18 24 25 63 6(0.5,1.0] 185 12 14 31 34 86 8(1.0,1.5] 198 11 16 53 41 67 10(1.5,2.0] 270 17 19 33 94 74 33(2.0,2.5] 202 5 9 21 65 69 33(2.5,3.0] 151 8 4 11 56 52 20(3.0,3.5] 109 2 6 1 49 38 13(3.5,4.0] 72 1 1 2 17 44 7(4.0,4.5] 52 1 2 7 35 7(4.5,5.0] 21 2 1 18(5.0,5.5] 7 1 5 1(5.5,6.0] 4 4>6

24

Page 25: Practical Issues and Challenges in Operationalizing K-12 CAT - the Conference … · 2015. 7. 14. · Presentation at the National Conference on Student Assessment (NCSA) 2015, San

Copyright © 2014 by Educational Testing Service. All rights reserved.Copyright © 2014 by Educational Testing Service. All rights reserved.

Item Exposure Rate by Item Difficulty Level

0

1

2

3

4

5

6

7

8

9

25

Page 26: Practical Issues and Challenges in Operationalizing K-12 CAT - the Conference … · 2015. 7. 14. · Presentation at the National Conference on Student Assessment (NCSA) 2015, San

Copyright © 2014 by Educational Testing Service. All rights reserved.Copyright © 2014 by Educational Testing Service. All rights reserved.

Item Exposure by Content Domains

Domain# ItemsFreq>=3000

1000<=Freq<3000

500<=Freq<1000

100<=Freq<500

1<=Freq<100 Freq=0

1 499 40 22 15 39 276 107

2 437 9 39 96 242 37 14

3 334 26 28 4 6 264 6

4 311 10 18 84 136 50 13

Overall 1581 85 107 199 423 627 140

26

Page 27: Practical Issues and Challenges in Operationalizing K-12 CAT - the Conference … · 2015. 7. 14. · Presentation at the National Conference on Student Assessment (NCSA) 2015, San

Copyright © 2014 by Educational Testing Service. All rights reserved.Copyright © 2014 by Educational Testing Service. All rights reserved.

Final Comments

• Post hoc analysis can help:– A State explore the invisible technical characteristics of a CAT

system

– Fully understand the capabilities and limitations of a CAT system in operational tests.

– Efficient QC procedures for CAT.

• Many options in post-hoc analysis for CAT exist.

27

Page 28: Practical Issues and Challenges in Operationalizing K-12 CAT - the Conference … · 2015. 7. 14. · Presentation at the National Conference on Student Assessment (NCSA) 2015, San

Copyright © 2014 by Educational Testing Service. All rights reserved.Copyright © 2014 by Educational Testing Service. All rights reserved.

Thanks very much for your

Time and Consideration!

28