P. Fotaris , INTEGRATING IRT King’s College London ANALYSIS … › fileadmin › institut ›...

P. Fotaris,King’s Col lege London

T. Mastoras,University of Macedonia

INTEGRATING IRTANALYSIS INTO LMS FOR

ITEM POOL OPTIMISATION

Computer-Aided Assessment (CAA) Use of computers in the assessment process. Self-assessment tests. Multiple Choice Questions. Automatic evaluation of the responses provided by examinees. Major trend in MOOCs and academic institutions worldwide.

COMPUTER-AIDED ASSESSMENT

Criticism of test quality: Many test items (questions) are flawed at the initial stage of their

development. 50% of the test items will fail to perform as intended (Haladyna, 1999). Unreliable results of examinee performance.

Two major approaches to item evaluation using item response data: Classical Test Theory (CTT) Item difficulty Item discrimination

Item Response Theory (IRT) Item difficulty (b) Item discrimination (a) Guessing (c)

TEST QUALITY ISSUES

CTT Relatively simple to compute and understand. True-score is defined in the context of a specific test. Not as sensitive to items that discriminate differentially across different

levels of ability. Not as effective in identifying items that are statistically biased.

IRT Parameters are not sample- or test-dependent. Allows a meaningful comparison between the ability of a person and the

difficulty of an item. Requires relatively complex estimation procedures. Requires large sample sizes (>1,000) (Swaminathan and Gifford, 1983). Lack of widely available, user-friendly software. Not included in standard statistical packages like SPSS and SAS.

CTT VS. IRT

Item analysis results can be used to: Evaluate performance Improve item quality

Test developers can use the result to discriminate whether: an item can be reused as is. should be revised before reuse. should be taken out of the active item pool.

ITEM POOL OPTIMISATION

Open-source IRT Analysis Tool ICL Hanson (2002) provided a stand-alone software for estimating the

parameters of IRT models called IRT Command Language (ICL). ICL is actually IRT estimation functions embedded into a ful ly -

featured programming language called TCL (“tickle”).

Open-source LMS Dokeos It is implemented in PHP and requires Apache acting as a web

server and mySQL as a Database Management System.

IRT ANALYSIS TOOL “ICL”& LMS “DOKEOS”

1. Defining the acceptable l imits for IRT parameters:a) discriminationb) dif ficulty, andc) guessing

2. If the estimated parameters of an item violate any of the validity rules, the item is flagged for review of its content.

3. The assessment report displays the estimated proficiency θ per examinee.

INTEGRATING IRT ANALYSIS IN LMS

3Normalized θ

DEFINING VALIDITY THRESHOLDS

PERFORMING IRT ANALYSIS

IRT ANALYSIS RESULTS

SYSTEM ARCHITECTURE

Web Server

LMS “Dokeos”

Assessment Profile

IRT tool “ICL”

Assessment Results

Estimated Parameters

Estimated Theta

Theta Estimation

Script

Examinee

Assessment TestDeveloper

Examinee

Calibration Rules

IRT Analysis Results

Parameter Estimation

Script

The LMS exports the assessment results to a data file and generates a TCL script to process them.

A s s e s s m e n t r e s u l t s ( t e s t 0 1 4 0 . d a t f i l e ) . P a r a m e t e r E s t i m a t i o n S c r i p t ( t e s t 0 1 4 0 . t c l f i l e ) .

STEP ONE

01010010001001111110010101011000001001110101000100010001100000111001100000100110000000000000001100000011000100001000100000010100001100101000001111011100100001000100010000000001100000000001001010000100011101110111011111110111111111110111111111110010011100000000000111010100001011000110000000010011101000110000001000000110...……… one row per examinee ………

output -no_printallocate_items_dist 40read_examinees test0140.dat 40i1starting_values_dichotomousEM_steps -max_iter 200print -item_paramrelease_items_dist

The LMS cal ls up ICL in order to create a data fi le containing the α, b, and c values for each test i tem.

At the same time it prepares a second TCL script to process these IRT parameters (θ estimation script) .

E s t i m a t e d p a r a m e t e r s ( t e s t 0 1 4 0 . p a r f i l e ) θ e s t i m a t i o n s c r i p t ( t e s t 0 1 4 0 t . t c l f i l e )

STEP TWO

1 1,597597 1,506728 0,1285152 1,377810 -0,87616 0,2239033 1,258461 0,549362 0,1405934 1,031856 0,495642 0,0792795 1,077831 1,004437 0,1363246 0,479151 1,544218 0,2182707 1,439241 1,279352 0,0823828 0,898259 1,310215 0,1295709 1,837514 1,349520 0,03267510 0,467694 0,934207 0,20608511 0,607603 0,265524 0,18121212 0,240009 1,054301 0,24573713 0,945631 1,451464 0,050895...……… one row per item ………

output -no_printallocate_items_dist 40read_examinees test0140.dat 40i1read_item_param test0140.parset estep [new_estep]estep_compute $estep 1 1delete_estep $estepset eapfile [open test0140.theta w]for {set i 1} {$i <= [num_examinees]} {incr i} {...}close $eapfilerelease_items_dist

The LMS calls up ICL to make a data file with the examinees’ θ values.

E s t i m a t e d t h e t a ( t e s t 0 1 4 0 . t h e t a f i l e )

STEP THREE

0,378453 0,434304 19-0,149162 -0,096175 14-1,523733 -5,999491 7-0,238032 -0,172708 15-0,964941 -1,001566 81,658672 1,737581 34-0,343387 -0,312642 16-0,665486 -0,666954 12...……… one row per examinee ………

The LMS imports the two ICL-produced data files (*.par and *.theta) to its database for further processing in the context of the aimed assessment test calibration.

STEP FOUR

E n t i t y - R e l a t i o n s h i p d i a g r a m o f L M S d a t a b a s e e x t e n s i o n s

LMS DATABASE EXTENSIONS

quiz_rel_question

PK,FK1,I2,I1 question_idPK,FK2,I3 exercice_id

quiz_version

PK,I1 idPK,I2 version

U1 quiz_id lower_a lower_b upper_b upper_c

track_e_answers

FK1,I1 exe_id question_id answer_id answer correct weighting

PK,FK1 id

title description sound type random active

track_e_exercices

PK exe_id

FK2,I2 exe_user_id exe_date exe_cours_idFK1,I1 exe_exo_id exe_result exe_weighting

quiz_answer

PK,I1 idPK,FK1,I3,I2 question_id

answer correct comment ponderation position

PK user_id

lastname firstname username password auth_source email status official_code phone picture_uri creator_id competences diplomas openarea teach productions chatcall_user_id chatcall_date chatcall_text

quiz_question

question description ponderation position type picture

Assessment Item Option

Examinee

Result Score details

Version

Initial LMS database

(relation m-m)

Each assessment can have multiple versions based on its revised items. By monitoring the examinees’ performance on each item, test developers can determine whether a certain modification of a specific item affected positively its quality.

Each examinee’s score per item is recorded for every test being administered.

Test developers can establish a new set of rules for each version of the assessment.

ADDITIONAL FUNCTIONALITIES

EXPERIMENTAL RESULTS

Item Discrimination Parameter Values (α)α ≥ 0.5 (White area)

Item Difficulty Parameter Values (b)-1.7 ≤ b ≤ 1.7 (White area)

Item Guessing Parameter Values (c)c ≤ 0.25 (White area)

EXAMPLES OF DEFECTIVE TEST ITEMS

Initial Item Revised ItemLow level of difficulty; the key answer deemed too plausible:Q: In the paged memory allocation scheme, theoperating system retrieves data from secondarystorage in same-size blocks called:A. pagesB. framesC. segmentsD. partitions

Q: In which memory allocation scheme does theoperating system retrieve data from secondarystorage in several blocks of different sizes?A. segmentedB. pagedC. demand pagingD. partitioned

Low degree of discrimination; the key answer confused examinees of both high and low abilities:Q: The transfer layer protocol of TCP/IP is called:A. TCPB. UDPC. IPD. A and B

Q: The transfer layer protocol of TCP/IP is called:A. TCP/UDPB. FTPC. IPD. HTTP

High guessing value probably due to the graduated answers:Q: How many are the basic control structures inprogramming?A. oneB. twoC. threeD. four

Q: The control structure used to choose amongalternative courses of action is called:A. sequenceB. repetitionC. selectionD. iteration

The suggested approach is capable of assisting test developers in their continuous effort to improve flaws test items.

The user-friendly interface allows users with no previous expertise in statistics to comprehend and utilize the IRT analysis results.

The initial experiment produced encouraging results, showing that the system can effectively evaluate item performance.

The proposed methodology can be easily adopted by dif ferent e-learning environments and would be ideally suited for a MOOC environment.

CONCLUSION

References Haladyna, T.M. (1999) Developing and Validating Multiple-Choice Test

Items (2nd edition), Lawrence Erlbaum Associates, Mahwah, New Jersey. Swaminathan, H. & Gifford, J.A. (1983) 'Estimation of Parameters in

the Three-parameter Latent Trait Model', in Weiss, D.J. (ed.), New Horizons in Testing, Academic Press, New York. Hanson, B.A. (2002) IRT Command Language (ICL). Obtained through

the Internet: http://www.b-a-h.com/software, [accessed 26/6/2013].

Thank you very much!

Questions?

QUESTIONS ?

P. Fotaris , INTEGRATING IRT King’s College London ANALYSIS … › fileadmin › institut ›...

Documents

Integrating Systems Modeling with Engineering Analysis Integrating Systems Modeling with Engineering Analysis ... - Modelica - Simulink ... Compare integration techniques

Integrating Levels of Analysis in World Politics

IRT 4520 IRT 4020 ThermoScan - roteskreuz.at · IRT 4520 IRT 4020 Type 6022 Type 6023 IRT 4520/4020 EK KURTZ DESIGN 12.02.03-800-32 ThermoScan 226 T 6022 I / O mem 6022351_IRT_CE_S1

IRT-350 IR-Thermometer IRT-350 IR thermometer IRT-350

[IRT] Item Response Theory · 2019. 3. 1. · Title irt — Introduction to IRT models DescriptionRemarks and examplesReferencesAlso see Description Item response theory (IRT) is

CAT5508 2 001 003 - 5508(2) TA-2 - Accent Bearings · 2016. 4. 1. · irt 1212-1 irt 1216-1 irt 1222-1 irt 1216-1 irt 1220-1 irt 1215-2 irt 1220-2 irt 1225-2 irt 1215-2 irt 1225-2

Estimation of IRT Graded Response Models: Limited Versus Full Information Methods vs full info IRT... · 2012. 3. 26. · tor analysis model is a linear model originally proposed

Integrating DDR3 Signal Integrity and Timing Analysis · Integrating DDR3 Signal Integrity and Timing Analysis ... System Timing Analysis DQ DQS Datasheet Memory ... – Controllers

IRT ANALYSIS AND VALIDATION OF THE GRIT SCALE: A …

Early Childhood Longitudinal Study, Birth Cohort (ECLS-B) · 2.1.5 IRT Analysis and an Adaptive Testing Strategy..... 2-12 2.1.6 IRT Item Calibrations of the BSID-II Standardization

Product specification FlexTrack IRT 501 … specification - Robot user documentation 3HAC024534-001 Product manual - FlexTrack IRT 501-66 IRT 501-66R IRT 501-90 IRT 501-90R 3HAW050008590

IRT 4-20 PCAUTO IRT 4-10 PCAUTO IRT 3-20 PCD IRT ......IRT 4-20 PCAUTO IRT 4-10 PCAUTO IRT 3-20 PCD IRT COMBI 4-10 IR-UVA IRT COMBI 4-20 IR-UVA GB DE FR SE IT ES Assembly Manual Montageanleitung

IRT 4-1 PCAUTO IRT 4-2 PCAUTO IRT 4-10 PCAUTO IRT ... - … · 1 T A 201 IRT 4-1 PCAUTO. IRT 4-2 PCAUTO IRT 4-10 PCAUTO. IRT 4-20 PCAUTO. GB - Instruction manual DE - Betriebsanleitung

Integrating Automotive Hazard and Threat Analysis …2017.eurospi.net/images/EuroSPI2016/ppt/macher_eurospi2016.pdf · INTEGRATING AUTOMOTIVE HAZARD AND THREAT ANALYSIS ... INTEGRATING

A hybrid Persian sentiment analysis framework: Integrating

TECHNOLOGY MANAGEMENT AND ANALYSIS OF INTEGRATING …

IRT 424 DTP IRT 425 DTP IRT 428 DTP - Hedson...Assembly manual Montageanleitung Manuel d’Installation Monteringsanvisning Manuale di montaggio Manual de ensamblado IRT 424 DTP IRT

Integrating R & Hadoop - Text Mining & Sentiment Analysis

Integrating Social Networks and Cluster Analysis to

Integrating Pavement Crack Detection and Analysis Using