View
2
Download
0
Category
Preview:
Citation preview
P. Fotaris,King’s Col lege London
T. Mastoras,University of Macedonia
INTEGRATING IRTANALYSIS INTO LMS FOR
ITEM POOL OPTIMISATION
Computer-Aided Assessment (CAA) Use of computers in the assessment process. Self-assessment tests. Multiple Choice Questions. Automatic evaluation of the responses provided by examinees. Major trend in MOOCs and academic institutions worldwide.
COMPUTER-AIDED ASSESSMENT
Criticism of test quality: Many test items (questions) are flawed at the initial stage of their
development. 50% of the test items will fail to perform as intended (Haladyna, 1999). Unreliable results of examinee performance.
Two major approaches to item evaluation using item response data: Classical Test Theory (CTT) Item difficulty Item discrimination
Item Response Theory (IRT) Item difficulty (b) Item discrimination (a) Guessing (c)
TEST QUALITY ISSUES
CTT Relatively simple to compute and understand. True-score is defined in the context of a specific test. Not as sensitive to items that discriminate differentially across different
levels of ability. Not as effective in identifying items that are statistically biased.
IRT Parameters are not sample- or test-dependent. Allows a meaningful comparison between the ability of a person and the
difficulty of an item. Requires relatively complex estimation procedures. Requires large sample sizes (>1,000) (Swaminathan and Gifford, 1983). Lack of widely available, user-friendly software. Not included in standard statistical packages like SPSS and SAS.
CTT VS. IRT
Item analysis results can be used to: Evaluate performance Improve item quality
Test developers can use the result to discriminate whether: an item can be reused as is. should be revised before reuse. should be taken out of the active item pool.
ITEM POOL OPTIMISATION
Open-source IRT Analysis Tool ICL Hanson (2002) provided a stand-alone software for estimating the
parameters of IRT models called IRT Command Language (ICL). ICL is actually IRT estimation functions embedded into a ful ly -
featured programming language called TCL (“tickle”).
Open-source LMS Dokeos It is implemented in PHP and requires Apache acting as a web
server and mySQL as a Database Management System.
IRT ANALYSIS TOOL “ICL”& LMS “DOKEOS”
1. Defining the acceptable l imits for IRT parameters:a) discriminationb) dif ficulty, andc) guessing
2. If the estimated parameters of an item violate any of the validity rules, the item is flagged for review of its content.
3. The assessment report displays the estimated proficiency θ per examinee.
INTEGRATING IRT ANALYSIS IN LMS
1
2
3Normalized θ
DEFINING VALIDITY THRESHOLDS
PERFORMING IRT ANALYSIS
IRT ANALYSIS RESULTS
SYSTEM ARCHITECTURE
Web Server
LMS “Dokeos”
Assessment Profile
IRT tool “ICL”
Assessment Results
Estimated Parameters
Estimated Theta
Theta Estimation
Script
Examinee
Examinee
Assessment TestDeveloper
Examinee
Calibration Rules
IRT Analysis Results
1
2
3
4
Parameter Estimation
Script
The LMS exports the assessment results to a data file and generates a TCL script to process them.
A s s e s s m e n t r e s u l t s ( t e s t 0 1 4 0 . d a t f i l e ) . P a r a m e t e r E s t i m a t i o n S c r i p t ( t e s t 0 1 4 0 . t c l f i l e ) .
STEP ONE
01010010001001111110010101011000001001110101000100010001100000111001100000100110000000000000001100000011000100001000100000010100001100101000001111011100100001000100010000000001100000000001001010000100011101110111011111110111111111110111111111110010011100000000000111010100001011000110000000010011101000110000001000000110...……… one row per examinee ………
output -no_printallocate_items_dist 40read_examinees test0140.dat 40i1starting_values_dichotomousEM_steps -max_iter 200print -item_paramrelease_items_dist
The LMS cal ls up ICL in order to create a data fi le containing the α, b, and c values for each test i tem.
At the same time it prepares a second TCL script to process these IRT parameters (θ estimation script) .
E s t i m a t e d p a r a m e t e r s ( t e s t 0 1 4 0 . p a r f i l e ) θ e s t i m a t i o n s c r i p t ( t e s t 0 1 4 0 t . t c l f i l e )
STEP TWO
1 1,597597 1,506728 0,1285152 1,377810 -0,87616 0,2239033 1,258461 0,549362 0,1405934 1,031856 0,495642 0,0792795 1,077831 1,004437 0,1363246 0,479151 1,544218 0,2182707 1,439241 1,279352 0,0823828 0,898259 1,310215 0,1295709 1,837514 1,349520 0,03267510 0,467694 0,934207 0,20608511 0,607603 0,265524 0,18121212 0,240009 1,054301 0,24573713 0,945631 1,451464 0,050895...……… one row per item ………
output -no_printallocate_items_dist 40read_examinees test0140.dat 40i1read_item_param test0140.parset estep [new_estep]estep_compute $estep 1 1delete_estep $estepset eapfile [open test0140.theta w]for {set i 1} {$i <= [num_examinees]} {incr i} {...}close $eapfilerelease_items_dist
The LMS calls up ICL to make a data file with the examinees’ θ values.
E s t i m a t e d t h e t a ( t e s t 0 1 4 0 . t h e t a f i l e )
STEP THREE
0,378453 0,434304 19-0,149162 -0,096175 14-1,523733 -5,999491 7-0,238032 -0,172708 15-0,964941 -1,001566 81,658672 1,737581 34-0,343387 -0,312642 16-0,665486 -0,666954 12...……… one row per examinee ………
The LMS imports the two ICL-produced data files (*.par and *.theta) to its database for further processing in the context of the aimed assessment test calibration.
STEP FOUR
E n t i t y - R e l a t i o n s h i p d i a g r a m o f L M S d a t a b a s e e x t e n s i o n s
LMS DATABASE EXTENSIONS
quiz_rel_question
PK,FK1,I2,I1 question_idPK,FK2,I3 exercice_id
quiz_version
PK,I1 idPK,I2 version
U1 quiz_id lower_a lower_b upper_b upper_c
track_e_answers
PK id
FK1,I1 exe_id question_id answer_id answer correct weighting
quiz
PK,FK1 id
title description sound type random active
track_e_exercices
PK exe_id
FK2,I2 exe_user_id exe_date exe_cours_idFK1,I1 exe_exo_id exe_result exe_weighting
quiz_answer
PK,I1 idPK,FK1,I3,I2 question_id
answer correct comment ponderation position
user
PK user_id
lastname firstname username password auth_source email status official_code phone picture_uri creator_id competences diplomas openarea teach productions chatcall_user_id chatcall_date chatcall_text
quiz_question
PK id
question description ponderation position type picture
Assessment Item Option
Examinee
Result Score details
Version
Initial LMS database
1
2
(relation m-m)
Each assessment can have multiple versions based on its revised items. By monitoring the examinees’ performance on each item, test developers can determine whether a certain modification of a specific item affected positively its quality.
Each examinee’s score per item is recorded for every test being administered.
Test developers can establish a new set of rules for each version of the assessment.
ADDITIONAL FUNCTIONALITIES
EXPERIMENTAL RESULTS
Item Discrimination Parameter Values (α)α ≥ 0.5 (White area)
Item Difficulty Parameter Values (b)-1.7 ≤ b ≤ 1.7 (White area)
Item Guessing Parameter Values (c)c ≤ 0.25 (White area)
EXAMPLES OF DEFECTIVE TEST ITEMS
Initial Item Revised ItemLow level of difficulty; the key answer deemed too plausible:Q: In the paged memory allocation scheme, theoperating system retrieves data from secondarystorage in same-size blocks called:A. pagesB. framesC. segmentsD. partitions
Q: In which memory allocation scheme does theoperating system retrieve data from secondarystorage in several blocks of different sizes?A. segmentedB. pagedC. demand pagingD. partitioned
Low degree of discrimination; the key answer confused examinees of both high and low abilities:Q: The transfer layer protocol of TCP/IP is called:A. TCPB. UDPC. IPD. A and B
Q: The transfer layer protocol of TCP/IP is called:A. TCP/UDPB. FTPC. IPD. HTTP
High guessing value probably due to the graduated answers:Q: How many are the basic control structures inprogramming?A. oneB. twoC. threeD. four
Q: The control structure used to choose amongalternative courses of action is called:A. sequenceB. repetitionC. selectionD. iteration
The suggested approach is capable of assisting test developers in their continuous effort to improve flaws test items.
The user-friendly interface allows users with no previous expertise in statistics to comprehend and utilize the IRT analysis results.
The initial experiment produced encouraging results, showing that the system can effectively evaluate item performance.
The proposed methodology can be easily adopted by dif ferent e-learning environments and would be ideally suited for a MOOC environment.
CONCLUSION
References Haladyna, T.M. (1999) Developing and Validating Multiple-Choice Test
Items (2nd edition), Lawrence Erlbaum Associates, Mahwah, New Jersey. Swaminathan, H. & Gifford, J.A. (1983) 'Estimation of Parameters in
the Three-parameter Latent Trait Model', in Weiss, D.J. (ed.), New Horizons in Testing, Academic Press, New York. Hanson, B.A. (2002) IRT Command Language (ICL). Obtained through
the Internet: http://www.b-a-h.com/software, [accessed 26/6/2013].
Thank you very much!
Questions?
QUESTIONS ?
Recommended