29
Some topics in nonparametric and parametric IRT, with some thoughts about the future Brian Junker Department of Statistics Carnegie Mellon University 232 Baker Hall Pittsburgh PA 15213 [email protected] May 1, 2000 On leave at Learning Research and Development Center, 3939 O’Hara Street, University of Pittsburgh, Pittsburgh PA 15260. Some of the work reported here was performed in the course of preparing a commissioned paper for the National Research Council Committee on Educational and Psychological Foundations of Assessment, United States National Academy of Sciences. Preparation of this paper was supported in part by NSF grants #SES-99.07447 and #DMS-97.05032, and the hospitality of the Learning Research and Development Center. Thanks to Anne Boomsma, Klaas Sijtsma, Lou Mariano, Matt Johnson, and L. Andries van der Ark for their comments and suggestions.

Some topics in nonparametric and parametric IRT, with some ...stat.cmu.edu/tr/tr710/tr710.pdfSome topics in nonparametric and parametric IRT, with some thoughts about the future Brian

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Some topics in nonparametric and parametric IRT, with some ...stat.cmu.edu/tr/tr710/tr710.pdfSome topics in nonparametric and parametric IRT, with some thoughts about the future Brian

Sometopicsin nonparametricandparametricIRT, with somethoughtsaboutthefuture

BrianJunker�

Departmentof StatisticsCarnegie Mellon University

232Baker HallPittsburghPA 15213

[email protected]

May 1, 2000

�OnleaveatLearningResearchandDevelopmentCenter, 3939O’HaraStreet,Universityof Pittsburgh,Pittsburgh

PA 15260. Someof thework reportedherewasperformedin thecourseof preparinga commissionedpaperfor theNationalResearchCouncilCommitteeon EducationalandPsychologicalFoundationsof Assessment,UnitedStatesNationalAcademyof Sciences.Preparationof this paperwassupportedin part by NSF grants#SES-99.07447and#DMS-97.05032,andthehospitalityof theLearningResearchandDevelopmentCenter. Thanksto AnneBoomsma,KlaasSijtsma,Lou Mariano,Matt Johnson,andL. AndriesvanderArk for their commentsandsuggestions.

Page 2: Some topics in nonparametric and parametric IRT, with some ...stat.cmu.edu/tr/tr710/tr710.pdfSome topics in nonparametric and parametric IRT, with some thoughts about the future Brian

NonparametricandParametricIRT, andtheFuture 1

Abstract

Item responsetheory (IRT) hasarguablybeenoneof the mostsuccessfulandwidely usedstatisticalmodelingtechniquesin psychometrics,with applicationsin developmental,social,educationalandcognitivepsychologyfor example,aswell asin medicalresearch,demographyandothersocialsciencesettings.Inthis paperI will try to briefly summarizesomeof the importantresearchin nonparametricandparametricIRT today. I will try to show that a broadunderstandingof IRT asan instanceof mathematicalstatisticsin theserviceof substantive psychology, togetherwith anappreciationof someof thecurrentmeasurementchallengesin educationandcognitive psychology, leadusto assessmentmodelsthatdonot look verymuchlike today’s IRT models,but for whichthetoolsandconceptualframework of nonparametricandparametricIRT arestill quitewell suited.

1 Introduction

In introducingSusanEmbretson’s 1999PresidentialAddressat the EuropeanMeetingof the Psychomet-ric Societyin Luneburg Germany, Ivo Molenaardefinedpsychometricsas“mathematicalstatisticsin theserviceof substantive psychology”:thatdefinitioncutsa prettywide swath,andindicatesjust how generalthe psychometricenterprisecanandshouldbe. Item responsetheory(IRT; e.g.,FischerMolenaar, 1995;Van der Linden Hambleton,1997) is a psychometricapproachto modelingdatafrom socialsurveys andeducationalandpsychologicaltests,datingbackat leastto Lord (1952)andRasch(1960),andto theworkof LoevingerandGuttmanbeforethem. IRT enablesus to studythecharacteristicsof testor survey itemsacrossmultiple respondentpopulations,andto studyrespondents’propensitiesto answerpositively acrossvariousitems.IRT hasarguablybeenoneof themostsuccessfulandwidely usedtechniquesin psychomet-rics,with applicationsin developmental,social,educationalandcognitive psychologyfor example,aswellasin medicalresearch,demographyandothersocialsciencesettings.

In this paperI will try to briefly summarizesomeof theimportantresearchin nonparametricandpara-metricIRT today, emphasizingtheinterplaybetweenparametricandnonparametricmodelsthatis thehall-markof theapproachinitiated in theNetherlandsby Mokkenandpursuedby Molenaar, Sijtsma,andtheircolleagues,andre-ignitedin theU.S.by Holland,Rosenbaum,andtheir colleagues.I will try to show thata broadunderstandingof IRT asan instanceof “mathematicalstatisticsin the serviceof substantive psy-chology”, togetherwith an appreciationof someof the currentmeasurementchallengesin educationandcognitivepsychology, leadusto assessmentmodelsthatdonot look verymuchlike today’s IRT models,butfor which the tools andconceptualframework of nonparametricandparametricIRT areparticularlywellsuited.

2 Nonparametric IRT: Scale Construction

To save spaceandpreserve focus,my summaryof nonparametricIRT will concentrateon thescalingthe-ory techniquesintroducedby Mokken andpursuedby Molenaar, Sijtsma,andtheir colleagues.Importantrelatedwork on nonparametricessentialunidimensionality(e.g.,Stout1987,1990;ZhangStout,1996;andStout,Habing,Douglas,Kim, RoussosZhang,1996),nonparametricregressionestimatesof item response

Page 3: Some topics in nonparametric and parametric IRT, with some ...stat.cmu.edu/tr/tr710/tr710.pdfSome topics in nonparametric and parametric IRT, with some thoughts about the future Brian

NonparametricandParametricIRT, andtheFuture 2

functionsandtestresponsesurfaces(especiallyRamsay, 1991,1995,1996),andrelatedparametricandnon-parametricwork (e.g.,Meijer, 1996;Cliff Donoghue,1992;Drasgow, Levine,Tsien,Williams Mead,1995;Samejima,1997)will notbeconsideredin detail.

2.1 Monotone Homogeneity / Strict Unidimensionality

TheMokken(1971)modelof monotonehomogeneitystartswith veryfew assumptions.A collectionof items� ����� ������ � ��� , which may includedichotomous,orderedpolytomous,or even continuousresponseitems,satisfiesthismodel,if � , thelatenttrait, is a real-valuedrandomvariable(unidimensionality);if eachitem stepresponsefunction(ISRF) � � ��������� ��� is non-decreasingin � for eachitem responsevariable

���andeachrealthreshold

�(monotonicity);andif

��� � � ��� ������ � ������ � ��� � �!�#" � ��� ���$���%�&� ��� (1)

for all possiblecutoffs�%�

(local independence).Whentheresponseis dichotomous(0/1) I referto � ��� � �('��� ���)�+*&� ��� astheitem responsefunction(IRF). In thatcase,equation(1) reducesto thefamiliar form

��� � � ��, �-����.� � �/��,0� � � �!�1" � � �&� � �3254 � *76 � �8� � � � �%9 254 (2)

Thedatawe observe whenexamineesor subjectsrespondto testor survey itemscanbe thoughtof asi.i.d. samplesfrom themarginal discretemultivariatedistribution,��� � � �:� �����;� � ������ � �=< � � � � ��� �����;� � ������0� ���;>&? � � � � (3)

wheretheintegrandcomesfrom eitherequation(1) or equation(2), and >&? � � � representsthedistribution of� in thepopulationof interest.Theassumptionsof unidimensionality, monotonicityandlocal independencecanberelaxed in variousways(e.g.,Sections2.2,3.2 and4.1.3below), but this basicmodelhasbeenthefoundationof muchnonparametricscaleconstruction.

2.1.1 Nonparametric scale construction

For dichotomousitems(���@�BA

or 1 indicatingincorrector correctanswer)a theoryof scaleconstruction—selectinggroupsof items that hangtogetherwell in the sensethat the monotonehomogeneitymodel isprobablyappropriatefor them—hasexistedat leastsinceMokken (1971;1997;seealsoMolenaar, 1991;1997). Theprincipal toolsof that theoryareadaptationsof Loevinger’s (1948) C coefficients,comparingthe marginal covarianceCov

���/D � � � � of eachitem pair with the maximumcovarianceCov E(F 2 ���/D � � � �possible,preservingthemarginsof theobserved

� DHG ���table. TheboundCov E(F 2 ��� D � ���.� is obtained

by adjustingthetableto removeGuttmanerrors(e.g.,Molenaar, 1991);andindeedtheoriginal formulasforthe C coefficientswereexpressedasratiosof Guttmanerrors(Mokken,1997).

A relatedmodelingcondition for dichotomousitems is invariant item ordering which saysthat forevery item pair I and J , either � D � � �LK � �&� � � or the reverseinequality is maintaineduniformly for all � .

Page 4: Some topics in nonparametric and parametric IRT, with some ...stat.cmu.edu/tr/tr710/tr710.pdfSome topics in nonparametric and parametric IRT, with some thoughts about the future Brian

NonparametricandParametricIRT, andtheFuture 3

Rosenbaum(1987a,b)andSijtsmaandJunker (1996,1997)exploreandextendthis ideaandprovide scaleconstructionexamples.Mokken’s doublemonotonicitymodelincorporatesboththemonotonehomogeneityandinvariantitemorderingassumptions.

The C coefficients aredirectly sensitive only to high or low correlationsbetweenitems, ratherthanto local independencegiven � as in equations(1) and (2). If the correlationsarenearzero, we may beunsatisfiedto assumethatsucha � exists(seefor examplethediscussionof co-monotonicityin Junker Ellis,1997). While a perfectGuttmanscalewould produceC coefficients equalto one, large C coefficientsprovide only indirect evidenceof a � “explaining” covariation in the item responsesin the senseof localindependence.

More direct attackson the problemof establishingsucha � from dataanalysishave beenpursuedbyStout,Ramsayandtheir studentsandcolleagues.Stout (1990; andsubsequentwork, for exampleStout,Habing,Douglas,Kim, RoussosZhang,1996)basicallyconstructsa proxy for � from the total scoreona specially-selectedsubsetof the itemsandusesit to testa weakenedversionof monotonehomogeneity,Stout’s essentialunidimensionalitymodel. MolenaarandStout (personalcommunication)arecurrentlyworking to find commongroundbetweenthesetwo approaches.Ramsay(1991)constructsnonparametricregressionestimatesof item(step)responsefunctionsusingtotalscoreor restscore(seeSection2.1.2belowfor definitions)asa proxy for � , which allows oneto explorenon-monotonicity. Ramsay(1995)constructsnonparametricregressionestimatesof thejoint responsesurfaceof all itemsonthetest,usinganappropriateproximity measureto determinewhichresponsepatternsare“closetogether”,whichenablestheexplorationof bothnon-monotonicityandlackof unidimensionality. Stout’s andRamsay’s methodsaregenerallymorecomputationallycomplex, andseemto requirelargerexamineeanditemsamplesizes,thanthemethodsini-tiatedby Mokkenanddevelopedby Molenaar, Sijtsma,Rosenbaumandtheircolleaguesandstudents.ThustheMokkentechniqueshave beenmorewidely usedin smallersocialsurvey andexperimentalpsychologysettings.

ExtendingMokken scalingand invariant item orderingmethodsto the caseof polytomousitem re-sponseshasbeenmorerecent. Molenaar(1991)first observed that a direct generalizationof the C coef-ficientsbasedon covariancesmadesense,andprovided an efficient computationalmethodfor obtainingCov E(F 2 ��� D � ���.� in thepolytomouscase.This providesthebasisfor Mokken-styleexplorationfor mono-tonehomogeneityin polytomousitems,as illustratedby Hemker, SijtsmaandMolenaar(1995). Gener-alizing invariant item orderingto the polytomouscaseturns out to be somewhat delicate. For example,Molenaar’s (1997)doublemonotonicitymodelrequires��� ���M�N� � � ��� K � � �LOP�Q�-R&� ��� , or the oppositeinequality, to hold uniformly in � , for each J , S ,

� � and�-R

. But this model fails to maintain inequali-ties like T � ���8� �U� K T � �LO�� ��� uniformly in � , which area naturalgeneralizationof the invariant item or-deringconditiongiven above for dichotomousitems. Scheiblechner’s (1995) ISOPmodel requiresthat��� ���/�B��� ��� K ��� �LO��B��� ��� uniformly in

�and � , but allows ISRF’s for items J and S at differentthresh-

olds� � and

�-Rto cross;this modeldoesmaintaintheorderof expecteditem scoresacross� . SeeSijtsma

andHemker (1998)for a completeaccount.Anotherapproachto understandingscalingby the Mokken model—indichotomous,polytomous,and

moregeneralsettings—wasinitiatedby Holland(1981),Rosenbaum(1984),andHollandandRosenbaum(1986;seealsoMeredith,1965),who moreor lessindependentlydevelopedits threefundamentalassump-tions underthe name“monotoneunidimensionallatentvariablemodel”, andcontinuedby Junker (1993),Ellis andJunker (1997)andJunker andEllis (1997)underthename“strict unidimensionalitymodel”. They

Page 5: Some topics in nonparametric and parametric IRT, with some ...stat.cmu.edu/tr/tr710/tr710.pdfSome topics in nonparametric and parametric IRT, with some thoughts about the future Brian

NonparametricandParametricIRT, andtheFuture 4

combineHollandandRosenbaum’s (1996)conditionalassociation(CA)conditionVpartitions

�W�X�ZY �-[ � � VL\ �^] non-decreasing_ V ` � [ � �Cov

� \ �ZYa� �b] �ZYc��� ` � [ �%�ed:A � (4)

with a vanishingconditionaldependenceconditionV^f � as gihkj � ��� �����.� � �l� becomeindependent,given

��� ��m � ����� � ��m n � � (5)

to obtainacompletecharacterizationof aninfinite-item-poolformulationof thebasicmonotonehomogene-ity model,in which � is bothgenuinelylatentandconsistentlyestimable,in termsof the joint distributionof observableitemresponses.

2.1.2 Stochastic ordering

A sideeffect of theeffort to understandhow to characterizeandtestthemonotonehomogeneitymodelhasbeena selectionof othermodeltestingcriteria,suchasJunker’s (1993) “manifestmonotonicity”propertyfor dichotomousitemsfollowing themonotonehomogeneitymodel,��� ���)�o*&� �Pp 9 �1qm �=r � is non-decreasingin

r (6)

This property, and examplesshowing that it doesnot hold when the “rest score”� p 9 �1qm �ks Dut"�� �/D

isreplacedby the totalscore

��mP�=s �D " � � D , areincludedin unpublishedwork of MolenaarandTomSnijders(Junker, 1993;Junker Sijtsma,2000).

Proving manifestmonotonicitydependson establishinga “stochasticordering” propertyfor � , giventhetotal score

� m[or equivalentlytherestscore

� p 9 �#qm]:��� � ����� ��mP�vr � is non-decreasingin

r,V � (7)

Hemker, Sijtsma,MolenaarandJunker (1996,1997)call this property “SOL” (StochasticOrderingof theLatenttrait by the sum score),and show that, surprisingly, this propertydoesnot generalizeto “most”nonparametricordered-polytomousresponseIRT models.Thusfor example,rulesbasedon cutoffs for

��mneednotbemostpowerful for “masterydecisions”in thesenseof � �:�

; ontheotherhand,suchcutoff rulesfor

� maremostpowerful for masterydecisionsin thenonparametricdichotomousresponsecase(Grayson,

1988;Huynh,1994).In the processof developing thesestochasticorderingideas,Hemker et al. (1997)andHemker and

Sijtsma(1999)have developeda taxonomyof nonparametricandparametricitem responsemodels,thatusefullycomplementsthe taxonomyof ThissenandSteinberg (1986). TheHemker taxonomyis basedonthecumulative,continuation-ratio,andadjacent-category logitsthatarecommonlyusedto defineparametricfamiliesof polytomousIRT models. Commonforms of gradedresponsemodels(GRM; Samejima,1997for example),sequentialmodels(SM; Tutz, 1990; Mellenbergh, 1995; Samejima,1969,1995),andpar-tial credit models(PCM; Masters1982)assume,respectively, that the logit functionslogit ��� ���P�w��� ��� ,logit � � ���a�x�U� � �x�y6z* � ��� , and logit ��� ���{�|�@}X*&� ���c~o�;� � ��}o*�� � ��� are linear in � . Hemker’s

Page 6: Some topics in nonparametric and parametric IRT, with some ...stat.cmu.edu/tr/tr710/tr710.pdfSome topics in nonparametric and parametric IRT, with some thoughts about the future Brian

NonparametricandParametricIRT, andtheFuture 5

analogousnonparametricmodelclasses,the np-GRM, np-SMand np-PCM,assumeonly thattheselogitsarenon-decreasingin � .

This taxonomyis a powerful way to organizeideasaboutmodeldefinitionandmodeldevelopmentinapplicationsof bothparametricandnonparametricIRT. It follows thatthenp-PCMclassis nestedwithin thenp-SMclass,whichis nestedwithin thenp-GRMclass;moreoverall threelinear-logit familiesabove(GRM,SM andPCM) arein fact subsetsof the np-PCMclass. As Hemker andSijtsma(1999)andVan der Ark(1999)show, thisapproachalsohighlightslinks betweenpolytomousIRT andthemachineryof generalizedlinearmodels(McCullaghNelder, 1989),just asit haslong beenrealizedthatparametricdichotomousIRTis basicallymultivariatemixed effects logistic regression(e.g.,DouglasQui, 1997; seealsoLee Nelder,1996). It is importantto realizehowever that of all of the modelsstudiedby Hemker andhis colleagues,only theparametricPCMof Masters(1982)andits specialcases,have thenicestochasticorderingpropertySOL(see7 above).

2.2 Some Interesting Questions

An ongoingquestionin this areais developingadequatedataanalysismethodology. Most of whatcannowbe done,in the dichotomousandpolytomouscases,is encodedin the computerprogramMSP (MolenaarSijtsma,1999).Sijtsma(1998)providesanexcellentsurvey of nonparametricIRT approachesto theanalysisof dichotomousitemscores;Molenaar(1997)andSijtsmaandVanderArk (thisvolume)survey extensionsto the polytomouscase. Snijders(this volume) introducesMokken scalingtools for multilevel dataaswell. Ellis (1994) hasre-examinedMokken’s hypothesistestingframework for the C coefficients, anddeveloped,in principle,new testsbasedon the theoryof order-restrictedinferenceof Robertson,WrightandDykstra(1988).Thesamemethodsmaybeusefulto developtestsof manifestmonotonicity. In additionto HollandandRosenbaum’s (1986)applicationsof theMantel-Haenzeltest,YuanandClarke (1999)havealsodevelopedasymptotictheoryfor testingtheconditionsof Junker (1993),thatshouldbeadaptableto theconditionsof Junker andEllis (1997). A moredirectapplicationof thetheoryof order-restrictedinferenceto testingCA andrelatedconditionswasgivenby BartolucciandForcina(in press).

In termsof themodelsthemselves,SijtsmaandVanderArk (this volume)discussesseveral interestingproblems,and currentprogress,relatedto the lack of SOL (7) in orderedpolytomousIRT modelsandto the sensitivity and specificity of the manifestmonotonicitycondition (6) for detecting(violations of)the monotonehomogeneitymodel. Oneof the strengthsof the nonparametricapproachto dichotomousIRT is that it usuallyassuresus, undervery generalcircumstances,that simplesummariesof the dataareinformative aboutinferenceswe wish to make, yet the currentevidencesuggeststhat thereare no suchsimplesummariesfor inferencesaboutordered-polytomousdata.Understandingtheimpactthatthishasonprogresswith theoryandapplicationsof polytomousIRT modelswill surelyentail facingandsolving theproblemsthatSijtsmaandVanderArk discuss.

Finally, someof themachinerydevelopedto characterizemonotonehomogeneitymodelsseemsreadyto applyto commonmodificationsof thisbasicmodel.For example,conditionalassociation(4) is basicallyanextremesharpeningof thewell-known fact that inter-item correlationsarenonnegative undermonotonehomogeneity. Post(1992;PostSnijders,1993)hasestablishedasimilar factaboutaclassof nonparametricprobabilisticunfoldingmodels:theinter-item correlationmatrixhasabandof positive correlationsnearthemaindiagonal,borderedby bandsof negativecorrelations.Is thereasharpeningof thePostresultanalogousto conditionalassociation?Could this be combinedwith the vanishingconditionaldependencecondition

Page 7: Some topics in nonparametric and parametric IRT, with some ...stat.cmu.edu/tr/tr710/tr710.pdfSome topics in nonparametric and parametric IRT, with some thoughts about the future Brian

NonparametricandParametricIRT, andtheFuture 6

of (5) to producea characterizationof Post’s models?Could sucha result follow from a “folding” of themonotonehomogeneitymodelto producenonparametricunfoldingmodels,alongthelinesof VerhelstandVerstralen(1993)or Andrich (1996)? In anotherdirection,muchof the work in Ellis andJunker (1997)andJunker andEllis (1997)doesnotdependon thelatentvariable� beingunidimensional.Junker andEllis(1997)for examplepoint out that their item-step“true scores”��� �������U� ������� � shouldform a manifoldofthesamedimensionastheunderlyinglatentvariable � . A characterizationtheoremmay againresult,if aweakeningof conditionalassociationto accommodatemultidimensional� couldbedeveloped.Suchtheo-remsarevery helpful in focusingour ideasaboutwhat theobservabledatafrom a monotonehomogeneitymodel,a probabilisticunfolding model,or a multidimensionalnonparametricIRT model,shouldbehavelike.

3 Parametric IRT: Modeling Dependence

ParametricIRT, assurveyedfor examplein theeditedvolumesof FischerandMolenaar(1995)andVanderLin-den and Hambleton(1997), is a well-established,wildly successfulstatisticalmodelingenterprise. IRTmodelshave greatlyextendedthedataanalyticreachof psychometricians,socialscientists,andeducationalmeasurementspecialists.My summaryof parametricIRT will be even lesscomplete,relative to the vastparametricIRT literature,thanmy summaryof nonparametricIRT.

A basicandfamiliar model in this areais the “two-parameterlogistic”, or 2PL, model for dichoto-mousitemresponsevariables(e.g.,Chapter1 of VanderLindenHambleton,1997),givenby themonotonehomogeneityassumptionsin Section2 andtheassumptionof a logistic form for theitemresponsefunctions� �&� � D _5� � �%� ����' ��� � D �@�o*&� � D � � � �%� � � � **(}��-�l�^�b6 � � � � D 6 � � � � � (8)

describingthedichotomousresponseof examineeI to item J . The“discrimination” parameter� � controlstherateof increaseof this logistic curve, andis directly relatedto theFisherinformationfor estimating� ,andthe“dif ficulty” parameter� � is thelocationonthe � scaleatwhichtheinformationis maximal;notealsothatat � D(� � � , � � �/D � ��* � �x*;���

. The3PL (three-parameterlogistic) modelextendsthe2PL modelbyaddinganon-zerolowerasymptoteto eachitemresponsefunction;ontheotherhandtheRaschor 1PL(one-parameterlogistic) modelis a restrictionof the2PL modelobtainedby setting � � identicallyequalto someconstant,usually1. SuchparametricIRT models,extendedby hierarchicalmixture/Bayesianmodelingandestimationstrategies,make it possiblein principle andin practiceto incorporatecovariatesandotherstructure. Many violationsof the basiclocal independenceassumptionof IRT modelsare in fact due tounmodeledheterogeneityof subjectsanditems,thatcannow beexplicitly modeledusingthesemethods.

The main purposeof this sectionis to introducea generalmodelingframework andhighlight a fewdevelopmentsin parametricIRT, someold andsomenew, thatwill berelevantto my discussionof applyingIRT andrelatedmodelsto cognitive assessmentproblemsin Section4 below.

3.1 Two-Way Hierarchical Structure

Theestimationof groupeffectsandtheuseof examineeanditem covariatesin estimatingitem parametersplays an importantrole in the analysisof large multi-site educationalassessmentssuchas the National

Page 8: Some topics in nonparametric and parametric IRT, with some ...stat.cmu.edu/tr/tr710/tr710.pdfSome topics in nonparametric and parametric IRT, with some thoughts about the future Brian

NonparametricandParametricIRT, andtheFuture 7

Assessmentof EducationalProgress(NAEP; e.g.,Algina, 1992;Johnson,Mislevy andThomas,1994;andZwick 1992).Theseefforts,which go backat leastto Mislevy (1985;seealsoMislevy Sheehan,1989)canbe recognizedas the weddingof hierarchicallinear or multi-level modelingmethodologywith standarddichotomousandpolytomousIRT models. The generalmodel is a two-way hierarchicalstructurefor �individualsand

fresponsevariables,asfollows

First level:� D � � � � � D _�� � �3��� 4 � *H6 � � � D _�� � � � p �%9 ��� 4 q �I �o* ����;� �c_0J �+* ����;� f

Secondlevel: � D � \ D � � ������� � eachI� � � ] ��� � ������� � eachJThird level:

��� � ����� ���&����¡� ���&� ���.�

¢ ££££££££££££¤££££££££££££¥(9)

where� D is theusualpersonparameter, � � is thevectorof itemparametersfor item J (e.g.,� � �o� � � �%� �.� inthe2PL modelabove), andwhereindependenceis assumedbetweenJ ’s conditionalon � D at thefirst levelandbetweenI ’satthesecondlevel. Theterms

���and

���representsetsof hyperparametersneededto specify

thepersondistributions\ D

anditem distributions ] � , with hyperpriordistributions���

and���

, respectively.Termsin the first level, for example,aremultiplied togetherto producethe usualjoint likelihoodfor the� G f

item responsematrix � � D � � ; the secondandthird levels canbe usedto imposeconstraintson thefirst level parametersand latentvariable,to deducewhat integrationsareneededfor marginal likelihoodapproaches,etc. The model(9) is expressedfor dichotomousitems,for simplicity of exposition,but caneasilybe generalizedto polytomousitems,or combinationsof item types(seefor examplePatz Junker,1999a;1999b).It is alsousualto assumefor

\ D � � � a singlelatenttrait distribution not dependingon I , andsimilarly for ] � .

Recentadvancesrelaxtheseassumptions,allowing for muchmoreflexible parametricmodelingof itemresponsedata.We mayallow thedistribution of � to dependhierarchicallyon examineecovariates,that is,insteadof taking

\ D ��¦§�to bea singlelatenttrait distribution in (9), we canallow it to dependon examinee

covariatesto modelpopulationheterogeneity, asin themulti-groupIRT modelsof Mislevy (1985)andBockandZimowski (1997),or to reflecthierarchicallinear structureas in Fox andGlas(1998). We may alsoelaborate] � ��¦§� , for exampleby building linearstructureinto the item parameters.For examplein the2PLmodel,where� � �o� � � �%� ��� , we might take¨©©©©©©ª � �� R

...� � 9 �� �«­¬¬¬¬¬¬® �°¯ ¨©©©©ªc± �± R...±�²

«­¬¬¬¬® � (10)

where¯

is an appropriatedesignmatrix of full columnrank, to reflect commonsources( ± O ’s) of itemdifficulty ( � � ’s)acrossitems.In thecaseof Rasch(1PL) IRF’s, this is thelinearlogistic testmodel(LLTM;Scheiblechner, 1972;Fischer, 1973).This modelandits variousgeneralizations(e.g.,GlasVerhelst,1989;

Page 9: Some topics in nonparametric and parametric IRT, with some ...stat.cmu.edu/tr/tr710/tr710.pdfSome topics in nonparametric and parametric IRT, with some thoughts about the future Brian

NonparametricandParametricIRT, andtheFuture 8

PatzJunker, 1999b)continuesto beusedfor psychologicalexperimentswith multiple outcomespersubject(e.g., Fischerand Molenaar, 1995 and the referencestherein)and for researchin cognitively-motivatedtestdesign(Embretson,1995; 1999). Thereis no reasonto restrictattentionto the � ’s, andfor exampleEmbretson(1999)hasexploredasimilardecompositionof the � ’s in a2PLmodel.

As BockandZimowski (1997)pointout,thesegeneralizationsof thebasicIRT modelbothsimplify andunify parametricapproachesto many thorny testanalysisquestions,includingdifferentialitem functioninganditemparameterdrift, nonequivalentgroupsandverticalequating,two-stagetestingandmatrix-samplededucationalassessmentsurvey work, etc. The computerprogramsConQuest(Wu, AdamsWilson, 1997)and BILOG-MG (Zimowski, Muraki, Mislevy Bock, 1997) provide fairly generalE-M basedsolutionswhen the underlyingIRT model is the 1PL (ConQuest),or 2PL or 3PL (BILOG-MG). Fox Glas (1998)andPatz andJunker (1999a;1999b)give two differentMarkov chainMonte Carlo (MCMC) approachesto the problem. Generalizationto polytomousitems,facets-styleratedresponsemodels,andmixturesofitem typesareconceptually, andoftencomputationally, straightforward; seefor exampleGlasandVerhelst(1989)andPatzandJunker (1999a;1999b).

3.2 Some Multidimensional Models

Researchin multidimensionalIRT modelshasconcentratedon additive andconjunctive combinationsofmultiple traits to produceprobabilitiesof response.Additive models,known as compensatorymodelsinmuchof theliterature,replacetheunidimensionallatenttrait � with anitem-specific,known (e.g.,Embret-son,1991;Stegelmann,1983;KeldermanandRijkes,1994;andAdams,Wilson Wang,1997)or unknown(e.g.,Reckase,1985;Wilson,WoodGibbons,1983;FraserMacDonald,1988;Muraki Carlson,1995)linearcombinationof components³ � � � � }z¦�¦�¦&} ³ �5´ � ´ of a > -dimensionallatenttrait vector, for examplein thedichotomousresponsecase��� ���@�+*&� � �����;� � ´ � � � � ³ � � � � }�¦�¦�¦�} ³ �#´ � ´ 6 � �.� �where � ��µ¶� might be the logistic or probit responsefunction for example. BeguinandGlas(1998)sur-vey the areawell (seealsoseveral contributed chaptersin Van der Linden Hamilton, 1997)andgive anMCMC algorithmfor estimatingthesemodels;GibbonsandHedeker (1997)pursuerelateddevelopmentsin biostatisticalandpsychiatricapplications.

Conjunctive modelsareoftenreferredto asnoncompensatoryor componentialmodelsin theliterature.Thesemodels(e.g.,Embretson,1985,1997)combineunidimensionalmodelsfor componentsof responseconjunctively, sothat ��� ���)�+*&� � �����;� � ´ � � ´!· " � � � · � � · �where � � · � � · � areparametricunidimensionaldichotomousresponsefunctions. Theusualinterpretationisthat the � � · � � · � representskills or subtasksall of which mustbeperformedcorrectlyin orderto generateacorrectresponseto theitem itself. JanssenandDeBoeck(1997)givea recentapplication.

Compensatorystructuresareattractive becauseof their conceptualsimilarity to factoranalysismodels.They have beenvery successfulin aiding the understandingof how studentresponsescanbe sensitive tomajor contentandskill componentsof items,andin aidingparalleltestconstructionwhentheunderlying

Page 10: Some topics in nonparametric and parametric IRT, with some ...stat.cmu.edu/tr/tr710/tr710.pdfSome topics in nonparametric and parametric IRT, with some thoughts about the future Brian

NonparametricandParametricIRT, andtheFuture 9

responsebehavior is multidimensional(e.g.,Ackerman,1994).Noncompensatorymodelsarelargely moti-vatedfrom adesireto modelcognitiveaspectsof itemresponse,a topic to whichwewill returnin Section4.Embretson(1997)alsoreviews blendsof thesetwo approaches(hergeneralcomponentlatenttrait models;GLTM).

3.3 Models That Accommodate Extra Behavioral Features of Assessment

In additionto providing a way to modeldependenceof item responseson specificexamineeanditem co-variates,thehierarchicalor multi-level approachto IRT alsoallows usto modelextrabehavioral featuresofassessment.This is a largely unexploredarea,but onewhich is worth furtherstudy, sincecurrentmodelsandmethodslargely ignorethesefeatures,relegatingthemto the“error distribution” of themodel.

Oneexampleof thissortof work involvesrecentefforts to moreelaboratelymodelthebehavior of ratersin rateditem responsedata.Whenonly oneraterrateseachitem, it maybesufficient to treateachratingasa different,locally independentpseudo-item—sothat thefirst level in (9) containsonefactorfor eachraterG

itemG

examineecombination—andto modeltheratereffectasa linearinfluenceon theitem’s difficultyparameter� � . Mathematicallythis is equivalentto theLLTM modelsketchedabove, but it hascometo beknown in thissettingasthe“Facetsmodel” (Linacre,1989;Engelhard,1994,1996).

For bothformativeandsummativeevaluationof raters,anumberof multiple-readratingdesignsarenowcommonplace(Wilson Hoskens,1999),includingdesignswith asmany assix ratersper item (e.g.,SykesHeidorn,1999).Thuseachexamineeperformanceis measuredseveralcorrelatedbut fallible times.JunkerandPatz (1998)showed that theusualFacetsmodelformulationin which the likelihoodis theproductofLLTM-style factorsfor eachratingof eachitem,accumulatesinformationabout� toooptimistically, sothatwith only oneitem response,in thelimit asthenumberof ratersgrows, thestandarderrorfor estimatingorpredicting � apparentlygoesto zero,contradictingthenotion(e.g.,Junker, 1993)that thenumberof itemsshouldtendto infinity in orderto make theerrorof estimationof � vanishinglysmall. Instead,modelsareneededthat appropriatelyaccumulateinformationfrom multiple ratingsto the singleitem responsebeingrated,andthenaccumulateinformationacrossitemresponsesto learnabout� itself.

Wilson andHoskens’ (1999)raterbundlemodel (RBM) attacksthis problemby replacingthe Facetsproductacrossratersfor eachitem in level oneof (9) with a log-linearmodelthatmodelsthedependencebetweenratingsof the item, conditionalon � . Their approachseemsvery usefulfor, e.g.,modeling“tableeffects”andotherraterdependencephenomenathatfollow whenratersareallowedor encouragedto discussratingsamongstthemselvesto increaseratingqualityandinter-raterreliability.

Patz,JunkerandJohnson’s (1999)hierarchicalratermodel(HRM) providesanalternative approachthatpositsa“latent rating” ¸ D � (notunlikeMaris’, 1995,notionof latentresponses;alsonotetheconnectionwiththedataaugmentationmethodin Bayesiancomputation,e.g.,Tanner, 1996)thatfollows aconventionalIRTmodel,suchas ��� ¸ D �@�+*&� � D � � � �%� � � � **¹}��-�l�^�b6 � � � � D 6 � � � �(or apolytomousvariant),anda “signaldetectionmodel” for eachrater º ratingthatitemresponsesuchas��� � D �¶»H�+*&� ¸ D � � �+�b*H6½¼�¾ � »;�3¿ � 4 ¼ �%9 ¿ � 4� ¾b» �where

¼ �%� » is theprobability that rater º ratesa responsewith latentrating ¸ D �L�À*asa 0, and

¼ � ¾b» is theprobability that rater º ratesa responsewith latent rating ¸ D �P�ÁA

as a 1. The factors ��� � D �¶»��Â*&� ¸ D � �

Page 11: Some topics in nonparametric and parametric IRT, with some ...stat.cmu.edu/tr/tr710/tr710.pdfSome topics in nonparametric and parametric IRT, with some thoughts about the future Brian

NonparametricandParametricIRT, andtheFuture 10

now appearat level oneof the hierarchy(9), andthe factors ��� ¸ D ���Ã*&� � D � � � �%� � � effectively form a newlevel betweenlevelsoneandtwo of (9). This producesa modelthatbehaves,for multiple discreteratings,muchasastandardgeneralizabilitytheorymodelwouldbehavefor multiplecontinuousratings(seeVerhelstVerstralen,1999,for asimilar development).In additionto providing anappropriateway to combineinfor-mationfrom multiple ratersto learnaboutstudentperformance,the HRM makespossiblecalibrationandmonitoringof individual raterbehavioral effectssuchas,in thecaseof polytomousresponses,raterseverityandraterprecision.

3.4 Some Current Questions

Almost any assessmentphenomenon—frombetween-examineedependencedueto institutionalor socio-logical factors,to behavioral aspectsof raters,to the analysisof item responsesinto requisiteexamineeskills or item features—canbeexpressedin thehierarchicalmixture/Bayesmodelingframework, becauseof its conceptualsimplicity. Recentadvancesin computation,andMCMC methodsin particular, havemadeit possibleto estimatea vastly wider variety of thesemodelsthanwould have beenimaginableeven tenyearsago.Questionsmotivatingthiswork inevitably involve identifyingphenomenathatareworthdetailedparametricmodeling,andseeingif thecomputationalmachinerycanbepushedto estimatemodelsof thesephenomena.Recentexamplesincludemultidimensional(GibbonsHedeker, 1992)andhierarchical(Brad-low, WainerWang,1999)modelingof testlets;blendingIRT andhierarchicallinearmodels,andbehavioralmodelssuchastheratermodelsdescribedabove.

Speedingup the computationswith approximations(including formal and informal applicationsofLaplace’s methodsuchas Rigdon Tsutakawa, 1983 and Kass,Tierney Kadane,1990; blendsof MonteCarlo andE-M approachesassurveyed in Tanner, 1996; andvariationalmethods,e.g.,Jaakkola Jordan,1999)continuesto beanessentialandfruitful avenueof research.

An avenuethat hasnot beenexploredasmuch in IRT work is choosingmodelsfor which sufficientstatisticsaresimpleandinterpretable.Themostfamiliar “basicmodel” of this sort is of coursetheRaschmodel,but aswe shall seein thenext sectionsomecurrentcognitively-motivatedmeasurementproblemsmayrequirea new “basicmodel”.

4 Measurement Challenges Posed by Cognitive and Embedded Assessments

In recentyears,ascognitive theoriesof learningand instructionhave becomericher, andcomputationalmethodsto supportassessmenthavebecomemorepowerful, therehasbeenincreasingpressureto makeas-sessmentstruly criterionreferenced,thatis, to “report” onstudentachievementrelative to theory-drivenlistsof examineeskills, beliefsandothercognitive featuresneededto performtasksin a particularassessmentdomain. For exampleBaxterandGlaser(1998)andNichols andSugrue(1999)presentcompellingcasesthatassessingexaminees’cognitive characteristicscanandshouldbethe focusof assessmentdesign.In asimilar vein,ResnickandResnick(1992)advocatestandards-referenced or criterion-referenced assessmentcloselytied to curriculum,asaway to inform instructionandenhancestudentlearning.

Appropriatecriterion-referencedtestingcanalsobeaneffectiveteachingtoolwhenembeddeddirectlyinteachingpractice.Indeedthereis substantialargumentandevidence,assummarizedfor exampleby Bloom(1984),thatpartof whatdistinguisheshigherstudentachievementin “masterylearning”andindividualized

Page 12: Some topics in nonparametric and parametric IRT, with some ...stat.cmu.edu/tr/tr710/tr710.pdfSome topics in nonparametric and parametric IRT, with some thoughts about the future Brian

NonparametricandParametricIRT, andtheFuture 11

tutoringsettingsasopposedto theconventionalclassroom,is theuseof frequentandrelatively unobtrusiveformative testscoupledwith feedbackfor the studentsandcorrective interventionsby the instructor, andfollow-up teststo determinehow muchthe interventionshelped.This approachcontinuesto beadvocatedaspartof anaturalandeffectiveapprenticeshipstyleof humaninstruction(e.g.,Gardner, 1992),andit is thebasisof many computer-basedintelligent tutoringsystems(ITS’s, e.g.,Anderson,1993;andmorebroadlyShutePsotka,1996).Heretoo,a decompositionof assessmentitemsinto appropriatecognitive attributesisimportant:feedbackand/orcorrective actionin a masteryclassor from anITS dependson knowing whichcognitive attributestheexamineehasandhasnotmastered.

Cognitive assessmentmodelsmust generallydeal with a more complex goal than linearly orderingexaminees,or partially orderingthemin a low-dimensionalEuclideanspace,which is what IRT hasbeendesignedandoptimizedto do. The goal of cognitive assessmentcanbe thoughtof producing,for eachexaminee,achecklistof skills or othercognitive attributesthattheexamineemayor maynotpossess,basedon the evidenceof tasksperformedby the examinee. The checklistof cognitive featuresin a cognitiveassessmentgenerallycomesfrom ananalysisof thecognitiveattributesneededto successfullyperformeachtaskin adomainof interest.For aparticularsetof tasks,thisanalysiscanbeencodedin anincidencematrix,the

¯-matrix (e.g.,Tatsuoka,1990,1995),whichwewill write asa

f G�Ämatrix

¯°� � ¯ � O � of 0’sand1’swith entries ¯ � O��QÅ * � if attribute S is requiredby task JA � if not

(11)

While the¯

-matrixdoesnotcaptureall of thestructurewemaybeinterestedin (¯

treatstheskills in aflat,non-time-orderedmanner, andtheremaybebothhierarchicalandtime-orderstructurein theskills astheyareappliedto a task),it is a usefulbookkeepingdevice.

Many attempts(e.g.,assurveyedby Roussos,1994)to blendIRT andcognitive measurementarebasedon a linear decompositionof item parameters,asin theLLTM, or on a linear decompositionof the latenttrait � , asin themultidimensionalcompensatoryIRT models.CompensatoryIRT models,like factoranaly-sismodels,canbesensitive to relatively largecomponentsof variationin examineeability or propensitytoansweritemscorrectly, but they aregenerallynotsensitive to thefinercomponentsof variationthatareoftenof interestin cognitive assessment.Of course,whethercognitive assessmentdataactuallysupportsmodelsthat track this finer level of variationis an empiricalmatter. LLTM-style modelscanbesensitive to thesefiner componentsof variationamongitemsbut arenot at all sensitive to componentsof variationamongexaminees. Noncompensatoryapproaches(e.g.,Embretson,1985,1997)areintendedto besensitive to finervariationsamongexaminees,in situationsin which several cognitive componentsarerequiredsimultane-ously for successfultaskperformance;but they may be attemptingto estimatemore(per-skill latent traitvaluesandskill-responseparameters)thanassessmentdataof thetypewe will considercanbeexpectedtosupport.Thesemodelsarealsorelatedto theprobabilitymatrix decompositionmodelsof Maris,De BoeckandVanMechelen(1996).

4.1 Three Approaches to Cognitive Assessment

To illustratethedifferencesbetweentraditionalIRT approachesandcognitively-motivatedapproachesto as-sessment,considertwo publishedmodelsintendedto dealwith essentiallythesamedata:taskperformanceby studentslearningtheLISP programminglanguageusingoneof thecomputerbasedintelligent tutoring

Page 13: Some topics in nonparametric and parametric IRT, with some ...stat.cmu.edu/tr/tr710/tr710.pdfSome topics in nonparametric and parametric IRT, with some thoughts about the future Brian

NonparametricandParametricIRT, andtheFuture 12

systemsdevelopedby JohnR. Andersonandhis colleaguesat Carnegie Mellon University(e.g.,Anderson,Corbett,KoedingerPelletier, 1995). Thefirst modelis theassessmentmodelactuallyembeddedin thetu-toring software,asdescribedby Corbett,AndersonandO’Brien (1995);thesecondis anIRT-basedmodelfor essentiallythesamedata,aspresentedby Draney, Pirolli andWilson (1995).ThenI will presenta thirdmodelto illustratethat,althoughthemodelingtraditionsin IRT arenotmuchlike thosein cognitive assess-mentmodeling,the fundamentaltechniquesandconceptsfrom IRT modelingarequiteusefulin cognitiveassessment.

4.1.1 The Corbett/Anderson/O’Brien model

For the“knowledgetracingmodel”embeddedin theLISPtutorsoftware(Corbett,AndersonO’Brien,1995)for thisdata,define� D �8��Æ^�Ç� *

orA

indicatingwhetheror not studentI performedtask J correctly attime

Ư � O � *orA

indicatingwhetheror not task Jrequiresskill S� D Ol��Æ^�Ç� *

orA

indicatingwhetheror not studentI possessesskill S at timeƸ D ����Æ^�Ç� È O.É-Ê 4�Ë " � � D OÌ��Æ^� indicatingwhetheror not studentI hastheskills neededfor task J

at timeÆrÍ� � � �/D � ��Æ^�Î�BA�� ¸ D � ��Æ^���o* � auniversalslip parameter] � � � � D �8��Æ^�Î�+*&� ¸ D �&��Æ^���vA � auniversalguessingparameter

Themodelof taskperformanceembeddedin theknowledgetracingmodelis essentially��� � D �8��Æ^���+* � � ��� ¸ D �8��Æ^���Ï* � �b*76ar�� }v�b*H6 ��� ¸ D �&��Æ^�Î�+* � � ]�� (12)

where��� ¸ D �&��Æ^�Î�+* � follows asimpleconjunctive model(thoughmorecomplex expressionsalongthelinesof Mislevy, 1996,couldbeimagined):��� ¸ D � ��Æ^�Î�+* � � ²!O " � � ��� D Ol��Æ^�ed:¯ � O � (13)

It is worth notingthatthe ¸ D ����Æ^� ’s [or the � D Ol��Æ^� ’s] canbeinterpretedasplayingtherole of Maris’s (1995)latent responses(they can also be interpretedin termsof the methodof dataaugmentationin Bayesiancomputation,e.g.,Tanner, 1996).

It is the probabilities � ��� D O���Æ^�Ldw¯ � O � , treating � D O8��Æ^� asthe unknown or randomquantity, that areactuallyof primaryinterestin assessmentbasedon thismodelof taskperformance.Givencurrentestimatesof ����� D O���Æ^�$dQ¯ � O � , the tutor canboth identify which skills needadditionalpractice,andselectitemsofsuitabledifficulty thatexercisethoseskills, to assignto thestudentnext.

Corbett,Andersonand O’Brien (1995) were particularly interestedin how to gatherevidenceabout����� D OÌ��Æ^� dw¯ � O � asthe numberof opportunitiesÆ

to apply rule S increases—i.e.they areinterestedin

Page 14: Some topics in nonparametric and parametric IRT, with some ...stat.cmu.edu/tr/tr710/tr710.pdfSome topics in nonparametric and parametric IRT, with some thoughts about the future Brian

NonparametricandParametricIRT, andtheFuture 13

modelinglearning.To accountfor theorderin which correctandincorrectactionsareobservedwhenskillS is calledfor, they suggesttreating � D Ol��Æ^� asahiddenMarkov chainwith anabsorbingstate,����� D OÌ��Æ^�Ð� � D OÌ��ÆÑ6�*;� � � Ò����� D OÌ��Æ^�Ð� � D Ol��ÆÑ6�*;� � � *��� � D OÌ��Æ^�Ð� � D OÌ��ÆÑ6�*;� � � *76MÒ��� � D O8��Æ^�Ð� � D Ol��ÆÑ6�*;� � � A � (14)

where“ � D Ol��Æ^� ” standsfor “ � D OÌ��Æ^�¹�+*” and“ � D OÌ��Æ^� ” standsfor “ � D Ol��Æ^���BA

”.Thetutoringsystemis arrangedto directlyobserve evidence³ D Ol��Æ^���BA

or 1 thatthecorrectactionwasperformedwhenskill S wascalledfor, greatlysimplifying theinferentialtask.Corbettetal. (1995)positthefollowing relationshipsbetweenthehiddenstate� D Ol��Æ^�H�oA

or 1 andtheobservableevidence³ D OÌ��Æ^�7�oAor 1: ����� D Ol��Æ^�Ð� ³ D OÌ��Æ^� � � ����� D OÌ��ÆÑ6�*;�e� ³ D OÌ��Æ^� �}z�b*�6 ����� D Ol��ÆÑ6�*;�Ð� ³ D OÌ��Æ^� � �&Ò����� D Ol��Æ^�Ð� ³ D OÌ��Æ^� � � ����� D OÌ��ÆÑ6�*;�e� ³ D OÌ��Æ^� �}z�b*�6 ����� D Ol��ÆÑ6�*;�Ð� ³ D OÌ��Æ^� � �&Ò� ��� D Ol��ÆÑ6�*;�e� ³ D OÌ��Æ^� � � �8�b*H6cr�� ����� D Ol��ÆÑ6�*;� � ��y�8�b*H6ar�� ����� D Ol��ÆÑ6�*;� �} ] ��� � D Ol��ÆÑ6�*;� � �� ��� D Ol��ÆÑ6�*;�e� ³ D OÌ��Æ^� � � �.r ����� D O8��ÆÓ6Ô*;� � ��y�.r � �.� D OÌ��ÆÑ6�*;� �}z�b*H6 ] � � � � D OÌ��ÆÑ6:*;� � � .

(15)

Given a-priori fixed valuesofÒ

,r, ] , anda probability

¼�¾that eachskill is in the learnedstatebefore

the tutoringbegins,we maysubstitute¼�¾

for ����� D OÌ��ÆM6B*;� � whenÆ��w*

, andtheabove formulasgive analgorithmfor recursively updating����� D Ol��Æ^� � on thebasisof theobservedsequenceof correctandincorrectactionsin the first

Æopportunitiesto apply rule S . This is a particularly simple and fast computational

method,capableof updatingthetutor’s modelof thestudent’s skills in real time asthestudentworkswiththetutor.

It is interestingto notethat in orderto producea well-fitting model,Corbettet al. (1995)hadto allowthe probabilities

Ò,r, ] , andthe probability

¼�¾that eachskill wasalreadyin the learnedstatebeforethe

tutoring began, to be perturbeddifferently from overall populationvaluesfor eachstudentI ; effectively,they allowedtheseparametersto becomerandomeffects.Thus,in additionto theindividual differencesinskills acquisitionthatthemodelhadbeendesignedto detect,thereweresubstantialindividualdifferencesinstartingknowledgeof thestudents,in tendency to slip or guess,andin therateof learning,underthismodel.

4.1.2 The Draney/Pirolli/Wilson model

Draney, Pirolli andWilson (1995)developanLLTM-style modelto analyzeessentiallythesamedata.Themodel they considerbegins with an indicator ³ D � OÌ��Æ^�y��*

if studentI performscorrectly when theÆ

thopportunityto useskill S occurs,undercondition J ; and ³ D � OÌ��Æ^�Î��A

otherwise.These³ D � OÌ��Æ^� differ from

Page 15: Some topics in nonparametric and parametric IRT, with some ...stat.cmu.edu/tr/tr710/tr710.pdfSome topics in nonparametric and parametric IRT, with some thoughts about the future Brian

NonparametricandParametricIRT, andtheFuture 14

Corbettetal.’s (1995) ³ D Ol��Æ^� only in thatthetaskthatprovidesacontext for performingtheskill is allowedto affect thedifficulty of correctskill performance.In theDraney et al. (1995)model,the “skill responsefunctions”aregivenby��� ³ D � Ol��Æ^�Î�+*�� � D � �¶� �5Õ O �bÖ � � **(}��-�l�^�b6 � D }×�¶�¹} Õ O)6 Ö)ØÚÙUÛ ��Æ^�%� This modelessentiallydecomposesthedifficulty parameter� in theRaschmodelaccordingto a

¯matrix,

asin equations(10) and(11), in termsof parametersfor task,� �

, skill, Õ O , andslopeof thelearningcurve,Ö . ThelogarithmicdependenceonÆ

is intendedto follow thedevelopmentof Anderson(1993,AppendixtoChapter3). If thedecompositionof tasksinto skills is complete,andtheskills areof a suitablegranularity,Anderson’s (1993)ACT-R theorypredictsthat skill “performances”will beapproximatelyindependentofoneanother, given the relevant difficulty andstudentparameters.This is essentiallya statementof localindependence,so that the “skill responsefunctions”above maybemultiplied togetherin theusualway toform anIRT likelihood.

It is interestingto comparetheCorbettet al. andDraney et al. modelingapproaches.For example,foradatasetsimilar to thatanalyzedby Draney etal., theassessmentmodelof Corbett,AndersonandO’Brien(1995,p. 32;seealsoDraney, Pirolli Wilson,1995,p. 115)wouldemploy essentiallyfour continuouslatentvariables,and33 dichotomouslatentattribute indicators,per studenttested—in additionto 132parametersto characterizefeaturesof theskills beingassessed.UsingtheirvariationonstandardIRT models,Draney etal. providedequivalentor betterfit to learningcurves,employing onecontinuouslatentvariableperstudenttested(Draney, Pirolli Wilson, 1995,p. 109),and36 parametersfor theskills beingtested(Draney, PirolliWilson,1995,p. 115).Thus,if thegoalis to modellearningcurves,clearlythemorecomplex Corbettetal.modelis notneeded.

However, theusesto which thetwo modelscanbeput, andthesubstantive interpretationsof estimatedparametersin thetwo models,arevery different. TheCorbettet al. modelis immediatelyusefulfor diag-nosingindividualdifferencesin studenttaskperformancebehavior by relatingit directly to specificskills inthetaskdecompositionof their taskdomain,but canonly indirectlyassessthevalidity andreliability of thetasksin theassessment,throughfit to learningcurvesor othersummariesof studenttaskperformance.

The Draney et al. model is not immediatelyuseful for studentdiagnosis,sinceits studentparameteris one-dimensional.After fitting themodel,Draney et al. go backandrank studentsbasedon (empiricalBayes)estimated� ’s (in the context of learningcurvesanalysis,the � ’s essentiallycodestudents’initialfacility in thetaskdomainprior to tutoring,muchastherandomeffect versionof

¼ ¾doesin theCorbettet

al. model),andcompareestimatedskill performancedifficultiesto thestudents’aggregate� distribution; seefor exampletheir Figure5.1,p. 112.Suchdisplaysallow usto predictwhich skills a “typical” studentwithsomefixedvalueof � might beexpectedto performcorrectly, andarevery usefulcommunicationdevices.However, detailedcognitive diagnosisof individual studentson thebasisof the � ’s is not possible,withouta post-hocanalysisof somesort. Indeed,for assessingwhetherindividual studentshave learnedparticularskills, Draney et al. replacetheIRT modelwith a Bayesianinferencenetwork that is focusedon inferringtheprobabilitythatanindividualhaslearnedoneor moreskills, usingpriorsconstructedfrom thefitted IRTmodelandnew datafrom furtherattemptsto performtheskill(s).

The utility of the Draney et al. IRT model for analyzingimportant task performancefeatures,andaggregateexamineebehaviors, shouldnot be minimized however. For example,Junker, Koedingerand

Page 16: Some topics in nonparametric and parametric IRT, with some ...stat.cmu.edu/tr/tr710/tr710.pdfSome topics in nonparametric and parametric IRT, with some thoughts about the future Brian

NonparametricandParametricIRT, andtheFuture 15

Trottini (2000)areusingessentiallythe samemodelingframework to develop a semi-automaticstepwisevariableconstruction/modelselectionprocedure,with the goal of identifying skills that were either toonarrowly or too broadlydefinedin a cognitive tutor (theseresultin stylizeddeviations,or “blips” from thetheoreticallypredictedlearningcurvesfor theskills). In a similar vein, Huguenard,et al. (1997;seealsoPatzetal., 1996)appliedapolytomousversionof theLLTM to studytherelationshipbetweentaskfeaturesandworking memoryload, usingthe IRT � parameterto soakup residualbetween-subjectsvariationnotmodeledby experimentalandworkingmemoryfactors.

4.1.3 An IRT-like cognitive assessment model

As the precedingexamplemakesclear, traditionalparametricIRT approachesmay not be well suitedtoindividual assessmenttasksin computerbasedintelligent tutoringsystems.Thesameissuescanalsoarisein standaloneassessmentsthataredesignedto assesspresenceor absenceof specificskills—ratherthantosortexamineesalongalinearscale—basedonperformanceonafixedsetof tasksgivenasastandalonetest.

To illustrate this we considera modifiedversionof the assessmentmodelof Corbett,AndersonandO’Brien (1995). We omit the hidden-Markov learningmodelandassumethat examineebehavior is onlyobserved at the tasklevel, not theskill level. This modelcanalsobeconnectedto multidimensionalnon-compensatoryIRT models,sinceit is interpretableas a simplified versionof Embretson’s (1985, 1997)multicomponentlatenttrait (MLTM) model.Define� D �

= 1 or 0 indicating whetheror not student Iperformedtask J correctly¯ � O

= 1 or 0 indicatingwhetheror not task J re-quiresskill S� D O = 1 or 0 indicating whetheror not student Ipossessesskill S¸ D � =

È O.É-Ê 4�Ë " � � D O indicating whetheror not student Ihastheskills neededfor task Jr#�

= ��� � D �)�BA�� ¸ D �@�+* � aper-problemslip parameter] � = ��� � D �)�+*&� ¸ D �@�BA � aper-problemguessingparameter

Notethat the¯yD �

aretheusualconstantentriesin the¯

-matrix, andthe ¸ D � aredeterministicfunctionsof the � D O ’s and

¯. The goal is to try to estimatethe � D O ’s, or moreprecisely ����� D OÜ�Ã* � , from the task

performancedata.Ourbasicresponsemodel[level onein thehierarchy(9)] is��� � D �@�+*&� Ý �#Þ���ß � � ¸ D ���b*H6cr#�;�à}v�b*H6 ¸ D �;� ] �� �b*H6cr#�;� ¿ � 4 ] �%9 ¿ � 4� �

andsofor theentireexamineesby tasksmatrix � � D � � of taskresponses,��� � D �@�B, D � � V I � J ��Ý �#Þ���ß �� È D È �âá �b*H6Pr#��� ¿ � 4 ] �%9 ¿ � 4� ã 2 � 4 á *ä6:�b*H6ar1�;� ¿ � 4 ] �%9 ¿ � 4� ã �%9 2 � 4� È D È �âá �b*H6Pr � � 2 � 4 r �%9 2 � 4� ã ¿ � 4 á ] 2 � 4� �b*H6 ] � � �%9 2 � 4 ã �%9 ¿ � 4 (16)

Page 17: Some topics in nonparametric and parametric IRT, with some ...stat.cmu.edu/tr/tr710/tr710.pdfSome topics in nonparametric and parametric IRT, with some thoughts about the future Brian

NonparametricandParametricIRT, andtheFuture 16

To seehow estimationof theparametersdependson thedata,it is instructive to setup thefirst stagesofanestimationalgorithmfor themodel. In particular, let’s assumethatprior distributionshave beenchosen[for levelstwo andthreein thehierarchy(9)], sothat

r1�)�°å�æ;�çr#���, ] ���°å��&� ] �.� , � D O��°å è � ËO �b*76På�O�� �%9 è � Ë

,andperhaps

å�OÑ�wå��Zå�O��. Theseprior distributionswill be usedas“placeholders”in the notationbelow;

their particularform will not affect our conclusionsin any substantialway. We will calculatetheso-called“completeconditional”distributions(e.g.,Gelman,Carlin,SternRubin,1995)of eachparameter, giventhedataandtherestof theparameters.Sucha calculationis directly usefulin settingup anMCMC algorithmfor Bayesianestimationof the modelparameters(e.g.,Patz Junker, 1999a;1999b),and is alsouseful insomeversionsof theE-M algorithm,suchasECME (e.g.,Liu Rubin,1998). However, evenwhenE-M is“possible” for a modellike (16), it neednot bepractical,dueto theneedto sumover all configurationsofthe latent é vector(seeequation17 below). For casesin which relatively few é ’s carrymostof the latenttrait distribution, MCMC canbe a moreefficient—albeitapproximate—way to estimatethe model. Thisphenomenonhasbeenfound in othermodelswith complex discretelatentstructureaswell (e.g.,Seltman,1999;SnijdersNowick, 1997;andTerHofstede,SteenkampWedel,1999).

As usualin settingupanMCMC calculation,wenotethatthecompleteconditionaldistribution for eachparameteror latentvariableis proportionalto theproductof only thoselikelihoodandprior factorsin (9)dependingon thatparameter. Thuswe obtainfor example¼��çr � �

rest�Çê �b*76ar � �%ë � 2 � 4 ¿ � 4 r ë � p �%9 2 � 4 q ¿ � 4� å æ �çr � � �¼�� ] � � rest�Çê ] ë � 2 � 4 p �%9 ¿ � 4 q� �b*H6 ] � �%ë � p �%9 2 � 4 q p �%9 ¿ � 4 q å � � ] � � �J �k* ����;� f , where“rest” standsfor the dataand the restof the parametersin the model. From these

it is easyto seethe intuitively plausiblefactsthat that the slip parameterr1�

is sensitive only to successesandfailuresof examineeswho we hypothesize(throughthevaluesof ¸ D � uponwhichwe have conditioned)do have the requisiteskills to performtask J , andsimilarly the guessingparameter] � is sensitive only tosuccessesandfailuresof examineeswho we hypothesizedo nothave therequisiteskills. Theseparametersmight begivenBetadistribution priors,to make life easyfor estimation.

More centralto thegoalof assessingexaminees,we seethat thecompleteconditionaldistributionsfor� D O areof theform:¼ì� � D O�� rest�ê !� É�Ê 4�Ë�í�î á �b*76ar#�.� 2 � 4 r �%9 2 � 4� ã è � Ë ¿#ïñð Ëbò� 4 á ] 2 � 4� �b*H6 ] �.� �%9 2 � 4 ã �%9 è � Ë ¿1ïñð Ëbò� 4

G å è � ËO �b*76På�OU� �%9 è � Ë �where ¸ p 9 O qD � �QÈ · t" O;É Ê 4çó " � � D · , which indicatespresenceof all skills neededfor task J , exceptfor skill S .Fromthesecompleteconditionalswe canseethatô When ¸ p 9 O qD � �+*

, thesuggestedmodelfor � D � is somesortof Bernoulli,whichmakessense.ô Whenthereareno taskssuchthatboth¯ � O��õ*

and ¸ p 9 O qD � �X*, then � D O is sensitive only to its prior

distributionå è � ËO �b*76På�OU� �%9 è � Ë

: no learningfrom dataoccurs.

Page 18: Some topics in nonparametric and parametric IRT, with some ...stat.cmu.edu/tr/tr710/tr710.pdfSome topics in nonparametric and parametric IRT, with some thoughts about the future Brian

NonparametricandParametricIRT, andtheFuture 17

Thesecondobservation is really a versionof thecredit/blameassignmentproblem(e.g.,VanLehnNiu, inpress):we cannotinfer whether � D O waslearned,if we arehypothesizingthatanotherneededskill is stillunlearned.Roughly, theremustbe information in the taskperformancedatato allow us to assigncredit(whena taskis performedcorrectly)or blame(whenit is performedincorrectly)to everycognitive attributerelatedto thetask.

Thereare essentiallytwo ways to avoid the credit/blameproblem. In somesituations,skills canbescoreddirectly; this is possiblefor examplewithin Anderson’s ITS’s for LISP, algebraandgeometry, be-causestudentsarerequiredto successfullyperformeachsubgoal/skill,with hintsandrepeatedattemptsifnecessary, beforemoving on thenext subgoalin the task(seealsoEmbretson,1985). If onecannotscorethe taskssubgoalby subgoal,onecantry to designthe assessment(e.g.,by handor usingmethodsfromtraditionalstatisticalexperimentaldesign)so that the tasksefficiently exerciseall skills in the skillset insuchaway that,takentogether, thetaskperformancedatainformsusabouteachskill. VanLehn,Niu, Siler,andGertner(1998) illustratethe inferentialdifficulties that canresultwhenitem setsarenot constructedwith thegoalof designingaroundthecredit/blameproblem.

Finally if we wantto estimatetheskill baserateså�O

in thepopulation(a measureof skill difficulty) wemayincludea fourthsetof completeconditionals¼��Zå�Ol�

rest�ÎêBå ö ËO �b*H6På�O��b÷ 9 ö Ë å��Zå�O�� �

where�-Oâ� ë D � D O is thenumberof studentswhoarepresentlyestimatedto haveskill S (again,aBetaprior

densityforå�O

is suggested).

4.2 A Role for Nonparametric IRT Methods?

Suchmodelsas(16) mayseemvery far removedfrom thesettingin which nonparametricIRT methodsarefamiliar. I wantto suggestseveralwaysin which they arenot sofar removed.

First,supposethattheskill variables� D O varyindependentlyof oneanotherin apopulationof studentsorexaminees,aswouldbeconsistentwith e.g.,Anderson’s (1993)ACT-R theory, andsupposethattheslip andguessingparameters

r1�and] � arefixedandsatisfytheplausibleinequality

�b*�6Lr1���Ðd ] � (butr#�

and ] � neednot be known). A model for the taskperformanceof a randomly-sampledexamineefrom the populationwould thenbe��� �ø�Bù �� ú é ! � á �b*H6ar1�;� ¿ 4 p é q ] �%9 ¿ 4 p é q� ã 254 á *76:�b*H6Pr1�.� ¿ 4 p é q ] �%9 ¿ 4 p é q� ã �%9 254 ¼�� é �� ú é ! � � �&� ¸ �8� é �%� 254 � *ä6 � �&� ¸ �8� é �%� � �%9 254 ¼�� é � (17)

where¼�� é � is a productmeasureover thespaceof binaryskills é �À� � ������ � ² � , and � �8� ¸ �&� é �%� is the

monotonefunction � �&� ¸ �&� é �%���X�b*H6ar1�;� ¿ 4 p é q ] �%9 ¿ 4 p é q�of ¸ �&� é �%� ; hence� �&� ¸ �&� é �%� is alsomonotonein thecoordinates� O of é . Note thesimilarity of equation(17) to (2) and(3).

Page 19: Some topics in nonparametric and parametric IRT, with some ...stat.cmu.edu/tr/tr710/tr710.pdfSome topics in nonparametric and parametric IRT, with some thoughts about the future Brian

NonparametricandParametricIRT, andtheFuture 18

It follows immediatelyfrom Lemma2 of Holland and Rosenbaum(1986; seealso Kamae,KrengelO’Brien, 1977)that for any non-decreasingsummary] �Z�=�

of� �û��� �����;� � �l� , T � ] �Z�=��� éH� is non-

decreasingin eachcoordinate� D O of é ; this implies the SOM (StochasticOrderingof the Manifestscore��mby the latent trait) propertyof Hemker et al. (1997), that ��� ��mQ�ü��� é7� is non-decreasingin each

coordinate� D O of é . Not muchis known abouttheinverseandmoreusefulpropertySOL (see7) whenthelatent“trait” is multidimensional.Wemight conjecturefor examplethat� á � O��+*$ýýý ú� É Ê 4�Ë " � ���@�=r ã (18)

would benon-decreasinginr. As in conventionaldichotomousIRT modelswe hopethatsucha property

is true,becauseit meansthata mostpowerful testof masteryof skill S canbe basedon a simpletotal ofcorrectly-performedtasksinvolving skill S .

It alsofollows from Theorem8 of Holland andRosenbaum(1986; seealsoJogdeo,1978)that sinceé is a collection of independent,or more generally, associated,randomvariables,the collection of re-sponsevariables

�is alsoassociated,thatis, for any two non-decreasingsummaries

\ �Z�v�and ] �Z�v�

, thatCov

� \ �Z� �b] �Z�=�%�@dÍA In general,conditionalassociationappearsto bea difficult propertyto prove whenthe underlyinglatentvariable, é in this case,is continuousandmultidimensional,but it may be easiertoeitherderive conditionsfor thebinarycoordinates� D O underwhich conditionalassociationdoeshold—inwhich casemonotonehomogeneityIRT modelsbegin to be competitorsfor modelingthe sameresponsedataasthecognitively-motivatedmodel—orat leastto obtaintheappropriategeneralizationof conditionalassociationfor cognitive assessmentmodelssuchasthis.

Next, an invariantitem orderingproperty, suchas � �8� ¸ ��� é �%�äK � Ol� ¸ OÌ� é �%� uniformly in é , appearstobedifficult to obtainunless

*)6�r1�, ] � , and ¸ ��� é � , arecomonotoneas J varies,for all é . This is a kind of

Guttmanscalingconditiononthelatentresponses�, thatsaysfor examplethateasierguessingis associated

with lower skill requirements,andvice-versa.Thus,our cognitive assessmentmodel(16) providesfertilegroundfor thinking aboutinvariant item ordering,without necessarilybeingtied to preconceptionsaboutcontinuousunidimensionalIRT models.

More broadly, Hoijtink andMolenaar(1997)show how nonparametricmodelfeaturessuchascondi-tional association(4) and manifestmonotonicity(6) can be directly relevant to assessingmodel fit in aparametricBayesiansetting. Given a completeand interestingset of nonparametricmodel featuresformodelslike (16),a similarapproachto modelfit maybetakenhere.

Finally, amongmany who work in the nonparametricIRT traditionsof Mokken, Molenaar, Sijtsma,Holland, Rosenbaumandtheir colleagues,it is the sourceof somebemusementthat we work so hardtoestablishthatscoressuchas

��m, whichareperfectlygoodin theRaschmodel,themoststringentof logistic

IRT models,alsosuffice to orderexaminees,assessmonotonicitypropertiesof theunderlyingmodelfromobservabledata,makemasterydecisions,etc.,in generalnonparametricsettings.Certainlycomparisonsbe-tweenRaschandMokkenscalingarenotnew (e.g.,Meijer, SijtsmaSmid,1990),andmorerecentlyaspectsof bothHolland’s (1990)“Dutch identity” work andScheiblechner’s (1995;seealsoJunker, 1998)“ISOP-model” work canbeseenpartly asattemptsto formalizetheconnectionbetweenRaschandnonparametricIRT methods.

I believethattheconnectionis actuallyrathersimple,andis nearlyobviousfrom Holland’s (1990)work:TheRaschmodelis a very well-behavedexponentialfamily modelwith immediatelyunderstandablesuffi-cientstatisticsfor items,given personparameters,andimmediatelyunderstandablesufficient statisticsfor

Page 20: Some topics in nonparametric and parametric IRT, with some ...stat.cmu.edu/tr/tr710/tr710.pdfSome topics in nonparametric and parametric IRT, with some thoughts about the future Brian

NonparametricandParametricIRT, andtheFuture 19

persons,given item parameters.Much of the work on monotonehomogeneitymodelsand their cousinsis directedat understandingjust how generallytheseunderstandable,but no longer formally sufficient,statisticsyield sensibleinferencesaboutexamineesanditems. The model(16) providesus with a differ-ent “basic”—albeitnot exactly exponential-family—modelfor cognitive assessment,in which parametersdependon immediatelyunderstandablesummariesof the data,as illustratedby the completeconditionalcalculationsabove. Wemayaskhow complex therelationshipbetweentheexamineeskill parametersé andthe taskperformancedata

�canget,andstill have thesesummariesbe informative aboutguessing,slips

presenceor absenceof skills, etc. We mayalsotake anexcursioninto parametricmodels,asI have done,andaskwhetherthis is the“right” parametricmodelon which to basea nonparametrictheoryof cognitiveassessment.For example,otherpossiblemodelswe might considerinsteadof (16) asa startingpoint forsucha nonparametrictheory includethe constrainedlatentclassmodelof Haertel(1989)and the hybridmodelof YamamotoandGitomer(1993).

In additionto theintrinsic interestof thisenterprise,theresultingnonparametrictheoryof cognitive as-sessmentmayhave somepracticalutility. Therelative easewith which theoriginal Corbett,AndersonandO’Brien (1995)modelcanbeestimatedis a consequenceof both its carefultailoring to thepsychologicaltheory, andthefactthatdatacouldbecollectedat theskill level (samegranularityasthepsychologicalthe-ory), ratherthanat thetasklevel, by theLISPtutor. But otherassessmentsettingsmaynotpermitsuchtightbindingof datacollectiondesignandpsychologicaltheory;andourdiscussionof themodel(16)shows thatevenrelatively minor modificationsalongtheselinescanmake the inferentialtaskmoredifficult. Estima-tion probablyrequiresE-M, MCMC, or someothercomputationallyintensive method,andgreatcaremustbetakensothatevery parameteris identifiablefrom thedata.Modelsbeingproposedin thepsychometricliteraturetodayto help unify the traditional IRT anddiscrete-cognitive-attributes approaches,suchastheUnified Model of DiBello, JiangandStout(in press),alsoappearto requiresuchtreatment.In many cases,theestimationmethod,while feasible,maywell betooslow for usein real-timefeedback,aswith computerbasedintelligenttutoringsystems,andtoo complicatedfor teacherscoringof assessmentsembeddedin in-struction. A cleartheoryof which fasterdatasummariesarerelevant to thecognitive inferenceswe wishto make, over a wide varietyof cognitive assessmentmodels,would beanimportantcontribution from theinterfacebetweennonparametricandparametric“IRT” models.

References

Ackerman,T. A. (1994). Usingmultidimensionalitem responsetheoryto understandwhat itemsandtestsaremeasuring.AppliedMeasurementin Education,7, 255–278.

Adams,R. J., Wilson, M., Wang,W.-C. (1997). The multidimensionalrandomcoefficientsmultinomiallogit model.AppliedPsychological Measurement,21,1–23.

Algina, J. (1992).Specialissue:theNationalAssessmentof EducationalProgress(Editor’s Note).Journalof EducationalMeasurement,29,93–94.

Anderson,J.R. (ed.) (1993).Rulesof themind.Hillsdale,NJ: LawrenceErlbaumAssociates.

Anderson,J. R., Corbett,A. T., Koedinger, K. R., Pelletier, R. (1995). Cognitive tutors: lessonslearned.TheJournalof theLearningSciences,4, 167-207.

Page 21: Some topics in nonparametric and parametric IRT, with some ...stat.cmu.edu/tr/tr710/tr710.pdfSome topics in nonparametric and parametric IRT, with some thoughts about the future Brian

NonparametricandParametricIRT, andtheFuture 20

Andrich,D. (1996). A hyperboliccosinelatenttrait modelfor unfoldingpolytomousresponses:Reconcil-ing ThurstoneandLikert methodologies.British Journal of Mathematicaland StatisticalPsychology,49,347–365.

Bartolucci,F., Forcina,A. (in press).A likelihoodratio test for MTPR

within binary variables. In press,Annalsof Statistics.

Baxter, G. P., Glaser, R. (1998).Investigatingthecognitive complexity of scienceassessments.EducationalMeasurement:IssuesandPractice, 17,37–45.

Beguin,A.A., Glas,C.A.W. (1999). MCMC estimationof multidimensionalIRT models. (ResearchRe-port 98-14). Departmentof EducationalMeasurementand Data Analysis, University of Twente,theNetherlands.[Online]. Available: http://to-www.e dt e. ut wente .n l/ TO/o md/ re port 98-/rr9814.htm . Accessed28April 2000.

Bloom, B. S. (1984). The 2-sigmaproblem: the searchfor methodsof group instructionaseffective asone-to-onetutoring.EducationalResearcher, 13,4–16.

Bock, R. D., Zimowski, M. F. (1997). Multi-group IRT. In W. J. VanderLinden,R. K. Hambleton(Eds.),Handbookof modernitemresponsetheory(pp. 433–448).New York: SpringerVerlag.

Bradlow, E. T., Wainer, H., Wang,X. (1999).A Bayesianrandomeffectsmodelfor testlets.Psychometrika,64,153–168.

Cliff, N., Donoghue,J. R. (1992). Ordinal testfidelity estimatedby an item samplingmodel. Psychome-trika, 57,217–236.

Corbett,A. T., Anderson,J. R., O’Brien, A. T. (1995). Studentmodelingin theACT programmingtutor.In P. D. Nichols,S. F. Chipman,R. L. Brennan(Eds.),Cognitivelydiagnosticassessment(pp. 19–41).Hillsdale,NJ: LawrenceErlbaumAssociates.

DiBello, L., Jiang,H., Stout,W. F. (in press).A multidimensionalIRT modelfor practicalcognitive diag-nosis.In Press,AppliedPsychological Measurement.

Douglas,J., Qui, P. (1997). Generalizedlinear factor analysiswith Markov chain MonteCarlo. Unpub-lishedmanuscript.

Draney, K. L., Pirolli, P., Wilson, M. (1995).A measurementmodelfor a complex cognitive skill. In P. D.Nichols,S.F. Chipman,R. L. Brennan(Eds.),Cognitivelydiagnosticassessment(pp. 103–125).Hills-dale,NJ: LawrenceErlbaumAssociates.

Drasgow, F., Levine,M. V., Tsien,S.,Williams, B., Mead,A. D. (1995).Fitting polytomousitem responsetheorymodelsto multiple-choicetests.AppliedPsychological Measurement,19,143–165.

Ellis, J.L. (1994).Foundationsof monotonelatentvariablemodels.Nijmegen:NijmegenInstitutefor Cog-nition andInformation.

Page 22: Some topics in nonparametric and parametric IRT, with some ...stat.cmu.edu/tr/tr710/tr710.pdfSome topics in nonparametric and parametric IRT, with some thoughts about the future Brian

NonparametricandParametricIRT, andtheFuture 21

Ellis, J. L., Junker, B. W. (1997). Tail-measurabilityin monotonelatentvariablemodels.Psychometrika,62,495–523.

Embretson,S.E. (1985).Multicomponentlatenttrait modelsfor testsdesign.In S.E. Embretson(Ed.),Testdesign:developmentsin psychology andpsychometrics(pp. 195–218).New York: AcademicPress.

Embretson,S. E. (1991). A multidimensionallatenttrait modelfor measuringlearningandchange.Psy-chometrika,56,495–515.

Embretson,S.E. (1995).Developmentstowardacognitive designsystemfor psychologicaltests.In D. Lu-binski,R. V. Dawis (Eds.),Assessingindividual differencesin humanbehavior:new concepts,methodsandfindings(pp17–48).PaloAlto CA: Davies-BlackPublishing.

Embretson,S. E. (1997). Multicomponentresponsemodels. In W. J. Van der Linden R. K. Hambleton(Eds.),Handbookof modernitemresponsetheory(pp. 305–321).New York: SpringerVerlag.

Embretson,S.E. (1999).Generatingitemsduringtesting:psychometricissuesandmodels.Psychometrika,64,407–433.

Engelhard,G., Jr. (1994). Examiningratererrorsin the assessmentof written compositionwith many-facetedRaschmodels.Journal of EducationalMeasurement,31,93–112.

Engelhard,G., Jr. (1996). Evaluatingrateraccuracy in performanceassessment.Journal of EducationalMeasurement,33,56–70.

Fischer, G. H. (1973).Linear logistic testmodelasaninstrumentin educationalresearch.ActaPsycholog-ica, 37,359–374.

Fischer, G. H., Molenaar, I. W. (eds.)(1995).Rasch models:foundations,recentdevelopments,andappli-cations.New York: Springer-Verlag.

Fox,G.J.A.,C.A.W. Glas(1998).Amulti-level IRTmodelwith measurementerror in thepredictorvariables.(ResearchReport98-16). Departmentof EducationalMeasurementandDataAnalysis,University ofTwente,the Netherlands.[Online]. Available: http://to-www. edt e. ut went e.n l/ TO/o md-/report98/repo rt 98. ht m. Accessed28April 2000.

Fraser, C.,McDonald,R.P. (1988).NOHARM: Leastsquaresitemfactoranalysis.MultivariateBehavioralResearch 23,267–269.

Gardner, H. (1992). Assessmentin context: the alternative to educationaltesting. In B. R. Gifford, M.C. O’Connor(Eds.),Changingassessments:alternativeviewsof aptitude, achievement,andinstruction(pp. 77–119).Norwell, MA: Kluwer AcademicPublishers.

Gelman,A., Carlin, J.B., Stern,H. S.,Rubin,D. B. (1995).Bayesiandataanalysis.New York: ChapmanandHall.

Gibbons,R. D., Hedeker, D. R. (1992). Full-informationitem bi-factoranalysis.Psychometrika,57, 423–436.

Page 23: Some topics in nonparametric and parametric IRT, with some ...stat.cmu.edu/tr/tr710/tr710.pdfSome topics in nonparametric and parametric IRT, with some thoughts about the future Brian

NonparametricandParametricIRT, andtheFuture 22

Gibbons,R. D., Hedeker, D. R. (1997).Randomeffectsprobitandlogistic regressionmodelsfor three-leveldata.Biometrics,53,1527–1537.

Glas,C. A. W., Verhelst,N. D. (1989).Extensionsof thepartialcreditmodel.Psychometrika,54,635–659.

Grayson,D. A. (1988). Two-groupclassificationin latent trait theory: scoreswith monotonelikelihoodratio. Psychometrika,53,383–392.

Haertel,E. H. (1989). Usingrestrictedlatentclassmodelsto maptheskill structureof achievementitems.Journalof EducationalMeasurement,26,301–321.

Hemker, B. T., SijtsmaK. (1999,July). A comparisonof threegeneral typesof unidimensionalIRT modelsfor polytomousitems.PaperpresentedataSymposiumonNonparametricItemResponseTheoryModelsin Action, EuropeanMeetingof thePsychometricSocietyin Luneburg, Germany.

Hemker, B. T., Sijtsma,K., Molenaar, I. W. (1995). Selectionof unidimensionalscalesfrom a multidi-mensionalitem bankin thepolytomousMokken IRT model. AppliedPsychological Measurement,19,337–352.

Hemker, B. T., SijtsmaK., Molenaar, I. W., Junker, B. W. (1996). PolytomousIRT modelsandmonotonelikelihoodratio of thetotal score.Psychometrika,61,679–693.

Hemker, B. T., SijtsmaK., Molenaar, I. W., Junker, B. W. (1997). Stochasticorderingusingthe latenttraitandthesumscorein polytomousIRT models.Psychometrika, 62,331–347.

Hoijtink, H., Molenaar, I. W. (1997). A multidimensionalitem responsemodel: ConstrainedlatentclassanalysisusingtheGibbssamplerandposteriorpredictive checks.Psychometrika,62,171–189.

Holland,P. W. (1981).Whenareitem responsemodelsconsistentwith observeddata?Psychometrika,46,79–92.

Holland,P. W. (1990).TheDutchidentity: anew tool for thestudyof itemresponsemodels.Psychometrika,55,5–18.

Holland,P. W., Rosenbaum,P. R. (1986).Conditionalassociationandunidimensionalityin monotonelatenttrait models.Annalsof Statistics,14,1523–1543.

Huguenard,B. R., Lerch,F. J., Junker, B. W., Patz,R. J., Kass,R. E. (1997). Working memoryfailure inphone-basedinteraction.ACM TransactionsonComputer-HumanInteraction,4, 67–102.

Huynh,H. (1994).A new proof for monotonelikelihoodratio for thesumof independentBernoulli randomvariables.Psychometrika,59,77–79.

Jaakkola,T. S.,Jordan,M. I. (1999).Bayesianparameterestimationvia variationalmethods.In press,Statis-tics andComputing. [Online]. Available: http://www.cs. berk ele y. edu/ ˜j ord an/p ubli -cations.html .Accessed28April 2000.

Page 24: Some topics in nonparametric and parametric IRT, with some ...stat.cmu.edu/tr/tr710/tr710.pdfSome topics in nonparametric and parametric IRT, with some thoughts about the future Brian

NonparametricandParametricIRT, andtheFuture 23

Janssen,R., De Boeck,P. (1997).Psychometricmodelingof componentiallydesignedsynonym tasks.Ap-pliedPsychological Measurement,21,37–50.

Jogdeo,K. (1978).On aprobabilityboundof MarshallandOlkin. Annalsof Statistics,6, 232–234.

Johnson,E.G.,Mislevy, R.J.,Thomas,N. (1994).Theoreticalbackgroundandphilosophyof NAEPscalingprocedures.In E. G. Johnson,J. Mazzeo,D. L. Kline (Eds.),Technical Reportof theNAEP1992TrialStateAssessmentProgramin Reading(pp. 133–146).Washington,DC: Office of EducationalResearchandImprovement,U.S.Departmentof Education.

Junker, B. W. (1993). Conditionalassociation,essentialindependenceandmonotoneunidimensionalitemresponsemodels.Annalsof Statistics,21,1359–1378.

Junker, B. W. (1998). Someremarkson Scheiblechner’s treatmentof ISOPmodels. Psychometrika,63,73–85.

Junker, B. W., Ellis, J. L. (1997). A characterizationof monotoneunidimensionallatentvariablemodels.Annalsof Statistics,25,1327–1343.

Junker, B. W., Koedinger, K. R.,Trottini, M. (2000).Finding improvementsin studentmodelsfor intelligenttutoring systemsvia variable selectionfor a linear logistic testmodel. Paperpresentedat the AnnualNorth AmericanMeetingof thePsychometricSociety, July2000,Vancouver, BC, Canada.

Junker, B. W., Patz,R. J. (June1998).Thehierarchical rater modelfor ratedtestitems.PaperpresentedattheAnnualNorth AmericanMeetingof thePsychometricSociety, Champaign-UrbanaIL, USA.

Junker, B. W., Sijtsma,K. (2000). Latentandmanifestmonotonicityin item responsemodels. AppliedPsychological Measurement,24,65–81.

Kass,R. E., Tierney, L., Kadane,J. B. (1990). The validity of posteriorexpansionsbasedon Laplace’smethod. In S. Geisser, J. S. Hodges,S. J. PressA. Zellner (Eds.),Bayesianand likelihoodmethodsinstatisticsandeconometrics:Essaysin honorof George A. Barnard (pp. 473–488).New York: North-Holland.

Kelderman,H., Rijkes,C. P. M. (1994). LoglinearmultidimensionalIRT modelsfor polytomouslyscoreditems.Psychometrika,59,149–176.

Kamae,T. Krengel,U., O’Brien, G. L. (1977). Stochasticinequalitieson partially orderedspaces.Annalsof Probability, 5, 899–912.

Lee,Y., Nelder, J.A. (1996).Hierarchicalgeneralizedlinearmodels.Journal of theRoyalStatisticalSoci-ety, SeriesB, 58,619–678.

Linacre,J.M. (1989).Many-facetedRasch Measurement. Chicago:MesaPress.

Liu, C., Rubin,D. B. (1998).Maximumlikelihoodestimationof factoranalysisusingtheECMEalgorithmwith completeandincompletedata.StatisticaSinica,8, 729–747.

Page 25: Some topics in nonparametric and parametric IRT, with some ...stat.cmu.edu/tr/tr710/tr710.pdfSome topics in nonparametric and parametric IRT, with some thoughts about the future Brian

NonparametricandParametricIRT, andtheFuture 24

Lord, F. M. (1952).A theoryof testscores.PsychometricMonographs,No. 7.

Loevinger, J. (1948).Thetechniqueof homogeneoustestscomparedwith someaspectsof “scaleanalysis”andfactoranalysis.Psychological Bulletin,45,507–530.

Maris,E. (1995).Psychometriclatentresponsemodels.Psychometrika,60,523–547.

Maris, E., De Boeck,P., Van Mechelen,I. (1996). Probabilitymatrix decompositionmodels. Psychome-trika, 61,7–29.

Masters,G. N. (1982).A Raschmodelfor partialcreditscoring.Psychometrika,47,149–174.

Meijer, R. R.,Sijtsma,K. Smid,N. G. (1990).Theoreticalandempiricalcomparisonof theMokkenandtheRaschapproachto IRT. AppliedPsychological Measurement,14,283–298.

Meijer, R. R. (1996).Person-fitresearch:An introduction.(Guesteditor’s introductionto theSpecialIssue:Person-fitresearch:Theoryandapplications.)AppliedMeasurementin Education,9, 3–8.

Meredith,W. (1965).Someresultsbasedonageneralstochasticmodelfor mentaltests.Psychometrika,30,419–440.

Mislevy, R. J. (1985). Estimationof latentgroupeffects. Journal of theAmericanStatisticalAssociation,80,993–997.

Mislevy, R. J. (1996).Testtheoryreconceived.Journal of EducationalMeasurement,33,379–416.

Mislevy, R.J.,Sheehan,K. M. (1989).Theroleof collateralinformationaboutexamineesin itemparameterestimationPsychometrika,54,661–679.

Mokken,R. J. (1971).A theoryandprocedure of scaleanalysis.TheHague:Mouton.

Mokken,R. J. (1997).Nonparametricmodelsfor dichotomousitems.In W. J.VanderLinden,R. K. Ham-bleton(Eds.),Handbookof modernitemresponsetheory(pp. 351–368).New York: SpringerVerlag.

Molenaar, I. W. (1991). A weightedLoevinger H-coefficient extendingMokken scalingto multicategoryitems.KwantitatieveMethoden,37,97–117.

Molenaar, I. W. (1997).Nonparametricmethodsfor polytomousresponses.In W. J.VanderLinden,R. K.Hambleton(Eds.),Handbookof modernpsychometrics(pp. 369–380).New York: SpringerVerlag.

Molenaar, I. W., Sijtsma,K. (1999).MSPfor Windows.Groningen:iecProGAMMA.

Molenaar, I. W., Stout,W. F. (January5, 2000).Personalcommunication.

Muraki, E., Carlson,J. E. (1995). Full-informationFactorAnalysisfor PolytomousItem Responses.Ap-pliedPsychological Measurement,19,73–90.

McCullagh,P., Nelder, J. A. (1989). GeneralizedLinear Models(2ndEdition). New York: ChapmanandHall.

Page 26: Some topics in nonparametric and parametric IRT, with some ...stat.cmu.edu/tr/tr710/tr710.pdfSome topics in nonparametric and parametric IRT, with some thoughts about the future Brian

NonparametricandParametricIRT, andtheFuture 25

Nichols,P., Sugrue,B. (1999).Thelackof fidelity betweencognitively complex constructsandconventionaltestdevelopmentpractice.EducationalMeasurement:IssuesandPractice, 18,18–29.

Patz,R. J.,Junker, B. W. (1999a).A straightforward approachto Markov ChainMonteCarlomethodsforitemresponsemodels.Journal of EducationalandBehavioral Statistics,24,146–178.

Patz, R. J., Junker, B. W. (1999b). Applicationsandextensionsof MCMC in IRT: Multiple item types,missingdata,andratedresponses.Journalof EducationalandBehavioral Statistics,24,xxx–xxx.

Patz,R.J.,Junker, B. W., Johnson,M. S.(1999,April). Thehierarchical ratermodelfor ratedtestitemsandits applicationto large-scaleeducationalassessmentdata. Paperpresentedat a Symposiumon MakingtheMostof ConstructedResponses:TheInformationin Multiple Ratings,AnnualMeetingof theAmer-icanEducationalResearchAssociation,MontrealCanada.

Patz,R. J.,Junker, B. W., Lerch,F. J.,Huguenard,B. R. (1996).Analyzingsmallpsychological experimentswith item responsemodels(CMU StatisticsDepartmenttechnicalreport #644). [Online]. Available:http://www.sta t. cmu.e du/c mu-st at s/ tr /t r64 4/ tr 644. htm l .Accessed28April 2000.

Post,W. J. (1992). Nonparametricunfoldingmodels.A latentstructure approach. Leiden: DSWO Press,LeidenUniversity, TheNetherlands.

Post,W. J.,Snijders,T. A. B. (1993). Nonparametricunfoldingmodelsfor dichotomousdata.Methodika,7, 130–156.

Ramsay, J.O. (1991).Kernelsmoothingapproachesto nonparametricitem characteristiccurve estimation.Psychometrika,56,611–630.

Ramsay, J.O. (1995).A similarity-basedsmoothingapproachto nondimensionalitemanalysis.Psychome-trika, 60,323–339.

Ramsay, J.O. (1996).A geometricalapproachto itemresponsetheory. Behaviormetrika,23,3–17.

Reckase,M. D. (1985).Thedifficulty of testitemsthatmeasuremorethanoneability. AppliedPsychologi-cal Measurement,9, 401–412.

Resnick,L. B., Resnick,D. P. (1992).Assessingthethinkingcurriculum:new toolsfor educationalreform.In B. R. Gifford, M. C. O’Connor(Eds.),Changingassessments:alternativeviewsof aptitude, achieve-ment,andinstruction(pp37–75).Norwell, MA: Kluwer AcademicPublishers.

Rigdon,S. E., Tsutakawa, R. K. (1983). Parameterestimationin latent trait models. Psychometrika,48,567–574.

Robertson,T., Wright, F. T., Dykstra,R. L. (1988)Order restrictedstatisticalinference. New York: Wiley.

Rosenbaum,P. R. (1984). Testingtheconditionalindependenceandmonotonicityassumptionsof item re-sponsetheory. Psychometrika,49, 425–435.

Page 27: Some topics in nonparametric and parametric IRT, with some ...stat.cmu.edu/tr/tr710/tr710.pdfSome topics in nonparametric and parametric IRT, with some thoughts about the future Brian

NonparametricandParametricIRT, andtheFuture 26

Rosenbaum,P. R. (1987a).Probabilityinequalitiesfor latentscales.British Journal of MathematicalandStatisticalPsychology, 40,157–168.

Rosenbaum,P. R. (1987b).Comparingitemcharacteristiccurves.Psychometrika,52,217–233.

Roussos,L. (1994).Summaryandreview of cognitive diagnosismodels.Unpublishedmanuscript.

Samejima,F. (1997). Departurefrom normalassumptions:a promisefor future psychometricswith sub-stantive mathematicalmodeling.Psychometrika,62,471–493.

Scheiblechner, H. (1972). DasLernenundLosenkomplexer Denkaufgaben.[The learningandsolvingofcomplex reasoningitems.]Zeitschrift fur ExperimentelleundAngewandtePsychologie, 3, 456–506.

Scheiblechner, H. (1995).Isotonicordinalprobabilisticmodels(ISOP).Psychometrika,60,281–304.

Seltman,H. (1999). HiddenStochasticModelsfor Biological RhythmData. UnpublishedPh.D.Disserta-tion, Departmentof Statistics,Carnegie Mellon University, Pittsburgh PA, USA.

Sijtsma,K. (1998). Methodologyreview: NonparametricIRT approachesto the analysisof dichotomousitemscores.AppliedPsychological Measurement,22,3–32.

Sijtsma,K., VanderArk, L. A. (this volume). Progressin IRT analysisof polytomousitem scores:dilem-masandpracticalsolutions.In A. Boomsma,T. Snijders,M. VanDuijn (Eds.),Essaysin ItemResponseModeling(pp. xxx–xxx). New York: Springer-Verlag.

Sijtsma,K., Hemker, B. T. (1998).NonparametricpolytomousIRT modelsfor invariantitemordering,withresultsfor parametricmodels.Psychometrika,63,183–200.

Sijtsma,K., Junker, B. W. (1996).A survey of theoryandmethodsof invariantitem ordering.British Jour-nal of MathematicalandStatisticalPsychology, 49,79–105.

Sijtsma,K., Junker, B. W. (1997).Invariantitemorderingof transitive reasoningtasks.In J.Rost,R. Lange-heine(Eds.),Applicationsof latent trait and latent classmodelsin the social sciences(pp. 97–107).Munster:WaxmannVerlag.

Shute,V. J.,Psotka,J. (1996). Intelligent tutoringsystems:Past,PresentandFuture.In D. Jonassen(Ed.),Handbookof Research on EducationalCommunicationsand Technology (pp. 570–600). New York:MacmillanPress..

Snijders,T.A.B. (thisvolume).Two-level non-parametricscalingfor dichotomousdata.In A. Boomsma,T.Snijders,M. VanDuijn (Eds.),Essaysin ItemResponseModeling(pp. xxx–xxx). New York: Springer-Verlag.

Snijders,T.A.B. (1997). Estimationandpredictionfor stochasticblockmodelsfor graphswith latentblockstructure.Journalof Classification,14 75–100.

Stegelmann,W. (1983). ExpandingtheRaschmodelto a generalmodelhaving morethanonedimension.Psychometrika,48,259–267.

Page 28: Some topics in nonparametric and parametric IRT, with some ...stat.cmu.edu/tr/tr710/tr710.pdfSome topics in nonparametric and parametric IRT, with some thoughts about the future Brian

NonparametricandParametricIRT, andtheFuture 27

Stout,W. F. (1987).A nonparametricapproachfor assessinglatenttrait unidimensionality. Psychometrika,52,589–617.

Stout,W. F. (1990).A new itemresponsetheorymodelingapproachwith applicationsto unidimensionalityassessmentandability estimation.Psychometrika,55,293–325.

Stout,W. F., Habing,B., Douglas,J., Kim, H. R., Roussos,L., Zhang,J. (1996). Conditionalcovariance-basednonparametricmultidimensionalityassessment.AppliedPsychological Measurement,20, 331–354.

Sykes,R. C., Heidorn,M. (1999,April). Theassignmentof raters to items: controlling for rater bias. Pa-per presentedat the annualmeetingof the NationalCouncil on Measurementin Education,Montreal,Canada.

Tanner, M. A. (1996).Toolsfor statisticalinference:methodsfor theexploration of posteriordistributionsandlikelihoodfunctions.3

»%´Edition. New York: Springer-Verlag.

Tatsuoka,K. K. (1990).Towardanintegrationof item responsetheoryandcognitive errordiagnosis.In N.Fredriksen,R. Glaser, A. Lesgold,M. G. Shafto(Eds.),Diagnosticmonitoringof skill and knowledgeacquisition(pp. 453–488).Hillsdale,NJ: LawrenceErlbaumAssociates.

Tatsuoka,K. K. (1995). Architectureof knowledgestructuresandcognitive diagnosis:a statisticalpatternrecognitionandclassificationapproach.In P. D. Nichols,S. F. Chipman,R. L. Brennan(Eds.),Cogni-tivelydiagnosticassessment(pp. 327–359).Hillsdale,NJ: LawrenceErlbaumAssociates.

Ter Hofstede,F. Steenkamp,J.-B.E. M., Wedel,M. (1999). Identifying spatiallycontiguousinternationaltargetmarkets.Manuscriptsubmittedfor publication.

Thissen,D., Steinberg, L. (1986).A taxonomyof item responsemodels.Psychometrika,51,567-577.

VanderArk, L. A. (1999,July). A referencecard for therelationsbetweenIRTmodelsfor polytomousitemsandsomerelevantproperties.PaperpresentedataSymposiumonNonparametricItemResponseTheoryModelsin Action, EuropeanMeetingof thePsychometricSocietyin Luneburg, Germany.

VanderLinden,W. J., Hambleton,R. K. (eds.) (1997). Handbookof modernitemresponsetheory. NewYork: SpringerVerlag.

VanLehn,K., Niu, Z. (in press). Bayesianstudentmodeling,userinterfacesandfeedback:A sensitivityanalysis.In Press,InternationalJournalof Artificial Intelligencein Education.

VanLehn,K., Niu, Z., Siler, S.,Gertner, A. (1998).Studentmodelingfromconventionaltestdata:aBayesianapproachwithoutpriors. In B. P. Goetl,H. M. Halff, C.L. Redfield,V. J.Shute(Eds.),Proceedingsof theIntelligent Tutoring SystemsFourth InternationalConference, ITS98 (pp. 434–443).Berlin: Springer-Verlag.

Verhelst,N. D., Verstralen,H. H. F. M. (1993).A stochasticunfoldingmodelderivedfrom thepartialcreditmodel.KwantitatieveMethoden,42,73–92.

Page 29: Some topics in nonparametric and parametric IRT, with some ...stat.cmu.edu/tr/tr710/tr710.pdfSome topics in nonparametric and parametric IRT, with some thoughts about the future Brian

NonparametricandParametricIRT, andtheFuture 28

Verhelst,N.D., Verstralen,H.H.F.M. (this volume). IRT modelsfor multiple raters. In A. Boomsma,T.Snijders,and M. Van Duijn (Eds.), Essaysin Item ResponseModeling (pp. xxx–xxx). New York:Springer-Verlag.

Wilson, M. R., Hoskens,M. (1999,April). Therater bundlemodel. Paperpresentedat a SymposiumonMaking theMost of ConstructedResponses:The Informationin Multiple Ratings,Annual MeetingoftheAmericanEducationalResearchAssociation,MontrealCanada.

Wilson, D., Wood, R. L., Gibbons,R. (1983). TESTFACT: testscoringanditem factoranalysis. [Com-puterprogram]. Chicago:ScientificSoftware Inc. Online descriptionavailable: http://ssicen-tral.com/irt/t es tfa ct .h tm . Accessed28 April 2000.

Wu, M. L., Adams,R. J.,Wilson, M. R. (1997).ConQuest:Generalizeditemresponsemodelingsoftware.ACER.

Yamamoto,K., Gitomer, D. H. (1993). Applicationof a HYBRID modelto a testof cognitive skill repre-sentation.In N. Fredriksen,R. J.Mislevy (Eds.),Testtheoryfor a new generation of tests(pp. 275–295).Hillsdale,NJ: LawrenceErlbaumAssociates.

Yuan,A., Clarke,B. (1999).Manifestcharacterizationandtestingfor two latenttraits. Manuscriptsubmit-tedfor publication.

Zhang,J., Stout,W. F. (1996,April). A theoreticalindex of dimensionalityandits estimation.Paperpre-sentedat theAnnualMeetingof theNationalCouncilonMeasurementin Education,New York.

Zimowski, M. F., Muraki, E., Mislevy, R. J., Bock, R. D. (1997). BILOG-MG. [Computerprogram].Chicago:ScientificSoftwareInc. Onlinedescriptionavailable: http://ssicent ral .c om/i rt -/bilogmg.htm .Accessed28April 2000.

Zwick, R. (1992). Specialissueon the NationalAssessmentof EducationalProgress.Journal of Educa-tional Measurement,17,93–94.