Adaptive Item-based Learning Environments

Adaptive item-based learning environmentsbased on the item response theory: possibilitiesand challengesjcal_368 549..562

K. Wauters,*† P. Desmet*‡ & W. Van den Noortgate*†*iTEC, Interdisciplinary Research on Technology, Education and Communication, Katholieke Universiteit Leuven, 8500 Kortrijk, Belgium†Faculty of Psychology and Educational Sciences, Katholieke Universiteit Leuven, 3000 Leuven, Belgium‡Franitalco, Research on French, Italian and Comparative Linguistics, Katholieke Universiteit Leuven, 3000 Leuven, Belgium

Abstract The popularity of intelligent tutoring systems (ITSs) is increasing rapidly. In order to makelearning environments more efficient, researchers have been exploring the possibility of anautomatic adaptation of the learning environment to the learner or the context. One of the pos-sible adaptation techniques is adaptive item sequencing by matching the difficulty of the itemsto the learner’s knowledge level. This is already accomplished to a certain extent in adaptivetesting environments, where the test is tailored to the person’s ability level by means of the itemresponse theory (IRT). Even though IRT has been a prevalent computerized adaptive test (CAT)approach for decades and applying IRT in item-based ITSs could lead to similar advantages asin CAT (e.g. higher motivation and more efficient learning), research on the application of IRTin such learning environments is highly restricted or absent. The purpose of this paper was toexplore the feasibility of applying IRT in adaptive item-based ITSs. Therefore, we discussedthe two main challenges associated with IRT application in such learning environments: thechallenge of the data set and the challenge of the algorithm. We concluded that applying IRTseems to be a viable solution for adaptive item selection in item-based ITSs provided that somemodifications are implemented. Further research should shed more light on the adequacy of theproposed solutions.

Keywords adaptive item selection, e-learning, intelligent tutoring systems, IRT, item-based learning.

Introduction

Learning environments increasingly include an elec-tronic component, offering some additional possibili-ties, such as making education more accessible for thelarge public by creating the possibility for learnersto study at their own pace, anytime and anywhere.However, one shortcoming of most learning environ-ments is that they are static, in the sense that they

provide for each learner the same information in thesame structure using the same interface. Research in thefield of educational sciences suggests that learnersdiffer from each other with respect to their learningcharacteristics, preferences, goals and knowledge leveland that these factors as well as context characteristics(e.g. location, time, etc.) have an influence on the learn-ing effectiveness (Kelly & Tangney 2006; Verdú et al.2008). Therefore, adaptive learning systems are devel-oped aimed at optimizing learning conditions (Bloom1984; Wasson 1993; Lee 2001; Verdú et al. 2008).

In the context of electronic learning environments,the term ‘adaptivity’ is used to refer to the adjustment of

Accepted: 14 May 2010Correspondence: Kelly Wauters, Katholieke Universiteit Leuven –Campus Kortrijk, Etienne Sabbelaan 53, 8500 Kortrijk, Belgium.Email: [email protected]

doi: 10.1111/j.1365-2729.2010.00368.x

Original article

© 2010 Blackwell Publishing Ltd Journal of Computer Assisted Learning (2010), 26, 549–562 549

mailto:[email protected]

one or more characteristics of the learning environmentin function of the learner’s needs and preferences and/orthe context. More specifically, ‘adaptivity’ can relate tothree dimensions: the form, the source and the medium.The form of adaptivity refers to the adaptation tech-niques that are implemented (Brusilovsky 1999; Brusi-lovsky & Peylo 2003; Paramythis & Loidl-Reisinger2004). Forms can be summarized in three categories:adaptive form representation, adaptive content repre-sentation and adaptive curriculum sequencing. Adaptiveform representation refers to the way the content is pre-sented to the learner and includes, for instance, whetherpictures and videos are added to the text and whetherlinks are visible or highlighted. Adaptive content repre-sentation concentrates on providing the learner withintelligent help on each step in the problem-solvingprocess based on discovered knowledge gaps of learn-ers. Adaptive curriculum sequencing is intended toselect the optimal question at any moment in order tolearn certain knowledge in an efficient and effective way.The source of adaptivity refers to the features involved inthe adaptivity process and can be classified into threemain categories: course/item features, such as the diffi-culty level and topic of the course; person features, com-posed of the learner’s knowledge level, motivation,cognitive load, interests and preferences; context fea-tures, which contains the time when, place from whichand device on which the learner works in the learningenvironment; or a combination of these, for instance, thesequencing of the curriculum might be based on adapt-ing the item’s difficulty level to the learner’s knowledgelevel. Regarding the medium of adaptivity, a main dis-tinction can be made between two types of adaptivelearning environments: adaptive hypermedia (AH) andintelligent tutoring systems (ITSs) (Brusilovsky 1999).While AH systems are composed of a large amount oflearning material presented by hypertext, ITSs oftenprovide only a limited amount of learning material.Instead, ITSs intend to support a learner in the process ofproblem solving. ITSs can be further divided into task-based and item-based ITSs. Task-based ITSs, such asASSISTance and assessments (ASSISTments) (Razzaqet al. 2005), are composed of substantial tasks or prob-lems that are often tackled by means of scaffolding prob-lems, which break the main problem down into smaller,learnable chunks. The learner tries to come to a reason-able solution by applying knowledge and skills that areneeded to address these problems. Item-based ITSs,

such as Franel (Desmet 2006) and Spanish translationof Intelligent Evaluation System using Tests for Tele-Education (SIETTE) (Conejo et al. 2004; Guzmán &Conejo 2005), are composed of simple questions thatcan be combined with hints and feedback. In the remain-der of this paper, we will focus on ‘adaptive curriculumsequencing in item-based ITSs by matching the item dif-ficulty level to the learner’s knowledge level’.

Recently, interests in adaptive item sequencing inlearning environments have grown (e.g. Pérez-Marínet al. 2006; Leung & Li 2007). Generally, it is foundthat excessively difficult course materials can frustratelearners, while excessively easy course materials cancause learners to lack any sense of challenge. Learnersprefer learning environments where the item selectionprocedure is adapted to their knowledge level. More-over, adaptive item sequencing can lead to moreefficient and effective learning. Even though severalmethods are adopted to match the item difficulty level tothe learner’s knowledge level (e.g. Bayesian Networks;Collins et al. 1996; Millán et al. 2000; Conati et al.2002), it seems obvious to make use of the item responsetheory (IRT; Van der Linden & Hambleton 1997) as it isfrequently used in computerized adaptive tests (CATs)for adapting the item difficulty level to the person’sknowledge level (Lord 1970; Wainer 2000). IRT modelsare measurement models that specify the probability of adiscrete outcome, such as the correctness of a responseto an item, in terms of person and item parameters. InCAT, IRT is often used to generate a ‘calibrated itembank’, to estimate and update an examinee’s ability esti-mate and to decide on the next item to render. CAT canpotentially decrease testing time and the number oftesting items and increase measurement precision andmotivation (Van der Linden & Glas 2000; Wainer 2000).

Despite the fact that IRT is a well-establishedapproach in testing environments, for instance, withinCAT, to date, the implementation of IRT within learningenvironments is limited. This paper will elaborate onthis issue. More precisely, the aim of this paper is toidentify and discuss the possibilities and challenges ofextrapolating the application of IRT for adaptive itemsequencing for testing applications to item-based ITSs,partly based on own experiences with this type of item-based ITSs. Furthermore, based on techniques usedfor other types of learning and testing environments, wesuggest possible tracks for dealing with those chal-lenges. A third purpose of the paper is to guide future

550 K. Wauters et al.

© 2010 Blackwell Publishing Ltd

empirical research in this field, including the empiricalevaluation of these ideas proposed in this paper.

In what follows, we will first address briefly adaptiveitem sequencing in testing environments since it canhelp to understand adaptive item sequencing in learningenvironments. Next, we will discuss the differencesbetween applying IRT for adaptive item sequencing inlearning environments and in testing environments. Inparticular, we will address the possibilities and chal-lenges of applying IRT for this purpose in item-basedITSs. Challenges regarding the data set and the algo-rithms will be highlighted, and some solutions fordealing with those challenges will be suggested.

Adaptive item sequencing in testingenvironments

As indicated earlier, the process of tailoring a test to aperson can be based on IRT. IRT models are measure-ment models describing responses based on person anditem characteristics. The Rasch model (Rasch 1960) isthe simplest IRT model and expresses the probability ofobserving a particular response to an item as function ofthe item difficulty level and the person’s ability level.

A prerequisite for adaptive item sequencing, andhence, for CAT, is to have items with a known difficultylevel. Therefore, an initial development of an item bankwith items of which the item difficulty level is known isneeded before an adaptive test can be administered. Thisitem bank should be large enough to include at any timean item with a difficulty level within the optimal rangethat was not yet presented to the person. To calibrate theitems of the item bank, i.e. to estimate the item param-eters (e.g. difficulty level), typically, a non-adaptive testis taken by a large sample of persons. Guidelines forthe number of persons needed to obtain reliable itemparameter estimates vary between 200 and 1000persons (e.g. Huang 1996; Wainer & Mislevy 2000).Following item calibration, several steps are takenwithin a CAT as illustrated in Fig 1. When no informa-tion is provided about the ability level of the person, thefirst items that are administered are often items withan average item difficulty level. After a few items areadministered, the person’s ability parameter is esti-mated by means of his or her answer on these items.Next, the item selection algorithm chooses the item thatprovides the most information about the current abilityestimate as the next item. Within the framework of the

Rasch model, based on which item difficulties andperson abilities are directly comparable, this is an itemwhose difficulty level is equal to the ability level ofthe person. Next, the person’s ability parameter isre-estimated based on all previous items. Based on theupdated estimate an additional item is selected. Thesesteps are repeated until a certain stopping criterionis met, for instance, when the ability estimate hardlychanges or when a specific number of items is reached.

Adaptive item sequencing inlearning environments

The research to date has tended to focus on applyingIRT for adaptive item sequencing in testing environ-ments rather than in learning environments. Further-more, the research on IRT in learning environmentsis mainly concentrated on assessment, both formativeand summative (Millán et al. 2003; Guzmán & Conejo2005; Chen et al. 2006, 2007; Leung & Li 2007;Baylari & Montazer 2009; Chen & Chen 2009).However, some learning environments already makeuse of IRT to adaptively sequence the learning material(Chen et al. 2005, 2006; Chen & Chung 2008; Chen &Duh 2008; Chen & Hsu 2008). These researchers haveapplied original and modified IRT with success in AHs.Nevertheless, our experience with item-based ITSs,

Fig 1 Computerized adaptive testing procedure.

Adaptive ITSs based on IRT 551


such as Franel (Desmet 2006), shows that the applica-tion of IRT in item-based ITSs has distinct possibilitiesand challenges. Franel is a freely accessible item-basedlearning environment for French-speaking persons tolearn Dutch and for Dutch-speaking persons to learnFrench. To optimize language learning, Franel is com-posed of several didactical item types, for instance,multiple choice, fill in the blank and translate, andseveral media features, such as video and audio. Learn-ers are also guided towards the correct response bymeans of hints and elaborated feedback. This elabo-rated feedback, such as error-specific feedback, candepend on the answer the learner has given and there-fore, can identify and correct knowledge gaps. Further-more, learners can choose to follow a guided path or tofollow their own learning path. Analysis has shownthat learners tend to follow their own learning path, inwhich they can freely select the items they want tomake. The item pool contains thousands of items. Theitems can be chosen from a navigation menu. Theappearance of the navigation menu is not adapted tothe learner, which means that every learner is presentedthe same navigation menu with the items in the sameorder. The learner can choose whether he or she wantsthe items to be categorized within chapters/topics(e.g. hobby, work, etc.) or whether he or she wants theitems to be categorized within domains (e.g. grammar,vocabulary, etc.). Besides the topic and domain, there isno further logical order in which the items are listedwithin the navigation menu. Once a learner has selectedan item, the learner is free to answer the item, ask fora hint or the solution or go to the next item withoutmaking any attempts.

That the implementation of IRT for item-based ITSshas received substantially less research attention couldbe due to the fact that extrapolating the ideas of CATand IRT to such learning environments is not straight-forward. Our experience with the tracking and loggingdata of item-based ITSs such as Franel (Desmet 2006)taught us that two major challenges arise when we wantto apply IRT for adaptive item sequencing in suchlearning environments. A first challenge implies build-ing a data set that yields sufficient and accurate infor-mation about the person and item characteristics. Asecond challenge is the question of the model to beapplied for adaptive item selection in item-based ITSs.In what follows, each of those challenges will bediscussed.

The challenge of the data set

The difference in the data-gathering procedure oflearning and testing environments has implications forIRT application in learning environments. In learningenvironments, data are generally collected in a lessstructured way, resulting in a missing data problem.Moreover, analysis and interpretation can be hamperedby the skipping of items.

Missing valuesIn item-based ITSs, learners are free to choose the exer-cises they want to make. This, combined with the possi-bly vast amount of exercises provided within a learningenvironment, leads to the finding that many exercisesare only made by few learners. Even though IRT candeal with structural incomplete data sets, the structureand huge amount of missing values found in the track-ing and logging data of learning environments caneasily lead to non-converging estimations of the IRTmodel parameters. Hence, the challenge of the missingvalues is twofold: we need to obtain the structure ofdata needed for item calibration and we need to obtainthe amount of data needed for reliable item parameterestimation.

In practical applications of IRT, the number of itemsthat have to be calibrated is often so large that it is prac-tically not feasible to administer the entire set of items toone sample of learners. IRT models are able to analyseincomplete data under certain circumstances (Eggen1993). More specifically, even though learners have notanswered all items, a common measurement scale canbe constructed on which all items can be located. Thereis, however, a problem regarding the structure of thedata if there is no overlap in items solved by one orseveral persons with those solved by other persons.Without making additional assumptions, this lack ofoverlap does not allow assessing the difficulty of theitems and/or the ability of the persons since in IRT, thisis performed relative to the other items or persons.Unfortunately, this lack in overlap might be encoun-tered in learning environments, especially if learnersare completely free in selecting items from a large itempool, as is the case in Franel.

A method that can be applied to obtain the structureneeded for item calibration is to administer all learnerssome common items. This would result in a calibrationdesign such as the one presented in Fig 2. This incom-



plete calibration design with a common items anchorhas the advantage that neither the equivalence of learn-ers nor the equivalence of items needs to be assumed.Additionally, an advantage for the course creator is thathe or she has some control over the item administration,such that non-overlapping parts in the data set can beavoided.Adrawback of the approach, however, is that atthe end, we have a lot of information about the commonitems but much less about the other items. Moreover,while item development is a very expensive and timeintensive enterprise, not all items will be used to thesame degree, and possibly, some items will hardly beused. This problem of unequal use of items might alsooccur in learning environments in which learners canchoose items freely out of a fixed navigation menu butselect items according to a certain pattern (for instance,learners try to solve the items in the order they are listedwithin the navigation menu). A solution to this problemis to vary the order of the items as listed in the learningenvironment. For instance, if all learners select someitems from the first topic, less items from the secondtopic, even less items from the third topic, etc., system-atically rotating the order in which the topics are pre-sented in the navigation menu (last topic comes first,first second, second third and so on) could lead to a

calibration design similar to the one presented in Fig 3.Applying this incomplete calibration design with ablock-interlaced anchor within learning environmentshas two main advantages: an anchor–item effect isachieved while the number of administrations acrossitems is kept more equal and both the course creator andthe learner have some control over the item administra-tion. A drawback is that it is still possible, though lesslikely, to end up with a data set with non-overlappingparts.

The amount of data refers to the data-centric charac-ter of IRT models, which implies that enough dataare required to estimate the item parameters. Someresearchers have used prior calibration in learning envi-ronments for item difficulty estimation (Tai et al. 2001;Guzmán & Conejo 2005; Chen & Duh 2008). However,this is a time-consuming and costly procedure andtherefore less appropriate for learning environmentswith lots of items. Alternatively, online calibration canbe employed in which both the learner’s knowledgelevel and the difficulty level of the new items areestimated during the learning sessions. Makransky andGlas (2010) explored a continuous updating strategythat yields a low mean average error for the abilityestimate. In this continuous updating strategy, items are

Item set 1 Item set 2 Item set 3 Item set 4 Item set 5

Learner 1

Learner 2

Learner 3

Learner 4

Fig 2 A common items design. The blackcells represent the common items. Thelighter cells represent the non commonitems.

Item set 1 Item set 2 Item set 3 Item set 4 Item set 5

Learner Sample 1

Learner Sample 2

Learner Sample 3

Learner Sample 4

Learner Sample 5

Fig 3 A block-interlaced anchoringdesign. The cells range from dark grey topale grey. The lighter the cells, the smallerthe number of items belonging to thatitem set that are made by each individuallearner within the learner sample and thesmaller the overlap across the differentlearners in items made.



initially randomly selected. After each exposure, theitems are (re)calibrated. Only when an item is adminis-tered a certain amount of times does it become eligiblefor adaptive item selection. A second issue is that morepersons are required to achieve the same level of preci-sion with larger item banks. Besides, when the numberof available items becomes large, which is the case inour item-based ITS, it will take longer for randomlyselected items to be administered a certain amount oftimes in order to become eligible for adaptive itemselection. A method to influence the administration orexposure rate of items, such that the amount of dataneeded for reliable item calibration is obtained fasterthan with random item selection, is by applying anitem exposure control algorithm. Item exposure controlmethods are often applied in CAT (Georgiadou et al.2007). The underlying philosophy of CAT, namelyselecting items that provide the most information aboutthe current ability estimate, leads to overexposure of themost informative items, while other items, whichprovide less information, are rarely selected. Severalitem exposure control strategies have been proposed toprevent overexposure of some items and to increase theuse rate of rarely or never selected items (Georgiadouet al. 2007). Therefore, in CAT, the objective of suchitem exposure control strategies is to equalize the fre-quency with which all items are administered. Yet, theobjective of item exposure control strategies in item-based ITSs would rather be increasing the administra-tion frequency of items that are close to being reliablycalibrated. A major advantage of item exposure controlstrategies in item-based ITSs is the decrease in thetime needed for items to become eligible for adaptiveitem selection. An item exposure control method thatimposes item-ineligibility constraints on the assemblyof shadow tests seems to be a reasonable solution thatcan be applied within item-based ITSs (Van der Linden& Veldkamp 2004, 2007). So far, this method is onlyimplemented in testing environments. In this item expo-sure control strategy, the selection of each new item ispreceded by an online assembly of a shadow test. Thisimplies that items are not selected directly from the itempool but from a shadow test. The shadow test is a full-sized test that is optimal at the person’s current abilityestimate and contains all items, administered or not, thatmeet all the constraints of the adaptive test. The optimalitem at the ability estimate from the free items in theshadow test is subsequently selected for administration.

After the item is administered, the items in the shadowtest that are not yet administered are returned to the itempool, the ability estimate is updated and the procedureis repeated. An asset of this method is that it can dealwith a large range of constraints as long as thereis a computer algorithm available. Hence, an item-ineligibility constraint can be implemented that reducesthe exposure of overexposed items, that has a positiveeffect on the exposure rate of underexposed items(Van der Linden & Chang 2003; Van der Linden & Veld-kamp 2004) or that, in the light of item-based ITSs, hasa positive effect on the exposure rate of items that havean exposure rate close to one required for reliable itemcalibration.

Skipped itemsBesides the large amount and pattern of missing values,intentionally skipping items may pose problems. InFranel, learners are not constrained to answer eachselected item. The decision of skipping an item maydepend on the learner’s knowledge level and on theitem difficulty, making the missing data not missing atrandom or missing completely at random. Research hasindicated that treating the omitted responses as wrongis not appropriate (Lord 1983; Ludlow & O’Leary1999; De Ayala et al. 2001). Goegebeur et al. (2006)proposed an IRT model for dealing with this kindof omitted responses, more specifically, responsesmissing not at random (MNAR) in a testing environ-ment. This IRT model models missing observationssimultaneously with the observed responses by threat-ening the data as if there were three response catego-ries: correct, incorrect and no response. The modelalso accounts for test speed. Goegebeur et al. (2006)applied this model to the Chilean Sistema di Mediciónde la Calidad de la Educación mathematics test data setand results show that the parameter estimates areimproved with this model compared with the estimatesbased on an extended one-parameter logistic model ofcomplete profiles. However, the authors emphasize thatcaution is in order since the results can be stronglyaffected by the mechanism underlying the MNARframework.

The challenge of the algorithm

Next to the challenge of the data set, the IRT applicationin learning environments is confronted with a second



challenge, namely the challenge of the algorithm. Thischallenge is partly derived from the difference in thedata-gathering procedure between learning and testingenvironments and partly derived from the difference inthe objective of learning and testing environments. Wedivide the challenge of the algorithm into three sub-problems, namely, the problem in estimating the itemdifficulty level, the problem in estimating the learner’sability level and the problem in defining the item selec-tion algorithm.

Item difficulty estimationBecause of the different problems facing IRT-baseditem calibration in learning environments that arisefrom the data-gathering procedure, more specifically,the missing values and the skipped items, otherapproaches to estimate the item difficulty level havebeen proposed.

A simple approach to estimate item difficulty that isnot directly related to the IRT model is the number oflearners who have answered the item correctly dividedby the number of learners who have answered the item.This proportion of correct answers has the benefit thatit is not based on a prior study but can be calculatedonline. The lower the proportion of persons whoanswer an item correct, the more difficult the item is.Johns et al. (2006) have compared the item difficultylevels obtained by training an IRT model with the pro-portion correct answers. Even though the correlationbetween those two measures of item difficulty across70 items was relatively high (r = +0.68), the proportionof correct answers is subject to the knowledge level ofthe learners who have answered that item and to theamount of learners who have answered that item.Hence, although attractive due to its simplicity, theapproach throws away an important strength of IRT:the proportion of correct answers for an item dependson the sample of persons who answered the item, andtherefore, these proportions are only comparable overitems if the group of persons is comparable. Anothermethod that has been used to estimate item difficulty inAHs is by means of the learner’s feedback on simplequestions after each item (e.g. Chen et al. 2005, 2006;Chen & Duh 2008), such as ‘Do you understand thecontent of the recommended course materials?’After alearner has given feedback, scores are aggregated withthose of other learners who previously answered thisquestion. The new difficulty level of the course mate-

rial is based on a weighted linear combination of thecourse difficulty as defined by course experts and thecourse difficulty determined from collaborative feed-back of the learners. The difficulty parameters slowlyapproach a steady value as the number of learnersincreases. Another approach to obtain item parameterestimates is allowing subject domain experts to esti-mate the value of the difficulty parameter (Yao 1991;Linacre 2000; Fernandez 2003; Lu et al. 2007). Thereis some evidence in the measurement literature that testspecialists are capable of estimating item difficultieswith reasonable accuracy (e.g. Chalifour & Powers1989), although other studies found contradictoryresults (Hambleton et al. 1998). Another method thathas been used in testing environments to estimate theitem difficulty level of new items is the paired compari-son method (Ozaki & Toyoda 2006, 2009). In thismethod, items for which the difficulty parameter has tobe estimated are compared with multiple items, ofwhich the item difficulty parameter is known. This isperformed sequentially or simultaneously, i.e. a one-to-one comparison or a one-to-many comparison, respec-tively. Learner feedback, expert feedback and pairedcomparison have two common limitations. First, thesethree alterative estimation approaches are less appli-cable for the calibration of an entire, large, alreadyexisting item bank since asking the learner for feed-back after each item administration requires bothconsiderable time and mental effort, possibly interrupt-ing the learning process. However, these estimationapproaches seem to be viable solutions for estimatingthe item difficulty level of new added items. Second,these estimation approaches are more subjective thanIRT-based calibration.

Ability estimationThe objective of adaptive learning environments is tofoster efficient learning by providing learners a person-alized course/item sequence. Obtaining the learner’sability estimate at an early stage of the learning processand following the learning curve adequately is requiredin order for adaptive learning environments to be effi-cient. Hence, the problem of the ability estimation istwofold. On the one hand, we need to estimate the learn-er’s ability level when little information is provided,which is referred to as the cold start problem. On theother hand, we need to be able to follow the progressionof the learner.



The cold start problem refers to the situation where alearner starts working in the learning environment.After having answered a few items, the ability estimateis not very accurate, reducing the efficiency of adaptiveitem selection. Therefore, two partial solutions can beprovided. On the one hand, it can be feasible to use per-sonal information, such as previous education, occupa-tion, age and gender to obtain and improve the learner’sability estimate. This technique can be extended byassigning weights to these criteria to model their rela-tive importance (Masthoff 2004). On the other hand, it isadvisable to apply other item selection methods than theone based on maximizing the Fisher information at thecurrent estimated ability level because using Fisherinformation when the estimated ability level is not closeto the true ability level could be less efficient thanassumed.Apotential solution to this problem is modify-ing the item selection algorithm by taking into consider-ation the uncertainty of the estimated ability level.Several modified item selection algorithms have beenproposed: Fisher interval information (Veerkamp &Berger 1997), Fisher information with a posterior distri-bution (Van der Linden 1998), Kullback–Leibler infor-mation (Chang & Ying 1996) and Kullback–Leiblerinformation with a posterior distribution (Chang &Ying1996). For a comparison of these item selection algo-rithms, see Chen et al. (2000).

The problem of the evolving ability level arisesbecause learning is likely to take place on the basis ofhints and feedback. Besides the change in the abilitylevel within a learning session, the ability of the learnermight increase between two successive learning ses-sions, for instance, due to learning in other courses theyfollow, or might decrease, for instance, due to forget-ting. This monitoring of the learner’s progress is animportant research focus in the field of educationalmeasurement. One method is progress testing. Inprogress testing, tests are frequently administered toallow for a quick intervention when atypical growth pat-terns are observed. However, a major drawback of thisapproach is that it requires regular ability assessmentsthat should be long enough to be accurate, making it lessappropriate for learning environments. Another methodto model the learner’s ability progress is by updating thelearner’s knowledge level after each item administra-tion as is the case in the Elo rating system (Elo 1978),which was recently implemented in the educationalfield (Brinkhuis & Maris 2009). In item-based ITSs, the

Elo rating system can be seen as an instance of pairedcomparison where the learner is seen as a player and theitem is seen as its opponent. In the formula of Brinkhuisand Maris (2009), the new ability rating after an itemadministration is function of the pre-administrationrating, a weight given to the new observation and thedifference between the actual score on the new observa-tion and the expected score on the new observation. Thisexpected score is calculated by means of the Raschmodel. This means that when the difference between theexpected score and the observed score is high, thechange in the ability estimate is high. A merit of thisalgorithm is that it makes it possible to quickly followchanging knowledge levels and rapidly perceive anatypical growth pattern.

Item selection algorithmThe difference in objectives between a testing and alearning environment also asks for a revision of theitem selection algorithm as it is implemented in testingenvironments.

The objective of a testing environment is to measureas precisely as possible the person’s ability level. It canbe shown that for the Rasch model, items with asuccess probability of 50% are optimally informative.However, the administration of items for which theperson has a success rate of only 50% can decrease theperson’s motivation (Andrich 1995) and increase testanxiety (Rocklin & Thompson 1985), possibly result-ing in an underestimation of the person’s ability (Betz& Weiss 1976; Rocklin & Thompson 1985). Adminis-tering easier items in order to increase motivation anddecrease test anxiety is non-optimal from a psychomet-ric point of view. However, the impact on the measure-ment precision and test length is modest (Bergstormet al. 1992; Eggen & Verschoor 2006). Another methodapplied in CAT resulting in a decrease in test anxietyand an increase in motivation and performance isallowing the person to select the item difficulty level ofthe next item from among a number of difficulty levels(Wise et al. 1992; Vispoel & Coffman 1994; Rooset al. 1997). Although most persons choose a difficultylevel that lies close to their estimated ability level(Johnson et al. 1991; Wise et al. 1992), a problem withthese self-adapted tests still remains that a few personschoose a difficulty level that is not well-matched totheir ability, resulting in a flaw in the psychometricdemands.



In contrast, the objective of a learning environment isto optimize learning efficiency. The effect of motivationherein is even of greater importance than in testingand especially than in high-stake testing. Unmotivatedlearners will likely stop using the learning environment.Hence, especially for learning environments items forwhich the person has a success probability of above50% should be considered. Next to increasing thesuccess probability in order to increase motivation, itis possible to give the learner some control over theitem difficulty level, comparable to self-adapted tests.Besides motivation, the learning outcome should alsobe kept optimal. Research results regarding the effect ofitem difficulty on the learning progress are not conclu-sive. Some studies have indicated that learners learnmore from easy items (Pavlik & Anderson 2008), whilesome theories suggest that learners learn more fromitems that are slightly more difficult given the learner’sability level (Hootsen et al. 2007), and yet, other studieshave indicated that learners learn as much from easygroups of learning opportunities as from difficult ones(Feng et al. 2008).

Discussion and conclusion

Because IRT has been a prevalent approach for adaptiveitem sequencing in testing environments, the questionthat is tackled within this paper is whether IRT is suit-able for adaptive item sequencing in item-based ITSs.Based on own experiences with item-based ITSs andideas borrowed from testing environments and othertypes of learning environments, we found some chal-lenges in applying IRT for adaptive item sequencing initem-based ITSs and made some suggestions to handlethese challenges (see Table 1).

A first main challenge that is brought up is that thestructure of the data set and the extent of missing valuescan make item difficulty estimation problematic. Solu-tions that are suggested include the implementation of acalibration design, such as a common items design and ablock-interlaced anchoring design. The former has theadvantages that neither the equivalence of learners northe equivalence of items needs to be assumed. However,the common items are administered more frequentlyand are therefore more reliably estimated than the non-common items. The latter has the same advantages asthe common items design without its disadvantage.Because learners seem to like some control over the

learning system (e.g. select the chapter), a block-interlaced anchoring design seems to be preferable initem-based ITSs as it provides both the course creatorand the learner some control over the item admini-stration. Furthermore, an online calibration method toevolve from random item administration to adaptiveitem administration is suggested. The advantage of thiscontinuous updating strategy is that items can easily beadded in the learning environment without the require-ment of prior calibration while keeping the measure-ment error of the learner’s knowledge level low. Finally,an item exposure control algorithm can be specified thatmakes items for which the item exposure rate is close tothe one required for reliable item difficulty estimation,eligible for adaptive item administration. This itemexposure control method that imposes item ineligibilityconstraints has the benefit of reducing the time neededfor item calibration. Combining the proposed solutionsmight also be considered. For example, when we wantto measure a learner’s ability level while calibrating anitem bank, it might be reasonable to combine a continu-ous updating calibration strategy with an item exposurecontrol algorithm. In such situations, a part of the learn-ing session is composed of randomly selected itemswith an unknown difficulty level. In another part, itemsare selected that have been administered already acertain amount of times but from which the difficultylevel cannot yet be reliably estimated. Items are selectedthat are closest to being reliably calibrated, and the itemdifficulty parameter estimate is updated after each itemadministration. In a last part, adaptive item selection canbe applied using items for which the difficulty level isalready sufficiently known. A difficulty is that we do notknow in advance how many items a learner will com-plete in one learning session.

A second main challenge, largely intertwined withthe challenge of the data set, concerns the implementedalgorithms, which is composed of the item difficultyestimation, the learner’s ability level estimation and theitem selection algorithm. The alternative applicabletechniques for item difficulty estimation that are pro-posed in this paper are proportion correct, learner’sfeedback, expert rating and paired comparison. Com-pared with IRT-based calibration, these alternative tech-niques are more sample-dependent (proportion correct)or more subjective because persons have to make a deci-sion (learner’s feedback, expert rating and paired com-parison). Besides, learner’s feedback, expert rating and



Tab

le1.

Cha

lleng

esan

dpr

opos

edso

lutio

nsof

appl

ying

IRT

fora

dapt

ive

item

sequ

enci

ngin

item

-bas

edIT

Ss.

Ch

alle

ng

ePr

ob

lem

Pro

po

sed

solu

tio

nA

dva

nta

ges

Dis

adva

nta

ges

Ap

plic

atio

ns

Ref

eren

ce

Dat

ase

tM

issi

ng

valu

es

Stru

ctu

reo

fd

ata

Co

mm

on

item

sd

esig

n•

No

assu

mp

tio

nn

eed

edo

nth

eeq

uiv

alen

ceo

fle

arn

ers

and

item

s•

No

n-e

qu

alad

min

istr

atio

no

fit

ems

•Th

eca

libra

tio

no

fan

item

ban

kEg

gen

1993

Blo

ck-i

nte

rlac

edan

cho

rin

gd

esig

n•

No

assu

mp

tio

nn

eed

edo

nth

eeq

uiv

alen

ceo

fle

arn

ers

and

item

s•

Mo

reeq

ual

adm

inis

trat

ion

of

all

item

s

•Th

eca

libra

tio

no

fan

item

ban

kEg

gen

1993

Am

ou

nt

of

dat

aC

on

tin

uo

us

up

dat

ing

stra

teg

y•

Easi

nes

sin

add

ing

item

sto

item

po

olw

ith

ou

tp

rio

rca

libra

tio

n•

Low

mea

sure

men

ter

ror

inab

ility

esti

mat

ion

•Le

ng

tho

fle

arn

ing

sess

ion

isp

refe

rab

lykn

ow

n•

The

calib

rati

on

of

anit

emb

ank

Mak

ran

sky

and

Gla

s20

10

Item

-in

elig

ibili

tyco

nst

rain

tso

nth

eas

sem

bly

of

shad

ow

test

s•

Fast

erca

libra

tio

n•

The

calib

rati

on

of

anit

emb

ank

Van

der

Lin

den

&V

eld

kam

p20

04,2

007

Skip

ped

item

sSi

mu

ltan

eou

sm

od

elin

go

fth

em

issi

ng

ob

serv

atio

ns

wit

hth

eo

bse

rved

resp

on

ses

•M

ore

accu

rate

par

amet

eres

tim

ates

•M

ore

com

ple

xm

od

elan

dh

ence

mo

red

ata

req

uir

edfo

res

tim

atio

n

•Es

tim

atio

no

fle

arn

er’s

abili

tyle

vel

Go

egeb

eur

etal

.200

6

Alg

ori

thm

Item

dif

ficu

lty

Pro

po

rtio

nco

rrec

t•

Alw

ays

esti

mab

le•

Sub

ject

tosa

mp

leo

fle

arn

ers

use

d•

Esti

mat

ion

of

item

dif

ficu

lty

leve

lw

hen

item

calib

rati

on

isn

ot

po

ssib

le•

Prio

rin

Bay

esia

nes

tim

atio

nal

go

rith

m

Joh

ns

etal

.200

6

Lear

ner

’sfe

edb

ack

•A

lway

ses

tim

able

•In

terr

up

tsle

arn

ing

pro

cess

•Ti

me

con

sum

ing

•Es

tim

atio

no

fit

emd

iffi

cult

yle

velo

fn

ewit

ems

•Pr

ior

inB

ayes

ian

esti

mat

ion

alg

ori

thm

Ch

enet

al.2

005

Exp

ert’

sfe

edb

ack

•A

lway

ses

tim

able

•Ti

me

con

sum

ing

•Es

tim

atio

no

fit

emd

iffi

cult

yle

velo

fn

ewit

ems

•Pr

ior

inB

ayes

ian

esti

mat

ion

alg

ori

thm

Lin

acre

2000

Pair

edco

mp

aris

on

•A

lway

ses

tim

able

•In

terr

up

tsle

arn

ing

pro

cess

•Ti

me

con

sum

ing

•Es

tim

atio

no

fit

emd

iffi

cult

yle

velo

fn

ewit

ems

•Pr

ior

inB

ayes

ian

esti

mat

ion

alg

ori

thm

Oza

ki&

Toyo

da

2006

,20

09

Ab

ility

leve

lW

eig

hte

dad

apta

tio

nb

ased

on

bac

kgro

un

dch

arac

teri

stic

s•

Ind

icat

ion

of

lear

ner

’sab

ility

leve

l•

Loss

of

som

ein

form

atio

n•

Ab

ility

leve

lest

imat

ion

on

the

bas

iso

ffe

wre

spo

nse

s•

Inco

mb

inat

ion

wit

hEl

ora

tin

gsy

stem

Mas

tho

ff20

04

Alt

ern

ativ

eit

emse

lect

ion

alg

ori

thm

•M

ore

effi

cien

tth

anFi

sher

info

rmat

ion

wh

enab

ility

leve

lis

un

kno

wn

•M

ore

rad

ical

than

wei

gh

ted

adap

tati

on

bas

edo

nb

ackg

rou

nd

char

acte

rist

ics

•A

bili

tyle

vele

stim

atio

no

nth

eb

asis

of

few

resp

on

ses

Ch

enet

al.2

000

Elo

rati

ng

syst

ems

•Pe

rcep

tio

no

fat

ypic

alle

arn

ing

pat

tern

s•

Easi

lyex

ten

dib

le

•Tr

acki

ng

of

abili

tyle

veli

nit

em-b

ased

lear

nin

gen

viro

nm

ents

Bri

nkh

uis

&M

aris

2009



paired comparison are less applicable for an entireitem bank calibration as it is time-consuming, and itcan interrupt the learning process. Nevertheless, allthe mentioned alternative estimation methods can beapplied for item difficulty estimation of new items. Fur-thermore, we can implement the item difficulty valuesthat are obtained with one of those alternative estima-tion methods into the IRT-based estimation algorithm,yielding a faster acquisition of reliable item difficultyestimates (Swaminathan et al. 2003). The algorithm toestimate the learner’s ability level faces two problems:the cold start problem and the problem of the evolvingability level. The former problem can be solved byweighted adaptation based on background characteris-tics and a modified item selection algorithm. The latterproblem can be solved by implementing the Elo ratingsystem to track the learner’s ability. The advantages ofthe Elo rating system are that it can quickly follow thechange in knowledge level and that it can easily beextended.An example of this last advantage is the incor-poration of background characteristics. When thelearner starts working in the learning environment, arealistic starting value of the learner’s knowledge levelbased on background characteristics can enhancethe estimation process in the Elo rating system. Theproblem of the item selection algorithm broaches thequestion of what item selection criterion should beimplemented and of whether learners should have somecontrol over it. The first question focuses on whether toselect items that maximize the measurement precisionand thereby leaving the learner with a probability of50% to answer an item correctly or whether to selectmore easy or difficult items. The answer to this questionshould be explored into more detail in further experi-mental research. The second question concerns thelearner’s control over the item selection algorithm.More specifically, should the learner be provided thefreedom to choose the item difficulty level of the nextitem or not? Studies in CAT have shown that giving aperson control leads to higher motivation. However,because the persons do not always select the item diffi-culty that matches their current ability estimate, themeasurement precision is biased. We could combinethis research result with the research results regardingthe modest impact of using psychometrically non-optimal items on the measurement precision. This couldyield an item-based ITS where learners have the possi-bility to select the item difficulty level of the next item

out of a specific item difficulty range. This difficultyrange does not lead to a high impact on the measurementprecision. In such a situation, the learners perceive thatthey have some control over the learning environment,which might lead to higher motivation.

We can conclude that IRT is potentially a valuablemethod to adapt the item sequencing in item-based ITSsto the learner’s knowledge level. Applying IRT to learn-ing environments, however, is not straightforward. Inthis paper, we wanted to identify associated problemsand suggested solutions based on the literature ontesting and learning environments, whether they wereIRT-based or not. Further research in which these ideasare empirically evaluated is required.

References

Andrich D. (1995) Review of the book computerized adaptivetesting: a primer. Psychometrika 4, 615–620.

Baylari A. & Montazer G.A. (2009) Design a personalizede-learning system based on item response theory and artifi-cial neural network approach. Expert Systems with Applica-tions 36, 8013–8021.

Bergstorm B.A., Lunz M.E. & Gershon R.C. (1992) Alteringthe level of difficulty in computer adaptive testing. AppliedMeasurement in Education 5, 137–149.

Betz N.E. & Weiss D.J. (1976) Psychological Effect of Imme-diate Knowledge of Results and Adaptive Ability Testing.(Rep. No. 76-4). University of Minnesota, Department ofPsychology, Psychometric Methods Program, Minneapo-lis, MN.

Bloom B.S. (1984) The 2-sigma problem: the search formethods of group instruction as effective as one-to-onetutoring. Educational Researcher 13, 4–16.

Brinkhuis M.J.S. & Maris G. (2009) Dynamic ParameterEstimation in Student Monitoring Systems. Measurementand Research Department Reports (Rep. No. 2009-1). Cito,Arhnem.

Brusilovsky P. (1999) Adaptive and intelligent technologiesfor Web-based education. Künstliche Intelligenz 4,19–25.

Brusilovsky P. & Peylo C. (2003) Adaptive and intelligentWeb-based educational systems. International Journal ofArtificial Intelligence in Education 13, 156–169.

Chalifour C.L. & Powers D.E. (1989) The relationship ofcontent characteristics of GRE analytical reasoning itemsto their difficulties and discriminations. Journal of Educa-tional Measurement 26, 120–132.

Chang H.H. & Ying Z.L. (1996) A global informationapproach to computerized adoptive testing. AppliedPsychological Measurement 20, 213–229.



Chen C.M. & Chen M.C. (2009) Mobile formative assessmenttool based on data mining techniques for supporting Web-based learning. Computers & Education 52, 256–273.

Chen C.M. & Chung C.J. (2008) Personalized mobile Englishvocabulary learning system based on item response theoryand learning memory cycle. Computers & Education 51,624–645.

Chen C.M. & Duh L.J. (2008) Personalized Web-based tutor-ing system based on fuzzy item response theory. ExpertSystems with Applications 34, 2298–2315.

Chen C.M., Hsieh Y.L. & Hsu S.H. (2007) Mining learnerprofile utilizing association rule for Web-based learningdiagnosis. Expert Systems with Applications 33, 6–22.

Chen C.M. & Hsu S.H. (2008) Personalized intelligent mobilelearning system for supporting effective English learning.Educational Technology & Society 11, 153–180.

Chen C.M., Lee H.M. & Chen Y.H. (2005) Personalizede-learning system using item response theory. Computers& Education 44, 237–255.

Chen C.M., Liu C.Y. & Chang M.H. (2006) Personalized cur-riculum sequencing utilizing modified item response theoryfor Web-based instruction. Expert Systems with Applica-tions 30, 378–396.

Chen S.Y., Ankenmann R.D. & Chang H.H. (2000) Acomparison of item selection rules at the early stages ofcomputerized adaptive testing. Applied PsychologicalMeasurement 24, 241–255.

Collins J.A., Greer J.E. & Huang S.X. (1996) Adaptive assess-ment using granularity hierarchies and Bayesian nets. InProceedings of the Third International Conference on Intel-ligent Tutoring Systems (eds C. Frasson, G. Gauthier & A.Lesgold), pp. 569–577. Springer-Verlag, London.

Conati C., Gertner A. & Vanlehn K. (2002) Using Bayesiannetworks to manage uncertainty in student modeling. UserModeling and User-Adapted Interaction 12, 371–417.

Conejo R., Guzmán E., Millán M.T.E., Trell M., Perez-de-la-Cruz J.L. & Rios A. (2004) SIETTE: a Web-based tool foradaptive testing. International Journal of Artificial Intelli-gence in Education 14, 29–61.

De Ayala R.J., Plake B.S. & Impara J.C. (2001) The impact ofomitted responses on the accuracy of ability estimation initem response theory. Journal of Educational Measurement38, 213–234.

Desmet P. (2006) L’apprentissage/enseignement des languesà l’ère du numérique: tendances récentes et défis. RevueFrançaise de Linguistique Appliquée 11, 119–138.

Eggen T.J.H.M. (1993) Itemresponstheorie en onvolledigegegevens. In Psychometrie in de Praktijk (eds T.J.H.M.Eggen & P.F. Sanders), pp. 239–284. Cito, Arhnem.

Eggen T.J.H.M. & Verschoor A.J. (2006) Optimal testingwith easy or difficult items in computerized adaptive

testing. Applied Psychological Measurement 30, 379–393.

Elo A.E. (1978) The Rating of Chess Players, Past andPresent. B.T. Batsford Ltd., London.

Feng M., Heffernan N., Beck J.E. & Koedinger K. (2008)Can we predict which groups of questions students willlearn from? In Proceedings of the 1st International Con-ference on Education Data Mining (eds R.S.J.D. Baker& J.E. Beck), pp. 218–225. Retrieved from http://www.educationaldatamining.org/EDM2008/uploads/proc/full%20proceedings.pdf.

Fernandez G. (2003) Cognitive scaffolding for a Web-basedadaptive learning environment. In Advances in Web-BasedLearning, Lecture Notes on Computer Science (eds G.Goos, J. Hartmanis & J. van Leeuwen) 2783, pp. 12–20.Springer, Berlin.

Georgiadou E., Triantafillou E. & Economides A.A. (2007) Areview of item exposure control strategies for computerizedadaptive testing developed from 1983 to 2005. Journal ofTechnology, Learning and Assessment 5, 4–38.

Goegebeur Y., De Boeck P., Molenberghs G. & Pino G. (2006)A local-influence-based diagnostic approach to a speededitem response theory model. Journal of the Royal StatisticalSociety Series C, Applied Statistics 55, 647–676.

Guzmán E. & Conejo R. (2005) Self-assessment in a feasible,adaptive Web-based testing system. IEEE Transactions onEducation 48, 688–695.

Hambleton R.K., Bastari B. & Xing D. (1998) EstimatingItem Statistics. Laboratory of Psychometric and EvaluativeResearch (Rep. No. 298). University of Massachusetts,School of Education, Amherst, MA.

Hootsen G., van der Werf R. & Vermeer A. (2007) E-learningop maat: automatische geïndividualiseerde materiaalselec-tie in het tweede-taalonderwijs. Toegepaste Taalwetensc-hap in Artikelen 78, 119–130.

Huang S.X. (1996) A content-balanced adaptive testing algo-rithm for computer-based training systems. In IntelligentTutoring Systems, Lecture Notes in Computer Science(eds C. Frasson, G. Gauthier & A. Lesgold), pp. 306–314.Springer, Heidelberg.

Johns J., Mahadevan S. & Woolf B. (2006) Estimating studentproficiency using an item response theory model. Intelli-gent Tutoring Systems, Lecture Notes in Computer Science4053, 473–480.

Johnson P.L., Roos L.L., Wise S.L. & Plake B.S. (1991)Correlates of examinee item choice behavior in self-adapted testing. Mid-Western Educational Researcher 4,25–29.

Kelly D. & Tangney B. (2006) Adapting to intelligent profilein an adaptive educational system. Interacting with Com-puters 18, 385–409.



Lee M.G. (2001) Profiling students’ adaptation styles in Web-based learning. Computers & Education 36, 121–132.

Leung E.W.C. & Li Q. (2007) An experimental study of apersonalized learning environment through open-sourcesoftware tools. IEEE Transactions on Education 50, 331–337.

Linacre J.M. (2000) Computer-adaptive testing: a metho-dology whose time has come. In Development ofComputerized Middle School Achievement Test (eds S.Chae, U. Kang, E. Jeon & J.M. Linacre), pp. 3–58. KomesaPress, Seoul.

Lord F.M. (1970) Some test theory for tailored testing. InComputer-Assisted Instruction, Testing, and Guidance (ed.W.H. Holtzman), pp. 139–183. Harper & Row, NewYork.

Lord F.M. (1983) Maximum-likelihood estimation of itemresponse parameters when some responses are omitted.Psychometrika 48, 477–482.

Lu F., Li X., Liu Q.T., Yang Z.K., Tan G.X. & He T.T. (2007)Research on personalized e-learning system using fuzzy setbased clustering algorithm. Computational Science – ICCS2007 4489, 587–590.

Ludlow L.H. & O’Leary M. (1999) Scoring omitted andnot-reached items: practical data analysis implications.Educational and Psychological Measurement 59, 615–630.

Makransky G. & Glas C.A.W. (2010) Bootstrapping an itembank: an automatic online calibration design in adaptivetesting. Journal of Applied Testing Technology (in press).

Masthoff J. (2004) Group modeling: selecting a sequence oftelevision items to suit a group of viewers. User Modelingand User-Adapted Interaction 14, 37–85.

Millán E., Garcia-Hervas E., Riscos E.G.D., Rueda A. &Perez-de-la-Cruz J.L. (2003) TAPLI: an adaptive Web-based learning environment for linear programming.Current Topics in Artificial Intelligence 3040, 676–685.

Millán E., Perez-de-la-Cruz J.L. & Suarez E. (2000) AdaptiveBayesian networks for multilevel student modeling. Intelli-gent Tutoring Systems, Proceedings 1839, 534–543.

Ozaki K. & Toyoda H. (2006) Paired comparison IRT modelby 3-value judgment: estimation of item parameters prior tothe administration of the test. Behaviormetrika 33, 131–147.

Ozaki K. & Toyoda H. (2009) Item difficulty parameter esti-mation using the idea of the graded response model andcomputerized adaptive testing. Japanese PsychologicalResearch 51, 1–12.

Paramythis A. & Loidl-Reisinger S. (2004) Adaptive learningenvironment and e-learning standards. Electronic Journalof eLearning 2, 181–194.

Pavlik P.I. & Anderson J.R. (2008) Using a model to computethe optimal schedule of practice. Journal of ExperimentalPsychology. Applied 14, 101–117.

Pérez-Marín D., Alfonseca E. & Rodriguez P. (2006) On thedynamic adaptation of computer assisted assessment offree-text answers. In Adaptive Hypermedia and AdaptiveWeb Based Systems, Lecture Notes on Computer Science(eds V. Wade, H. Ashman & B. Smith) 4018, pp. 374–377.Springer, Berlin.

Rasch G. (1960) Probabilistic Models for Some Intelligenceand Attainment Tests. Institute of Educational Research,Copenhagen.

Razzaq L., Feng M., Nuzzo-Jones G., Heffernan N.T.,Koedinger K.R., Junker B., Ritter S., Knight A., AniszczykC., Choksey S., Livak T., Mercado E., Turner T.E., Upale-kar R., Walonoski J.A., Macasek M.A. & Rasmussen K.P.(2005) The assistment project: blending assessment andassisting. In Proceedings of the 12th International Confer-ence on Artificial Intelligence in Education (eds C.K. Looi,G. McCalla, B. Bredeweg & J. Breuker), pp. 555–562. ISOPress, Amsterdam.

Rocklin T. & Thompson J.M. (1985) Interactive effects of testanxiety, test difficulty, and feedback. Journal of Educa-tional Psychology 77, 368–372.

Roos L.L., Wise S.L. & Plake B.S. (1997) The role ofitem feedback in self-adapted testing. Educational andPsychological Measurement 57, 85–98.

Swaminathan H., Hambleton R.K., Sireci S.G., Xing D.H. &Rizavi S.M. (2003) Small sample estimation in dichoto-mous item response models: effect of priors based onjudgmental information on the accuracy of item parameterestimates. Applied Psychological Measurement 27,27–51.

Tai D.W.-S., Tsai T.-A. & Chen F.M.-C. (2001) Performancestudy on learning Chinese keyboarding skills using theadaptive learning system. Global Journal of EngineeringEducation 5, 153–161.

Van der Linden W.J. (1998) Bayesian item selection criteriafor adaptive testing. Psychometrika 63, 201–216.

Van der Linden W.J. & Chang H.H. (2003) Implementingcontent constraints in alpha-stratified adaptive testing usinga shadow test approach. Applied Psychological Measure-ment 27, 107–120.

Van der Linden W.J. & Glas C.A.W. (2000) ComputerizedAdaptive Testing: Theory and Practice. Kluwer, Norwell,MA.

Van der Linden W.J. & Hambleton R.K. (1997) Handbookof Modern Item Response Theory. Springer, NewYork.

Van der Linden W.J. & Veldkamp B.P. (2004) Constrainingitem exposure in computerized adaptive testing withshadow tests. Journal of Educational and BehavioralStatistics 29, 273–291.

Van der Linden W.J. & Veldkamp B.P. (2007) Condi-tional item-exposure control in adaptive testing using



item-ineligibility probabilities. Journal of Educational andBehavioral Statistics 32, 398–418.

Veerkamp W.J.J. & Berger M.P.F. (1997) Some new itemselection criteria for adaptive testing. Journal of Educa-tional and Behavioral Statistics 22, 203–226.

Verdú E., Regueras L.M., Verdú M.J., De Castro J.P. & PérezM.A. (2008) An analysis of the research on adaptive learn-ing: the next generation of e-learning. WSEAS Transactionson Information Science and Applications 5, 859–868.

Vispoel W.P. & Coffman D.D. (1994) Computerized-adaptiveand self-adapted music-listening tests: psychometricfeatures and motivational benefits. Applied Measurement inEducation 7, 25–51.

Wainer H. (2000) Computerized Adaptive Testing: A Primer.Erlbaum, London.

Wainer H. & Mislevy R.J. (2000) Item response theory, itemcalibration, and proficiency estimation. In ComputerizedAdaptive Testing: A Primer (ed. H. Wainer), pp. 61–100.Erlbaum, London.

Wasson B. (1993) Automating the development of intelligentlearning environments: a perspective on implementationissues. In Automated Instructional Design, Developmentand Delivery (ed. R.D. Tennyson), pp. 153–170. Springer,Berlin.

Wise S.L., Plake B.S., Johnson P.L. & Roos L.L. (1992)A comparison of self-adapted and computerized adap-tive tests. Journal of Educational Measurement 29, 329–339.

Yao T. (1991) CAT with a poorly calibrated item bank. RaschMeasurement Transactions 5, 141.



Copyright of Journal of Computer Assisted Learning is the property of Wiley-Blackwell and its content may not

be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written

permission. However, users may print, download, or email articles for individual use.

Documents

Adaptive Item-based Learning Environments