IRT Fixed Parameter Calibration and Other Approaches to Maintaining Item Parameters on a Common Ability Scale Seonghoon Kim, PhD Keimyung University Email:

IRT Fixed Parameter Calibration and Other Approaches to Maintaining Item Parameters on a Common Ability Scale

Seonghoon Kim, PhDKeimyung University

Email: [email protected]

Presented at Measured Progresson July 10, 2008

2

Overview

I. Nature of IRT Ability Scale II. Three Approaches to Maintaining

Item Parameters on a Common Scale

III. Principle of Fixed Parameter Calibration (FPC)

IV. Use of Computer Programs for FPC V. Applications of FPC for Scaling and

Equating

3

Reference Guide

This presentation was prepared based on my articles, Kim, S. (2006a). A comparative study of IRT fixed parameter

calibration methods. Journal of Educational Measurement, 43 (4), 355-381.

Kim, S. (2006b). A study on IRT fixed parameter calibration methods using BILOG-MG. Journal of Educational Evaluation, 19 (1), 323-342.

Kim, S., & Kolen, M. J. (2006). Robustness to format effects of IRT linking methods for mixed-format tests. Applied Measurement in Education, 19 (4), 357-381.

Kim, S., & Lee, W. (2006). An extension of four IRT linking methods for mixed-format tests. Journal of Educational Measurement, 43 (1), 53-76.

Kim, S., & Kolen, M. J. (2007). Effects on scale linking of different definitions of criterion functions for the IRT characteristic curve methods. Journal of Educational and Behavioral Statistics, 32 (4), 371-397.

and my recent thoughts and works on FPC

4

I. Nature of IRT Ability ScaleIndeterminacy in IRT modeling

Item response function (IRF) and metrics Two-parameter logistic (2PL) model IRF = P(θ | a, b) = 1/[1+exp(-Da(θ-b))] Suppose that θO = A θN + B If aO = aN /A and bO = A bN + B,

P(θO | aO, bO) = P(θN | aN, bN)

Therefore, IRF and item parameters are invariant conditional on linear transformation

Thus, in practice, either θO or θN can be used, which means scale indeterminacy.

5

I. Nature of IRT Ability Scale“0, 1” Scaling vs. Rasch Scaling

“0, 1” scaling Scaling by arbitrarily assuming that the mean

(M) and standard deviation (SD) of the ability distribution are equal to 0 (origin) and 1 (unit).

Such arbitrary but “standardized” fixing is unavoidable when the M and SD are unknown.

Rasch scaling Setting the origin (0) of the scale at the

average difficulty of all items involved, while fixing the unit at 1.

The fixed unit is guaranteed by the Rasch modeling.

6

I. Nature of IRT Ability ScaleNeed for a Fixed Common Ability Scale

A fixed common scale should be used across test administrations for several reasons To check the invariance property of

item parameters To achieve comparability between item

parameters from different administrations

To develop an item pool To conduct IRT equating

7

To develop a common ability scale requires all new scales to be linked to the fixed old scale θO.

θN1

θN2

θN3

θO

I. Nature of IRT Ability ScaleNeed for a Fixed Common Ability Scale

8

I. Nature of IRT Ability ScaleFactors for Development of a Common Scale

Development of a fixed common scale is subject to Data collection design for IRT scaling and

equating test forms Random groups design vs. Common-item

nonequivalent groups design Scaling convention

“0,1” scaling vs. Rasch scaling Item parameter estimation method

Marginal maximum likelihood (MML) estimation vs. Joint maximum likelihood (JML) estimation

9

The ContextAssumed in This Presentation

Data collection design for IRT scaling and equating test forms

Common-item nonequivalent groups (CING) design Anchor items (i.e., common items) link two test forms

Scaling convention “0, 1” scaling

Group dependent In a random groups design, two “0, 1” scales from

alternative forms may be considered equivalent. Marginal Maximum Likelihood (MML) Estimation

Estimation of Item parameters Estimation of Underlying Ability Distribution

Quadrature weights are estimated at quadrature points.

10

Data Structure Illustration for the CING Design

Old Form

(Group 1)

New Form

(Group 2)

New Form Unique Items

to New Group (2)

Old Form Unique Items

to Old Group (1)

Items

Common Items (Anchor)

to Old and New Groups

11

II. Three Approaches to Maintaining a Common Scale

Separate calibration by form and linking Estimate transformation coefficients A and B using two sets

of item parameter estimates for the anchor items Use A and B to transform new form item parameter

estimates into those on the old scale Fixed parameter calibration (FPC)

Holding the old form anchor item parameters fixed and estimating the new form non-anchor items

Concurrent calibration (aka multiple-group estimation)

Combining new and old form data and estimating both all item parameters and underlying ability distributions, with the old group being designated as the reference-scale group

Will not be addressed in details in this presentation

12

II. Maintaining the Old ScaleSeparate Calibration by Form and Linking

“0, 1” scales from two test forms Old form scale: θO (reference) New form scale: θN (arbitrary)

Scheme of linking two “0, 1” scales θO = A θN + B

θN (arbitrary origin & unit)

θO (fixed origin & unit)

-1 10

-1 10B

A

13

Linking ability scales is completed by placing all item parameters from separate calibrations onto the fixed old scale.

In the case of the 2PL model, given A and B, aN and bN parameters from a new scale are transformed into a* = aN /A and b* = A bN + B

In practice, A and B are estimated with item parameter estimates from the old and new scales.

Mean-Sigma Method (Marco, 1977) Mean-Mean Method (Loyd & Hoover, 1980) Haebara Method (Haebara, 1980) Stocking-Lord Method (Stocking & Lord, 1983)

II. Maintaining the Old ScaleSeparate Calibration by Form and Linking

14

Suppose that the characteristic curve (Haebara or Stocking-Lord) method is employed as a linking method for the “separate calibration and linking” approach.

The performance of the three alternative approaches to maintaining the old scale is differential depending on whether the new form items are common or not (Hanson & Béguin, 2002; Kim, 2006b; Kim & Kolen, in process).

For the common items, concurrent calibration would perform best, due mainly to larger sample size (new group + old group), compared to the non-common items.

For the non-common items, the three approaches would perform almost equally.

II. Maintaining the Old ScaleComparative Performance

Method Unique Items Anchor Items

Separate Calibration and Linking No Yes *

Concurrent Calibration Yes ** Yes **

Fixed Parameter Calibration No No

form anchor items

** Old form item parameters can be changed by the inclusion of new form

items, which may be remarkable for anchor items.

Possibility of Old Form Item Parameters Being Replaced

(When New Form Items Are Calibrated)

Note. It is assumed that old form item parameters were obtained before.

* Parameters of old form anchor items can be replaced with those of new

II. Maintaining the Old ScaleComparative Performance

16

When using the “stable” old form anchor item parameters to obtain or diagnose the parameters of new form non-anchor items on the fixed old scale

Note Placing the parameters of new form non-

anchor items on the old scale is the focus. Updating of the old form item parameters is

not concerned at all. The old form anchor items are assumed to

have stable parameter estimates because a large sample was used for obtaining them.

II. Maintaining the Old ScaleWhen is FPC most appropriate?

17

Why To place the parameters of new form non-anchor items

onto the fixed old scale How

Holding the old form anchor item parameters fixed and estimating the new form non-anchor items

Critical Process Estimating the underlying distribution of ability for the

new form on the fixed old scale so that the new item parameters may be properly expressed on the old scale.

By the IRT modeling, the underlying distribution can be estimated using both the new form data and the fixed anchor item parameters.

III. Principle of FPCBasics

18

III. Principle of FPCSchematic Illustration of Updating Priors and Underlying Distributions of Ability

1st Est. Ability Dist.= 2nd Initial Prior

a1N b1N … bJN

θO

EM Iterations

2nd Est. Ability Dist.= 3rd Initial Prior

a1N b1N … bJN

θO

1st Initial Prior Fixinga1O, b1O, a2O, b2O, …

θO

FinalEst. Ability

Dist.

a1N b1N … bJN

Estimated New Item

Parameters on the θO Scale

19

III. Principle of FPCNumerical Expression: Multiple Prior Weights Updating and Multiple EM Cycles (MWU-MEM)

K

k

N

i

ssNEWOLDikNEWkiNEWNEW qpqf

1 1

)1()1( )ˆ ,ˆ ,,|( ),|( log )( πΔΔyΔyΔ

Likelihood Function for Estimating New Form Non-Anchor Item Parameters(Iteration s, quadrature point k, person i, data y, parameters Δ)

Closed-Form Formula for Estimating Quadrature Weights of the Underlying Ability Distribution from the New Form Data

N

i

ssNEWOLDik

sk qp

N 1

)1()1()( )ˆ ,ˆ , ,|(1

πΔΔy

Refer to Kim (2006a) for numerical details.

20

The values of the fixed anchor item parameters are expressed on the fixed old scale, so the origin and unit of the ability scale for the new form data have been already set. That is, we do not need to use “0, 1” scaling for the new form data.

New form non-anchor item parameters should be estimated using the new form underlying distribution that is properly recovered on the fixed old scale.

As with ability estimates, the underlying distribution can be estimated using the new form data and the fixed anchor item parameters.

Fixing the anchor item parameters pulls the underlying distribution onto the old scale gradually. Accordingly, the new form item parameters are also pulled onto the old scale.

III. Principle of FPCSummary of Key Points

21

Unstable estimates of the fixed item parameters might adversely affect the performance of FPC.

However, Kim (2006a) showed that FPC is robust to sampling errors of the fixed item parameter estimates in calibrating non-anchor items.

This seems to be because the new form data collaborate with the fixed item parameters in “revealing” the old scale.

In other words, as long as the sample size of the new group is large enough, unstable estimates of the fixed item parameters would not much affect the proper estimation of both the underlying distribution for the new group and the non-anchor item parameters.

III. Principle of FPCConcerns about the Unstable Estimates of Anchor Item Parameters

22

Some computer programs, such as BILOG-MG, do not update the prior quadrature weights during EM cycles when conducting FPC.

The resulting posterior (quadrature) weights would not properly represent the underlying ability distribution for the new form data.

Two ad-hoc methods can be used to obtain good estimates of the quadrature weights for the underlying distribution. Simple Transformation Prior Update (STPU)

Method Iterative-Run Prior Update (IRPU) Method

III. Principle of FPCTwo Alternatives to the MWU-MEM Method

23

Simple Transformation Prior Update (STPU) Method Uses A and B from a linking method to simply

update the prior ability distribution by transforming the posterior distribution from the regular, separate calibration with the new form. Then, conduct FPC with the updated prior ability distribution.

Iterative-Run Prior Update (IRPU) Method Uses iteratively updated prior ability

distributions through multiple FPC runs of BILOG-MG. An estimated posterior distribution in a calibration run is used as a prior distribution in the next calibration until the sequential procedure minimizes the difference between the two distributions.


24

Kim (2006b) shows that the two ad hoc methods for updating the prior ability distribution work very well. In recovering the parameters of non-anchor

items, the two methods perform almost equally to the Stocking-Lord linking method and concurrent calibration.

In practice, the STPU method may be preferred due to simplicity.

The IRPU method has the same feature as the MWU-MEM method, except for multiple runs of FPC. Thus, theoretically, the IRPU method may be more acceptable than the STPU method.


25

Someone might think that imposing strong Bayesian priors on the fixed item parameters and freeing the non-anchor item parameters would function similarly to FPC. A rationale for such constrained estimation can

be found in, for example, the BILOG (Mislevy & Bock, 1990) manual.

In theory, it sounds reasonable. But, my experiences suggest that using strong

priors to fix the anchor item parameters tends to distort the non-fixed item parameters.

III. Principle of FPCCaveats against Using “Constrained” Estimation for FPC

26

Note that in constrained estimation the anchor item parameters are to be estimated (although almost fixed), while in FPC they are excluded from the parameter list to be estimated.

Without a facility to update ability prior weights, both the underlying distribution and non-anchor item parameters would be distorted.

III. Principle of FPCCaveats against Using “Constrained” Estimation for FPC

27

BILOG-MG 7.0 (Zimowski et al., 2003) The “FIX” option does not function

properly because the prior weights are not updated during EM cycles (Kim, 2006a).

The STPU or IRPU method can be used. PARSCALE 4.1 (Muraki & Bock, 2003)

For FPC to work properly, the “POSTERIROR” option should be used (Kim, 2006a).

Without the “POSTERIOR” option, the STPU or IRPU method can be used.

IV. Use of Computer Programs for FPC

28

Data 3,000 examinees for the new form data

The data were obtained by simulating examinees from Normal (1, 1) distribution, against the old group of N(0, 1) distribution.

25-item multiple-choice (MC) test FPC

First 20 items fixed (item parameters are ready for use)

Last 5 items freed The three-parameter logistic (3PL) model is

used for item analyses. Comparison of Default, STPU, and IRPU

FPC methods

IV. Use of Computer Programs for FPCIllustration of FPC with BILOG-MG

29


Command File (to Use the Default FPC Facility)

Default FPC with BILOG-MGThe examinee group (2) was sampled from N(1,1)>COMMENT Fixed-parameter calibration>GLOBAL DFNAME=‘New.txt', PRNAME='Sample.PRM', NPARM=3, SAVE;>SAVE PAR='itempar';>LENGTH NITEMS=25;>INPUT NTOT=25, SAMPLE=3000, NALT=5, NID=4;>ITEMS INUM=(1(1)25), INAMES=(O01(1)O20, P01, P02, P03, P04, P05);>TEST TNAME=G2_FIX, INUM=(1(1)25), FIX=(1(0)20, 0(0)5); (4A1, T1, 25A1)>CALIB NQPT=31, CYCLE=100, CRIT=0.001, NEWTON=1,NOADJUST;>SCORE NOPRINT;

30


Data File (New.txt)

11111111111111111101111111111110100111111110011100111110100011111101011111111111101111111111111111111111111110111111011011111111111111011101110101111101111001000001000010011110110011110111111010011111111111111111111111111111101111011111110111100111111111111111111110111111111111101011111111111101111111111111100111111000111101111111111111111111111111. . . . . . . . . . . . . . . .. . . . . . . .

Item Responses for Anchor Items

31


Fixed Parameter File (Sample.PRM)

2001 0.48877 -1.76191 0.1885002 0.78980 -1.51222 0.1830103 0.86113 -1.46012 0.1726604 0.59502 -1.07553 0.2083505 0.81096 -0.79854 0.2098106 0.84988 -0.62070 0.1248107 0.59386 -0.30609 0.1730208 0.79144 -0.07422 0.2346309 0.51684 0.48596 0.2039410 0.90287 1.19854 0.1676111 0.50175 -2.00058 0.2126312 0.81267 -1.53418 0.1564913 1.16172 -1.22405 0.1387214 0.52306 -1.01148 0.1851915 0.74785 -0.84378 0.2089316 0.77883 -0.68332 0.1901317 0.88805 -0.41610 0.1812618 0.90752 0.08592 0.1753419 0.62818 0.65946 0.2622920 0.85275 1.82052 0.13813

a b c

No. of Fixed Items

32


Command File for the STPU Method (Before Transformation)

Single Group “0, 1” Scaling,Although the examinee group was sampled from N(1,1).>COMMENT STPU FPC before Transformation of Ability Points>GLOBAL DFNAME='New.txt', NPARM=3, SAVE;>SAVE PAR='sampleSim01.PAR';>LENGTH NITEMS=25;>INPUT NTOT=25, SAMPLE=3000, NALT=5, NID=4;>ITEMS INUM=(1(1)25), INAMES=(O01(1)O20, P01, P02, P03, P04, P05);>TEST TNAME=NO_FIX; (4A1, T1, 25A1)>CALIB NQPT=31, CYCLE=100, CRIT=0.001, NEWTON=0, IDIST=0;>SCORE NOPRINT;

33


Posterior Distribution from “0, 1” Scaling for the STPU Method

QUADRATURE POINTS, POSTERIOR WEIGHTS, MEAN AND S.D.: 1 2 3 4 5 POINT -0.4036E+01 -0.3767E+01 -0.3498E+01 -0.3229E+01 -0.2960E+01 POSTERIOR 0.2163E-04 0.7268E-04 0.2169E-03 0.5802E-03 0.1392E-02 6 7 8 9 10 POINT -0.2691E+01 -0.2422E+01 -0.2153E+01 -0.1884E+01 -0.1615E+01 POSTERIOR 0.3030E-02 0.6054E-02 0.1104E-01 0.1842E-01 0.2878E-01 11 12 13 14 15 POINT -0.1346E+01 -0.1076E+01 -0.8074E+00 -0.5384E+00 -0.2693E+00 POSTERIOR 0.4281E-01 0.5985E-01 0.7752E-01 0.9294E-01 0.1036E+00 16 17 18 19 20 POINT -0.2361E-03 0.2688E+00 0.5379E+00 0.8069E+00 0.1076E+01 POSTERIOR 0.1074E+00 0.1034E+00 0.9265E-01 0.7725E-01 0.6001E-01 21 22 23 24 25 POINT 0.1345E+01 0.1614E+01 0.1883E+01 0.2152E+01 0.2421E+01 POSTERIOR 0.4343E-01 0.2927E-01 0.1837E-01 0.1073E-01 0.5841E-02 26 27 28 29 30 POINT 0.2690E+01 0.2959E+01 0.3228E+01 0.3498E+01 0.3767E+01 POSTERIOR 0.2957E-02 0.1399E-02 0.6105E-03 0.2514E-03 0.9631E-04 31 POINT 0.4036E+01 POSTERIOR 0.3212E-04

MEAN 0.00000 S.D. 1.00000

34


Command File for the STPU Method (After Transformation)STPU FPC with Transformed Prior Points

The examinee group was sampled from N(1,1).

Omitted (The same as the commands for before-transformation “0, 1” calibration)

>TEST TNAME=G2_FIX, INUM=(1(1)25), FIX=(1(0)20, 0(0)5); (4A1, T1, 25A1)>CALIB NQPT=31, CYCLE=100, CRIT=0.001, NEWTON=0, IDIST=1, NOADJUST;>QUAD POINTS=( -3.1663E+000 -2.8864E+000 -2.6065E+000 -2.3266E+000 -2.0467E+000 -1.7668E+000 -1.4869E+000 -1.2070E+000 -9.2710E-001 -6.4720E-001 -3.6730E-001 -8.6352E-002 1.9314E-001 4.7304E-001 7.5305E-001 1.0330E+000 1.3130E+000 1.5930E+000 1.8729E+000 2.1529E+000 2.4328E+000 2.7127E+000 2.9926E+000 3.2725E+000 3.5524E+000 3.8323E+000 4.1122E+000 4.3921E+000 4.6731E+000 4.9530E+000 5.2329E+000), WEIGHTS=( 2.1630E-005 7.2680E-005 2.1690E-004 5.8020E-004 1.3920E-003 3.0300E-003 6.0540E-003 1.1040E-002 1.8420E-002 2.8780E-002 4.2810E-002 5.9850E-002 7.7520E-002 9.2940E-002 1.0360E-001 1.0740E-001 1.0340E-001 9.2650E-002 7.7250E-002 6.0010E-002 4.3430E-002 2.9270E-002 1.8370E-002 1.0730E-002 5.8410E-003 2.9570E-003 1.3990E-003 6.1050E-004 2.5140E-004 9.6310E-005 3.2120E-005);>SCORE NOPRINT;

From “0, 1” Scaling

(Not Transformed)

Rescaled points byθ* = Aθ+B,

A = 1.040535B = 1.033264

35


2nd Command File for the IRPU MethodIRPU FPC with Updated Prior WeightsThe examinee group was sampled from N(1,1).

Omitted (The same as the commands for the default FPC run)

>TEST TNAME=G2_FIX, INUM=(1(1)25), FIX=(1(0)20, 0(0)5); (4A1, T1, 25A1)>CALIB NQPT=31, CYCLE=100, CRIT=0.001, NEWTON=0, IDIST=1, NOADJUST;>QUAD POINTS=( -4.0000E+000 -3.7330E+000 -3.4670E+000 -3.2000E+000 -2.9330E+000 -2.6670E+000 -2.4000E+000 -2.1330E+000 -1.8670E+000 -1.6000E+000 -1.3330E+000 -1.0670E+000 -8.0000E-001 -5.3330E-001 -2.6670E-001 -7.7720E-016 2.6670E-001 5.3330E-001 8.0000E-001 1.0670E+000 1.3330E+000 1.6000E+000 1.8670E+000 2.1330E+000 2.4000E+000 2.6670E+000 2.9330E+000 3.2000E+000 3.4670E+000 3.7330E+000 4.0000E+000), WEIGHTS=( 8.8370E-007 3.0840E-006 1.0040E-005 3.1720E-005 9.4690E-005 2.5560E-004 6.3580E-004 1.4490E-003 3.0500E-003 6.0110E-003 1.1060E-002 1.8890E-002 3.0200E-002 4.5590E-002 6.4400E-002 8.4190E-002 1.0160E-001 1.1300E-001 1.1550E-001 1.0830E-001 9.2970E-002 7.3160E-002 5.2690E-002 3.4660E-002 2.0800E-002 1.1400E-002 5.7180E-003 2.6290E-003 1.1160E-003 4.3390E-004 1.5790E-004);>SCORE NOPRINT;

Updated Weights(= Posterior

weights from the 1st run of IRPU

FPC)

Fixed Points(-4.0 to 4.0)

36


History of Updated Posterior Distributions by the IRPU Method

Iter# Mean Std. Dev. 0 0.000 1.000 1 0.699 0.923 2 0.876 0.921 3 0.933 0.932 4 0.954 0.943 5 0.963 0.951 6 0.967 0.956 7 0.969 0.960 8 0.971 0.963 9 0.972 0.965 10 0.973 0.966 11 0.973 0.967 12 0.974 0.968

From Default FPC

Iterations stopped

because the M and SD were not changed beyond the 0.001 limit

37


FPC Estimates of Non-Anchor Item Parameters

on the Fixed Old Scale Mean/Sigma Default FPC Item a b c a b c 21 0.591 -1.947 0.212 0.650 -1.994 0.21422 0.831 -1.643 0.222 0.922 -1.699 0.23023 1.027 -1.781 0.196 1.128 -1.850 0.19824 0.566 -0.988 0.213 0.635 -1.089 0.22025 0.605 -0.727 0.206 0.681 -0.847 0.216

STPU FPC IRPU FPC Item a b c a b c 21 0.605 -1.909 0.210 0.624 -1.844 0.20822 0.863 -1.587 0.222 0.887 -1.542 0.21723 1.065 -1.723 0.196 1.100 -1.663 0.19524 0.575 -0.991 0.209 0.594 -0.952 0.20725 0.614 -0.729 0.205 0.637 -0.689 0.206

38


FPC Estimates of Mean and SD of the Underlying Distribution on the Fixed Old

Scale

Method Mean Std. Dev.

Default FPC 0.699 0.923 STPU FPC 1.003 1.018 IRPU FPC 0.974 0.968

Mean-Sigma B = 1.033 A = 1.041

Note. The new group examinees were from a N(1,1) distribution that was expressed on the fixed old scale.

Under-estimation

39

Data 3,000 examinees for the new form data

The data were obtained by simulating examinees from Normal (0.5, 1.22) distribution, against the old group of N(0, 1) distribution.

A mixed-format test of 15 MC items and 2 five-category constructed-response (CR) items

FPC First 10 MC items fixed (item parameters are

ready for use) Last 5 MC and 2 CR items freed The 3PL model for MC items and the generalized

partial credit (GPC) model for CR items Comparison of STPU and MWU-MEM

methods

IV. Use of Computer Programs for FPCIllustration of FPC with PARSCALE

40


Command File (MWU-MEM FPC)

MWU-MEM FPC with PARSCALEThe examinee group was sampled from N(0.5, 1.2^2)>COMMENT 10 common items fixed and 2 CR items calibration>FILE DFNAME='new.txt', IFNAME='MC10FIX.IFN', SAVE;>SAVE PARM='MC10FIX';>INPUT NTOT=17, TAKE=3000, NID=5, NTEST=1, LENGTH=17; (5A1, T1, 17A1)>TEST TNAME=I10FIX, ITEMS=(1(1)45), NBLOCK=17;>BLOCK BNAME=FIXED, NITEMS=1, NCAT=2, ORI=(0, 1), MOD=(1, 2), REP=10, SKIP;>BLOCK BNAME=FREEMC, NITEMS=1, NCAT=2, ORI=(0, 1), MOD=(1, 2), GPARM=0.2, GUESS=(2, EST), REP=5;>BLOCK BNAME=FREED, NITEMS=1, NCAT=5, ORI=(0,1,2,3,4), MOD=(1,2,3,4,5), REP=2;>CALIB NQPT=41, PAR, LOG, SCALE=1.7, CYCLE=200, NEWTON=0, FREE=(NOADJUST, NOADJUST), ESTORDER, SPRI, GPRI, POSTERIOR;>SCORE NOSCORE;

41


Data File (New.txt)

111111011111111321111111111111114411111011001111032111111111011111341111110001111103111111111111111144111101100001011131101010001111111101111101001111144000001010001000010001100000110010011111101101111122. . . . . . . . . . . . . . . . . .. . . . . . . . .

Item Responses for Anchor Items

Item Responses for CR Items

42

MWU-MEM FPC with PARSCALENo Fix, “0, 1” Scaling>COMMENT 10 common items fixed and 2 CR items calibration>FILE DFNAME='new.txt', SAVE;>SAVE PARM='MC10FIX';>INPUT NTOT=17, TAKE=3000, NID=5, NTEST=1, LENGTH=17; (5A1, T1, 17A1)>TEST TNAME=I10FIX, ITEMS=(1(1)45), NBLOCK=17;>BLOCK BNAME=FIXED, NITEMS=1, NCAT=2, ORI=(0, 1), MOD=(1, 2), GPARM=0.2, GUESS=(2, EST), REP=10;>BLOCK BNAME=FREEMC, NITEMS=1, NCAT=2, ORI=(0, 1), MOD=(1, 2), GPARM=0.2, GUESS=(2, EST), REP=5;>BLOCK BNAME=FREED, NITEMS=1, NCAT=5, ORI=(0,1,2,3,4), MOD=(1,2,3,4,5), REP=2;>CALIB NQPT=41, PAR, LOG, SCALE=1.7, CYCLE=200, NEWTON=0, FREE=(NOADJUST, NOADJUST), ESTORDER, SPRI, GPRI, POSTERIOR;>SCORE NOSCORE;


No IFNAME

Command File to Prepare IFNAME File (MC10FIX.IFN)

No SKIP

43


Item Parameter Output File from “0, 1” Scaling

MWU-MEM FPC with PARSCALENo Fix, “0, 1” Scaling I10FIX 17 17 7 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1GROUP 01FIXED 20001 0.94308 0.07058 -1.12375 0.14908 0.26134 0.07792 0.00000 0.00000 0.00000 0.00000BLOCK 20002 0.98019 0.06877 -0.93880 0.12173 0.21813 0.06540 0.00000 0.00000 0.00000 0.00000BLOCK 20003 1.18582 0.07723 -0.72689 0.08253 0.19030 0.04856 0.00000 0.00000 0.00000 0.00000

(Omitted)

FREED 50016 1.16556 0.03437 -0.14845 0.01309 0.00000 0.00000 0.00000 1.25729 0.29044 -0.33537 -1.21236 0.00000 0.04262 0.03157 0.02902 0.03037BLOCK 50017 1.42147 0.04095 -0.19171 0.01178 0.00000 0.00000 0.00000 1.29058 0.38858 -0.50917 -1.16999 0.00000 0.03895 0.02653 0.02434 0.02606

44


Modified Item Parameter File (MC10FIX.IFN)

MWU-MEM FPC with PARSCALENo Fix, “0, 1” ScalingI10FIX 17 17 7 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1GROUP 01FIXED 20001 0.69300 0.00000 -1.50000 0.00000 0.12500 0.00000 0.00000 0.00000 0.00000 0.00000BLOCK 20002 0.78600 0.00000 -1.00000 0.00000 0.18500 0.00000 0.00000 0.00000 0.00000 0.00000BLOCK 20003 0.89700 0.00000 -0.60000 0.00000 0.23300 0.00000 0.00000 0.00000 0.00000 0.00000

(Omitted)

FREED 50016 1.16556 0.03437 -0.14845 0.01309 0.00000 0.00000 0.00000 1.25729 0.29044 -0.33537 -1.21236 0.00000 0.04262 0.03157 0.02902 0.03037BLOCK 50017 1.42147 0.04095 -0.19171 0.01178 0.00000 0.00000 0.00000 1.29058 0.38858 -0.50917 -1.16999 0.00000 0.03895 0.02653 0.02434 0.02606

Replaced with fixed a

Replaced with fixed b

Replaced with fixed c

Replacing for the 10 fixed items

45


Command File for the STPU Method (After Transformation)STPU FPC with Transformed Prior Points

The examinee group was sampled from N(1,1).

Omitted (The same as the commands for MWU-MEM>CALIB NQPT=31, PAR, LOG, SCALE=1.7, CYCLE=200, NEWTON=0, FREE=(NOADJUST, NOADJUST), ESTORDER, SPRI, GPRI, DIST=4, QPREAD;>QUADP POINTS=( -5.2976E+000 -4.9280E+000 -4.5598E+000 -4.1902E+000 -3.8206E+000 -3.4524E+000 -3.0828E+000 -2.7132E+000 -2.3450E+000 -1.9754E+000 -1.6059E+000 -1.2377E+000 -8.6808E-001 -4.9891E-001 -1.2988E-001 2.3929E-001 6.0846E-001 9.7749E-001 1.3467E+000 1.7162E+000 2.0844E+000 2.4540E+000 2.8236E+000 3.1918E+000 3.5614E+000 3.9310E+000 4.2992E+000 4.6688E+000 5.0384E+000 5.4066E+000 5.7761E+000), WEIGHTS=( 1.2430E-005 3.4290E-005 8.7330E-005 2.0480E-004 4.4420E-004 9.1150E-004 1.8720E-003 4.1960E-003 1.0550E-002 2.5160E-002 4.2780E-002 5.0290E-002 5.8510E-002 8.6110E-002 9.9290E-002 8.6880E-002 9.7990E-002 1.0840E-001 9.0140E-002 7.7730E-002 6.4860E-002 4.3230E-002 2.4440E-002 1.3010E-002 6.7400E-003 3.3710E-003 1.6010E-003 7.1410E-004 2.9710E-004 1.1500E-004 4.1380E-005);>SCORE NOSCORE;

From “0, 1” Scaling

(Not Transformed)

Rescaled points byθ* = Aθ+B,

A = 1.38B = 0.24

46


FPC Estimates of Non-Anchor Item Parameters

on the Fixed Old ScaleSTPU Method a b c Item c2 c3 c4 c5 11 0.741 -1.361 0.194 12 0.767 -0.995 0.238 13 0.741 -0.906 0.185 14 0.942 -0.442 0.140 15 1.181 -0.113 0.234 16 0.920 0.025 -1.569 -0.343 0.449 1.56217 1.120 -0.031 -1.667 -0.522 0.615 1.452

MWU-MEM Method a b c Item c2 c3 c4 c5 11 0.741 -1.361 0.194 12 0.768 -0.994 0.238 13 0.741 -0.908 0.184 14 0.942 -0.444 0.139 15 1.180 -0.113 0.234 16 0.921 0.025 -1.568 -0.342 0.450 1.56117 1.120 -0.030 -1.666 -0.522 0.615 1.454

47


FPC Estimates of Mean and SD of the Underlying Distribution on the Fixed Old

Scale

Method Mean Std. Dev.

STPU FPC 0.460 1.242 MWU-MEM FPC 0.456 1.227

Mean-Sigma B = 0.239 A = 1.384

Note. The new group examinees were from a N(0.5,1.22) distribution that was expressed on the fixed old scale.

Under-estimation

Over-estimation

48

Online Calibration in Computerized Adaptive Testing (CAT)

Calibration of Pretest Items on the Fixed Operational Scale in Regular, Non-CAT Administration

In a Mixed-Format Test, Separate Calibration of CR Items from MC Items To Minimize Effects of Bad CR Items on MC Item Calibration

Equating Test Forms in the CING Design

V. Applications of FPC for Scaling and Equating

49

In CAT, different sets of operational items are adaptively administered to examinees, with pretest items “seeded” in a certain common block of examinee groups.

Because the operational items were already calibrated, their parameters are known in CAT

Thus, FPC may be the best way to calibrate and diagnose the pretest items on the scale of the operational items, without affecting the operational item parameters.

V. Applications of FPCOnline Calibration in CAT

50

To develop test forms, pretest items are often administered together with operational items to examinees.

However, it would be wise to calibrate operational items separately from pretest items, because the operational item parameters could be contaminated by bad pretest items.

In this case, the ability distribution that is estimated using only the operational items can be reasonably used as the prior ability distribution for FPC with the pretest items, while the operational item parameters are used to fix the operational items in the FPC.

V. Applications of FPCCalibration of Pretest Items on the Fixed Operational Scale

51

A mixed-format test contains different types of items; for instance, some are MC items and others are CR items.

Simultaneous calibration with both types of items can be conducted, assuming that a dominant factor underlies examinees’ responses to items.

However, practitioners may want to calibrate MC items separately from CR items, because calibration with bad CR items might adversely affect the estimation of MC item parameters.

In this case, MC items are first calibrated and then CR items are calibrated while fixing the MC item parameters.

V. Applications of FPCFPC with Different Formats of Items

52

Test equating using IRT requires all item parameters to be placed on a common scale (which is usually the old form scale).

Once all item and ability parameters are placed on a common scale, IRT true score or observed score equating is conducted.

Thus, FPC can be effectively used for placing all item parameters on the fixed old scale. Surely, the anchor is the common items between the new and old forms.

V. Applications of FPCEquating Test Forms in the CING Design

END

Thank You

EXPLORE FPC

Documents

IRT Fixed Parameter Calibration and Other Approaches to Maintaining Item Parameters on a Common Ability Scale Seonghoon Kim, PhD Keimyung University Email: