Upload
arthur-rakes
View
224
Download
0
Tags:
Embed Size (px)
Citation preview
IRT Fixed Parameter Calibration and Other Approaches to Maintaining Item Parameters on a Common Ability Scale
Seonghoon Kim, PhDKeimyung University
Email: [email protected]
Presented at Measured Progresson July 10, 2008
2
Overview
I. Nature of IRT Ability Scale II. Three Approaches to Maintaining
Item Parameters on a Common Scale
III. Principle of Fixed Parameter Calibration (FPC)
IV. Use of Computer Programs for FPC V. Applications of FPC for Scaling and
Equating
3
Reference Guide
This presentation was prepared based on my articles, Kim, S. (2006a). A comparative study of IRT fixed parameter
calibration methods. Journal of Educational Measurement, 43 (4), 355-381.
Kim, S. (2006b). A study on IRT fixed parameter calibration methods using BILOG-MG. Journal of Educational Evaluation, 19 (1), 323-342.
Kim, S., & Kolen, M. J. (2006). Robustness to format effects of IRT linking methods for mixed-format tests. Applied Measurement in Education, 19 (4), 357-381.
Kim, S., & Lee, W. (2006). An extension of four IRT linking methods for mixed-format tests. Journal of Educational Measurement, 43 (1), 53-76.
Kim, S., & Kolen, M. J. (2007). Effects on scale linking of different definitions of criterion functions for the IRT characteristic curve methods. Journal of Educational and Behavioral Statistics, 32 (4), 371-397.
and my recent thoughts and works on FPC
4
I. Nature of IRT Ability ScaleIndeterminacy in IRT modeling
Item response function (IRF) and metrics Two-parameter logistic (2PL) model IRF = P(θ | a, b) = 1/[1+exp(-Da(θ-b))] Suppose that θO = A θN + B If aO = aN /A and bO = A bN + B,
P(θO | aO, bO) = P(θN | aN, bN)
Therefore, IRF and item parameters are invariant conditional on linear transformation
Thus, in practice, either θO or θN can be used, which means scale indeterminacy.
5
I. Nature of IRT Ability Scale“0, 1” Scaling vs. Rasch Scaling
“0, 1” scaling Scaling by arbitrarily assuming that the mean
(M) and standard deviation (SD) of the ability distribution are equal to 0 (origin) and 1 (unit).
Such arbitrary but “standardized” fixing is unavoidable when the M and SD are unknown.
Rasch scaling Setting the origin (0) of the scale at the
average difficulty of all items involved, while fixing the unit at 1.
The fixed unit is guaranteed by the Rasch modeling.
6
I. Nature of IRT Ability ScaleNeed for a Fixed Common Ability Scale
A fixed common scale should be used across test administrations for several reasons To check the invariance property of
item parameters To achieve comparability between item
parameters from different administrations
To develop an item pool To conduct IRT equating
7
To develop a common ability scale requires all new scales to be linked to the fixed old scale θO.
θN1
θN2
θN3
θO
I. Nature of IRT Ability ScaleNeed for a Fixed Common Ability Scale
8
I. Nature of IRT Ability ScaleFactors for Development of a Common Scale
Development of a fixed common scale is subject to Data collection design for IRT scaling and
equating test forms Random groups design vs. Common-item
nonequivalent groups design Scaling convention
“0,1” scaling vs. Rasch scaling Item parameter estimation method
Marginal maximum likelihood (MML) estimation vs. Joint maximum likelihood (JML) estimation
9
The ContextAssumed in This Presentation
Data collection design for IRT scaling and equating test forms
Common-item nonequivalent groups (CING) design Anchor items (i.e., common items) link two test forms
Scaling convention “0, 1” scaling
Group dependent In a random groups design, two “0, 1” scales from
alternative forms may be considered equivalent. Marginal Maximum Likelihood (MML) Estimation
Estimation of Item parameters Estimation of Underlying Ability Distribution
Quadrature weights are estimated at quadrature points.
10
Data Structure Illustration for the CING Design
Old Form
(Group 1)
New Form
(Group 2)
New Form Unique Items
to New Group (2)
Old Form Unique Items
to Old Group (1)
Items
Common Items (Anchor)
to Old and New Groups
11
II. Three Approaches to Maintaining a Common Scale
Separate calibration by form and linking Estimate transformation coefficients A and B using two sets
of item parameter estimates for the anchor items Use A and B to transform new form item parameter
estimates into those on the old scale Fixed parameter calibration (FPC)
Holding the old form anchor item parameters fixed and estimating the new form non-anchor items
Concurrent calibration (aka multiple-group estimation)
Combining new and old form data and estimating both all item parameters and underlying ability distributions, with the old group being designated as the reference-scale group
Will not be addressed in details in this presentation
12
II. Maintaining the Old ScaleSeparate Calibration by Form and Linking
“0, 1” scales from two test forms Old form scale: θO (reference) New form scale: θN (arbitrary)
Scheme of linking two “0, 1” scales θO = A θN + B
θN (arbitrary origin & unit)
θO (fixed origin & unit)
-1 10
-1 10B
A
13
Linking ability scales is completed by placing all item parameters from separate calibrations onto the fixed old scale.
In the case of the 2PL model, given A and B, aN and bN parameters from a new scale are transformed into a* = aN /A and b* = A bN + B
In practice, A and B are estimated with item parameter estimates from the old and new scales.
Mean-Sigma Method (Marco, 1977) Mean-Mean Method (Loyd & Hoover, 1980) Haebara Method (Haebara, 1980) Stocking-Lord Method (Stocking & Lord, 1983)
II. Maintaining the Old ScaleSeparate Calibration by Form and Linking
14
Suppose that the characteristic curve (Haebara or Stocking-Lord) method is employed as a linking method for the “separate calibration and linking” approach.
The performance of the three alternative approaches to maintaining the old scale is differential depending on whether the new form items are common or not (Hanson & Béguin, 2002; Kim, 2006b; Kim & Kolen, in process).
For the common items, concurrent calibration would perform best, due mainly to larger sample size (new group + old group), compared to the non-common items.
For the non-common items, the three approaches would perform almost equally.
II. Maintaining the Old ScaleComparative Performance
Method Unique Items Anchor Items
Separate Calibration and Linking No Yes *
Concurrent Calibration Yes ** Yes **
Fixed Parameter Calibration No No
form anchor items
** Old form item parameters can be changed by the inclusion of new form
items, which may be remarkable for anchor items.
Possibility of Old Form Item Parameters Being Replaced
(When New Form Items Are Calibrated)
Note. It is assumed that old form item parameters were obtained before.
* Parameters of old form anchor items can be replaced with those of new
II. Maintaining the Old ScaleComparative Performance
16
When using the “stable” old form anchor item parameters to obtain or diagnose the parameters of new form non-anchor items on the fixed old scale
Note Placing the parameters of new form non-
anchor items on the old scale is the focus. Updating of the old form item parameters is
not concerned at all. The old form anchor items are assumed to
have stable parameter estimates because a large sample was used for obtaining them.
II. Maintaining the Old ScaleWhen is FPC most appropriate?
17
Why To place the parameters of new form non-anchor items
onto the fixed old scale How
Holding the old form anchor item parameters fixed and estimating the new form non-anchor items
Critical Process Estimating the underlying distribution of ability for the
new form on the fixed old scale so that the new item parameters may be properly expressed on the old scale.
By the IRT modeling, the underlying distribution can be estimated using both the new form data and the fixed anchor item parameters.
III. Principle of FPCBasics
18
III. Principle of FPCSchematic Illustration of Updating Priors and Underlying Distributions of Ability
1st Est. Ability Dist.= 2nd Initial Prior
a1N b1N … bJN
θO
EM Iterations
2nd Est. Ability Dist.= 3rd Initial Prior
a1N b1N … bJN
θO
1st Initial Prior Fixinga1O, b1O, a2O, b2O, …
θO
FinalEst. Ability
Dist.
a1N b1N … bJN
Estimated New Item
Parameters on the θO Scale
19
III. Principle of FPCNumerical Expression: Multiple Prior Weights Updating and Multiple EM Cycles (MWU-MEM)
K
k
N
i
ssNEWOLDikNEWkiNEWNEW qpqf
1 1
)1()1( )ˆ ,ˆ ,,|( ),|( log )( πΔΔyΔyΔ
Likelihood Function for Estimating New Form Non-Anchor Item Parameters(Iteration s, quadrature point k, person i, data y, parameters Δ)
Closed-Form Formula for Estimating Quadrature Weights of the Underlying Ability Distribution from the New Form Data
N
i
ssNEWOLDik
sk qp
N 1
)1()1()( )ˆ ,ˆ , ,|(1
πΔΔy
Refer to Kim (2006a) for numerical details.
20
The values of the fixed anchor item parameters are expressed on the fixed old scale, so the origin and unit of the ability scale for the new form data have been already set. That is, we do not need to use “0, 1” scaling for the new form data.
New form non-anchor item parameters should be estimated using the new form underlying distribution that is properly recovered on the fixed old scale.
As with ability estimates, the underlying distribution can be estimated using the new form data and the fixed anchor item parameters.
Fixing the anchor item parameters pulls the underlying distribution onto the old scale gradually. Accordingly, the new form item parameters are also pulled onto the old scale.
III. Principle of FPCSummary of Key Points
21
Unstable estimates of the fixed item parameters might adversely affect the performance of FPC.
However, Kim (2006a) showed that FPC is robust to sampling errors of the fixed item parameter estimates in calibrating non-anchor items.
This seems to be because the new form data collaborate with the fixed item parameters in “revealing” the old scale.
In other words, as long as the sample size of the new group is large enough, unstable estimates of the fixed item parameters would not much affect the proper estimation of both the underlying distribution for the new group and the non-anchor item parameters.
III. Principle of FPCConcerns about the Unstable Estimates of Anchor Item Parameters
22
Some computer programs, such as BILOG-MG, do not update the prior quadrature weights during EM cycles when conducting FPC.
The resulting posterior (quadrature) weights would not properly represent the underlying ability distribution for the new form data.
Two ad-hoc methods can be used to obtain good estimates of the quadrature weights for the underlying distribution. Simple Transformation Prior Update (STPU)
Method Iterative-Run Prior Update (IRPU) Method
III. Principle of FPCTwo Alternatives to the MWU-MEM Method
23
Simple Transformation Prior Update (STPU) Method Uses A and B from a linking method to simply
update the prior ability distribution by transforming the posterior distribution from the regular, separate calibration with the new form. Then, conduct FPC with the updated prior ability distribution.
Iterative-Run Prior Update (IRPU) Method Uses iteratively updated prior ability
distributions through multiple FPC runs of BILOG-MG. An estimated posterior distribution in a calibration run is used as a prior distribution in the next calibration until the sequential procedure minimizes the difference between the two distributions.
III. Principle of FPCTwo Alternatives to the MWU-MEM Method
24
Kim (2006b) shows that the two ad hoc methods for updating the prior ability distribution work very well. In recovering the parameters of non-anchor
items, the two methods perform almost equally to the Stocking-Lord linking method and concurrent calibration.
In practice, the STPU method may be preferred due to simplicity.
The IRPU method has the same feature as the MWU-MEM method, except for multiple runs of FPC. Thus, theoretically, the IRPU method may be more acceptable than the STPU method.
III. Principle of FPCTwo Alternatives to the MWU-MEM Method
25
Someone might think that imposing strong Bayesian priors on the fixed item parameters and freeing the non-anchor item parameters would function similarly to FPC. A rationale for such constrained estimation can
be found in, for example, the BILOG (Mislevy & Bock, 1990) manual.
In theory, it sounds reasonable. But, my experiences suggest that using strong
priors to fix the anchor item parameters tends to distort the non-fixed item parameters.
III. Principle of FPCCaveats against Using “Constrained” Estimation for FPC
26
Note that in constrained estimation the anchor item parameters are to be estimated (although almost fixed), while in FPC they are excluded from the parameter list to be estimated.
Without a facility to update ability prior weights, both the underlying distribution and non-anchor item parameters would be distorted.
III. Principle of FPCCaveats against Using “Constrained” Estimation for FPC
27
BILOG-MG 7.0 (Zimowski et al., 2003) The “FIX” option does not function
properly because the prior weights are not updated during EM cycles (Kim, 2006a).
The STPU or IRPU method can be used. PARSCALE 4.1 (Muraki & Bock, 2003)
For FPC to work properly, the “POSTERIROR” option should be used (Kim, 2006a).
Without the “POSTERIOR” option, the STPU or IRPU method can be used.
IV. Use of Computer Programs for FPC
28
Data 3,000 examinees for the new form data
The data were obtained by simulating examinees from Normal (1, 1) distribution, against the old group of N(0, 1) distribution.
25-item multiple-choice (MC) test FPC
First 20 items fixed (item parameters are ready for use)
Last 5 items freed The three-parameter logistic (3PL) model is
used for item analyses. Comparison of Default, STPU, and IRPU
FPC methods
IV. Use of Computer Programs for FPCIllustration of FPC with BILOG-MG
29
IV. Use of Computer Programs for FPCIllustration of FPC with BILOG-MG
Command File (to Use the Default FPC Facility)
Default FPC with BILOG-MGThe examinee group (2) was sampled from N(1,1)>COMMENT Fixed-parameter calibration>GLOBAL DFNAME=‘New.txt', PRNAME='Sample.PRM', NPARM=3, SAVE;>SAVE PAR='itempar';>LENGTH NITEMS=25;>INPUT NTOT=25, SAMPLE=3000, NALT=5, NID=4;>ITEMS INUM=(1(1)25), INAMES=(O01(1)O20, P01, P02, P03, P04, P05);>TEST TNAME=G2_FIX, INUM=(1(1)25), FIX=(1(0)20, 0(0)5); (4A1, T1, 25A1)>CALIB NQPT=31, CYCLE=100, CRIT=0.001, NEWTON=1,NOADJUST;>SCORE NOPRINT;
30
IV. Use of Computer Programs for FPCIllustration of FPC with BILOG-MG
Data File (New.txt)
11111111111111111101111111111110100111111110011100111110100011111101011111111111101111111111111111111111111110111111011011111111111111011101110101111101111001000001000010011110110011110111111010011111111111111111111111111111101111011111110111100111111111111111111110111111111111101011111111111101111111111111100111111000111101111111111111111111111111. . . . . . . . . . . . . . . .. . . . . . . .
Item Responses for Anchor Items
31
IV. Use of Computer Programs for FPCIllustration of FPC with BILOG-MG
Fixed Parameter File (Sample.PRM)
2001 0.48877 -1.76191 0.1885002 0.78980 -1.51222 0.1830103 0.86113 -1.46012 0.1726604 0.59502 -1.07553 0.2083505 0.81096 -0.79854 0.2098106 0.84988 -0.62070 0.1248107 0.59386 -0.30609 0.1730208 0.79144 -0.07422 0.2346309 0.51684 0.48596 0.2039410 0.90287 1.19854 0.1676111 0.50175 -2.00058 0.2126312 0.81267 -1.53418 0.1564913 1.16172 -1.22405 0.1387214 0.52306 -1.01148 0.1851915 0.74785 -0.84378 0.2089316 0.77883 -0.68332 0.1901317 0.88805 -0.41610 0.1812618 0.90752 0.08592 0.1753419 0.62818 0.65946 0.2622920 0.85275 1.82052 0.13813
a b c
No. of Fixed Items
32
IV. Use of Computer Programs for FPCIllustration of FPC with BILOG-MG
Command File for the STPU Method (Before Transformation)
Single Group “0, 1” Scaling,Although the examinee group was sampled from N(1,1).>COMMENT STPU FPC before Transformation of Ability Points>GLOBAL DFNAME='New.txt', NPARM=3, SAVE;>SAVE PAR='sampleSim01.PAR';>LENGTH NITEMS=25;>INPUT NTOT=25, SAMPLE=3000, NALT=5, NID=4;>ITEMS INUM=(1(1)25), INAMES=(O01(1)O20, P01, P02, P03, P04, P05);>TEST TNAME=NO_FIX; (4A1, T1, 25A1)>CALIB NQPT=31, CYCLE=100, CRIT=0.001, NEWTON=0, IDIST=0;>SCORE NOPRINT;
33
IV. Use of Computer Programs for FPCIllustration of FPC with BILOG-MG
Posterior Distribution from “0, 1” Scaling for the STPU Method
QUADRATURE POINTS, POSTERIOR WEIGHTS, MEAN AND S.D.: 1 2 3 4 5 POINT -0.4036E+01 -0.3767E+01 -0.3498E+01 -0.3229E+01 -0.2960E+01 POSTERIOR 0.2163E-04 0.7268E-04 0.2169E-03 0.5802E-03 0.1392E-02 6 7 8 9 10 POINT -0.2691E+01 -0.2422E+01 -0.2153E+01 -0.1884E+01 -0.1615E+01 POSTERIOR 0.3030E-02 0.6054E-02 0.1104E-01 0.1842E-01 0.2878E-01 11 12 13 14 15 POINT -0.1346E+01 -0.1076E+01 -0.8074E+00 -0.5384E+00 -0.2693E+00 POSTERIOR 0.4281E-01 0.5985E-01 0.7752E-01 0.9294E-01 0.1036E+00 16 17 18 19 20 POINT -0.2361E-03 0.2688E+00 0.5379E+00 0.8069E+00 0.1076E+01 POSTERIOR 0.1074E+00 0.1034E+00 0.9265E-01 0.7725E-01 0.6001E-01 21 22 23 24 25 POINT 0.1345E+01 0.1614E+01 0.1883E+01 0.2152E+01 0.2421E+01 POSTERIOR 0.4343E-01 0.2927E-01 0.1837E-01 0.1073E-01 0.5841E-02 26 27 28 29 30 POINT 0.2690E+01 0.2959E+01 0.3228E+01 0.3498E+01 0.3767E+01 POSTERIOR 0.2957E-02 0.1399E-02 0.6105E-03 0.2514E-03 0.9631E-04 31 POINT 0.4036E+01 POSTERIOR 0.3212E-04
MEAN 0.00000 S.D. 1.00000
34
IV. Use of Computer Programs for FPCIllustration of FPC with BILOG-MG
Command File for the STPU Method (After Transformation)STPU FPC with Transformed Prior Points
The examinee group was sampled from N(1,1).
Omitted (The same as the commands for before-transformation “0, 1” calibration)
>TEST TNAME=G2_FIX, INUM=(1(1)25), FIX=(1(0)20, 0(0)5); (4A1, T1, 25A1)>CALIB NQPT=31, CYCLE=100, CRIT=0.001, NEWTON=0, IDIST=1, NOADJUST;>QUAD POINTS=( -3.1663E+000 -2.8864E+000 -2.6065E+000 -2.3266E+000 -2.0467E+000 -1.7668E+000 -1.4869E+000 -1.2070E+000 -9.2710E-001 -6.4720E-001 -3.6730E-001 -8.6352E-002 1.9314E-001 4.7304E-001 7.5305E-001 1.0330E+000 1.3130E+000 1.5930E+000 1.8729E+000 2.1529E+000 2.4328E+000 2.7127E+000 2.9926E+000 3.2725E+000 3.5524E+000 3.8323E+000 4.1122E+000 4.3921E+000 4.6731E+000 4.9530E+000 5.2329E+000), WEIGHTS=( 2.1630E-005 7.2680E-005 2.1690E-004 5.8020E-004 1.3920E-003 3.0300E-003 6.0540E-003 1.1040E-002 1.8420E-002 2.8780E-002 4.2810E-002 5.9850E-002 7.7520E-002 9.2940E-002 1.0360E-001 1.0740E-001 1.0340E-001 9.2650E-002 7.7250E-002 6.0010E-002 4.3430E-002 2.9270E-002 1.8370E-002 1.0730E-002 5.8410E-003 2.9570E-003 1.3990E-003 6.1050E-004 2.5140E-004 9.6310E-005 3.2120E-005);>SCORE NOPRINT;
From “0, 1” Scaling
(Not Transformed)
Rescaled points byθ* = Aθ+B,
A = 1.040535B = 1.033264
35
IV. Use of Computer Programs for FPCIllustration of FPC with BILOG-MG
2nd Command File for the IRPU MethodIRPU FPC with Updated Prior WeightsThe examinee group was sampled from N(1,1).
Omitted (The same as the commands for the default FPC run)
>TEST TNAME=G2_FIX, INUM=(1(1)25), FIX=(1(0)20, 0(0)5); (4A1, T1, 25A1)>CALIB NQPT=31, CYCLE=100, CRIT=0.001, NEWTON=0, IDIST=1, NOADJUST;>QUAD POINTS=( -4.0000E+000 -3.7330E+000 -3.4670E+000 -3.2000E+000 -2.9330E+000 -2.6670E+000 -2.4000E+000 -2.1330E+000 -1.8670E+000 -1.6000E+000 -1.3330E+000 -1.0670E+000 -8.0000E-001 -5.3330E-001 -2.6670E-001 -7.7720E-016 2.6670E-001 5.3330E-001 8.0000E-001 1.0670E+000 1.3330E+000 1.6000E+000 1.8670E+000 2.1330E+000 2.4000E+000 2.6670E+000 2.9330E+000 3.2000E+000 3.4670E+000 3.7330E+000 4.0000E+000), WEIGHTS=( 8.8370E-007 3.0840E-006 1.0040E-005 3.1720E-005 9.4690E-005 2.5560E-004 6.3580E-004 1.4490E-003 3.0500E-003 6.0110E-003 1.1060E-002 1.8890E-002 3.0200E-002 4.5590E-002 6.4400E-002 8.4190E-002 1.0160E-001 1.1300E-001 1.1550E-001 1.0830E-001 9.2970E-002 7.3160E-002 5.2690E-002 3.4660E-002 2.0800E-002 1.1400E-002 5.7180E-003 2.6290E-003 1.1160E-003 4.3390E-004 1.5790E-004);>SCORE NOPRINT;
Updated Weights(= Posterior
weights from the 1st run of IRPU
FPC)
Fixed Points(-4.0 to 4.0)
36
IV. Use of Computer Programs for FPCIllustration of FPC with BILOG-MG
History of Updated Posterior Distributions by the IRPU Method
Iter# Mean Std. Dev. 0 0.000 1.000 1 0.699 0.923 2 0.876 0.921 3 0.933 0.932 4 0.954 0.943 5 0.963 0.951 6 0.967 0.956 7 0.969 0.960 8 0.971 0.963 9 0.972 0.965 10 0.973 0.966 11 0.973 0.967 12 0.974 0.968
From Default FPC
Iterations stopped
because the M and SD were not changed beyond the 0.001 limit
37
IV. Use of Computer Programs for FPCIllustration of FPC with BILOG-MG
FPC Estimates of Non-Anchor Item Parameters
on the Fixed Old Scale Mean/Sigma Default FPC Item a b c a b c 21 0.591 -1.947 0.212 0.650 -1.994 0.21422 0.831 -1.643 0.222 0.922 -1.699 0.23023 1.027 -1.781 0.196 1.128 -1.850 0.19824 0.566 -0.988 0.213 0.635 -1.089 0.22025 0.605 -0.727 0.206 0.681 -0.847 0.216
STPU FPC IRPU FPC Item a b c a b c 21 0.605 -1.909 0.210 0.624 -1.844 0.20822 0.863 -1.587 0.222 0.887 -1.542 0.21723 1.065 -1.723 0.196 1.100 -1.663 0.19524 0.575 -0.991 0.209 0.594 -0.952 0.20725 0.614 -0.729 0.205 0.637 -0.689 0.206
38
IV. Use of Computer Programs for FPCIllustration of FPC with BILOG-MG
FPC Estimates of Mean and SD of the Underlying Distribution on the Fixed Old
Scale
Method Mean Std. Dev.
Default FPC 0.699 0.923 STPU FPC 1.003 1.018 IRPU FPC 0.974 0.968
Mean-Sigma B = 1.033 A = 1.041
Note. The new group examinees were from a N(1,1) distribution that was expressed on the fixed old scale.
Under-estimation
39
Data 3,000 examinees for the new form data
The data were obtained by simulating examinees from Normal (0.5, 1.22) distribution, against the old group of N(0, 1) distribution.
A mixed-format test of 15 MC items and 2 five-category constructed-response (CR) items
FPC First 10 MC items fixed (item parameters are
ready for use) Last 5 MC and 2 CR items freed The 3PL model for MC items and the generalized
partial credit (GPC) model for CR items Comparison of STPU and MWU-MEM
methods
IV. Use of Computer Programs for FPCIllustration of FPC with PARSCALE
40
IV. Use of Computer Programs for FPCIllustration of FPC with PARSCALE
Command File (MWU-MEM FPC)
MWU-MEM FPC with PARSCALEThe examinee group was sampled from N(0.5, 1.2^2)>COMMENT 10 common items fixed and 2 CR items calibration>FILE DFNAME='new.txt', IFNAME='MC10FIX.IFN', SAVE;>SAVE PARM='MC10FIX';>INPUT NTOT=17, TAKE=3000, NID=5, NTEST=1, LENGTH=17; (5A1, T1, 17A1)>TEST TNAME=I10FIX, ITEMS=(1(1)45), NBLOCK=17;>BLOCK BNAME=FIXED, NITEMS=1, NCAT=2, ORI=(0, 1), MOD=(1, 2), REP=10, SKIP;>BLOCK BNAME=FREEMC, NITEMS=1, NCAT=2, ORI=(0, 1), MOD=(1, 2), GPARM=0.2, GUESS=(2, EST), REP=5;>BLOCK BNAME=FREED, NITEMS=1, NCAT=5, ORI=(0,1,2,3,4), MOD=(1,2,3,4,5), REP=2;>CALIB NQPT=41, PAR, LOG, SCALE=1.7, CYCLE=200, NEWTON=0, FREE=(NOADJUST, NOADJUST), ESTORDER, SPRI, GPRI, POSTERIOR;>SCORE NOSCORE;
41
IV. Use of Computer Programs for FPCIllustration of FPC with PARSCALE
Data File (New.txt)
111111011111111321111111111111114411111011001111032111111111011111341111110001111103111111111111111144111101100001011131101010001111111101111101001111144000001010001000010001100000110010011111101101111122. . . . . . . . . . . . . . . . . .. . . . . . . . .
Item Responses for Anchor Items
Item Responses for CR Items
42
MWU-MEM FPC with PARSCALENo Fix, “0, 1” Scaling>COMMENT 10 common items fixed and 2 CR items calibration>FILE DFNAME='new.txt', SAVE;>SAVE PARM='MC10FIX';>INPUT NTOT=17, TAKE=3000, NID=5, NTEST=1, LENGTH=17; (5A1, T1, 17A1)>TEST TNAME=I10FIX, ITEMS=(1(1)45), NBLOCK=17;>BLOCK BNAME=FIXED, NITEMS=1, NCAT=2, ORI=(0, 1), MOD=(1, 2), GPARM=0.2, GUESS=(2, EST), REP=10;>BLOCK BNAME=FREEMC, NITEMS=1, NCAT=2, ORI=(0, 1), MOD=(1, 2), GPARM=0.2, GUESS=(2, EST), REP=5;>BLOCK BNAME=FREED, NITEMS=1, NCAT=5, ORI=(0,1,2,3,4), MOD=(1,2,3,4,5), REP=2;>CALIB NQPT=41, PAR, LOG, SCALE=1.7, CYCLE=200, NEWTON=0, FREE=(NOADJUST, NOADJUST), ESTORDER, SPRI, GPRI, POSTERIOR;>SCORE NOSCORE;
IV. Use of Computer Programs for FPCIllustration of FPC with PARSCALE
No IFNAME
Command File to Prepare IFNAME File (MC10FIX.IFN)
No SKIP
43
IV. Use of Computer Programs for FPCIllustration of FPC with PARSCALE
Item Parameter Output File from “0, 1” Scaling
MWU-MEM FPC with PARSCALENo Fix, “0, 1” Scaling I10FIX 17 17 7 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1GROUP 01FIXED 20001 0.94308 0.07058 -1.12375 0.14908 0.26134 0.07792 0.00000 0.00000 0.00000 0.00000BLOCK 20002 0.98019 0.06877 -0.93880 0.12173 0.21813 0.06540 0.00000 0.00000 0.00000 0.00000BLOCK 20003 1.18582 0.07723 -0.72689 0.08253 0.19030 0.04856 0.00000 0.00000 0.00000 0.00000
(Omitted)
FREED 50016 1.16556 0.03437 -0.14845 0.01309 0.00000 0.00000 0.00000 1.25729 0.29044 -0.33537 -1.21236 0.00000 0.04262 0.03157 0.02902 0.03037BLOCK 50017 1.42147 0.04095 -0.19171 0.01178 0.00000 0.00000 0.00000 1.29058 0.38858 -0.50917 -1.16999 0.00000 0.03895 0.02653 0.02434 0.02606
44
IV. Use of Computer Programs for FPCIllustration of FPC with PARSCALE
Modified Item Parameter File (MC10FIX.IFN)
MWU-MEM FPC with PARSCALENo Fix, “0, 1” ScalingI10FIX 17 17 7 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1GROUP 01FIXED 20001 0.69300 0.00000 -1.50000 0.00000 0.12500 0.00000 0.00000 0.00000 0.00000 0.00000BLOCK 20002 0.78600 0.00000 -1.00000 0.00000 0.18500 0.00000 0.00000 0.00000 0.00000 0.00000BLOCK 20003 0.89700 0.00000 -0.60000 0.00000 0.23300 0.00000 0.00000 0.00000 0.00000 0.00000
(Omitted)
FREED 50016 1.16556 0.03437 -0.14845 0.01309 0.00000 0.00000 0.00000 1.25729 0.29044 -0.33537 -1.21236 0.00000 0.04262 0.03157 0.02902 0.03037BLOCK 50017 1.42147 0.04095 -0.19171 0.01178 0.00000 0.00000 0.00000 1.29058 0.38858 -0.50917 -1.16999 0.00000 0.03895 0.02653 0.02434 0.02606
Replaced with fixed a
Replaced with fixed b
Replaced with fixed c
Replacing for the 10 fixed items
45
IV. Use of Computer Programs for FPCIllustration of FPC with PARSCALE
Command File for the STPU Method (After Transformation)STPU FPC with Transformed Prior Points
The examinee group was sampled from N(1,1).
Omitted (The same as the commands for MWU-MEM>CALIB NQPT=31, PAR, LOG, SCALE=1.7, CYCLE=200, NEWTON=0, FREE=(NOADJUST, NOADJUST), ESTORDER, SPRI, GPRI, DIST=4, QPREAD;>QUADP POINTS=( -5.2976E+000 -4.9280E+000 -4.5598E+000 -4.1902E+000 -3.8206E+000 -3.4524E+000 -3.0828E+000 -2.7132E+000 -2.3450E+000 -1.9754E+000 -1.6059E+000 -1.2377E+000 -8.6808E-001 -4.9891E-001 -1.2988E-001 2.3929E-001 6.0846E-001 9.7749E-001 1.3467E+000 1.7162E+000 2.0844E+000 2.4540E+000 2.8236E+000 3.1918E+000 3.5614E+000 3.9310E+000 4.2992E+000 4.6688E+000 5.0384E+000 5.4066E+000 5.7761E+000), WEIGHTS=( 1.2430E-005 3.4290E-005 8.7330E-005 2.0480E-004 4.4420E-004 9.1150E-004 1.8720E-003 4.1960E-003 1.0550E-002 2.5160E-002 4.2780E-002 5.0290E-002 5.8510E-002 8.6110E-002 9.9290E-002 8.6880E-002 9.7990E-002 1.0840E-001 9.0140E-002 7.7730E-002 6.4860E-002 4.3230E-002 2.4440E-002 1.3010E-002 6.7400E-003 3.3710E-003 1.6010E-003 7.1410E-004 2.9710E-004 1.1500E-004 4.1380E-005);>SCORE NOSCORE;
From “0, 1” Scaling
(Not Transformed)
Rescaled points byθ* = Aθ+B,
A = 1.38B = 0.24
46
IV. Use of Computer Programs for FPCIllustration of FPC with PARSCALE
FPC Estimates of Non-Anchor Item Parameters
on the Fixed Old ScaleSTPU Method a b c Item c2 c3 c4 c5 11 0.741 -1.361 0.194 12 0.767 -0.995 0.238 13 0.741 -0.906 0.185 14 0.942 -0.442 0.140 15 1.181 -0.113 0.234 16 0.920 0.025 -1.569 -0.343 0.449 1.56217 1.120 -0.031 -1.667 -0.522 0.615 1.452
MWU-MEM Method a b c Item c2 c3 c4 c5 11 0.741 -1.361 0.194 12 0.768 -0.994 0.238 13 0.741 -0.908 0.184 14 0.942 -0.444 0.139 15 1.180 -0.113 0.234 16 0.921 0.025 -1.568 -0.342 0.450 1.56117 1.120 -0.030 -1.666 -0.522 0.615 1.454
47
IV. Use of Computer Programs for FPCIllustration of FPC with BILOG-MG
FPC Estimates of Mean and SD of the Underlying Distribution on the Fixed Old
Scale
Method Mean Std. Dev.
STPU FPC 0.460 1.242 MWU-MEM FPC 0.456 1.227
Mean-Sigma B = 0.239 A = 1.384
Note. The new group examinees were from a N(0.5,1.22) distribution that was expressed on the fixed old scale.
Under-estimation
Over-estimation
48
Online Calibration in Computerized Adaptive Testing (CAT)
Calibration of Pretest Items on the Fixed Operational Scale in Regular, Non-CAT Administration
In a Mixed-Format Test, Separate Calibration of CR Items from MC Items To Minimize Effects of Bad CR Items on MC Item Calibration
Equating Test Forms in the CING Design
V. Applications of FPC for Scaling and Equating
49
In CAT, different sets of operational items are adaptively administered to examinees, with pretest items “seeded” in a certain common block of examinee groups.
Because the operational items were already calibrated, their parameters are known in CAT
Thus, FPC may be the best way to calibrate and diagnose the pretest items on the scale of the operational items, without affecting the operational item parameters.
V. Applications of FPCOnline Calibration in CAT
50
To develop test forms, pretest items are often administered together with operational items to examinees.
However, it would be wise to calibrate operational items separately from pretest items, because the operational item parameters could be contaminated by bad pretest items.
In this case, the ability distribution that is estimated using only the operational items can be reasonably used as the prior ability distribution for FPC with the pretest items, while the operational item parameters are used to fix the operational items in the FPC.
V. Applications of FPCCalibration of Pretest Items on the Fixed Operational Scale
51
A mixed-format test contains different types of items; for instance, some are MC items and others are CR items.
Simultaneous calibration with both types of items can be conducted, assuming that a dominant factor underlies examinees’ responses to items.
However, practitioners may want to calibrate MC items separately from CR items, because calibration with bad CR items might adversely affect the estimation of MC item parameters.
In this case, MC items are first calibrated and then CR items are calibrated while fixing the MC item parameters.
V. Applications of FPCFPC with Different Formats of Items
52
Test equating using IRT requires all item parameters to be placed on a common scale (which is usually the old form scale).
Once all item and ability parameters are placed on a common scale, IRT true score or observed score equating is conducted.
Thus, FPC can be effectively used for placing all item parameters on the fixed old scale. Surely, the anchor is the common items between the new and old forms.
V. Applications of FPCEquating Test Forms in the CING Design
END
Thank You
EXPLORE FPC