Upload
persephona13
View
219
Download
1
Embed Size (px)
Citation preview
Running Head: IRT ANALYSIS OF IPIP
Item Response Theory Analysis of the IPIP Big-Five Scales
D. Matthew Trippe and Robert J. Harvey
Virginia Polytechnic Institute and State University
Abstract
We used Samejima (1969) graded-response item response theory model to evaluate the Five
Factor Model scales of the public-domain International Personality Item Pool (IPIP; Goldberg,
1999). Test information and standard error functions showed that the IPIP scales provided
relatively good measurement precision across most of the scale ranges.
IRT Analysis of the IPIP - 2
Item Response Theory Analysis of the IPIP Big-Five Scales
The Five Factor Model (FFM) or “Big Five” approach has emerged as the dominant
taxonomic approach in the realm of personality research (e.g., Digman, 1990, John, 1990).
Although FFM critics exist (e.g. Block, 1995), many researchers and practitioners have
embraced the common framework provided by the FFM (e.g., McCrae & John, 1992; Goldberg,
1993; Costa & McCrae, 1995). In applied use, FFM constructs have shown to be valid predictors
of a wide variety of criteria (e.g., Paunonen & Ashton, 2001), and in particular, Industrial-
Organizational psychologists have produced a large body of research examining the predictive
relationships between FFM personality dimensions and job performance (e.g., Barrick & Mount,
1991; Mount & Barrick, 1995, Salgado, 1997; Hurtz & Donovan, 2000). Although some of
these studies have reached more optimistic conclusions than others, the bulk of the available
evidence suggests that FFM personality dimensions (especially Conscientiousness) exhibit at
least a small-to-moderate predictive relationship with respect to job performance, with improved
validity being obtained when the linkage between the predictor and criterion is logically and
theoretically justified (e.g., Hurtz & Donovan, 2000; Paunonen & Ashton, 2001), and when
problems regarding under-specification of the job performance criterion are avoided (e.g., Austin
& Villanova, 1992).
The increased focus that has been seen with respect to using personality traits for
employee selection and promotion purposes has led to a concomitant increase in the need for
researchers and practitioners to evaluate the quality and measurement precision of the personality
instruments they use to estimate the FFM dimensions. Item response theory (IRT) presents an
excellent methodology for evaluating personality instruments in this regard, given that unlike
classical test theory (CTT) it does not assume that tests are equally precise across the full range
IRT Analysis of the IPIP - 3
of possible test scores. That is, rather than providing a point estimate of the standard error of
measurement (SEM) for a personality scale as in CTT, IRT provides a test information function
(TIF) and a test standard error (TSE) function to index the degree of measurement precision
across the full range of the latent trait (denoted θ). Using IRT, personality tests can be evaluated
in terms of the amount of information and precision they provide at specific ranges of test scores
that are of particular interest (e.g., when used for top-down employee selection purposes,
measurement precision at the upper end of the θ continuum would likely be the primary focus,
and even a relatively large lack of precision at the lower end of the scale might be excused).
Because many standardized tests tend to provide their highest levels of measurement precision in
the middle range of scores, with declines in precision being seen at the high and low ends of the
scale (e.g., Harvey, Murry, & Markham, 1994), it is quite possible that a test might be deemed
adequate for assessing individuals scoring in the middle range of the scale, but unacceptably
precise at the high or low ends (which, depending on the direction of the test’s scale, may
represent precisely the most relevant ranges of scores for employee selection purposes).
Despite the advantages offered by IRT over the older CTT-based methods of assessing
test performance, relatively few IRT analyses of personality inventories have been reported to
date. For example, Harvey, Murry, and Markham (1994) evaluated the Myers Briggs Type
Indicator, finding that short-form versions of the instrument provided considerably less
measurement precision than the full-length form, and that precision for all of the forms was quite
low at both the high and low ranges of scores (a finding that would strongly caution against
using its scales in a top-down selection situation). Likewise, Rouse, Finger, and Butcher (1999)
evaluated the Minnesota Multiphasic Personality Inventory-2 Personality Psycholopathology-
Five scales, finding that although three scales provided peak information at the high end of the θ
IRT Analysis of the IPIP - 4
scale, very little information was produced at low levels of θ, a finding that suggests that
practitioners should be cautious when interpreting low and moderate scores. Finally, of more
direct relevance to the present investigation, McBride and Harvey (2002) examined the
performance of both the NEO-PI-R (Costa & McCrae, 1992) and scales formed from subsets of
the 1,200 items contained in Goldberg’s (1999) public domain International Personality Item
Pool (IPIP) that were designed to parallel the FFM scales of the NEO-PI. Using the graded-
response model of Samejima (1969) to analyze the Likert-type item responses; McBride and
Harvey (2002) found that that although 20-item IPIP scales were less precise than the NEO-PI
scales, 60-item IPIP scales outperformed the NEO-PI across most of the range of scores.
The present study was performed to further investigate the performance of the 60-item
IPIP scales formed by Goldberg (1999) to parallel the FFM dimensions measured by the NEO-
PI; two factors suggested the need for additional study. First, the McBride and Harvey (2002)
analyses were conducted using archival data collected as part of the IPIP development project
from carefully selected, compensated study participants who completed a wide range of survey
instruments over the course of the study; the degree to which similar results would be found in
additional samples (especially, ones not participating in a long-term research project) needed to
be determined (e.g., the archival participants in the original IPIP study may well have exhibited
appreciably higher levels of factors such as dedication, candor, veracity, etc., that potentially
may exert an impact on the obtained item parameters). Second, although McBride and Harvey
(2002) evaluated the degree to which some of IRT’s assumptions were satisfied (primarily,
unidimensionality), a closer examination of the performance of the graded-response IRT model
may be warranted (e.g., Drasgow, Levine, Tsien, Williams, & Mead, 1995).
IRT Analysis of the IPIP - 5
In the present study, we were particularly interested in determining the degree of
measurement precision that was present in these IPIP item pools for the ranges of scores that
would most likely be of interest to practitioners when using the FFM scales for employee
selection purposes (i.e., the desirable pole of each dimension). That is, the McBride and Harvey
(2002) results showed that although the TIF and TSE functions were relatively flat (a desirable
characteristic for a general-purpose instrument), all five of the FFA dimensions for both the
NEO-PI and IPIP scales showed their lowest levels of measurement precision for the pole of the
scale that would presumably be most useful in a selection context (i.e., at the high end of the
Agreeableness, Conscientiousness, Extraversion, and Openness scales, and at the low end of the
Neuroticism scale). If such results are found to be generalizable to additional samples of raters,
the importance of modifying the item pools to increase the performance of the IPIP scales in
these target score ranges (i.e., by including additional items having their points of maximum
information toward these poles) would be underscored.
Method
Participants
Participants were recruited primarily from the Introductory Psychology participant pool
at a large southeastern university; participants received extra credit toward their final grade. All
participants completed the IPIP in an online testing session in which they answered the test items
via a web browser, and in which test items were presented in alternating fashion with the five
scales intermixed in screens of 8 items per screen; respondents were not allowed to go back to
change answers from earlier screens. Because the IPIP was available on a publicly available
online web server, a number of non-student participants also volunteered to participate. Out of a
total sample of approximately 700 individuals, respondents were discarded if they took less than
IRT Analysis of the IPIP - 6
15 minutes to complete the online survey, or if their responses exhibited very low variance
across items (e.g., by selecting ‘2’ for all item responses), producing a final sample of N = 624
individuals for the IRT analyses.
Measure
The items from the IPIP used in this study were the 300 items identified by Goldberg
(2001; see http://ipip.ori.org/ipip/ for items and descriptive information) that paralleled the FFM
constructs of Agreeableness, Conscientiousness, Extraversion, Neuroticism and Openness as
measured by the subscales of the NEO-PI-R (Costa & McCrae, 1992). Each of the broad five
facet scales contained 60 items (10 for each of the NEO-PI subfactors), with each item composed
of a short statement (e.g. “love order and regularity”) to which participants responded on a five
point Likert scale (1= “very inaccurate” to 5 = “very accurate”).
Scale Dimensionality
Most IRT models assume that a response to any one item is unrelated to other item
responses if the latent trait is controlled for (e.g., Lord & Novic, 1968). Consequently, IRT
models assume that the latent trait construct space is either strictly unidimensional, or as a
practical matter, dominated by a general underlying factor. Reckase (1979) recommended that
the first factor account for at least twenty percent of the variance in order to obtain stable item
parameters; to evaluate the predominance of the factors underlying each FFA pool in the IPIP,
each scale was subjected to a common factor analysis using maximum likelihood estimation.
Modified parallel analysis (MPA; e.g., Drasgow & Lissak, 1983) was also used to assess the
dimensionality of each scale; MPA is an extension of the Humphreys and Montanelli (1975)
method of parallel analysis in which the eigenvalues from a synthetically created data set (i.e.,
one that satisfies the unidimensionality assumption of IRT) are compared to those estimated
IRT Analysis of the IPIP - 7
from the actual data. In this approach, the synthetic data set is generated from item and person
parameters based on the actual data set. Appreciable multidimensionality is said to be present
when the second eigenvalue obtained from the actual data set is significantly larger than the
second eigenvalue obtained from the synthetic data set.
Item Parameter Estimation and Model Fit
MULTILOG 6.0 (Thissen, 1991) was used to estimate Samejima’s (1969) Graded
Response Model parameters for items in each of the five scales. The maximum number of
iterations (cycles) was set to 2000, and all scales converged before reaching this maximum. The
fit of parameters obtained from each scale were evaluated using the graphical and statistical
procedures recommended by Drasgow, Levine, Tsien, Williams and Mead (1995). Results from
the graphical procedure are not reported due to space limitations; χ2 fit statistics were computed
for item singles, doubles and triples, with large χ2 values seen as indicative of poor fit. Drasgow
et al (1995) recommended interpreting values lower than 3.0 as indications of acceptable fit.
Given our inability to perform a cross-validation in light of our sample size (i.e., dividing the
sample into two groups of 312 would fall below the Reise & Yu, 1990, recommended minimum
sample size of at least 500 to recover stable item parameters), some degree of caution may be
appropriate when interpreting our results.
Results
Dimensionality
Table 1 presents statistics relevant to scale dimensionality; internal consistency using
CTT methods was reasonably strong, ranging from .90 for the Openness scale to .95 for the
Neuroticism scale. Although the variance explained by the first eigenvalue for these scales
exceeds Reckase’s (1979) criterion of 20%, and that the results in Figure 1 (which presents the
IRT Analysis of the IPIP - 8
eigenvalue plots from the MPA) show that in all 5 scales a clear and dominant first factor is
present, the fact that the second eigenvalues from the real data are all somewhat larger than the
second eigenvalues from the synthetic data indicates that all five IPIP FFA scales are to some
degree multidimensional.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Insert Table 1 and Figure 1 about here
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Of course, no objective or straightforward criterion or demarcation exists to indicate how
much multidimensionality is “too much” in order to recover stable and accurate item parameters
in IRT. In a series of studies on Monte Carlo and real data sets, Reckase (1979) found that when
a dominant first factor was present, IRT models primarily estimated the first factor, and that
model fit was directly related to the size of the first eigenvalue. That is, as the size of the first
eigenvalue increased, the deviations from fit decreased in approximately linear fashion. Reckase
concluded that an eigenvalue that accounted for at least 20% of the total variance is needed for
reasonable ability estimates and stable item parameters. Additionally, research by Drasgow and
Parsons (1983) and Parsons and Hulin (1982) reported evidence that dichotomous models are
considerably robust to rather severe violations of the unidimensionality assumption. For
example, Parsons and Hulin (1982) were able to recover reasonably stable item parameters when
analyzing the Job Descriptive Index (JDI), which is considerably multidimensional (i.e., the JDI
contains 4 distinct satisfaction facets, but still contains a dominant general satisfaction factor).
Similarly, in a series of simulated data sets Drasgow and Parsons (1983) found that as the
predominance of the general factor decreased, the estimation program LOGIST was drawn to the
strongest factor. They concluded that estimating parameters on moderately heterogeneous data
IRT Analysis of the IPIP - 9
sets, such as those found in achievement tests and attitude assessment is justified. Kirisci, Hsu,
and Yu (2001) investigated the robustness of polytomous item parameter estimation using
MULTILOG to violations of the unidimensionality assumption, concluding that (a) when data
are multidimensional, a test length of more than 20 items and a sample size of over 250 are
necessary to recover stable parameter estimates, and (b) when there is one dominant dimension
with several minor dimensions, a unidimensional IRT model is likely justified.
Thus, given past research, we viewed the moderate violations of strict unidimensionality
seen in the IPIP personality scales in Figure 1 and Table 1 as being unlikely to exert an
appreciable distorting effect on IRT parameter estimation. To further assess model fit, χ2
statistics are reported in Table 2. The mean χ2 value for all 5 scales is well below Drasgow et
al’s (1995) recommended cutoff of 3. Although such results are supportive of the fit of the IRT
model to these data, it should be stressed that even more definitive results could be obtained via
the use of separate and larger validation and holdout samples.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Insert Table 2 about here
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Test Information and Standard Error Functions
Item information functions for each of the five FFM scales were aggregated to form the
test information functions (TIFs) reported in Figure 2. These TIFs indicate the areas on the θ
continuum in which the IPIP scales provide the most information or best discrimination among
test takers. Figure 3 reports plots of the test standard error functions based on the test information
functions. That is, as the SEM for a given level of θ decreases, the information at that level θ
increases.
IRT Analysis of the IPIP - 10
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Insert Figures 2-3 about here
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
An examination of the TIF and TSE results presented in Figures 2 and 3 indicates that for
all scales except Neuroticism, test information clearly declines at the higher end of the θ scale;
typically, the TSE for these scales hovers around .25 for mid- and lower ranges of θ, then
gradually rises to the .40’s and higher towards the positive extreme of each latent trait (the
reverse situation is seen for the Neuroticism scale). Unfortunately, because higher scores on the
non-Neuroticism scales would typically be viewed as desirable among applicants for many jobs
(with lower scores desirable on Neuroticism), the fact that the IPIP provides very good levels of
information across most ranges of θ is to some degree offset by the notable drop in information
and measurement precision that occurs in the regions of the θ scale that are most desirable in
selection contexts. Although this loss of precision is to some degree moderated by the fact that a
smaller percentage of individuals would be expected to lie at the extremes of these FFA scales,
implementing a top-down hiring strategy based on these personality test results would tend to
exacerbate the relative loss of measurement precision existing in this region of θ.
Discussion
Overall, consistent with the earlier McBride and Harvey (2002) study of the IPIP and
NEO-PI, the results of the present study indicate that the IPIP FFM scales are capable of
providing strong measurement precision across most of the range of their respective personality
traits. As with most psychometric inventories, measurement precision tends to decline somewhat
at the extreme ends of the latent traits. Although these findings are strongly supportive of the
usefulness of the IPIP pools as indicators of the FFM constructs for “general purpose”
IRT Analysis of the IPIP - 11
assessment situations (which are typically focused primarily on the middle range of scores that
contains the majority of examinees), our findings should instead motivate caution on the part of
test users who intend to use these FFM scales as assessed by the IPIP (or the NEO-PI as well,
based on the McBride & Harvey, 2002, results) in order to make employment or other selection
or placement decisions on a top-down basis. Additional research using either actual or Monte
Carlo methods is now needed to determine the degree to which actual employment decisions
might be affected based on the magnitude of measurement error implied by the TSE and TIF
results reported above.
Even given these cautionary notes, it must be stressed that overall, our results are quite
consistent with the McBride and Harvey (2002) results in indicating that these IPIP item pools
provide an impressive level of performance for most ranges of θ. This strong measurement
performance is all the more notable in light of the fact that the IPIP was designed to be a non-
proprietary resource available to all researchers. Given the lack of proprietary restrictions on the
use of these item pools, ideally research on the IPIP will fulfill its designer’s goals of progressing
at a faster rate than would be possible with a proprietary instrument (e.g., Goldberg, 1999).
Of course, applied uses of the full 300-item instrument we used in this study might well
be viewed as problematic, given the lengthy and cumbersome nature of such a survey when used
for employee selection or other assessment purposes. Fortunately, the relatively flat test standard
error and information functions seen for these item pools suggests that this 300-item pool might
form the basis for driving a computer adaptive testing (CAT) version of the IPIP that could
considerably reduce test administration time. That is, unlike many full-length psychological tests
– which typically exhibit a test information function that is strongly center-weighted, and often
considerably lacking at the high and low ends of the scale – the TIFs shown in Figure 2 are (as in
IRT Analysis of the IPIP - 12
the McBride & Harvey, 2002, study) remarkably flat across a wide range of θ. Thus, it is quite
likely that a CAT-based IPIP for these FFM constructs could appreciably cut testing time without
causing an undue reduction in information and precision in estimating θ for examinees.
Additional research evaluating the degree to which CAT can reduce testing time, plus studies
designed to assess the susceptibility of the IPIP items to differential item functioning (DIF) on
criteria relevant to employee selection (e.g., race, sex) is now needed.
IRT Analysis of the IPIP - 13
References
Austin, J.T., & Villanova, P. (1992). The criterion problem. Journal of Applied Psychology, 77,
836-874.
Barrick, M.R., & Mount, M.K. (1991). The Big Five personality dimensions and job
performance: A meta-analysis. Personnel Psychology, 44, 1-26.
Block, J. (1995). A contrarian view of the five factor approach to personality description.
Psychological Bulletin, 117, 187-215.
Costa, P. T., & McCrae, R.R. (1995). Primary traits of Eysenck’s P-E-N system: Three and five
factor solutions. Journal of Personality and Social Psychology, 69, 308-317.
Costa, P. T., Jr., & McCrae, R. R. (1992). Revised NEO Personality Inventory (NEO-PI-R) and
NEO Five-Factor Inventory (NEO-FFI) professional manual. Odessa, FL: Psychological
Assessment Resources.
Digman, J.M. (1990). Personality structure: Emergence of the five factor model. Annual Review
of Psychology, 41, 417-440.
Drasgow, F., Levine, M.V., Tsien, S., Williams, B., & Mead, A.D. (1995). Fitting polytomous
item response theory models to multiple choice tests. Applied Psychological
Measurement, 19, 143-165.
Drasgow, F., & Lissak, R. I. (1983). Modified parallel analysis: A procedure for examining the
latent dimensionality of dichotomously scored item responses. Journal of Applied
Psychology, 68, 363-373.
Drasgow, F., & Parsons, C.K. (1983). Application of unidimensional item response theory
models to multidimensional data. Applied Psychological Measurement, 7, 189-199.
IRT Analysis of the IPIP - 14
Goldberg, L. R. (1999). A broad-bandwidth, public domain, personality inventory measuring the
lower-level facets of several five-factor models. In I. Mervielde, I. Deary, F. De Fruyt, &
F. Ostendorf (Eds.), Personality Psychology in Europe, Vol. 7 (pp. 7-28). Tilburg, The
Netherlands: Tilburg University Press.
Goldberg, L. R. (1993). The structure of phenotypic personality traits. American Psychologist,
48, 26-34.
Harvey, R. J., Murry, W.D., & Markham, S.E. (1994). Evaluation of three short-form versions of
the Meyers-Briggs Type Indicator. Journal of Personality Assessment, 63, 181-184.
Humphreys, L.G., & Montanelli, R.G., Jr. (1975). An investigation of the parallel analysis
criterion for determining the number of common factors. Multivariate Behavioral
Research, 10, 193-205.
Hurtz, G. M., & Donovan, J.J. (2000). Personality and job performance: The Big Five revisited.
Journal of Applied Psychology, 85, 869-879.
International Personality Item Pool (2001). A Scientific Collaboratory for the Development of
Advanced Measures of Personality Traits and Other Individual Differences
(http://ipip.ori.org/). Internet Web Site.
Kiriscki, L., Hsu, T., & Yu, L. (2001). Robustness of item parameter estimation programs to
assumptions of unidimensionality and normality. Applied Psychological Measurement,
25, 146-162.
Lord, F.M., & Novic, M. (1968). Statistical theories of mental test scores. Reading Mass.:
Addison-Wesley.
IRT Analysis of the IPIP - 15
McBride, N. L., & Harvey, R. J. (2002, April). Item response theory comparison of the IPIP and
NEO-PI-R Paper presented at the Annual Conference of the Society for Industrial and
Organizational Psychology, Toronto.
McCrae, R.R. & John, O. P. (1992). An introduction to the five factor model and its application.
Journal of Personality, 60, 175-215.
Mount, M.K. & Barrick, M.R. (1995). The Big Five personality dimensions: Implications for
research and practice in human resources management. In K. M. Rowland and G. Ferris
(Eds.), Research in personnel and human resource management (Vol. 13, pp 153-200).
Greenwich, CT: JAI Press.
Parsons, C.K. & Hulin, C.L., (1982). An empirical comparison of item response theory and
hierarchical factor analysis in applications to the measurement of job satisfaction. Journal
of Applied Psychology, 67, 826-834.
Paunonen, S.V. & Ashton, M.C. (2001). Big five factors and facets and the prediction of
behavior. Journal of Personality and Social Psychology, 81, 524-539.
Pervin, L.A. (1994). A critical analysis of current trait theory. Psychological Inquiry, 5, 103-113.
Reckase, M.D. (1979). Unifactor latent trait models applied to multifactor tests: Results and
implications. Journal of Educational Statistics, 4, 207-230.
Reise, S. P. & Yu, J. (1990). Parameter recovery in the graded response model using
MULTILOG. Journal of Educational Measurement, 27, 133-144.
Rouse, S.V., Finger, M.S., & Butcher, J.N. (1999). Advances in clinical personality
measurement: An item response theory analysis of the MMPI-2 PSY-5 Scales. Journal
of Personality Assessment, 72, 282-307.
IRT Analysis of the IPIP - 16
Salgado, J.F. (1997). The five factor model of personality and job performance in the European
community. Journal of Applied Psychology, 82, 30-43.
Samejima, F. (1969). Estimation of latent ability using a response pattern of graded scores.
Psychometrika Monograph Supplement.
Thissen, D. (1991). MULTILOG version 6.0 user’s guide [computer program]. Chicago:
Scientific Software International.
IRT Analysis of the IPIP - 17
Table 1
Descriptive Statistic Relevant to Dimensionality for the International Personality Item Pool FFM
Scales
IPIP Scale Cronbach’s Coefficient
Alpha (raw)
First Eigenvalue
Variance Explained by first
Eigenvalue Agreeableness .92 10.85 38% Conscientiousness .94 13.38 45% Extraversion .93 13.45 44% Neuroticism .95 15.09 48% Openness .90 8.83 30%
IRT Analysis of the IPIP - 18
Table 2 Χ2 Fit Statistics for the International Personality Item Pool’s FFM Scales.
Frequency Table of Chi-Square/DF Ratios for IPIP300 Agreeableness Scale <1 1<2 2<3 3<4 4<5 5<7 >7 Mean SD
Singlets 60 0 0 0 0 0 0 0.044 0.051Doublets 23 25 7 3 0 2 0 1.471 1.098Triplets 3 13 3 1 0 0 0 1.534 0.691
Frequency Table of Chi-Square/DF Ratios for IPIP300 Conscientiousness Scale <1 1<2 2<3 3<4 4<5 5<7 >7 Mean SD
Singlets 60 0 0 0 0 0 0 0.076 0.088Doublets 14 32 11 1 0 1 1 1.597 1.105Triplets 3 15 0 1 1 0 0 1.449 0.817
Frequency Table of Chi-Square/DF Ratios for IPIP300 Extraversion Scale <1 1<2 2<3 3<4 4<5 5<7 >7 Mean SD
Singlets 60 0 0 0 0 0 0 0.039 0.048Doublets 18 30 8 4 0 0 0 1.484 0.77Triplets 3 14 3 0 0 0 0 1.452 0.456
Frequency Table of Chi-Square/DF Ratios for IPIP300 Neuroticism Scale <1 1<2 2<3 3<4 4<5 5<7 >7 Mean SD
Singlets 60 0 0 0 0 0 0 0.124 0.136Doublets 9 33 10 2 4 1 1 1.947 1.353Triplets 2 12 5 0 1 0 0 1.744 0.838
Frequency Table of Chi-Square/DF Ratios for IPIP300 Openness Scale <1 1<2 2<3 3<4 4<5 5<7 >7 Mean SD
Singlets 60 0 0 0 0 0 0 0.075 0.1Doublets 25 28 4 2 0 0 1 1.325 1.191Triplets 6 11 2 1 0 0 0 1.391 0.573
IRT Analysis of the IPIP - 19
Figure Captions
Figure 1. Modified parallel analysis scree plots for the five IPIP scales. Figure 2. Test level information functions for the five IPIP scales. Figure 3. Test level standard error of measurement plots for the five IPIP scales.
IRT Analysis of the IPIP - 20
IRT Analysis of the IPIP - 21
IRT Analysis of the IPIP - 22