Upload
cecily-hunter
View
218
Download
1
Tags:
Embed Size (px)
Citation preview
Assessing Personality 75 Years After Likert:Thurstone Was Right!
(And some implications for I/O)
Colleagues
Sasha Chernyshenko Steve Stark
Thurstone In a series of papers in the late 1920s,
Thurstone asserted “Attitudes Can Be Measured” and provided several methods for their measurement
He assumed that a conscientious person would endorse a statement that reflected his/her attitude…but
“as a result of imperfections, obscurities, or irrelevancies in the statement, and inaccuracy or carelessness of the subjects” not everyone will endorse a statement, even when it matches their attitude
Thurstone, Psych Review, 1929
For N1 people with attitude S1, all should endorse a statement with scale value S1 if they were conscientious and the item was perfect; but only n1 actually endorse the item
These people will endorse another statement with scale value S2 with a probability p that is a function of |S1-S2|
Figure from Thurstone’s paper:
Thurstone 1929
Thurstone 1928 Attitudes Can Be Measured
Gave an example of an attitude variable, militarism-pacifism, with six statements representing a range of attitudes:
Thurstone 1928
Thurstone 1928
A pacifist “would be willing to indorse all or most of the opinions in the range d to e and … he would reject as too extremely pacifistic most of the opinions to the left of d, and would also reject the whole range of militaristic opinions.”
“His attitude would then be indicated by the average or mean of the range that he indorses”
Implications
On Thurstone’s pacificism-militarism scale, three people might endorse two items each: Person 1 endorses f and d, and is very
pacifistic Person 2 endorses e and b, and is neutral Person 3 endorses c and a, and is very
militaristic Thus, it is crucial to know which items are
endorsed!
Likert 1932
Proposed a much simpler approach: A five-point response scale with options “Strongly Approve”, “Approve”, “Neutral”, “Disapprove”, and “Strongly Disapprove”.
The numerical values 1 to 5 were assigned to the different response options
And an individual’s score was the sum or mean of the numerical scores
Likert 1932
Likert evaluated his scales bySplit-half reliabilityItem-total correlations
To make this work, he hit upon the idea of reverse scoring, e.g., statements like d and f from Thurstone needed to be scored in the opposite direction of statements like a and c.
Likert 1932
When computing item-total correlations, “if a zero or very low correlation coefficient is obtained, it indicates that the statement fails to measure that which the rest of the statements measure.” (p. 48)
“Thus item analysis reveals the satisfactoriness of any statement so far as its inclusion in a given attitude scale is concerned”
Likert 1932
Likert discarded intermediate statements like “Compulsory military training in all countries should be reduced but not eliminated”
Such a statement is “double-barreled and of little value because it does not differentiate persons in terms of their attitudes” (p. 34)
Likert Scaling
Although Likert didn’t articulate a psychometric model for his procedure, his analysis implies what Coombs (1964) called a dominance response process.
Specifically, someone high on the trait or attitude measured by a scale is likely to “Strongly Agree” with a positively worded item and “Strongly Disagree” with a negatively worded item
Person endorses item if her standing on the latent trait, theta, is more extreme than that of the item.
0.00.10.20.30.40.50.60.70.80.91.0
-3 -2 -1 0 1 2 3
Theta
Pro
b o
f P
osi
tive
Re
spo
nse
Item Person
Example of a Dominance Process
Thurstone Scaling
Thurstone assumed people endorse items reflecting attitudes close to their own feelings
Coombs (1964) called this an ideal point process
Sometimes called an unfolding model
Person endorses item if his standing on the latent trait is near that of the item.“I enjoy chatting quietly with a friend at a cafe.” Disagree either because:
Too introverted (uncomfortable in public places)Too extraverted (chatting over coffee is boring)
Example of an Ideal Point Process
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
-3.0 -2.5 -2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0Theta
Item
TooIntroverted
TooExtraverted
Important Point:
The item-total correlation of intermediate ideal point items will be close to zero!
Which Process is Appropriate for Temperament Assessment?
In a series of studies, we’veExamined appropriateness of dominance process
by fitting models of increasing complexity to data
from two personality inventories
Compared fits of dominance and ideal point
models of similar complexity to 16PF data
Compared fits of dominance and ideal point
models to sets of items not preselected to fit
dominance models
Fitting Traditional Dominance Models to Personality Data
Data 16PF 5th Edition
• 13,059 examinees completed 16 noncognitive scales Goldberg’s Big Five factor markers
• 1,594 examinees completed 5 noncognitive scales
Models examined Parametric – 2PLM, 3PLM
Nonparametric – Levine’s Maximum Likelihood Formula
Scoring (MFSM)
Three-Parameter Logistic Model
Three-Parameter Logistic Model
Three-Parameter Logistic Model
Three-Parameter Logistic Model
Three-Parameter Logistic Model
Three-Parameter Logistic Model
Two-Parameter Logistic Model
Methods for Assessing Fit: Fit Plots
0.0
0.2
0.4
0.6
0.8
1.0
-3.0 -2.0 -1.0 0.0 1.0 2.0 3.0
Theta
Pro
b.
of
Po
siti
ve R
esp
on
se
IRF
EMP
Chi-squares typically computed for single items
Methods for Assessing Fit: Chi-Squares
Very important to examine item pairs and triplets
May indicate violations of local independence or misspecified model
s
1k i
2ii2
i kE
kEkO *i iE k N P u k f d
( , ') ( ) ( ') ( )ij i jE k k N P u k P u k f d
Methods for Assessing Fit: Chi-Squares
To aid interpretation of chi-squares: Adjust to sample size of 3,000
Compare groups of different size
The expected value of a non-central chi-square is equal to its df plus N times the noncentrality parameter
where N is the sample size. So an estimate of the noncentrality parameter is
NdfE )( 2
./)(ˆ 2 Ndf
Adjusted Chi-square
To adjust to a sample size of, say, 250, use
For IRT, we usually adjust to N = 3000, and divide by the df to get an adjusted chi-square/df ratio
Less than 2 is great, less than 3 is OK
2 2250( ) /Adjusted df df N
Adjusted Chi-square/df for an Ability Test
AdjChf < 3
<1 1<2 2<3 3<4 4<5 5<7 >7 Mean SDSinglets 11 3 2 1 0 2 1 1.877 2.923Doublets 77 44 31 16 12 9 1 1.829 1.734Triplets 327 424 264 92 14 16 3 1.71 1.092
FREQUENCY TABLE OF ADJUSTED (N=3000) CHISQUARE/DF RATIOS
Results for 16 PF Sensitivity Scale: Mean Chi-sq/df Ratios
Model Singles Doubles Triples
2PL 0.98 4.05 5.45
3PL 0.87 3.89 5.23
SGR 0.99 7.76 7.12
MFS-dich 2.91 2.61 2.42
MFS-poly 1.55 2.68 2.58
What if Items Assessed Trait Values Along the Whole Continuum?
Items on existing personality scales have been pre-screened on item-total correlation
We speculate that items measuring intermediate trait values are systematically deleted
So, what happens if a scale includes some intermediate items?
TAPAS Well-being Scale
Tailored Adaptive Personality Assessment System
Assesses up to 22 facets of the Big Five
Well-being is a facet of emotional stability
We wrote items reflecting low, moderate, and high well-being
For example, TAPAS Well-Being Scale
WELL04, “I don’t have as many happy moments in my life as others have
WELL17, “My life has had about an equal share of ups and downs
WELL41, “Most days I feel extremely good about myself
In total, 20 items. 5 negative items, 9 positive, and 6 neutral
Item_NameInitial SME Location Reverse Mean SD
Factor Loading
CITC (alpha=.76)
1 WELL02 negative r 2.14 0.80 -0.40 0.352 WELL04 negative r 2.08 0.87 -0.45 0.403 WELL06 negative r 2.23 0.78 -0.55 0.454 WELL09 negative r 2.22 0.76 -0.53 0.425 WELL13 negative r 2.20 0.77 -0.54 0.45
6 WELL16 neutral 2.48 0.85 0.08 0.08
7 WELL17 neutral 2.82 0.73 0.13 0.15
8 WELL19 neutral r 2.85 0.65 -0.09 -0.05
9 WELL20 neutral 3.00 0.89 0.04 0.0610 WELL23 neutral 3.03 0.64 0.07 0.1111 WELL26 neutral r 2.80 0.78 -0.14 0.0612 WELL29 positive 2.89 0.74 0.36 0.4813 WELL30 positive 2.77 0.74 0.56 0.4214 WELL34 positive 3.13 0.70 0.46 0.3515 WELL38 positive 2.80 0.82 0.57 0.4916 WELL40 positive 2.53 0.75 0.56 0.4817 WELL41 positive 2.96 0.73 0.56 0.5018 WELL43 positive 3.13 0.66 0.63 0.5519 WELL45 positive 2.82 0.70 0.53 0.4620 WELL46 positive 2.89 0.72 0.47 0.41
Traditional Analysis Results
Fit Plot for 2PL WELL17
Fit Plot for Item 7
0.0
0.2
0.4
0.6
0.8
1.0
-3.0 -2.0 -1.0 0.0 1.0 2.0 3.0
Theta
Pro
b.
of
Po
siti
ve R
esp
on
se
IRF7
EMP7
An Ideal Point Model: The Generalized Graded Unfolding Model (GGUM)
Roberts, Donoghue, & Laughlin (2000). Applied
Psychological Measurement.
The model assumes that the probability of
endorsement is higher the closer the item to the
person
GGUM software provides maximum likelihood
estimates of item parameters
GGUM
The probability of disagree is:
and the probability of agree is
1 1
1 1
exp{ [( ) ]} exp{ [2( ) ]}( 1| )
1 exp{3 ( )} exp{ [( ) ]} exp{ [2( ) ]}i j i i i j i i
i ji j i i j i i i j i i
P Z
1 1
1 exp{3 ( )}( 0 | )
1 exp{3 ( )} exp{ [( ) ]} exp{ [2( ) ]}i j i
i ji j i i j i i i j i i
P Z
GGUM Estimated IRF for Moderate Item
GGUM ORF for Option 2
0.0
0.2
0.4
0.6
0.8
1.0
-3.0 -2.0 -1.0 0.0 1.0 2.0 3.0
Well-Being
Pro
b. o
f P
osi
tive
Res
po
nse
IRF for Agree response to TAPAS Well-being item “My life has had about an equal share of ups and downs.”
TAPAS Well-being Scale
<1 1<2 2<3 3<4 4<5 5<7 >7 Mean SDSinglets 20 0 0 0 0 0 0 0 0Doublets 22 0 0 0 0 0 2 0.997 3.256Triplets 9 0 0 1 1 1 0 1.081 2.001
FREQUENCY TABLE OF ADJUSTED (N=3000) CHISQUARE/DF RATIOS
<1 1<2 2<3 3<4 4<5 5<7 >7 Mean SDSinglets 20 0 0 0 0 0 0 0 0Doublets 17 1 0 0 1 2 3 2.955 6.439Triplets 5 0 1 0 0 1 5 5.408 6.512
FREQUENCY TABLE OF ADJUSTED (N=3000) CHISQUARE/DF RATIOS
2PL Results:
GGUM Results:
Summary of Findings
2PLM and 3PLM fit scales developed by traditional methods OK, but if moderate items are included Chi-square doublets and triplets can be large, especially when
moderate items are included Discrimination parameter estimates are uniformly small for
moderate items (and item-total correlations are near zero).
GGUM fits all items, including moderate items Adj. chi-square to df ratios are small for doubles and triples GGUM discrimination parameter estimates are large for the
moderate items!
So, for Well-Being
Fitting a dominance item response theory model (the 2-parameter logistic) produced an adjusted Chi-Square to df ratio of 2.955 for pairs
The ideal point model yielded an adjusted Chi-square/df ratio of 0.997 for pairs
Conclusion
Ideal point model seems more appropriate for temperament assessment
BUT there’s a “Fly in the ointment” for I/O Correct specification of response process
does not guarantee more accurate assessment, because …
Traditional items are easily FAKED
Examples of “Traditional” Itemsthat are Easily Faked
I get along well with others. (A+) I try to be the best at everything I do. (C+) I insult people. (A-) My peers call me “absent minded.” (C-)
Because these items consist of individual statements, they
are commonly referred to as “single stimulus” items.
In each case, the positively keyed response is obvious.
Army Assessment of Individual Motivation (AIM) Uses tetrads:
• I get along well with others. (A+)• I set very high standards for myself. (C+)• I worry a lot. (ES-) • I like to sit on the couch and eat potato
chips. (Physical condition-)
Respondent picks the statement that is Most Like Me and the statement that is Least Like Me
Army AIM has shown less score inflation What psychometric model would describe
this type of data????
So…
US Army researchers Len White and Mark Young (and others) found some fake resistance and criterion-related validity for the tetrad format
But modeling four-dimensional items was too hard for me!
How about two-dimensional items?
Multidimensional Pairwise Preference (MDPP) Format
Create items by pairing stimuli that are similar in
desirability, but representing different dimensions
“Which is more like you?”
• I get along well with others. (A+)
• I always get my work done on time. (C+)
This led to my work on personality assessment
over the past 10 years
And the result is:
Tailored Adaptive Personality Assessment
System (TAPAS) TAPAS is designed to overcome existing limitations of
personality assessment for selection by incorporating
recent advancements in: Temperament/personality assessment
Item response theory (IRT)
Computerized adaptive testing (CAT)
Our goal is for TAPAS to be innovative in both how we
assess (IRT, CAT) and what we assess (facets of
personality)
TAPAS Vision
Fully customizable assessment to fit array of users’ needs
Users can select any dimension from a comprehensive superset; a scale length to suit their needs a response format (depends on faking worries) adaptive or static
Resulting scores can be used to predict multiple criteria or as source of feedback
TAPAS Facet Dimensions
Based on factor analysis of each of the Big Five dimensions E.g., Roberts, B., Chernyshenko, O.S., Stark, S., & Goldberg,
L. (2005). The structure of conscientiousness. Personnel
Psychology
Analyzed 7 major personality inventories Currently 21 facets + additional “physical
condition” facet for military jobs
TAPAS Facet Dimensions Conscientiousness
Six facet hierarchical structure:Industriousness: task- and goal-directed Order: planful and organized
Self-control: delays gratification
Traditionalism: follows norms and rules
Social Responsibility: dependable and reliable
Virtue: ethical, honest, and moral
Factor Analysis Results
Industriousness
OrderSelf-
controlResponsibi
lityTraditional
ismVirtue
neo competence .88 -.28 .14 .10 -.01 -.09neo achievement striving .76 .02 -.12 .10 .09 -.18ab5c organization .75 .11 .05 .11 -.10 -.17ab5c purposefulness .67 .18 -.04 -.02 -.11 .24neo self-discipline .65 .22 -.11 -.03 -.02 .16ab5c efficiency .63 .36 -.19 -.03 -.07 .21ab5c rationality .50 .16 .12 -.28 .16 -.01neo dutifulness .49 -.05 .14 -.02 .26 .09
FactorScale Name
For each facet, we have an empirical mapping of existing scales to our facets Provide basis for existing scale classification Validity of each facet can be investigated via meta-analysis
TAPAS Military Meta-Analysis
42 studies or technical reports 1988-2006
Small number of police and fire-fighter studies were also
included
22 TAPAS facets
8 criteria (e.g., task proficiency, contextual
performance, leadership, attrition, fitness)
1494 empirical correlations
TAPAS Military Meta-Analysis
Job/Task Performance 38964 14 36 .05 .06Contextual Performance 19423 9 18 .21 .26Counterproductivity 17673 8 17 -.14 -.18Attrition 17912 5 8 -.09 -.10Leadership 9429 12 20 .15 .18Training Performance 6156 8 27 .14 .17Adaptability 1291 3 4 .17 .21Physical Fitness 18044 5 17 .18 .23
kcObserved Validity
Corrected Validity
Criterion N kd
Industriousness Results
Validity tables can be used to guide the choice of facets!
TAPAS Civiliam Meta-Analysis
Studies or technical reports in the period
1988-2006
Same 8 criterion categories and 22 TAPAS
facets
4755 validity coefficients (so, in total, we
have over 6,000 validities in our database)
“How” TAPAS Measures
Our research on the item response process for personality stimuli (Stark et al., 2006; Chernyshenko et al., 2007)
suggests that Response endorsement is driven by the similarity between
the person and the behavior described by the stimulus (aka, an ideal point process)
Implications: Different models (not the 3PL or SGR) should be used for
item administration and scoring: e.g., GGUM Multiple stimuli per item are possible (i.e., pairs)
“How” TAPAS Measures
The choice of 4 response formats will be available
Single statement dichotomous (Agree/Disagree) Single statement polytomous (SA,A,D,SD) Unidimensional pairwise preference (i.e., two-
alternative forced choice) Multidimensional pairwise preference (Stark,
2002)
• Used when faking is likely
Single Statement Scales
Generalized Graded Unfolding Model (GGUM; Roberts et al., 1998)
Reverse scoring is not needed Basic idea: a person endorses an item if it
accurately describes him/her Thus, the probability of endorsement is higher
the closer the item to the person
GGUM IRFs for twoPersonality Statements
"I enjoy chatting quietly with a friend at a café."(Sociability)
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
-3.0 -2.0 -1.0 0.0 1.0 2.0 3.0
Theta
P(T
heta
)
"I am about as organized as most people."(Order)
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
-3.0 -2.0 -1.0 0.0 1.0 2.0 3.0
Theta
P(th
eta)
Multidimensional Pairwise Preference (MDPP) Format
Create items by pairing stimuli that are similar in
desirability, but representing different dimensions
“Which is more like you?”
• I get along well with others. (A+)
• I set very high standards for myself. (C+)
MDPP Roots: Assessment of Individual Motivation (AIM)
AIM utilizes forced-choice tetrad format to reduce social desirability effects Greater resistance to faking than ABLE (a single
statement personality inventory developed by the Army researchers)
Low correlations (.00 to .25) with examinee race and gender and measures of cognitive ability
Predicts attrition and various job and training performance criteria in research and operational testing
MDPP Roots: Assessment of Individual Motivation (AIM)
But, due to quasi-ipsative scoring AIM items are difficult to create and Score accuracy cannot be checked against
known scores, because no formal psychometric model for stimulus endorsement is available
CAT is not possible without a psychometric model
Respondent evaluates each stimulus (personality statement) separately
and makes independent decisions about endorsement.
Stimuli may be on different dimensions.
Single stimulus response probabilities P{0} and P{1} computed using a
unidimensional ideal point model for “traditional” items (GGUM)
IRT Model for Scoring Multidimensional Pairwise Preference Items
(Stark, 2002; Stark, Chernyshenko, & Drasgow, 2005)
}1{}0{}0{}1{
}0{}1{
}1,0{}0,1{
}0,1{),()(
tsts
ts
stst
stddts PPPP
PP
PP
PP
tsi
1 = Agree0 = Disagree
Refer to new pairwise preference model as MDPP
MDPP IRF for Item Measuring Sociability and Order
MDPP Model Performance
Stark & Drasgow (2002)
.77 correlation between estimated and known
scores in 2-D tests, 20 pairs, 10% unidimensional
Stark & Chernyshenko
.88 for 5-D tests, 50 items, 5% unidimensional
All possible pairings of dimensions was not
required for good parameter recovery
CAT vs. Nonadaptive
* CAT yielded similar correlations with only half as many items.* 10-d CAT correlations > .9 with 100 items (only 5 unidim!).
% Unidim.
Items Per Construct
3-d 5-d 7-d 10-d 3-d 5-d 7-d 10-d
5 .73 .72 .76 .76 .87 .85 .86 .8710 .85 .87 .87 .86 .93 .93 .93 .9320 .93 .93 .93 .94 .96 .96 .96 .965 .73 .74 .75 .75 .87 .87 .85 .88
10 .85 .85 .86 .87 .92 .93 .93 .9320 .93 .93 .94 .94 .96 .96 .96 .965 .74 .74 .74 .75 .87 .84 .86 .87
10 .85 .85 .87 .86 .92 .90 .93 .9320 .92 .93 .93 .94 .96 .96 .96 .96
20
Average Correlation Across DimensionsNonadaptive Adaptive
5
10
Summary of MDPP Model Studies
MDPP items are attractive for applied use: Faking is more difficult Can create huge pool with relatively few statements
representing each dimension (20 stimuli = 190 items)
5% unidimensional pairings sufficient for accurate score recovery
As with SS models, MDPP CAT can reduce test length by about 50% while maintaining accuracy, which is important if many dimensions assessed.
Current Empirical TAPAS Studies
Comparing MDPP format to single statement
(SS) format
Testing what makes forced-choice items
resistant to faking# of dimensions?
Matching on social desirability?
Matching on statement locations?
Study 1: Benchmark Study
4-D MDPP measure (41 pairs) designed using “conventional
wisdom”
Match stimuli on social desirability (average difference
between SocD did not exceed 1.08 on 5-point scale)
Match stimuli to have different locations on respective
dimensions (average distance 4.3 units on Z-score metric)
4-D SS measure (40 items)
Both measures administered under faking and honest conditions (N
= 510 and N = 574)
2-D SS measure (20 items) – all honest (n=1084)
Very Strong Faking Instructions! Unlike in the previous sections where the instructions
asked you to be as honest and accurate as possible, we now ask that you PRETEND you are not yet in the Army, but very much want to be. Imagine a recruiter asks you to take this questionnaire to determine if you are GOOD ARMY MATERIAL. If you score well, you will be let into the Army. If you don’t score well, you will not.
For the remaining sections, you are to answer the test questions by describing yourself in a way that will make you look like “good Army material” so you are sure to pass the test and get into the Army. Remember you are not yet in the Army, but very much want to be. In other words, create the best possible impression of yourself and convince the Army that you will make a good Soldier.
Study 1: Benchmark Study
Comparability of formats under Honest Conditions dom_MDPP enr_MDPP ord_MDPP trad_MDPPdom_MDPP 1.00 0.27 0.12 0.11enr_MDPP 0.27 1.00 0.09 0.12ord_MDPP 0.12 0.09 1.00 0.33trad_MDPP 0.11 0.12 0.33 1.00dom_GGUM 0.59 0.22 0.02 0.08enr_GGUM 0.21 0.49 0.06 0.13ord_GGUM 0.20 0.15 0.49 0.34trad_GGUM 0.05 0.10 0.21 0.54ord_GOLD 0.21 0.13 0.50 0.35trad_GOLD 0.06 0.10 0.24 0.50
Study 1: Benchmark Study
MDPP scales created using conventional wisdom are as fakable as SS scales in strong faking conditions
In faking conditions, respondents chose items with “more positive” location (i.e., > 20% endorsement shift across conditions)
Honest Faking Difference Effect Sizedom_MDPP 0.10 0.32 0.21 0.32enr_MDPP 0.17 0.95 0.78 0.97ord_MDPP -0.07 0.32 0.39 0.70trad_MDPP 0.48 1.56 1.08 1.06dom_GGUM 0.13 0.44 0.31 0.41enr_GGUM 0.25 0.65 0.41 0.59ord_GGUM -0.19 0.36 0.54 0.71trad_GGUM 0.65 1.25 0.60 0.77TRAD_GOLD 31.43 31.31 -0.12 -0.03ORD_GOLD 29.96 29.69 -0.26 -0.05
Study 2: Location Matching
11-D MDPP static measure with 117 items
Match stimuli on similarity in locations (average distance
2.09 z-score units)
11-D SS measure (7 items each)
Both measures administered under faking and honest
conditions (N = 286 and N = 358)
Again, very strong faking instructions
Study 2: Location Matching
Unlike benchmark study, only 20 out of 117 items showed inflated percent endorsement shiftsNote that we matched only on locations, not
Soc.DScored 97 pair 11-D MDPP measure
Similar correlations across formats as in benchmark study
But, less score inflation
Study 2: Location Matching
MDPP Scores Honest (N= 358) Faking (N=276) Difference Effect SizeORD_MDPP97 -0.08 0.10 0.18 0.38SOC_MDPP97 0.13 0.06 -0.07 -0.12TRAD_MDPP97 -0.24 -0.01 0.23 0.30ENR_MDPP97 -0.77 -0.57 0.20 0.28DOM_MDPP97 -0.29 -0.33 -0.04 -0.06IND_MDPP97 -0.72 -0.43 0.29 0.49INTE_MDPP97 -0.17 -0.01 0.15 0.26TRUST_MDPP97 -0.24 -0.18 0.07 0.07CURI_MDPP97 0.01 0.13 0.12 0.20WELL_MDPP97 -0.38 -0.26 0.12 0.20PHYC_MDPP97 -0.54 -0.28 0.25 0.42 Compare to: SS scales in benchmarking study had .41 SD inflation for DOM,
and .79 SD inflation for TRAD
Conclusions
MDPP model (Stark, 2002) can be used effectively to score real MDPP response patterns MDPP scores agree with SS scores under honest conditions
Fake resistance of forced-choice format should not be taken for granted E.g., must match on item locations, not just Soc.D
Our MDPP CAT algorithm has constraints on location difference and Soc.D difference Adaptive testing format may further decrease fakability (e.g.,
NCAPS results with UPP scales) But, there is lots of R&D work to be done…
Current Work
TAPAS is being implemented by the US Army for enlistment screening June 8 for applicants without high school diplomas
Will it predict their attrition and counter-productive behaviors?
Current Work
We have about 50 statements for each of the 13 dimensions that are being used by the US Army
Are some statements overused? We don’t have a exposure control algorithm
In principle, each of the approximately 650 statements could be paired with any of the other 649…but there are lots of constraints on item selection…
In Sum,
TAPAS designed to bring the latest inPsychometric theoryComputer technologyPersonality theory
Our goal is to produce an easily customizable assessment tool to meet the needs of diverse users and researchers