Upload
martin-j-ippel
View
135
Download
5
Tags:
Embed Size (px)
Citation preview
Martin J. Ippel &
Ryan Glaze
CogniMetrics Inc, San Antonio,TX
Identification of Technical Aptitude Based on Criterion Measures of the
U.S. Navy Apprentice Technical Training Program (O.N.R. Contracts Nr. N00014-10-M-0087 & N00014-10-C-0505)
Paper presented at the 53rd Annual Conference of the International Military Testing Association, Bali (Indonesia).
October 31 – November 4, 2011
Outline: 1. Terminology: aptitude, ability and skill
2. Research Questions: (1) Can technical skill be measured independently from technical knowledge, and if so, (2) do we need different predictor variables to predict success in technical training on these aspects.
3. Did we indeed extract general aptitude variance from a collection of training performance measures? Evidence in favor:
i. Study 1: Technical Aptitude: a 2-dimensional concept
ii. Study 2: Observed Score Variance Decomposed
iii. Study 3: Translate SE model parameters into (2PN) IRT parameters
4. Summary of results
Terminology: Aptitude: entails that someone can learn to perform a task (or a class of tasks) under reasonable time constraints Ability: entails that someone can perform a task (or class of tasks); example: Using DC equipment Skill: a very specific ability; example: charging a car battery
Spearman-Holzinger Bi-Factor Model of the ASVAB (Ippel-Watson, 2008)
Spearman-Holzinger Bi-Factor Model of the ASVAB (Ippel-Watson, 2008)
Percentages of Variance Explained per ASVAB Factor estimated under two conditions: (1) Regular single-version administration (confounded condition); (2) multitrait-
multimethod design, both versions administered (unconfounded condition). (Ippel, 2011).
Conclusions:
1. The Armed Forces form a highly technical and partly high-tech work environment, but have no adequate tools for selection and placement of future personnel in this highly technical work environment.
2. The existing TK tests measure mainly the ASVAB general factor (crystalized intelligence).
2. Opportunity / Challenge
Apprentice Technical Training Program
We have access to training data of the Navy’s Apprentice Technical Training (ATT) program, which provides basic electricity and electronics training to 21 Navy ratings. The ATT program is a modular program. The ATT program consists of 49 modules, which in different combinations prepare recruits for a particular job rating.
ATT modules
modules for a particular job rating
common specific
specific
specific
specific
specific
specific
specific
Examples of Navy ratings trained in the ATT Program:
• (Aviation) Electronics Technician • Electrician’s Mate • Interior Communications Technician • Communications Technician • Gas Turbine Systems Technician • Sonar Technician • Fire Control Technician • Missile Technician
2. Opportunity / Challenge
Lesson 1 Lesson 2 Lesson k
ATT module tests S-test
K-test
Each module consists of 8 – 10 lessons Modules can be general to all job ratings or specific for a subgroup of job ratings
2. Opportunity / Challenge
2. Opportunity / Challenge
Lots of test data available, but (as is often the case with training data) many test-score distributions have undesirable properties. For example,
3. Research Questions
Previous Study: • Make something out of these criterion scores: Watson & Ippel (2008): developed a logistic model to replace the dichotomized K-scores and S-scores with probability of passing the Minimum Competence Level (MCL). Present Project: • Can technical skill be measured independently from technical
knowledge, and if so, • Do we need different predictor variables to predict success in
technical training on these aspects
3. Research Questions
STUDY I
SE model with dichotomous test scores
STUDY III
(2PN) IRT model
STUDY II
equivalent models
CT-CM model
augmented model
(Takane & DeLeeuw, 1989)
STUDY I
Test of an independent cluster model for Technical Aptitude
SE model with independent cluster structure (part model)
TS
S3
u13
S2
u12
S1
u11
TK
K3
u23
K2
u22
K1
u21
model 1 model 2
modules / tests Tech. Skill Tech. Knowledge Tech. Skill Tech. Knowledge
Estimate S.E. Estimate S.E. Estimate S.E. Estimate S.E.
Introduction to Electricity (S1) 0.443 0.096 0.464 0.104
Multi-meter Measurements (S2) 0.432 0.147 0.438 0.148
Basic DC Circuits (S3) 0.529 0.082 0.554 0.090
Introduction to AC (S4) No test available No test available
AC Test Equipment (S5) 0.678 0.083 0.737 0.091
Transformers (S6) 0.592 0.074 0.633 0.080
Introduction to DC (S7) 0.525 0.098 0.607 0.108
Digital Logic Functions (S8) 0.44 0.134 0.456 0.143
Introduction to Electricity (K1) 0.774 0.122 0.922 0.135
Multi-meter Measurements (K2) 0.121* 0.08 0.119* 0.086
Basic DC Circuits (K3) 0.092* 0.096 0.087* 0.100
Introduction to AC (K4) 0.485 0.075 0.541 0.085
AC Test Equipment (K5) 0.548 0.070 0.610 0.084
Transformers (K6) 0.203 0.100 0.220 0.108
Introduction to DC (K7) 0.392 0.079 0.417 0.088
Digital Logic Functions (K8) 0.417 0.083 0.460 0.091
*) p > 0.05
Table 1: Estimated Factor Loadings on Dimensions of a Technical Learning Aptitude under two different models (Navy Rating: AE, N = 500)
Table 2: Meta-analytic estimates of means of CFA factor loadings of A.T.T. post-tests over 16 samples of Navy Ratings (N = 500 for each sample)
modules / tests sampling 95% CI
Meanλ SDλ error SEλ LL UL
Introduction to Electricity (S1) 0.312 0.142 0.04% 0.008 0.297 0.328
Multi-meter Measurements (S2) 0.198 0.127 0.06% 0.009 0.180 0.215
Basic DC Circuits (S3) 0.465 0.084 0.02% 0.006 0.453 0.477
Introduction to AC (S4) No Test Available
AC Test Equipment (S5) 0.406 0.164 0.03% 0.007 0.393 0.419
Transformers (S6) 0.298 0.137 0.04% 0.008 0.282 0.313
Introduction to DC (S7) 0.450 0.071 0.02% 0.006 0.438 0.462
Digital Logic Functions (S8) 0.325 0.121 0.04% 0.008 0.31 0.339
Introduction to Electricity (K1) 0.540 0.095 0.01% 0.005 0.530 0.550
Multi-meter Measurements (K2) 0.456 0.18 0.02% 0.006 0.444 0.468
Basic DC Circuits (K3) 0.572 0.143 0.01% 0.005 0.563 0.582
Introduction to AC (K4) 0.628 0.077 0.01% 0.004 0.62 0.637
AC Test Equipment (K5) 0.577 0.071 0.01% 0.005 0.567 0.586
Transformers (K6) 0.525 0.118 0.02% 0.005 0.515 0.536
Introduction to DC (K7) 0.470 0.167 0.02% 0.006 0.458 0.482
Digital Logic Functions (K8) 0.630 0.132 0.01% 0.004 0.622 0.638
modules / tests sampling 95% CI
Meanτ SDτ error SEτ LL UL
Introduction to Electricity (S1) -0.309 0.569 0.04% 0.008 -0.324 -0.294
Multi-meter Measurements (S2) -1.397 0.338 0.01% -0.004 -1.388 -1.406
Basic DC Circuits (S3) -0.300 0.281 0.04% 0.008 -0.316 -0.285
Introduction to AC (S4) No Test Available
AC Test Equipment (S5) -0.043 0.684 0.13% 0.011 -0.064 -0.022
Transformers (S6) -0.810 0.493 0.00% 0.002 -0.814 -0.806
Introduction to DC (S7) -0.689 0.42 0.01% 0.003 -0.696 -0.682
Digital Logic Functions (S8) -1.066 0.447 0.00% -0.001 -1.064 -1.067
Introduction to Electricity (K1) -1.291 0.349 0.00% -0.003 -1.284 -1.297
Multi-meter Measurements (K2) -1.457 0.623 0.01% -0.005 -1.447 -1.467
Basic DC Circuits (K3) -0.752 0.391 0.00% 0.003 -0.758 -0.747
Introduction to AC (K4) -0.743 0.473 0.00% 0.003 -0.748 -0.737
AC Test Equipment (K5) 0.006 0.34 0.17% 0.011 -0.015 0.028
Transformers (K6) -1.290 0.337 0.00% -0.003 -1.283 -1.296
Introduction to DC (K7) -1.306 0.495 0.00% -0.003 -1.299 -1.313
Digital Logic Functions (K8) -1.496 0.467 0.01% -0.006 -1.485 -1.507
Table 3: Meta-analytic estimates of means of CFA difficulty parameters for each A.T.T. post-tests across 16 samples of Navy Ratings
(N = 500 for each sample)
STUDY II
Decomposition of Observed Score Variance (augmenting the SE model)
SE model with independent cluster structure (part model)
TS
S3
u13
S2
u12
S1
u11
TK
K3
u23
K2
u22
K1
u21
TS
TS
S3
u13
S2
u12
S1
u11
TK
K3
u23
K2
u22
K1
u21
Intro AC Basic DC Trans-
formers
TS
Correlated Traits – Correlated Modules model (part model)
TS
S3
u13
S2
u12
S1
u11
TK
K3
u23
K2
u22
K1
u21
Intro AC Basic DC Trans-
formers
TS
Correlated Traits – Correlated Modules model (part model)
Knowledge Domain Knowledge Type
Techn. Knowledge Techn. Skill H2 Unique
Tests Estimate S.E. Estimate S.E. Estimate S.E.
Introduction to Electricity (S1) 0.269 0.179 0.403 0.104 0.235 0.765
Introduction to Electricity (K1) 0.602 0.329 0.793 0.163 0.991 0.009
Multi-meter Measurements (S2) -0.215 0.257 0.463 0.151 0.261 0.739
Multi-meter Measurements (K2) 0.514 0.171 0.048 0.117 0.267 0.734
Basic DC Circuits (S3) 0.303 0.172 0.494 0.094 0.336 0.664
Basic DC Circuits (K3) 0.261 0.176 0.046 0.114 0.070 0.930
Introduction AC (S4) no test available
Introduction AC (K4) 0.492 0.073
AC Test Equipment (S5) -0.070 0.202 0.686 0.081 0.475 0.525
AC Test Equipment (K5) 0.136 0.171 0.531 0.070 0.300 0.700
Transformers (S6) 0.291 0.197 0.558 0.088 0.396 0.604
Transformers (K6) 0.147 0.146 0.178 0.104 0.053 0.947
Introduction to DC (S7) -0.696 0.353 0.713 0.175 0.993 0.007
Introduction to DC (K7) 0.043 0.151 0.388 0.080 0.152 0.848
Digital Logic Functions (S8) 0.139 0.287 0.424 0.128 0.199 0.801
Digital Logic Functions (K8) -0.023 0.147 0.426 0.084 0.182 0.818
Bold Face: significant at 0.05 or lower
R(S, K) = 0.881
ΦM: 0.70 for all non-diagonal entries
Mean Size of Variance Components in Criterion Tests for Common ATT Modules
VARIANCE ABSORPTION
Significant larger trait loadings in Study I as a percentage of total number of tests when the T-variance components were larger (T>M), equal (T=M), or smaller (T<M) than the module component
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
T<M T=M T>M
Per
cen
tage
of
sign
ific
ant
dif
fere
nce
s
Relation between T- and M variance componets
S-tests
K-tests
STUDY III
Translating SE model parameters into
(2PN) IRT parameters
Average Probability of Failure on K-tests and S-tests for trainees with a mean score on underlying latent traits at the first trial
IRT Parameters
Modules/Tests bj1 bj2 aj
Introduction to Electricity (S1) 0.301 -0.12
Multi-meter Measurements (S2) 0.296 -1.399
Basic DC Circuits (S3) 0.552 -0.231
Introduction to AC (S4)
AC Test Equipment (S5) 0.489 0.135
Transformers (S6) 0.312 -1.068
Introduction to DC (S7) 0.559 -0.575
Digital Logic Functions (S8) 0.559 -1.107
Introduction to Electricity (K1) 0.588 -1.334
Multi-meter Measurements (K2) 0.584 -1.916
Basic DC Circuits (K3) 1.068 -0.857
Introduction to AC (K4) 1.144 -1.295
AC Test Equipment (K5) 0.735 -0.124
Transformers (K6) 0.699 -1.607
Introduction to DC (K7) 0.559 -1.565
Digital Logic Functions (K8) 1.144 -2.471
Table 4: IRT parameters based on the meta-analytic factor loading and threshold estimates
Is the (2PN) IRT model adequate for the A.T.T. criterion score data?
a. Validating Assumptions • Independent cluster structure • Postulated dimensions represent the complete latent space of the
construct
b. Properties of ICCs
Probability of passing score should be monotonically increasing with person parameter
c. Model Predictions vs Observed Data
d. Other Psychometric Information • Test Information Functions • Distribution Observed Scores, True Scores and Person Parameter
Values
Is the (2PN) IRT model adequate for the A.T.T. criterion score data?
a. Validating Assumptions • Independent cluster structure • Postulated dimensions represent the complete latent space of the
construct
b. Properties of ICCs
Probability of passing score should be monotonically increasing with person parameter
c. Model Predictions vs Observed Data
d. Other Psychometric Information • Test Information Functions • Distribution Observed Scores, True Scores and Person
Parameter Values
Is the (2PN) IRT model adequate for the A.T.T. criterion score data?
a. Validating Assumptions • Independent cluster structure • Postulated dimensions represent the complete latent space of the
construct
b. Properties of ICCs
Probability of passing score should be monotonically increasing with person parameter
c. Model Predictions vs Observed Data
d. Other Psychometric Information • Test Information Functions • Distribution Observed Scores, True Scores and Person
Parameter Values
Item Characteristic Curves (ICCs) for dichotomized criterion measures K4 and K5 based on meta-analytically derived estimates (N = 10,000)
Is the (2PN) IRT model adequate for the A.T.T. criterion score data?
a. Validating Assumptions • Independent cluster structure • Postulated dimensions represent the complete latent space of the
construct
b. Properties of ICCs
Probability of passing score should be monotonically increasing with person parameter
c. Model Predictions vs Observed Data
d. Other Psychometric Information • Test Information Functions • Distribution Observed Scores, True Scores and Person
Parameter Values
0
5
10
15
20
25
30
35
40
45
50
-0.7 -0.6 -0.5 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8
Fre
qu
en
cy
Standarized Residuals TS test
Distribution of Standardized Residuals for the TS test based on seven dichotomized common A.T.T. criterion scores in a Stratified Sample (N =
1,000) with meta-analytically derived estimates of 2PN IRT item parameters
0
10
20
30
40
50
60
-0.7 -0.6 -0.5 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8
Fre
qu
en
cy
Standardized Residuals TK test
Distribution of Standardized Residuals for the TK test based on eight dichotomized common A.T.T. criterion scores in a Stratified Sample (N =
1,000) with meta-analytically derived estimates of 2PN IRT item parameters
Is the (2PN) IRT model adequate for the A.T.T. criterion score data?
a. Validating Assumptions • Independent cluster structure • Postulated dimensions represent the complete latent space of the
construct
b. Properties of ICCs
Probability of passing score should be monotonically increasing with person parameter
c. Model Predictions vs Observed Data
d. Other Psychometric Information • Test Information Functions • Distribution Observed Scores, True Scores and Person
Parameter Values
Test Information Function for Technical Skill Learning (Navy ranking AE, N = 500)
Test Information Function for Technical Concepts Learning (Navy ranking AE, N = 500)
Score distributions for the skill test consisting of seven ATT common modules criterion scores. Person parameters were transformed to the
same metric as the observed and true score estimates (Data: Stratified Sample, N = 10,000)
Score distributions for the knowledge test consisting of eight ATT common modules criterion scores. Person parameters were transformed to the
same metric as the observed and true score estimates (Data: Stratified Sample, N = 10,000)
Conclusions:
1. The A.T.T. criterion performance data could be modeled as a two-dimensional independent cluster structure.
2. The latent variables underlying these clusters could be shown to be orthogonal to module-specific training effects.
3. The underlying distribution of the person parameter (theta) was found to be normally distributed (most clearly for the technical skill parameter).
Why is this important? The advantage of deriving relatively “pure” measures directly from training data is obvious: o The logical distance between construct and behavior is extremely small
and the relevance of the theoretical construct for the criterion is indisputable
o It provides a best possible criterion to evaluate existing tests (see: Glaze & Ippel, 2011)
o It provides a best possible criterion for the development of new tests
THANK YOU!
0.00
0.10
0.20
0.30
0.40
0.50
0.60
0.70
S1 S2 S3 S4 S5 S6 S7 S8 K1 K2 K3 K4 K5 K6 K7 K8
Exp
lain
ed V
aria
nce
Post-Tests Common ATT Modules, (Navy Rating AE)
CFA
M-aptitude
M-module
0.000
0.100
0.200
0.300
0.400
0.500
0.600
0.700
0.800
0.900
1.000
Pro
bab
ility
Su
cce
ss
latent dimension
P (Xij = 1 | θj) = f (θj – bi)
bi
In IRT models the probability of success a function the distance between the location of the test (b) and the examinee (θ) on the latent dimension
3. Research Questions