View
218
Download
0
Category
Tags:
Preview:
Citation preview
Large-scale testing: Uses and abuses
Richard P. Phelps
Universidad Finis Terrae, Santiago, Chile
January 7, 2014
Large-scale testing: Uses and abuses
1. 3 types of large-scale tests2. Measuring test quality3. A chronology of mistakes4. Economists misunderstand testing5. How SIMCE is affected
AchievementAptitude
Non-cognitive
1. Three types of large-scale tests
Achievement tests Historically, were larger versions of classroom tests
~ 1900 - “scientific” achievement tests developed (Germany & USA)
SOURCE: Phelps, Standardized Testing Primer, 2007
J.M. Rice - systematically analyzed test structures & effects
E.L. Thorndike - developed scoring scales
Achievement tests
Purpose: to measure how much you know and can recall
Developed using: content coverage analysis
How validated: retrospective or concurrent validity (correlation with past measures, such as high school
grades)
Requires a mastery of content prior to test.
Fairness assumes that all have same opportunity to learn content
Coachable – specific content is known in advance
SOURCE: Phelps, Standardized Testing Primer, 2007
Aptitude tests
1917 – Adapted by U.S. Army to select, assign soldiers in World War 1
1930s – Harvard University president J. Conant- wanted new admission test to identify students from lower social classes with the
potential to succeed at Harvard- developed the first Scholastic Aptitude Test (SAT)
SOURCE: Phelps, Standardized Testing Primer, 2007
1890s – A. Binet & T. Simon (France)
- Pre-school children with mental disabilities
- achievement test not possible- developed content-free test of mental abilities
(association, attention, memory, motor skills, reasoning)
Aptitude testsPurpose: predict how much can be learned
Developed using: skills/job analysis
How validated: predictive validity, correlation with future activity (e.g., university or job evaluations)
Content independent. Measures: … what student does with content provided… how student applies skills & abilities developed over a lifetime
Not easily coachable – the content is either…… not known in advance, … basic, broad, commonly known by all, curriculum-free;… less dependent on the quality of schools
SOURCE: Phelps, Standardized Testing Primer, 2007
Aptitude tests
Aptitude tests can identify:
- Students bored in school who study what interests them on their own
- Students not well adapted to high school, but well adapted to university
- Students of high ability stuck in poor schools
SOURCE: Phelps, Standardized Testing Primer, 2007
Achievement Aptitude
Measure past learning potential
Development content analysis job/skills analysis
Validation retrospective predictive
Content dependent independent
Coachable? very much not much
Comparing Achievement & Aptitude tests
Non-cognitive tests
More recently developed – measure values, attitudes, preferences
Types: integrity tests career exploration matchmakingemployment “fit”
Non-cognitive tests
Purpose: to identify “fit” with others or a situation
Developed using: surveys, personal interviews
How validated? success rate in future activities
Content is personal, not learned
“Faking” can be an issue (e.g., “honesty” tests)
Achievement Aptitude Non-Cognitive
Measure past learning potential attitudes, values, preferences
Development content analysis job/skills analysis surveys
Validation retrospective predictive predictive
Content dependent independent independent
Coachable? very much very little can be faked
Comparing Achievement, Aptitude, & Non-Cognitive Tests
2. Measuring test quality
3 measures are important:1. Predictive validity2. Content coverage3. Sub-group differences
Test reports can be “data dumps”
Predictive validity(values from -1.0 to +1.0)
…measures how well higher scores on admission test match better outcomes at university (e.g., grades, completion)
A test with low predictive validity provides a little information.
Source: NIST, Engineering Statistics Handbook
A positive correlation between two measures
Source: NIST, Engineering Statistics Handbook
A negative correlation between two measures
Source: NIST, Engineering Statistics Handbook
No correlation between two measures
How does one measure predictive capacity?
Correlation Coefficient: I--------------------------------------------I
-1 0 1
0
0.1
0.2
0.3
0.4
0.5
0.6
SAT
PSU 2010
Predictive validities: SAT and PSU
SOURCE: Pearson, Final Report Evaluation of the Chile PSU, January 2013
Language Mathematics SAT Writing PSU Social Science
0
0.1
0.2
0.3
0.4
0.5
0.6
SAT PSU Administracion
Predictive validities: SAT and PSU(faculty: Administracion)
SOURCE: Pearson, Final Report Evaluation of the Chile PSU, January 2013
Recommended