Introduction to standard setting (cutscores)

SETTING CUTSCORES FOR CERTIFICATION

EXAMSNathan A. Thompson, Ph.D.

Vice President, ASCAdjunct Faculty, University of Cincinnati

Why are cutscores necessary? As Glaser (1963) pointed out, the reason

for the existence of many tests is to make decisions about people Mastery: Pass/Fail educational content Credentialing: Award/not professional

credential (certification, certificate, license) Pre-employment: Hire/not for job (or eligible

as candidate) University selection: Admission/not to

university or program

From Livingston (1980), discussing the rationale for cutscores:

Why are cutscores necessary? What does that mean? That most of what we want to measure is in

a continuum (knowledge, intelligence) and not naturally in “states” (e.g., male/female)

So we need to set a cutscore (or cutscores) on the continuum to sort examinees into groups that reflect interpretations and meanings that are useful to us Pass is “qualified” and Fail is “unqualified”

How do we set a cutscore? As the Livingston excerpt notes, all

cutscores involve a level of subjectivity or arbitrariness

The higher the stakes of the exam, the more we need to reduce the arbitrariness

Standard setting methods differ in their level of objectivity

A more objective method provides an anchor to validity and defensibility

How do we set a cutscore?Approach Example Arbitrari

nessArbitrary round number

70% of items MOST

Quota Whatever passes 85% of people (z=-1.0)

MOST

Examinee-based

Borderline, Contrasting Groups

LEAST

Content-based Angoff, Bookmark LEAST

Examinee-based methods Borderline Method

Experts familiar with content AND all examinees identify those examinees they consider “borderline”

The mean or median score for those examinees is the cutscore

Contrasting Groups Method Experts familiar with content AND all examinees

sort examinees into Pass and Fail Groups (or external criterion is used)

The point where the two score distributions cross is the cutscore

Examinee-based methods Are conceptually appealing but have two

large disadvantages: Require examinees to take the test first, so

pass/fail decisions cannot be made after they finish the test

Require a way to assign examinees into groups WITHOUT test scores – either experts that are familiar with all examinees or some sort of “gold standard” Example: For a practice test, results on the real

test can be used as a gold standard to set cutscore

Content-based methods The Angoff and Bookmark methods require

experts to look at items rather than candidates

Bookmark: pilot all items, analyze difficulty statistics, order the items by difficulty in a booklet, and ask experts to place a bookmark

Angoff: All experts provide a rating 0 to 100 for each item, average serves as cutscore

Content-based methods The Angoff method is the most commonly

used approach in certification testing and therefore quite legally defensible

Biggest advantage: does not require test to be administered for data

Can use data too, with Beuk Compromise, to incorporate examinee-based aspects

The drawback is that it requires a group of subject matter experts to rate all items, which can take time

Content-based methods The Bookmark method has the

advantage that a rating is not required for every item from every expert (which takes a lot of time)

The drawback is that it requires all items to be delivered to a decent-sized sample in order to obtain item difficulty statistics (might not be feasible)

Education

Introduction to standard setting (cutscores)