24

M AKING A PPROPRIATE P ASS- F AIL D ECISIONS D WIGHT H ARLEY, Ph.D. DIVISION OF STUDIES IN MEDICAL EDUCATION UNIVERSITY OF ALBERTA

Embed Size (px)

Citation preview

MMAKINGAKING A APPROPRIATE PPROPRIATE

PPASS-ASS-FFAIL AIL DDECISIONS ECISIONS

DDWIGHT WIGHT HHARLEY, Ph.D.ARLEY, Ph.D.

DIVISION OF STUDIES IN MEDICAL EDUCATIONDIVISION OF STUDIES IN MEDICAL EDUCATIONUNIVERSITY OF ALBERTAUNIVERSITY OF ALBERTA

PPASSINGASSING S SCORESCORES

Essential component of high stakes exams Essential component of high stakes exams Reaffirm standardsReaffirm standards Their purpose is to ensure that Their purpose is to ensure that

qualified candidates passqualified candidates pass unqualified candidates unqualified candidates do notdo not pass pass

How much is enough?How much is enough? Is 50% the passing score on this exam ?Is 50% the passing score on this exam ?

RREAFFIRMINGEAFFIRMING S STANDARDSTANDARDS

Performance standardPerformance standard Minimally adequate level of performance to enter Minimally adequate level of performance to enter

practicepractice Passing scorePassing score

Point on the score scale which separates those who are Point on the score scale which separates those who are successful and those who are notsuccessful and those who are not

TTHEHE B BASISASIS F FOROR P PASSINGASSING S SCORESCORES

Arbitrary judgment unavoidableArbitrary judgment unavoidable Reflect consensus of experts on reasonable Reflect consensus of experts on reasonable

expectations for evidence of competenceexpectations for evidence of competence Imposing discrete categories on a continuumImposing discrete categories on a continuum Set to serve the interests of public and professionSet to serve the interests of public and profession Process should be as open as possibleProcess should be as open as possible Based on as much relevant data as possibleBased on as much relevant data as possible Rationale presented as clearly as possibleRationale presented as clearly as possible

PPROCESSROCESS OFOF SSETTINGETTING PPASSINGASSING SSCORESCORES

Unreasonable to expect 100% correctUnreasonable to expect 100% correct Possible to construct tests with predetermined Possible to construct tests with predetermined

passing scorespassing scores Possible to adjust passing scores to achieve an Possible to adjust passing scores to achieve an

acceptable pass rateacceptable pass rate Possible to estimate a minimum passing score by Possible to estimate a minimum passing score by

combining estimates of the importance of individual combining estimates of the importance of individual test itemstest items

PPASSINGASSING S SCORECORE L LEVELEVEL

Determined by the situation and purposeDetermined by the situation and purpose Provide society with enough sufficiently competent practitionersProvide society with enough sufficiently competent practitioners Raising the passing score increases the average competence Raising the passing score increases the average competence

of those who pass but decreases their numberof those who pass but decreases their number Proportions passing should remain constantProportions passing should remain constant The more relevant and demanding the requirements for writing The more relevant and demanding the requirements for writing

the test, the fewer are expected to failthe test, the fewer are expected to fail If more than a small proportion of successful candidates fail the If more than a small proportion of successful candidates fail the

exam, its validity may be subject to serious challenge.exam, its validity may be subject to serious challenge.

CCRITERIARITERIA F FOROR D DEFENSIBILITYEFENSIBILITY

A standard setting method should …A standard setting method should … produce appropriate classification informationproduce appropriate classification information be sensitive to candidate performancebe sensitive to candidate performance be sensitive to instructionbe sensitive to instruction be statistically soundbe statistically sound identify the “true” standardidentify the “true” standard be easy to implement and computebe easy to implement and compute be credible and easily interpretable by lay peoplebe credible and easily interpretable by lay people

More than 3 dozen methodsMore than 3 dozen methods Some of the better known methods includeSome of the better known methods include

NedelskyNedelsky AngoffAngoff BookmarkBookmark EbelEbel Jaeger Jaeger IRT methodsIRT methods

SSTANDARD TANDARD SSETTING ETTING MMETHODSETHODS

““TTHE HE IINDUSTRYNDUSTRY SSTANDARDTANDARD””

The Angoff Method is:The Angoff Method is: the most commonly used methodthe most commonly used method convenient to useconvenient to use well-researched well-researched easily explainedeasily explained easily customizedeasily customized applicable to several response formatsapplicable to several response formats

AANGOFFNGOFF M METHODETHOD

Judges assign probabilities that a hypothetical Judges assign probabilities that a hypothetical minimally competent borderline candidate will be minimally competent borderline candidate will be able to answer each item correctly.able to answer each item correctly.

For each judge, probabilities are summed to get a For each judge, probabilities are summed to get a minimum performance level (MPL)minimum performance level (MPL)

MPLs are averaged to get a final passing scoreMPLs are averaged to get a final passing score

MMINIMALLY INIMALLY CCOMPETENTOMPETENT

The effectiveness of the Angoff method rests on the The effectiveness of the Angoff method rests on the judges’ ability to accurately conceptualize a judges’ ability to accurately conceptualize a “minimally competent, borderline candidate.”“minimally competent, borderline candidate.”

Repeated references to a formal summary of the Repeated references to a formal summary of the behaviours and performance indicators is requiredbehaviours and performance indicators is required

Judge training and calibration are essentialJudge training and calibration are essential

AANGOFF NGOFF CCALCULATIONSALCULATIONS

ItemItem Judge 1Judge 1 Judge 2Judge 2

11 1.001.00 0.850.85

22 0.650.65 0.500.50

33 0.800.80 0.750.75

44 0.450.45 0.500.50

55 0.300.30 0.400.40

MPLMPLjj 3.23.2 3.03.0

Passing score for this test is 3.1 items correct out of 5.

AA MMINORINOR VVARIANTARIANT

Judges are asked to imagine Judges are asked to imagine a pool of 100a pool of 100 minimally competent borderline students and then minimally competent borderline students and then estimate the number of these students who would estimate the number of these students who would answer the item correctlyanswer the item correctly

Reduces cognitive complexity of the taskReduces cognitive complexity of the task

VVARIATIONS ON A ARIATIONS ON A TTHEMEHEME

ScalesScales Iterative processIterative process Feedback between roundsFeedback between rounds

Judges’ resultsJudges’ results Past item performancePast item performance

p-valuesp-values % passing% passing

Yes/No procedureYes/No procedure

SSCALESCALES

Probability scales are sometimes provided to Probability scales are sometimes provided to simplify the process. For example:simplify the process. For example:

5%, 20%, 40%, 60%, 75%, 90%, 95%5%, 20%, 40%, 60%, 75%, 90%, 95%

0%, 5%, 10%, 15% … 95%, 100%0%, 5%, 10%, 15% … 95%, 100%

20%, 25%, 30% … 95%, 100%20%, 25%, 30% … 95%, 100%

AANGOFF WITH NGOFF WITH IITERATIONTERATION

Most commonly used modification.Most commonly used modification. ““Angoff-ing” is done a number of times.Angoff-ing” is done a number of times. Time between rounds is used for discussion among Time between rounds is used for discussion among

judges.judges. Intent is to reduce variability among judges on item Intent is to reduce variability among judges on item

estimates.estimates.

NNORMATIVE ORMATIVE DDATAATA

Normative or impact data is presented just prior to Normative or impact data is presented just prior to the final iteration.the final iteration.

Improves inter-rater reliability.Improves inter-rater reliability. Greatest impact on items that have been greatly Greatest impact on items that have been greatly

over or underestimated.over or underestimated.

YYES/ES/NNO O PPROCEDUREROCEDURE

Judges decide whether or not a single minimally Judges decide whether or not a single minimally competent borderline student would or would not competent borderline student would or would not answer the item correctlyanswer the item correctly

Attempt to simplify the cognitive complexity of the Attempt to simplify the cognitive complexity of the judges’ taskjudges’ task

Comparable results to the traditional methodComparable results to the traditional method

YYES/ES/NNO O CCALCULATIONSALCULATIONS

ItemItem Judge 1Judge 1 Judge 2Judge 2

11 11 11

22 11 00

33 11 11

44 00 00

55 00 00

MPLMPLjj 33 22

Passing score = Average of MPLs= (3+2)/2= 2.5 items correct

IIN AN N AN EEMERGENCYMERGENCY

When a committee is not available, Angoff-ing can When a committee is not available, Angoff-ing can be done solobe done solo

Assign Angoff values to each item ands sum the Assign Angoff values to each item ands sum the valuesvalues

Ask a colleague to review your Angoff assignmentsAsk a colleague to review your Angoff assignments Use an item analysis as a reality checkUse an item analysis as a reality check

RROUNDING OUNDING PPASSING ASSING SSCORESCORES

Rarely do derived passing scores produce exact Rarely do derived passing scores produce exact whole numberswhole numbers

Rounding may have an impact on the pass/fail rateRounding may have an impact on the pass/fail rate Consider the consequences of rounding Consider the consequences of rounding

Questions?Questions?