Michigan Assessment Consortium Common Assessment Development Series Common Assessment Development Series Rubrics and Scoring Guides

Michigan Assessment Consortium

Common Assessment Common Assessment Development SeriesDevelopment Series

Rubrics and Scoring GuidesRubrics and Scoring Guides

Developed by…

Bruce R. Fay, PhDBruce R. Fay, PhD

Assessment ConsultantAssessment Consultant

Wayne RESAWayne RESA

Support

The Michigan Assessment Consortium The Michigan Assessment Consortium professional development series in common professional development series in common assessment development is funded in part by assessment development is funded in part by the Michigan Association of Intermediate the Michigan Association of Intermediate School Administrators in cooperation with the School Administrators in cooperation with the Michigan Department of Education, Michigan Michigan Department of Education, Michigan State University, Ingham and Ionia ISDs, State University, Ingham and Ionia ISDs, Oakland Schools, and Wayne RESA.Oakland Schools, and Wayne RESA.

What you will learn

Why and when you need rubricsWhy and when you need rubrics Different kinds of rubricsDifferent kinds of rubrics How to develop a rubricHow to develop a rubric How to use a rubricHow to use a rubric What scoring guides areWhat scoring guides are How to use scoring guidesHow to use scoring guides

Subjectivity in Scoring

No such thingNo such thing

If it’s truly subjective, it’s just If it’s truly subjective, it’s just someone’s opinion, and is of little someone’s opinion, and is of little or no value to the person being or no value to the person being assessedassessed

The problem is Bias

There are sources of bias in all There are sources of bias in all assessment methodsassessment methods

Some are common to all methodsSome are common to all methods Others are unique to each methodOthers are unique to each method All of them must be minimized in All of them must be minimized in

order for assessments with scored order for assessments with scored items to be fairitems to be fair

So, what is a rubric?

“…“…guidelines, rules, or principles by which guidelines, rules, or principles by which student responses, products, or student responses, products, or performances are judged. They describe performances are judged. They describe what to look for in student performances what to look for in student performances or products to judge qualityor products to judge quality.” (p. 4).” (p. 4)

Scoring Rubrics in the ClassroomScoring Rubrics in the Classroom

Judith Arter and Judith Arter and Jay McTigheJay McTighe

Corwin PressCorwin Press

Assessment methods that require a rubric / scoring guide

Written responseWritten response Performance/observationPerformance/observation Interactive/conversationInteractive/conversation PortfolioPortfolio

Where Do Rubrics Fit?

Classroom or large-scale assessmentClassroom or large-scale assessment Free-response written methodsFree-response written methods

Short- and extended-response itemsShort- and extended-response items Performance observation (somewhat)Performance observation (somewhat) Across assessment targets, especially Across assessment targets, especially

complex, hard to define ones, such as complex, hard to define ones, such as problem-solving, writing, and group problem-solving, writing, and group processesprocesses

Across content areas and grade levelsAcross content areas and grade levels

Assessment Myth #1

True or False? Selected-response True or False? Selected-response tests are more objective that free-tests are more objective that free-response tests?response tests?

Answer?Answer? False! The only thing truly False! The only thing truly

“objective” about selected-response “objective” about selected-response tests is the scoring.tests is the scoring.

Assessment Myth #2

True or False? Scoring (or grading) of True or False? Scoring (or grading) of free-response items is inherently free-response items is inherently subjective?subjective?

Answer?Answer? False! On two accounts:False! On two accounts:

1) Scoring and grading are not the same 1) Scoring and grading are not the same thingthing

2) “Subjective scoring” isn’t scoring at all, 2) “Subjective scoring” isn’t scoring at all, it’s just your opinionit’s just your opinion

Assessment of Open-ended Work

ChecklistsChecklists Performance listsPerformance lists Scoring RubricsScoring Rubrics Scoring GuidesScoring Guides

Checklists

Simple criteria for simple tasksSimple criteria for simple tasks Checklist formatChecklist format Assess presence or absence onlyAssess presence or absence only No judgment of qualityNo judgment of quality

Performance Lists

More sophisticated than checklistsMore sophisticated than checklists Criterion-basedCriterion-based Product, task, or performance broken Product, task, or performance broken

down into relatively simple, discrete down into relatively simple, discrete piecespieces

Each piece scored on a scaleEach piece scored on a scale Scale for each piece can be differentScale for each piece can be different

Scoring Rubrics & Guides

Judgments regarding complex tasksJudgments regarding complex tasks Written criteriaWritten criteria Score points defined/describedScore points defined/described Represents the essence of quality workRepresents the essence of quality work Reflects the best thinking in the fieldReflects the best thinking in the field Scoring guides often include exemplars in Scoring guides often include exemplars in

the form of annotated anchor papersthe form of annotated anchor papers

Benefits of Rubrics forTeaching, Learning, and Assessing

Focuses instruction/assessmentFocuses instruction/assessment Clarifies instructional goalsClarifies instructional goals Clarifies assessment targetsClarifies assessment targets Defines quality workDefines quality work

Integrates assessment and instructionIntegrates assessment and instruction Develops shared/consistent vocabulary and Develops shared/consistent vocabulary and

understandingunderstanding Provides consistency in scoringProvides consistency in scoring

Across students by a scorerAcross students by a scorer Across multiple scorersAcross multiple scorers Across timeAcross time

Rubric Basics

TypesTypes UsesUses Scoring rangesScoring ranges

Types of Rubrics

HolisticTrait

Analytic

Generic A B

Task Specific

C D

USES OF RUBRICS

Holistic Rubrics StrengthsStrengths

Provide a quick, overall rating of qualityProvide a quick, overall rating of quality Judge the “impact” of a product or performanceJudge the “impact” of a product or performance Use for Summative or large-scale assessmentUse for Summative or large-scale assessment

LimitationsLimitations May lack the diagnostic detail needed toMay lack the diagnostic detail needed to

Plan instructionPlan instruction Allow students to see how to improveAllow students to see how to improve

Students may get the same score for vastly different Students may get the same score for vastly different reasonsreasons

Trait-Analytic Rubrics

StrengthsStrengths Judge aspects of complex work independentlyJudge aspects of complex work independently Provide detailed/diagnostic data by trait that Provide detailed/diagnostic data by trait that

can better inform instruction and learningcan better inform instruction and learning

LimitationsLimitations More time consuming to learn and applyMore time consuming to learn and apply May result in lower inter-rater agreement when May result in lower inter-rater agreement when

multiple scorers are used (without appropriate multiple scorers are used (without appropriate procedures)procedures)

Generic Rubric Strengths Complex skills that generalize across tasks, grades, Complex skills that generalize across tasks, grades,

or content areasor content areas Situations where students are doing a similar but Situations where students are doing a similar but

not identical tasknot identical task Help students see “the big picture”, generalize Help students see “the big picture”, generalize

thinkingthinking Promote/require thinking by the studentPromote/require thinking by the student Allow for creative or unanticipated responsesAllow for creative or unanticipated responses Can’t give away the answer ahead of timeCan’t give away the answer ahead of time More consistency with multiple raters (only one More consistency with multiple raters (only one

rubric to learn, so you can learn it well)rubric to learn, so you can learn it well)

Generic Rubrics Limitations

Difficult to develop and validateDifficult to develop and validate Takes time and practice to learn, internalize, and Takes time and practice to learn, internalize, and

apply consistentlyapply consistently Takes time to applyTakes time to apply Takes discipline to apply correctlyTakes discipline to apply correctly Requires a scoring procedure to ensure consistent Requires a scoring procedure to ensure consistent

scores when multiple raters are involvedscores when multiple raters are involved

Task-specific Rubric Stengths

Specialized tasksSpecialized tasks Highly structured assignmentsHighly structured assignments Specific/detailed assessment goalsSpecific/detailed assessment goals Provide detailed feedback to student on workProvide detailed feedback to student on work Situations requiring quick but highly consistent Situations requiring quick but highly consistent

scoring from multiple scorers with less training scoring from multiple scorers with less training and/or inter-rater control proceduresand/or inter-rater control procedures

Task-specific Rubric Limitations

Can’t show to students ahead of time as they give Can’t show to students ahead of time as they give away the answeraway the answer

Does not allow the student to see what quality Does not allow the student to see what quality looks like ahead of timelooks like ahead of time

Need a new rubric for each taskNeed a new rubric for each task Rater on autopilot may miss correct answers not Rater on autopilot may miss correct answers not

explicitly shown in the rubricexplicitly shown in the rubric

Scoring Ranges

Minimum of 3 levelsMinimum of 3 levels Maximum of 3 to 7 levels (typically, beyond 8 Maximum of 3 to 7 levels (typically, beyond 8

it’s hard to apply and understand)it’s hard to apply and understand) Even vs. Odd – Odd point scales (3, 5, 7, etc.) Even vs. Odd – Odd point scales (3, 5, 7, etc.)

allow a middle ground that is psychologically allow a middle ground that is psychologically attractive for the rater (which you may want to attractive for the rater (which you may want to avoid)avoid)

5-point scales tend to look like A-F grading 5-point scales tend to look like A-F grading scheme (which you may also want to avoid)scheme (which you may also want to avoid)

Distinguish Quality

4 or more points typically needed to 4 or more points typically needed to distinguish levels of qualitydistinguish levels of quality

4 – 7 points is typical4 – 7 points is typical Depends on being able to distinguish levels Depends on being able to distinguish levels

of qualityof quality The more open-ended/complex the task, the The more open-ended/complex the task, the

broader the range of points neededbroader the range of points needed

A Meta-rubric

A rubric for evaluating rubrics

Trait 1: content/coverageTrait 1: content/coverage Trait 2: clarity/detailTrait 2: clarity/detail Trait 3: usabilityTrait 3: usability Trait 4: technical qualityTrait 4: technical quality

MRT 1: Content/coverage

Aligned to curriculum and instructionAligned to curriculum and instruction Includes everything that is qualityIncludes everything that is quality Does not include trivial thingsDoes not include trivial things Reasonable explanations for what is Reasonable explanations for what is

included and excludedincluded and excluded Reflects best thinking and practiceReflects best thinking and practice Rarely find work that can’t be scoredRarely find work that can’t be scored

MRT 2: Clarity/detail

Different users likely to interpret the rubric Different users likely to interpret the rubric in the same way – language is not in the same way – language is not ambiguous, vague, or contradictoryambiguous, vague, or contradictory

Use of rubric supports consistent scoring Use of rubric supports consistent scoring across students, teachers, and timeacross students, teachers, and time

Examples of student work illustrate each Examples of student work illustrate each level of quality on each traitlevel of quality on each trait

MRT 3: Usability/Practicality

Can be applied in a reasonable amount of Can be applied in a reasonable amount of time when scoringtime when scoring

Can easily explain/justify why a particular Can easily explain/justify why a particular score was assignedscore was assigned

Student can see what to do differently next Student can see what to do differently next time to earn a better scoretime to earn a better score

Teacher can see how to alter instruction for Teacher can see how to alter instruction for greater student achievementgreater student achievement

MRT Trait 4: Technical Quality

Evidence of reliability (consistency) – Evidence of reliability (consistency) – across students, teachers, and timeacross students, teachers, and time

Evidence for validity (appropriateness) – Evidence for validity (appropriateness) – students and teachers agree that it supports students and teachers agree that it supports teaching and learning when used as intendedteaching and learning when used as intended

Evidence of fairness and lack of bias – does Evidence of fairness and lack of bias – does not place any group at a disadvantage not place any group at a disadvantage because of the way the rubric is worded or because of the way the rubric is worded or appliedapplied

Develop Your Own Rubrics

Form a learning teamForm a learning team Locate/acquire additional resourcesLocate/acquire additional resources Modify existing rubricsModify existing rubrics Understand the development processUnderstand the development process Help each other outHelp each other out When you are comfortable with the process, When you are comfortable with the process,

introduce it to your studentsintroduce it to your students

The Meta-rubric outlineWOW Most Some None

Trait 1

Trait 2

Trait 3

Trait 4

Possible Development Process Gather samples of student workGather samples of student work Sort student work into groups and write down the Sort student work into groups and write down the

reasons for how it is sortedreasons for how it is sorted Cluster the reasons into traitsCluster the reasons into traits Write a value-neutral definition of each traitWrite a value-neutral definition of each trait Find samples of student work that illustrate a Find samples of student work that illustrate a

possible range of score points on each traitpossible range of score points on each trait Write value-neutral descriptions of each score level Write value-neutral descriptions of each score level

for each trait, if appropriatefor each trait, if appropriate Evaluate your rubric using the Meta-rubricEvaluate your rubric using the Meta-rubric Test it out and revise it as neededTest it out and revise it as needed

AcknowledgmentsThis module is based on material adapted from:This module is based on material adapted from:

Scoring Rubrics in the ClassroomScoring Rubrics in the Classroom

By By Judith Arter and Jay McTigheJudith Arter and Jay McTighe

Experts in Assessment SeriesExperts in Assessment Series

Corwin Press, Thousand Oaks, CACorwin Press, Thousand Oaks, CA

andandMaterial provided by Edward Roeber of Michigan State Material provided by Edward Roeber of Michigan State

University, East Lansing, MichiganUniversity, East Lansing, Michigan

Documents

Michigan Assessment Consortium Common Assessment Development Series Common Assessment Development Series Rubrics and Scoring Guides