Best Practices in Item Writing

Preview:

Citation preview

Workshop: Item Writing

Objectives for Today

Item writing and development:TerminologyGeneral GuidelinesWriting CR items

Next week: Examples and reviewing

Objectives of Item Development

1. Build exams that are high quality, legally defensible, and produce reliable and valid score interpretations

2. Create items that directly assess the knowledge and skills in question

3. Minimize (ultimately eliminate) distractions and undue influence

Terminology

Item: a test ‘question’ or prompt designed to assess knowledge, skills or abilities

Stem: the part of the item that presents the content

Asset/Stimulus: the part of the item that presents or exhibits information, might be a reading passage, image, or audio

Options: the available answers

Key: the correct answer

Distractor: the incorrect options

stem

asset

key

distractors

options

item

Terminology

Terminology

Response: the recorded input of an individual examinee

Rubric: a clearly defined set of criteria (rules) for scoring a free response itemConvert open responses to a scale of numbersCan have multiple rubrics on one item

Item Types

There are two major item type categories:

1. Selected ResponseMultiple choiceMultiple ResponseDrag and drop or matching

2. Constructed Response (aka free or open)Essay or similar (e.g., math problem)Short answer or Fill-In-The-Blank (can still be automatically scored)Performance tests

Example: Multiple Choice

What is the capital of Norway?a. Oslo*b. Bergenc. Stavangerd. Stockholm

Example: Multiple Response

Which of the following are cities in Norway?a. Oslo*b. Copenhagenc. Stavanger*d. Stockholm

City in Norway Not a City in Norway

Drag and Drop is often same as Multiple Response!

Example: Scored short answer

James purchased a music album for $8. It was discounted by 20%. What was the regular price?

(Student would type response in the box; acceptable answers might be: $10, ten dollars, 10 dollars)

More difficult and high-fidelity than MC

Example: Fill in the blank

_________ is capital of Norway.

(Student would type response in the box)

Example: Essay

Write a detailed account of how Oslo became the capital of Norway. Why was it a good choice? Provide three reasons to support your position.

This lends itself to rubrics:Position? 0/1 pointsReasons? 0/1/2/3 pointsHistorical accuracy? 0/1/2 points

Construct-Irrelevant Variance

The enemy of all tests is construct-irrelevant variance.

Scores should reflect an examinees knowledge, skills, or abilities as it relates to the construct of interest (in this case, course competency)

Measurement of anything else is irrelevant and unhelpful.Remember that reliability is ~unidimensionality

Construct-Irrelevant Variance

Our Goal:Reduce construct-irrelevant variance

Construct-Irrelevant VarianceIt’s a scientific experiment; we want to hold all variables constant except the variable of interest.

Guidelines for Item Writing

The following are some general applicable guidelines, regardless of test purpose or item type

Validity: Remember original purpose

What is the goal of the test?Show minimal subject mastery?Show mastery at a range of levels?Differentiate the top students?Identify students that need remediation?

Clear InformationBe clear and concise in the item’s content:

Provide all information that is neededDo not provide extraneous or superfluous information unless a distractor

Make sure formatting is as clear as possible

Utilize Blueprints!

Make sure that the content of items maps to the blueprint as directly as possible

Record rationale/source/etc.

Essential link in the validity chain of evidence

Think like an examinee

While writing an items, it’s important to think like an examinee.

Very important but often overlooked: Quality distractors

What is Appropriate Difficulty?

Write items of appropriate difficulty… While a very difficult item might be correct and

actually quite “good” it might not serve the purposes of a test of minimal competence

Some tests call for a narrow range of difficulty Some situations call for a wide range

Enhances reliability because more score variance

Rationale and Source

Whenever possible, record the rationale and source or reference for finding the correct answer

For example: “Answer B is correct because the stem says _____; C and D would not have an effect and A would actually counteract because _____.”

“Found on Page 125 of Jackson 2013 Text”

Maintain Grammar

If a question mark completes the stem, options should be formatted as stand alone phrases. No need for punctuation.

What is the capital of Norway?a. Oslo*b. Bergenc. Stavangerd. Stockholm

Maintain Grammar

If the stem does not end with punctuation, the options should complete the stem’s sentence.

The capital of Norway isa. Oslo.*b. Bergen.c. Stavanger.d. Stockholm.

Maintain Grammar

Capitalize appropriately – proper nouns require capitalization, but otherwise it is generally unnecessary.

Washington, D.C. is the _____ of the United States.a. capital*b. largest cityc. primary portd. southernmost city

How to Write Items

1. Identify a relevant situation or a piece of necessary knowledge that you’d like to evaluate. Consult your Blueprint.

2. Browse text books, references and sources that are relevant to the exam – generate ideas!

3. Determine how to structure the itemCorrect answerDistractors!

How to Write Items

Best Practices in quality control for Multiple Choice questions:

Ensure that the key is truly correct

Check that distractors are fully incorrect, but plausible

Review the stem to make sure all necessary information is presented

Make sure that the “question” part of the stem is clear and indicates the type of response necessary

How to Write Items

Examples and counter examples of these specific guidelines will be the next workshop (Item Review)

Constructed Response Items

Goal:

Forming a connection from complex responses and real-life situations to reliable scores

How to Write Items: CR

Examples of constructed response items:

Solving a practical problem (high fidelity)Proposing solutions with explanationsCreate a solution within certain parametersEssays (argumentative or creative)Synthesizing information

How to Write Items: CR

Guidelines for constructed response items:

Determine the topic for the itemEstablish the scenarioDetermine all necessary informationReduce/eliminate unnecessary information (unless a distractor!)

Think of the steps, write the item, answer it yourself as a student

Scoring CR Items

Scoring of CR items can be difficult due to complexity

Remember that the most interesting item in the world does not do any good if no way to accurately score it!

If possible, link to algorithm of problem solvingWeight by difficulty or criticalityForgetting to round as last step, or provide units …vs…Utilizing incorrect information from scenario

Scoring CR Items

Approaches to CR scoring (keep in mind while writing)Score on processScore on resultsScore on both

Did student complete each step? Did they reach the correct answer?

Scoring CR Items

Ways to convert CR item to pointsRubricsPoints for errors/completionsPoints for answers or multiple answers

These make your life easier and standardize the scoring, making it more reliable

Rubrics

Rubrics are very helpfulA set of rules to convert open responses to score points

Rubric/Criteria: What you are ratingRating scale: Axis, with point levelsDescriptors: Examples of what each mean

Rubrics

Identify axes (often driven by curriculum)Establish relevant point levels (can differ)Establish descriptorsRevisit point levels

Observable or isolatable

Rubrics

Some examples of rubrics with dos/don’ts

Mention that they have their own set of statistics – inter-rater reliability, agreement, read-behinds, etc.

Using multiple answers

Provide a complex scenario, ask student to list every piece of information they would need to solve (e.g., there are 5)3 points for each correct-3 for each missing-3 for each supplied that is not correct

Note: You could earn -15!

Performance Testing

Deductions or additions due to criticalityExample: 100 possible points

Cutscore = 80-7 for minor error-14 for moderate error-21 for critical error (e.g., safety)

Performance Testing

Performance Testing

Interestingly, Performance Testing still lacks a true psychometric theory

Readings

Haladyna, T. M., Rodriguez, M. C., & Downing, S. M. (2013). Developing and validating test items. NY: Routledge.

Downing & Haladyna (2006). The Handbook of Test Development.

Lots of free resources on internet (ASC has an item writing guide…)

Question and Answer

Recommended