Specs design

Language Testing and Assessment

Summary of Units B4 and C4

Unit B4:

Optimal Specification Design

The central question in this unit is:

“What is the optimal design for a test specification and what elements should it include?”

The Difference between Prompt Attributes ‘PA’

& Response Attributes ‘RA’

In Popham’s model, the Prompt Attribute describes the input to the examinee while the Response Attribute describes what the examinee does as a result. Bachman and Palmer (1996) phrased this same distinction in different terms using ‘characteristics of the input’ versus ‘characteristics of the response.’

Clearly, the author disagrees with the majority. It’s also clear that he’s open to change of perspective. How do we know this?

a) The author states precisely a willingness to change.(b) In line 10 ‘there may be some value to the opposing view if...’ suggests a willingness to change.(c) The title suggests a willingness to change.(d) The comment near the end indicates willingness to change: ‘I may be persuaded if...’

1- The RA for this item might simply read: “The student will select the correct answer from among the choices given.” If that were the case, then the spec writer has decided on a rather minimalist approach to the PA/RA distinction, describing the actual action performed by the examinee.

2- An alternative RA might read like this:

a- The student will study all four choices. b- If a particular choice references a particular line in the passage, the student will study that line carefully.c- He or she will reread the passage to eliminate three choices. d- Then the student will select the correct answer from among the choices.

Either of these RAs could work in conjunction

with a PA similar to the following:

“1-The item stem poses a question about the author’s viewpoints, which will require inference from the text.

2- Choices ‘a’, ‘b’, and ‘d’ are distracters that attribute to the passage a comment that the author didn’t make, or which is taken out of context and misinterpreted.

Choice ‘a’ refers to a comment the author made, without actual reference in the text while choices ‘b’ and ‘d’ refer to some part of the text, (e.g., a line number a paragraph, a header, a title).

3- Choice ‘c’ will be the key or correct response; it may use any of the locator features given above (line number, paragraph, header, title, etc.), or it can simply refer to the whole passage.”

This PA/RA formula is a classical model of spec for multiple-choice items.

In this formula, all guidelines about the item are in the PA: the entire description of its stem, its choices, why the incorrect choices are incorrect, and why the key is correct is considered to be part of the prompt and not the response.

The choices themselves seem to be part of the examinee’s thinking.

In our multiple-choice item, the examinee will probably double-check whether the author did indeed say what is claimed in line 10 or near the end and if so, whether it is being interpreted correctly. In effect, the item itself is a kind of outline of the examinee’s answering strategy; a layout of the response.

Guidance about both the prompt and the response are important in a test specification.

It is possible to fuse the PA and RA and simply give clear specification guidance on both; actually, we could create a new spec element (the ‘PARA’) in which we can put all this guidance.

Guiding language

The basic element of spec design is producing samples and the guiding language that goes with them.

Guiding language and samples, constitute a minimalist definition of a specification, in an attempt to disentangle prompt from response.

‘Event’ vs. ‘Procedure’ &

Specplates as a universal design

Event versus Procedure

A testing event is a single task or test item such as a multiple-choice.

A procedure is a set of events or tasks such as an oral interview or a portfolio assessment for teacher observation.

Test developers organize items into a test using a ‘table of specs’ that presents information, at a very global level:

- How many of each item type and skill are needed?

- What special materials are required to

develop the test?

Specplates

A ‘specplate’ is a combination of the words ‘specification’ and ‘template,’ a model for a specification, and a generative blueprint which itself produces blueprints.

Over time, certain specs fuse into a higher-order specification. A specplate is a guide tool to ensure that the new specifications meet a common standard established by the existing specs. One type of information that might appear in a specplate is guidance on task type.

PA (excerpt)

For a M.C. task on verb tense and voice agreement: Each incorrect choice (distracter) in the item must be incorrect according to the focus of the item. One distracter should be incorrect in tense, another incorrect in voice, and the third incorrect in both tense and voice.

“When specifying the distracters, the PA should contain the following language ‘Each incorrect distracter in the item must be incorrect according to the focus of the item.’ Immediately following that sentence, the PA should clearly specify how each of the three distracters is incorrect.”

PA Specplate (excerpt)

You are encouraged to employ (if feasible) the dual-feature model of multiple-choice item creation, namely:

Key: both of two features of the item are correct (tense/voice)Distracter 1: one of two key features of the item is incorrect (tense/voice)Distracter 2: the other of two key features of the item is incorrect (tense/voice)Distracter 3: both of two key features of the item are incorrect (tense/voice).”

The ‘magic formula’ model of M.C. item creation is: crafting an item for which, in order to get the item right, examinees must do two things correctly.

Once the specplate has been written, it can serve as the starting point for new specs that require those features. Rather than starting from scratch each time, the specplate generates the specification shell and important details follow somewhat automatically.

Ownership

Specs ownership is part of human nature because of a sense of investment in the test-crafting process.

However, a well-crafted test is never owned by a single individual. Thus, a simple historical record of contributions is the best way to attribute a spec to its various authors.

Disagreement is sometimes inevitable in specs design; yet, a compromise between opposing positions is possible.

There is consensus that the faculty will observe the test in action and decide after a while whether more changes are needed.

Summary of Unit B4

The central focus of this unit was the nature of test specs and their elements.

We have raised and tried to answer the question: what are the essential minimum components to specs beyond the bare minimum of guiding language and samples?

Unit C4:

Evolution in Action

In the conclusion to Unit A4 we listed the following elements of a specification-driven testing theory:

■ Specs exist.■ Specs evolve.■ The specs are not launched until ready.■ Discussion lead to transparency.■ All are welcome to discussion.

We saw in Unit B4, that all specs share two common features:

1- spec-generated sample items, 2- relative guiding language.

In this Unit, we will focus on some design considerations that arise as a spec evolves.

[V. 1: Guiding language on the scoring scale]

The objective of this spec is for students to produce a role-play task on the pragmatics of making a complaint in a simple everyday situation. In a role-play with the teacher, students are asked to plan and render a complaint about something that has gone wrong.

Scoring of the interaction will be as follows:1- not competent – the student displayed little command of the situation pragmatics.2- minimally competent – the student used language of complaint, but the interaction was hesitant and/or impolite.3- competent – the student’s interactions were smooth and generally fluent, and there was no use of impolite language.4- superb – the student’s interactions were smooth and very fluent, and in addition, the student displayed subtle command of nuance.

[Version 1, sample one]You’ve recently purchased a radio; back home, you discover that a part is missing from the box.

[Version 1, sample two]After getting back home from shopping, you discover that a jar of peanut butter is open and its seal is punctured, so you’re worried that it may be unsafe to eat.

In both cases, you want to return to the store to the resolve the situation with the manager.

(a) write out a plan of what you will say, then, (b) role-play the conversation with your teacher.


1- not competent – the student displayed little command of the pragmatics of the situation. If the student wrote a plan, it was inadequate or not implemented.2- minimally competent – the student used language of complaint, but the interaction was hesitant and/or impolite. The student’s plan may have been adequate, but the student was unable to implement it. 3- competent – the student’s interactions were smooth and generally fluent, and there was no evidence of impolite language use. The student wrote a viable plan and generally followed it during the interaction.4- superb – the student’s interactions were smooth and very fluent, and in addition, the student displayed subtle command of nuance. The student wrote a viable plan and generally followed it during the interaction.

After some time, the descriptor for Level 4 is improved again, Levels 1-3 being unchanged:


4- superb – the student’s interactions were smooth and very fluent, and the student displayed subtle command of nuance. He/she wrote a viable plan and generally followed it during the interaction. Alternatively, the student wrote little (or no) plan, but seemed to be able to execute the interaction in a commanding and nuanced manner.

There are some interesting questions that arise:

- What is the role of the written plan? - Why have the instructors adapted the

scoring scale to reflect alternative use of the plan?

- Do you suspect that any changes might be coming for level 3 on the scale?

- Do you suspect that the plan may prove to be an optional testing task, in general? - Do you think that the plan may prove

unworkable?

Planning causes debate which in turn causes change

A newcomer arrives at the faculty at a point in time between Version 2 and Version 3, an energetic instructor who plays the role of a productive debater in meetings.

This new instructor asks, “Do we plan when we do complaints in real life, and if yes, do we write it?”

The newcomer causes the teachers to watch carefully the use of this task in the next test administration, and sure enough, there are high-level students for whom the plan is irrelevant and a waste of time.

New questions arise here:- What obligations do teachers have to challenge each other and help make tests better?- What ownership should be given to this new teacher or to any new teacher?

However… change stagnates

Gradually, teachers stop teaching the written plan in their lessons, and most students do not produce one during the test.

The instructors simply stop looking at the spec, they stop using a written plan, and the task evolves beyond reference to the spec.

Then, one teacher remembers to teach written plans and the students feel they did better on the test thanks to plan writing.

- Should students be welcome to discussions of test evolution and change? - Should teachers re-visit and re-affirm the wording of the spec, which does permit a plan? - Or should they follow their own instinct and ignore this student feedback, encouraging role-plays without written plans?- Should teachers continue to heed the advice of their ‘energetic colleague’ and teach their students to do such tasks without written plans, because that is more authentic?

Application

Conduct a reverse engineering day-long workshop with your colleagues on a test task.

1- Introduction and Welcome: Orient your colleagues to selected tasks. The goal of this part is not to revise the tasks but to make sure they know what the tasks are.

Orient the participants to the basic design of specs: samples and guiding language. Don’t show actual specs because people will think that the spec samples you show are how all specs should be written. In addition to the critical analysis that is the target of the day, you want an organic, bottom-up growth of specs.

2- Group Phase 1: Divide the participants into groups or pairs, each being assigned the same set of tasks. Ask each group to do straight reverse engineering and write out what they think is the guiding language for the tasks without recommending any changes.

This should be followed by a report back.

3- Vent Your Spleen: In the whole group, allow people to vent about test tasks they have never liked – tasks they did not analyze in Phase 1.

Based on the judgmental splenetic discussion that will certainly result, select a new set of tasks, and proceed to the next step.

4- Group Phase 2: Divide the participants into groups, each having to do critical reverse engineering of some tasks about which they feel particularly splenetic. The goal is a set of specs that improve testing at your situation. A report back should follow.

5- ‘What’s Next?’ The group discusses which specs stand a reasonable chance of implementation. Not everything that arises will be feasible. Some things will be difficult to implement. But some should survive.

Summary

This Unit was a practical application on Units A4 and B4, a way to drill all the theoretical notions and concepts that we have studied in both units.The Unit proposes more exercise related to validity as in Unit C1.

Documents

Specs design