Testing What You Teach: Eliminating the “Will this be on the final?” Ideology

Testing What You Teach: Eliminating the “Will this be on the final?” IdeologyDr. Barry Lee ReynoldsNational Yang-Ming UniversityEducation Center for Humanities and Social Sciences

Outline• Introduction• Backwash• Reliability• Validity

Introduction

Why students ask: “Will this be on the final exam?”

The distrust of testsWho distrusts tests?

Language Teachers Language Students

Why? Due to their negative effects on learning, tests are

often considered as more harmful than helpful. Sometimes teaching is good, but the test does not

reflect the teaching.

The effect of testing on teaching is known as backwash, and can be harmful or beneficial (Hughes, 2003).

Tests are often inaccurate measurementsTesting technique

e.g., If you want to know how well someone writes, you must ask them to write. (referred to as validity)

e.g., The test must consistently measure the ‘construct’ (e.g., the past tense, vocabulary, writing) (referred to as reliability)

BackwashHow can a teacher achieve beneficial

backwash?

BackwashHarmful Backwash

Ex. multiple choice items to test writing

Beneficial Backwash Ex. writing to test writing

More contextualized (low-stakes exam) Final exam for a course

More global (high-stakes exam) University entrance exam (e.g., TOEFL)

How can a teacher achieve beneficial backwash? (1/2)Test the abilities whose development you want to

encourage If you want to encourage oral ability, then test oral

ability.

Sample widely and unpredictably It is important that the sample taken should represent

as far as possible the full scope of what is specified.

Use direct testing If we test directly the skills that we are interested in

fostering, then practice for the test represents practice in those skills.

How can a teacher achieve beneficial backwash? (2/2) Make testing criterion-referenced

If the test specifications make clear just what students have to be able to do (and with what degree of success), then students will have a clear picture of what they have to achieve.

Base tests on objectives If tests are based on objectives, rather than on detailed

teaching and textbook content, they will provide a truer picture of what has actually been achieved.

Ensure the test is known and understood by students and teachers Students need to understand what the test demands of them. Explain the rationale for the test, its specifications, and provide

sample items.

ValidityHow can teachers ensure the validity of an

assessment?

Construct validityAn assessment is said to be valid and have

construct validity if it measures accurately what it is intended to measure. e.g., “reading ability”; “speaking fluency”;

“grammar”

Does the assessment really test the “construct” it has set out to test? Construct validity used in reference to an

overarching notion of validity.

Teachers must ensure that their tests truly assess the skills they have taught in their classrooms.

Content validity Content Validity

If you wish to test “reading ability” the assessment must be made up of items that test for language skills that are associated with “reading ability.”

To ensure content validity, it is not enough just to have students “read” and require them to answer questions; the questions must constitute a proper sample of all the language skills that have been taught in the course. Areas that are not tested, tend to be ignored by teachers in

their teaching and students in their learning.

Unfortunately, the content of tests are usually made up of what is easiest to test.

Match assessment content to specifications written for the course (i.e., class goals & objectives).

Criterion-related validity Criterion-related validity refers to the degree to which

one assessment correlates with another assessment. Criterion-related validity includes concurrent validity

and predictive validity. Concurrent validity is established when the test and the

criterion are administered at about the same time. Example – testing of oral and written language abilities

Predictive validity concerns the degree to which a test can predict students’ future performance. Example – prerequisite course; internship opportunities

Criterion-related validity is usually investigated through the use of correlation coefficients.

Validity in scoringAn assessment should not test more than one

ability (unless it was designed with the intention to do so!). Example – Reading test that also assesses spelling

and grammar; writing test that emphasizes punctuation

Face validityA test is said to have face validity if it looks as

if it measures what it is supposed to measure.

ReliabilityHow can teachers ensure the reliability of an

assessment?

Reliability Reliability refers to the degree to which an assessment

produces stable and consistent results. In other words, giving the assessment on X day will result with

pretty much the same results if it had been given on Y day.

This is determined through the use of “the reliability coefficient.” test-retest method split-half method

Lado (1961) provides benchmarks to follow: vocabulary, grammar, and reading assessments .90-.99 listening .80-.89 speaking .70-.79

Scorer reliabilityQuantifying the level of agreement given by

the same or different scorers on different occasions by means of a coefficient can help ensure scorer reliability.

Ex. grading essays

How to make tests more reliable? (1/3)Take enough samples of behavior

It is not enough to just include enough items, but to ensure each item is a “fresh start” for the students.

Exclude items which do not discriminate well between weaker and stronger students.

Do not allow candidates too much freedom.Write unambiguous items.Provide clear and explicit instructions.

How to make tests more reliable? (2/3) Ensure that tests are well laid out and perfectly

legible. Make students familiar with format and testing

techniques. Provide uniform and non-distracting conditions of

administration. Use items that permit scoring which is as objective

as possible. Make comparisons between students as direct as

possible (similar to not allowing students too much freedom).

How to make tests more reliable? (3/3)Create a detailed scoring key.Train scorers (if not scoring sheets yourself).Agree acceptable responses and appropriate

scores at outset of scoring. Identify candidates by number, not name.Employ multiple, independent scoring (if

possible).

Relationship between reliability and validityTo be valid an assessment must be reliable;

however, it may be possible for an assessment to be reliable but not valid. Ex. writing test that actually assesses translation

Be careful not to sacrifice validity while ensuring reliability.

Thank You For Your Attention

ReferencesHughes, A. (2003). Testing for language

teachers. Cambridge University Press.Lado, R. (1961). Language Testing: The

Construction and Use of Foreign Language Tests. A Teacher's Book.

Documents

Testing What You Teach: Eliminating the “Will this be on the final?” Ideology