WEEK 2 LTA 200913

WEEK 2Fundamentals of testing & AssessmentValidityReliabilityPracticabilityBackwash/WashbackTE30503 LTADr Lee*

Dr Lee

TE30503 LTADr Lee*Source: http://martin-thoma.com/what-is-the-best-programming-language/

Dr Lee

IntroductionIn this lecture, we will be looking at a number of terminologies that are synonymous with language testing and assessment.Familiarize yourself with all these terms fast as we will be using these terms regularly throughout the course. TE30503 LTADr Lee*

Dr Lee

CRITERIA OF A GOOD LANGUAGE TEST

Is there a perfect language test?A perfect language test if such a thing could be found would satisfy all of the following criteria. In practice, we have to achieve a balance between different criteria particularly between validity, reliability, and practicability.TE30503 LTADr Lee*

Dr Lee

TE30503 LTADr Lee*source:http://www.maniacworld.com/good-cheap-fast-service.html

Dr Lee

source: http://www.peanuts.com/comics/

Source: http://funnyexam.com/popular/answers/3 http://www.roderik.net/me/humour/

Source: http://1.bp.blogspot.com/-QoyapCf4y6s/TqDShTS-HcI/AAAAAAAAJ8k/4NjDiP8uxs4/s1600/Find_X.jpg and http://jokeslab.com/media/wp-content/uploads/2011/09/funny-math-exam-expansion.jpg

A.VALIDITY 1Validity is arguably the most important criteria for the quality of a test. The term validity refers to whether or not the test measures what it claims to measure. A test with high validity will have items closely linked to the tests intended focus.Grant (1987: 89) defines validity as follows:Validity in general refers to the appropriateness of a given test or any of its component parts as a measure of what it is purported to measure. A test is said to be valid to the extent that it measures what it is supposed to measure. TE30503 LTADr Lee*

Dr Lee

Validity 2The validity of a test is critical because, without sufficient validity, test scores have no meaning.There are several ways to estimate the validity of a test: including content validity, concurrent validity, predictive validity, construct validity, and face validity. TE30503 LTADr Lee*

Dr Lee

1. Content ValidityA test is said to have content validity if its content constitutes a representative sample of the language skills, structures, etc. with which it is meant to be concerned.To judge content validity, need a specification of the skills or structures etc. that it is meant to cover.A comparison of test specification and test content is the basis for judgement of content validity.Content validity is important because:1. provide an accurate measure of what it is supposed to measure.2. backwash effect areas which are not tested are usually ignored in T&L.

TE30503 LTADr Lee*

Dr Lee

Criterion-related validity:Criterion-related validity refers to how far results of the test agree with those provided by some independent and highly dependable assessment of the candidates ability.Two kinds of criterion-related validity: concurrent validity and predictive validity.TE30503 LTADr Lee*

Dr Lee

2. Concurrent ValidityConcurrent validity assesses how well a test agrees with a concurrent assessment of a different type.i.e., the extent the test provides results which are more or less comparable with the results of other language tests. Concurrent validation involves the comparison of the test scores with some other measure for the same candidates taken at roughly the same time as the test. This other measure may be scores from a parallel version of the same test or some other test; the candidates self-assessments of their language abilities; or ratings of the candidates on relevant dimensions by teachers, subject specialists or other information.

TE30503 LTADr Lee*

Dr Lee

3. Predictive ValidityPredictive validity is a test's ability to predict how a person will perform at a later date on a different assessment of ability performance in school or on a job, for example. i.e., the extent the results of the test accurately predicts the language performance of the candidates when they use the language in the real world.Predictive validation is most common with proficiency tests: tests which are intended to predict how well somebody will perform in the future. The simplest form of predictive validation is to give students a test, and then at some appropriate point in the future give them another test of the ability the initial test was intended to predict.TE30503 LTADr Lee*

Dr Lee

Criterion-related validityThe results of the assessment of predictive and concurrent validity are expressed as correlation coefficients, and the absolute minimum standard is .71. Often evidence of predictive validity is impossible to obtain. For example, university entrance examinations notoriously fail to predict success in university. The reason, though, is not necessarily the inadequacy of the examinations but the inadequacy of the sample. When you compare scores on the entrance examination with success in university, you're looking at the success only of the highest-scoring students on the entrance examination. If people with middling or low scores on the entrance examination were admitted to university then you could well find a relationship between the entrance examination results and success in university. TE30503 LTADr Lee*

Dr Lee

4. Construct ValidityConstruct refers to any underlying ability (or trait) which is hypothesised in a theory or language ability.Construct validity - The extent the test matches a coherent view of the nature of language and the nature of language learning? Does the test really adopt the theory of language/language learning which it claims to adopt?A test is said to have construct validity if it is capable of measuring certain specific characteristics in accordance with a theory of language behaviour and learning. The more direct the test the better is the construct validity.For example, if a communicative approach to language teaching and learning has been adopted throughout a course, a test comprising chiefly MCQ items (indirect test) will lack construct validity.TE30503 LTADr Lee*

Dr Lee

5. Face ValidityA test is said to have face validity if it looks as if it measures what it is supposed to measure.If a test item looks right to other testers, teachers, moderators, and testees, it can be described as having face validity. For example, a test which pretended to measure pronunciation ability but which did not require the candidate to speak might be though to lack face validity.Face validity is hardly a scientific concept, yet it is very important.A test which does not have face validity may not be accepted by candidates, teachers, education authorities or employers. TE30503 LTADr Lee*

Dr Lee

The use of validityWhat use is the teacher to make of the notion of validity?First, every effort should be made in constructing tests to ensure content validity.Where possible, the tests should be validated empirically against some criterion.If indirect testing is used, reference should be made to the research literature to confirm that measurement of the relevant underlying constructs has been demonstrated using the testing techniques that are to be used.TE30503 LTADr Lee*

Dr Lee

B.RELIABILITYReliability is simply consistency.i.e., the extent the test produces consistent results if the same candidates take the test on repeated occasions.According to Alderson, Clapham and Wall (1995, p. 6): "Reliability is the extent to which test scores are consistent: if candidates took the same test again tomorrow after taking it today, would they get the same result?"

TE30503 LTADr Lee*

Dr Lee

ReliabilitySource: http://www.upei.ca/~xliu/measurement/week6-7.htm

Reliability 2Another way to look at reliability is the degree to which test scores are free from measurement error and are consistent from one occasion to another (the degree of stability on repeated administrations of the test)Sources of measurement error, which include fatigue, nervousness, content sampling, answering mistakes, misinterpreting instructions and guessing, contribute to an individual's score and lower a test's reliability. TE30503 LTADr Lee*

Dr Lee

Reliability 3The reliability of a test can be quantified in the form of reliability coefficient.Reliability coefficients allow us to compare the reliability of different tests.The ideal reliability coefficient is 1A test with a reliability coefficient of 1 is one which would give precisely the same results for a particular set of candidates regardless of when it happened to be administered. A test which had a reliability coefficient of zero would give sets of results quite unconnected with each other.Certain authors have suggested how high a reliability coefficient we should expect for different types of language tests.TE30503 LTADr Lee*

Dr Lee

Reliability 4Lado (1961) says that good vocabulary, structure and reading tests are usually in the .90 to .99 rangeAuditory comprehension tests are more often in the .80 to .89 range.Oral production tests may be in the .70 to .79 range.TE30503 LTADr Lee*

Dr Lee

Reliability 5Different types of reliability estimates should be used to estimate the contributions of different sources of measurement error. Inter-rater reliability coefficients provide estimates of errors due to inconsistencies in judgment between raters. Alternate-form reliability coefficients provide estimates of the extent to which individuals can be expected to rank the same on alternate forms of a test. TE30503 LTADr Lee*

Dr Lee

Reliability 6Reliability should be assessed at every administration of the test. Tests are used if they have been reliable in the past, but they are only useful to you if they are reliable when you use them.Groups of people tested can differ in many important ways, some of which can affect reliability. It is not unusual to find that a highly acclaimed test fails to live up to its history of reliability when you use it. Usually this is not a reflection on the quality of the test (or of you as a test administrator), but simply a reflection of the facts of life no test is appropriate for everyone. For example, the reliability of many tests varies markedly with the age of the people taking it. TE30503 LTADr Lee*

Dr Lee

How to make tests more reliableTake enough samples of behaviour. Other things being equal, the more items you have on a tests, the more reliable that test will be.Do not allow candidates too much freedom. Candidates should not be given a choice, and the range over which possible answers might vary should be restricted

TE30503 LTADr Lee*

Dr Lee

How to make tests more reliable 23. Write unambiguous items. Candidates should not be presented with items whose meaning is not clear or to which there is an acceptable which the test writer has not anticipated. Moderation can help to minimize this problem.4.Provide clear and explicit instructions to avoid misinterpretation of what candidates are asked to do.

TE30503 LTADr Lee*

Dr Lee

How to make tests more reliable 35. Ensure that tests are well laid out and perfectly legible. Minimize tests that are badly typed (or handwritten), have too much text in too small a space, and are poorly reproduced.6. Candidates should be familiar with format and testing techniques. Any aspect of a test that is unfamiliar to candidates, are likely affect candidates. performance

TE30503 LTADr Lee*

Dr Lee

How to make tests more reliable 47. Provide uniform and non-distracting conditions of administration. 8. Use items that permit scoring which is as objective as possible.9. Make comparisons between candidates ad direct as possible. For example, scoring the compositions all on one topic will be more reliable than if the candidates are allowed to choose from six topics.10. Provide detailed scoring key11. Train scorers.

TE30503 LTADr Lee*

Dr Lee

How to make tests more reliable 512. Agree acceptable responses and appropriate scores at outset of scoring.13. Identify candidates by number, not name.14. Employ multiple, independent scoring.

TE30503 LTADr Lee*

Dr Lee

Reliability and ValidityTo be valid a test must provide consistently accurate measurements.It must therefore be reliable.A reliable test, however, may not be valid at all. For example, as a writing test we might require candidates to write down the translation equivalents of 500 words in their own language. This could well be a reliable test; but it is unlikely to be a valid test of writing.In making test reliable, we must be wary of reducing their validity eg. restricting the scope of what candidates are permitted to write in a composition might diminish the validity of the task.There will always be some tension between reliability and validity.Need to balance gains in one against loses in the other.

TE30503 LTADr Lee*

Dr Lee

TE30503 LTADr Lee*Source: http://fcemprep.co.uk/reliability-validity/

Dr Lee

TE30503 LTADr Lee*Source: http://upload.wikimedia.org/wikipedia/commons/5/5d/Reliability_and_validity.svg

Dr Lee

C. PRACTICABILITY Can the test be administered reasonably easily?Can the test be administered without unreasonable expenditure?Can the test be administered in a reasonable amount of time?Can the test be marked reasonably easily?Other things being equal, it is good that a test should be easy and cheap to construct, administer, score and interpret.TE30503 LTADr Lee*

Dr Lee

Practicability 2Is sometimes called 'logistics'. Frankly, testers wish logistics would just go away. It is a nuisance. We are annoyed at the influence of features like staffing, cost, space, time and other logistical concerns that impinge on test development.

TE30503 LTADr Lee*

Dr Lee

Practicability 3A good way to conceive of this problem is from the tension between 'fast', 'cheap' and 'good' The idea is that you can only have two of those features: a test can be cheap and good, but it will take forever to develop or re-engineer. A test can be quick and good, but it will cost a fortune because you will have to re-divert other resources to the test. Finally a test can be 'fast' and 'cheap' but it will be terrible -- it will probably lack reliability and validity.TE30503 LTADr Lee*

Dr Lee

D. WASHBACK/BACKWASH EFFECT What influence will the test have on the teaching which takes place before the test?Will this influence be positive (i.e. will it encourage good learning habits)?Or will the influence be negative?

TE30503 LTADr Lee*

Dr Lee

WASHBACK/BACKWASH EFFECT 2Washback effect is powerful: it can be beneficial or detrimental. If we use a test to improve classroom teaching, then the test is said to have positive washback. However, if the test has a negative effect on teaching, it is said to have negative washback.TE30503 LTADr Lee*

Dr Lee

How to achieve Beneficial Backwash1.Test the abilities whose development you want to encourage. If you want to encourage oral ability, then test oral ability.2. Sample widely and unpredictably. Important that the sample taken should represent as far as possible the full scope of what is specified. 3. Use direct testing. Direct testing implies testing of performance skills, with texts and tasks as authentic as possible. 4. Make test criterion-referenced. If test specifications make clear just what candidates have to be able to do, and with what degree of success, then students will have a clear picture of what they have to do. TE30503 LTADr Lee*

Dr Lee

How to achieve Beneficial Backwash 25. Base achievement test on objectives. If achievement test are based on objectives, rather than on detailed teaching and textbook content, they will provide a truer picture of what has actually been achieved. 6. Ensure test is known and understood by students and teachers. 7. Where necessary, provide assistance to teachers (esp. intro of new test).

TE30503 LTADr Lee*

Dr Lee

ConclusionLooked at 4 fundamental terms associated with LTA:ValidityReliabilityPracticabilityBackwash/Washback

TE30503 LTADr Lee*

Dr Lee

Documents

WEEK 2 LTA 200913