Upload
baboon84
View
224
Download
0
Embed Size (px)
Citation preview
7/28/2019 2. Measurement
1/43
Measurement
7/28/2019 2. Measurement
2/43
Measurement, test, evaluation Measurement: process of quantifying the
characteristics of persons according to explicitprocedures and rules
Quantification: assigning numbers, distinguishablefrom qualitative descriptions Characteristics: mental attributes such as aptitude,
intelligence, motivation, field
dependence/independence, attitude, nativelanguage, fluency
Rules and procedure: observation must bereplicable in other context and with other
individuals.
7/28/2019 2. Measurement
3/43
Test Carroll: a procedure designed to elicit certain
behavior from which one can make inferences
about certain characteristics of an individual
Elicitation: to obtain a specific sample of behavior Interagency Language Roundtable (ILR):oral
interview: a test of speaking consisting of (1) a set
of elicitation procedures, including a sequence of
activities and sets of question types and topics; (2)
a measurement scale of language proficiency
ranging from a low level of 0 to a high level of 5.
7/28/2019 2. Measurement
4/43
Test?
Years informal contact with a child to rate thechilds oral proficiency: the rater did not followthe procedures
A rating based on a collection of personal lettersto indicate an individuals ability to write effectiveargumentative editorials for a news magazine.
A teachers rating based on informal interactivesocial language use to indicate the students abilityto use language to perform variouscognitive/academic language functions.
7/28/2019 2. Measurement
5/43
Evaluation
Definition: systematic gathering ofinformation for the purpose of making
decisions.
Evaluation need not be exclusivelyquantitative: verbal descriptions,
performance profiles, letters of reference,overall impressions
7/28/2019 2. Measurement
6/43
Relation Between Evaluation,
Test, and Measurement
7/28/2019 2. Measurement
7/43
Relation Between Evaluation,
Test, and Measurement 1: qualitative descriptions of student performance
for diagnosing learning problems
2. teachers ranking for assigning grades 3. achievement test to determine student progress
4. proficiency test as a criterion in second
language acquisition research 5.assigning code numbers to subjects in secondlanguage research according to native language
7/28/2019 2. Measurement
8/43
What Is It, Measurement, Test,
Evaluation ? placement test
classroom quiz
grading of composition
rating of classroom fast reading exercise
rating of dictation
7/28/2019 2. Measurement
9/43
Measurement Qualities
A test must be reliable and valid.
7/28/2019 2. Measurement
10/43
Reliability Free from errors of measurement. If a student does a test twice within a short time,
and if the test is reliable, the results of the 2 tests
should be the same. If 2 raters rate the same writing sample, the ratings
should be consistent if the ratings should bereliable.
The primary concerns in examining reliability is toidentify the different sources or error, then to usethe appropriate empirical procedures forestimating the effect of these sources of errors on
test scores.
7/28/2019 2. Measurement
11/43
Validity
Validity: the extent to which the inferences ordecisions are meaningful, appropriate and useful.
The test should measure the ability and very littleelse.
If a test is not reliable, it is not valid.
Validity is a quality of test interpretation and use.
The investigation of validity is both a matter ofjudgment and of empirical research.
7/28/2019 2. Measurement
12/43
Reliability and Validity
Both are essential to the use of tests.
Neither is a quality of tests themselves: reliability
is a quality of test scores, while validity is aquality of interpretations or uses that are made of
test scores.
Neither is absolute: we can never attain perfectlyerror free measures and particular use of a test
score depends upon many factors outside the test
itself.
7/28/2019 2. Measurement
13/43
Properties of Measurement
Scales 4 properties
distinctiveness: different numbers assigned to
persons with different values ordered in magnitude: the larger the number, the
larger the amount of the attribute
equal interval: equal difference between abilitylevels
absolute zero point: the absence of the attribute
7/28/2019 2. Measurement
14/43
Four Types of Scales
Nomical: naming classes or categories.
Ordinal: an order with respect to each other.
Interval: the distance between the levels areequal.
Ratio: includes the absolute zero point
7/28/2019 2. Measurement
15/43
Nominal
Examples :License plate numbers; SocialSecurity numbers; names of people, places,
objects; numbers used to identify footballplayers
Limitations: Cannot specify quantitative
differences among categories
7/28/2019 2. Measurement
16/43
Ordinal
Examples: Letter grades (ratings fromexcellent to failing), military ranks, order of
finishing a test Limitations: Restricted to specifying
relative differences without regard to
absolute amount of difference
7/28/2019 2. Measurement
17/43
Interval
Examples: Temperature (Celsius andFahrenheit), calendar dates
Limitations: Ratios are meaningless; thezero point is arbitrarily defined
7/28/2019 2. Measurement
18/43
Ratio
Examples: Distance, weight, temperature indegrees Kelvin, time required to learn a
skill or subject Limitations: None except that few
educational variables have ratio
characteristics
7/28/2019 2. Measurement
19/43
Nominal, Ordinal, Interval or
Ratio? 5 in IELTS
550 in TOEFL
C in BEC
8 in CET-4 writing
58 in the final evaluation of a student
7/28/2019 2. Measurement
20/43
Property and Type of Scale
Type of Scale
Property Nominal Ordinal Interval Ratio
Distinctiveness + + + +
Ordering - + + +
Equal intervals - - + +
Absolute zero
point
- - - +
7/28/2019 2. Measurement
21/43
Limitations in Measurement
It is essential and important for us tounderstand the characteristics of measures
of mental abilities and the limitations thesecharacteristics place on our interpretation of
test scores.
These limitations are of two kinds:limitations in specification and limitations
in observation and quantification.
7/28/2019 2. Measurement
22/43
Limitation in Specification
Two levels of the specification of language ability Theoretical level
Task: we need to specify the ability in relation to,or in contrast to, other language abilities and otherfactors that may affect test performance.
Reality: large number of different individual
characteristicscognitive, affective, physicalthat could potentially affect test performance makethe task nearly impossible.
7/28/2019 2. Measurement
23/43
Limitation in Specification
Operational level
Task: we need to specify the instances of language
performance as indicators of the ability we wish tomeasure.
Reality: the complexity ad the interrelationshipsamong the factors that affect performance on
language tests force us to simplify assumptions in
designing language tests and interpreting test
scores.
7/28/2019 2. Measurement
24/43
Conclusion
Our interpretations and uses of test scoreswill be of limited validity.
Any theory of language test performancewe develop is likely to be underspecified
and we have to rely on measurement theory
to deal with the problem ofunderspecification.
7/28/2019 2. Measurement
25/43
Limitations in Observation and
Quantification All measures of mental ability are indirect,
incomplete, imprecise, subjective and
relative.
7/28/2019 2. Measurement
26/43
Indirectness The relationship between test scores and the
abilities we want to measure is indirect. Languagetests are indirectindicators of the underlying traitsin which we are interested. Because scores from
language tests are indirect indicators of ability, thevalid interpretation and use of such scores dependscrucially on the adequacy of the way we havespecified the relationship between the test score
and the ability we believe it indicates. To theextent that this relationship isnot adequatelyspecified, the interpretations and uses made of thetest score may be invalid.
7/28/2019 2. Measurement
27/43
Incompleteness
The performance we observe and measurein a language test is a sample of an
individual's total performance in thatlanguage.
7/28/2019 2. Measurement
28/43
Incompleteness
Since we cannot observe an individual'stotal language use, one of our main
concerns in language testing is assuring thatthe sample we do observe is representative
of that total use - a potentially infinite set of
utterances, whether written or spoken.
7/28/2019 2. Measurement
29/43
Incompleteness
It is vitally important that we incorporateinto our measurement design principles or
criteria that will guide us in determiningwhat kinds of performance will be most
relevant to and representative of the abilities
we want to measure, for example, real lifelanguage use.
7/28/2019 2. Measurement
30/43
Imprecision
Because of the nature of language, it is virtuallyimpossible (and probably not desirable) to write
tests with 'pure' items that test a single construct orto be sure that all items are equally representative
of a given ability. Likewise, it is extremely
difficult to develop tests in which all the tasks or
items are at the exact level of difficultyappropriate for the individuals being tested.
7/28/2019 2. Measurement
31/43
Subjectivity
As Pilliner (1968) noted, language tests aresubjective innearly all aspects.
Test developers Test writers
Test takers
Test scorers
7/28/2019 2. Measurement
32/43
Relativeness
The presence or absence of language abilities isimpossible to define in an absolute sense.
The concept of 'zero' language ability is a complex
one
The individual with absolutely complete languageability does not exist.
All measures of language ability based on domainspecifications of actual language performancemust be interpreted as relative to some 'norm' of
performance.
7/28/2019 2. Measurement
33/43
Steps in Measurement
Three steps
1. identify and define the constructtheoretically
2. define the construct operationally
3. establish procedures for quantifyingobservations
7/28/2019 2. Measurement
34/43
Defining Constructs
Theoretically Historically, there were two distinct
approaches to defining language proficiency.
Real-life approach: language proficiencyitself is not define, but a domain of actuallanguage us is identified.
The approach assumes that if we measurefeatures present in language use, wemeasure the language proficiency.
7/28/2019 2. Measurement
35/43
Real Life Approach: Example
American Council on the Teaching of ForeignLanguages (ACTFL): definition of advanced level
Able to satisfy the requirements of everyday
situations and routine school and workrequirements. Can handle with confidence but notwith facility complicated tasks and socialsituations, such as elaborating, complaining, and
apologizing. Can narrate the describe with somedetails, liking sentences together smoothly. Cancommunicate facts and talk casually about topicsof current public and personal interest, usinggeneral vocabulary.
7/28/2019 2. Measurement
36/43
Interactional/ability Approach
Language proficiency is defined in terms ofits component abilities. These components
can be reading, writing, listening, speaking,(Lado), functional framework (Halliday),
communicative frameworks (Munby)
7/28/2019 2. Measurement
37/43
Example of Pragmatic
Competence The knowledge necessary, in addition to
organizational competence, for
appropriately producing or comprehendingdiscourse,. Specifically, it includesillocutionary competence, or the knowledgeof how to perform speech acts, and
sociolinguistic competence, or theknowledge of the sociolinguisticconventions which govern language use.
7/28/2019 2. Measurement
38/43
Defining Constructs
Operationally This step involves determining how to isolate the
construct and make it observable.
We must decide what specific procedures we willfollow to elicit the kind of performance that willindicate the degree to which the given construct is
present in the individual.
The context in which the language testing takesplace influences the operations we would follow.
The test must elicit language performance in astandard way, under uniform conditions.
7/28/2019 2. Measurement
39/43
Quantifying Observations
The units of measurement of language tests aretypically defined in two ways.
1. points or levels of language performance. From zero to five in oral interview Different levels in mechanics, grammar,
organization, content in writing
Mostly an ordinal scale, therefore needingappropriate statistics for ordinal scales. 2. the number of tasks successfully completed
7/28/2019 2. Measurement
40/43
Quantifying Observations
2. the number of tasks successfullycompleted
We generally treat such a score as one withan interval scale.
Conditions for an interval scale
7/28/2019 2. Measurement
41/43
Quantifying Observations the performance must be defined and selected in a
way that enables us to determine the relativedifficulty and the extent to which they representthe construct being tested.
the relative difficulty: determined from thestatistical analysis of responses to individual testitems.
How much they represent the construct: depend onthe adequacy of the theoretical definition of theconstruct.
7/28/2019 2. Measurement
42/43
Score Sorting
Raw score
Score Class
1. Range
2. Number of groups: K=1.87(N-1)2/5
3. Interval: I=R/K
4. Highest and Lowest of the group
5. Arrange the data into groups
7/28/2019 2. Measurement
43/43
Central Tendency & Dispersion
Mean: x-=x / N
Median: middle of the range
Mode: the score around which the bulk ofthe data congregate
Variance: V=(x-x-)2 / (n-1)
Standard deviation:
S=((x-x-)2 / (n-1))