2. Measurement

Embed Size (px)

Citation preview

  • 7/28/2019 2. Measurement

    1/43

    Measurement

  • 7/28/2019 2. Measurement

    2/43

    Measurement, test, evaluation Measurement: process of quantifying the

    characteristics of persons according to explicitprocedures and rules

    Quantification: assigning numbers, distinguishablefrom qualitative descriptions Characteristics: mental attributes such as aptitude,

    intelligence, motivation, field

    dependence/independence, attitude, nativelanguage, fluency

    Rules and procedure: observation must bereplicable in other context and with other

    individuals.

  • 7/28/2019 2. Measurement

    3/43

    Test Carroll: a procedure designed to elicit certain

    behavior from which one can make inferences

    about certain characteristics of an individual

    Elicitation: to obtain a specific sample of behavior Interagency Language Roundtable (ILR):oral

    interview: a test of speaking consisting of (1) a set

    of elicitation procedures, including a sequence of

    activities and sets of question types and topics; (2)

    a measurement scale of language proficiency

    ranging from a low level of 0 to a high level of 5.

  • 7/28/2019 2. Measurement

    4/43

    Test?

    Years informal contact with a child to rate thechilds oral proficiency: the rater did not followthe procedures

    A rating based on a collection of personal lettersto indicate an individuals ability to write effectiveargumentative editorials for a news magazine.

    A teachers rating based on informal interactivesocial language use to indicate the students abilityto use language to perform variouscognitive/academic language functions.

  • 7/28/2019 2. Measurement

    5/43

    Evaluation

    Definition: systematic gathering ofinformation for the purpose of making

    decisions.

    Evaluation need not be exclusivelyquantitative: verbal descriptions,

    performance profiles, letters of reference,overall impressions

  • 7/28/2019 2. Measurement

    6/43

    Relation Between Evaluation,

    Test, and Measurement

  • 7/28/2019 2. Measurement

    7/43

    Relation Between Evaluation,

    Test, and Measurement 1: qualitative descriptions of student performance

    for diagnosing learning problems

    2. teachers ranking for assigning grades 3. achievement test to determine student progress

    4. proficiency test as a criterion in second

    language acquisition research 5.assigning code numbers to subjects in secondlanguage research according to native language

  • 7/28/2019 2. Measurement

    8/43

    What Is It, Measurement, Test,

    Evaluation ? placement test

    classroom quiz

    grading of composition

    rating of classroom fast reading exercise

    rating of dictation

  • 7/28/2019 2. Measurement

    9/43

    Measurement Qualities

    A test must be reliable and valid.

  • 7/28/2019 2. Measurement

    10/43

    Reliability Free from errors of measurement. If a student does a test twice within a short time,

    and if the test is reliable, the results of the 2 tests

    should be the same. If 2 raters rate the same writing sample, the ratings

    should be consistent if the ratings should bereliable.

    The primary concerns in examining reliability is toidentify the different sources or error, then to usethe appropriate empirical procedures forestimating the effect of these sources of errors on

    test scores.

  • 7/28/2019 2. Measurement

    11/43

    Validity

    Validity: the extent to which the inferences ordecisions are meaningful, appropriate and useful.

    The test should measure the ability and very littleelse.

    If a test is not reliable, it is not valid.

    Validity is a quality of test interpretation and use.

    The investigation of validity is both a matter ofjudgment and of empirical research.

  • 7/28/2019 2. Measurement

    12/43

    Reliability and Validity

    Both are essential to the use of tests.

    Neither is a quality of tests themselves: reliability

    is a quality of test scores, while validity is aquality of interpretations or uses that are made of

    test scores.

    Neither is absolute: we can never attain perfectlyerror free measures and particular use of a test

    score depends upon many factors outside the test

    itself.

  • 7/28/2019 2. Measurement

    13/43

    Properties of Measurement

    Scales 4 properties

    distinctiveness: different numbers assigned to

    persons with different values ordered in magnitude: the larger the number, the

    larger the amount of the attribute

    equal interval: equal difference between abilitylevels

    absolute zero point: the absence of the attribute

  • 7/28/2019 2. Measurement

    14/43

    Four Types of Scales

    Nomical: naming classes or categories.

    Ordinal: an order with respect to each other.

    Interval: the distance between the levels areequal.

    Ratio: includes the absolute zero point

  • 7/28/2019 2. Measurement

    15/43

    Nominal

    Examples :License plate numbers; SocialSecurity numbers; names of people, places,

    objects; numbers used to identify footballplayers

    Limitations: Cannot specify quantitative

    differences among categories

  • 7/28/2019 2. Measurement

    16/43

    Ordinal

    Examples: Letter grades (ratings fromexcellent to failing), military ranks, order of

    finishing a test Limitations: Restricted to specifying

    relative differences without regard to

    absolute amount of difference

  • 7/28/2019 2. Measurement

    17/43

    Interval

    Examples: Temperature (Celsius andFahrenheit), calendar dates

    Limitations: Ratios are meaningless; thezero point is arbitrarily defined

  • 7/28/2019 2. Measurement

    18/43

    Ratio

    Examples: Distance, weight, temperature indegrees Kelvin, time required to learn a

    skill or subject Limitations: None except that few

    educational variables have ratio

    characteristics

  • 7/28/2019 2. Measurement

    19/43

    Nominal, Ordinal, Interval or

    Ratio? 5 in IELTS

    550 in TOEFL

    C in BEC

    8 in CET-4 writing

    58 in the final evaluation of a student

  • 7/28/2019 2. Measurement

    20/43

    Property and Type of Scale

    Type of Scale

    Property Nominal Ordinal Interval Ratio

    Distinctiveness + + + +

    Ordering - + + +

    Equal intervals - - + +

    Absolute zero

    point

    - - - +

  • 7/28/2019 2. Measurement

    21/43

    Limitations in Measurement

    It is essential and important for us tounderstand the characteristics of measures

    of mental abilities and the limitations thesecharacteristics place on our interpretation of

    test scores.

    These limitations are of two kinds:limitations in specification and limitations

    in observation and quantification.

  • 7/28/2019 2. Measurement

    22/43

    Limitation in Specification

    Two levels of the specification of language ability Theoretical level

    Task: we need to specify the ability in relation to,or in contrast to, other language abilities and otherfactors that may affect test performance.

    Reality: large number of different individual

    characteristicscognitive, affective, physicalthat could potentially affect test performance makethe task nearly impossible.

  • 7/28/2019 2. Measurement

    23/43

    Limitation in Specification

    Operational level

    Task: we need to specify the instances of language

    performance as indicators of the ability we wish tomeasure.

    Reality: the complexity ad the interrelationshipsamong the factors that affect performance on

    language tests force us to simplify assumptions in

    designing language tests and interpreting test

    scores.

  • 7/28/2019 2. Measurement

    24/43

    Conclusion

    Our interpretations and uses of test scoreswill be of limited validity.

    Any theory of language test performancewe develop is likely to be underspecified

    and we have to rely on measurement theory

    to deal with the problem ofunderspecification.

  • 7/28/2019 2. Measurement

    25/43

    Limitations in Observation and

    Quantification All measures of mental ability are indirect,

    incomplete, imprecise, subjective and

    relative.

  • 7/28/2019 2. Measurement

    26/43

    Indirectness The relationship between test scores and the

    abilities we want to measure is indirect. Languagetests are indirectindicators of the underlying traitsin which we are interested. Because scores from

    language tests are indirect indicators of ability, thevalid interpretation and use of such scores dependscrucially on the adequacy of the way we havespecified the relationship between the test score

    and the ability we believe it indicates. To theextent that this relationship isnot adequatelyspecified, the interpretations and uses made of thetest score may be invalid.

  • 7/28/2019 2. Measurement

    27/43

    Incompleteness

    The performance we observe and measurein a language test is a sample of an

    individual's total performance in thatlanguage.

  • 7/28/2019 2. Measurement

    28/43

    Incompleteness

    Since we cannot observe an individual'stotal language use, one of our main

    concerns in language testing is assuring thatthe sample we do observe is representative

    of that total use - a potentially infinite set of

    utterances, whether written or spoken.

  • 7/28/2019 2. Measurement

    29/43

    Incompleteness

    It is vitally important that we incorporateinto our measurement design principles or

    criteria that will guide us in determiningwhat kinds of performance will be most

    relevant to and representative of the abilities

    we want to measure, for example, real lifelanguage use.

  • 7/28/2019 2. Measurement

    30/43

    Imprecision

    Because of the nature of language, it is virtuallyimpossible (and probably not desirable) to write

    tests with 'pure' items that test a single construct orto be sure that all items are equally representative

    of a given ability. Likewise, it is extremely

    difficult to develop tests in which all the tasks or

    items are at the exact level of difficultyappropriate for the individuals being tested.

  • 7/28/2019 2. Measurement

    31/43

    Subjectivity

    As Pilliner (1968) noted, language tests aresubjective innearly all aspects.

    Test developers Test writers

    Test takers

    Test scorers

  • 7/28/2019 2. Measurement

    32/43

    Relativeness

    The presence or absence of language abilities isimpossible to define in an absolute sense.

    The concept of 'zero' language ability is a complex

    one

    The individual with absolutely complete languageability does not exist.

    All measures of language ability based on domainspecifications of actual language performancemust be interpreted as relative to some 'norm' of

    performance.

  • 7/28/2019 2. Measurement

    33/43

    Steps in Measurement

    Three steps

    1. identify and define the constructtheoretically

    2. define the construct operationally

    3. establish procedures for quantifyingobservations

  • 7/28/2019 2. Measurement

    34/43

    Defining Constructs

    Theoretically Historically, there were two distinct

    approaches to defining language proficiency.

    Real-life approach: language proficiencyitself is not define, but a domain of actuallanguage us is identified.

    The approach assumes that if we measurefeatures present in language use, wemeasure the language proficiency.

  • 7/28/2019 2. Measurement

    35/43

    Real Life Approach: Example

    American Council on the Teaching of ForeignLanguages (ACTFL): definition of advanced level

    Able to satisfy the requirements of everyday

    situations and routine school and workrequirements. Can handle with confidence but notwith facility complicated tasks and socialsituations, such as elaborating, complaining, and

    apologizing. Can narrate the describe with somedetails, liking sentences together smoothly. Cancommunicate facts and talk casually about topicsof current public and personal interest, usinggeneral vocabulary.

  • 7/28/2019 2. Measurement

    36/43

    Interactional/ability Approach

    Language proficiency is defined in terms ofits component abilities. These components

    can be reading, writing, listening, speaking,(Lado), functional framework (Halliday),

    communicative frameworks (Munby)

  • 7/28/2019 2. Measurement

    37/43

    Example of Pragmatic

    Competence The knowledge necessary, in addition to

    organizational competence, for

    appropriately producing or comprehendingdiscourse,. Specifically, it includesillocutionary competence, or the knowledgeof how to perform speech acts, and

    sociolinguistic competence, or theknowledge of the sociolinguisticconventions which govern language use.

  • 7/28/2019 2. Measurement

    38/43

    Defining Constructs

    Operationally This step involves determining how to isolate the

    construct and make it observable.

    We must decide what specific procedures we willfollow to elicit the kind of performance that willindicate the degree to which the given construct is

    present in the individual.

    The context in which the language testing takesplace influences the operations we would follow.

    The test must elicit language performance in astandard way, under uniform conditions.

  • 7/28/2019 2. Measurement

    39/43

    Quantifying Observations

    The units of measurement of language tests aretypically defined in two ways.

    1. points or levels of language performance. From zero to five in oral interview Different levels in mechanics, grammar,

    organization, content in writing

    Mostly an ordinal scale, therefore needingappropriate statistics for ordinal scales. 2. the number of tasks successfully completed

  • 7/28/2019 2. Measurement

    40/43

    Quantifying Observations

    2. the number of tasks successfullycompleted

    We generally treat such a score as one withan interval scale.

    Conditions for an interval scale

  • 7/28/2019 2. Measurement

    41/43

    Quantifying Observations the performance must be defined and selected in a

    way that enables us to determine the relativedifficulty and the extent to which they representthe construct being tested.

    the relative difficulty: determined from thestatistical analysis of responses to individual testitems.

    How much they represent the construct: depend onthe adequacy of the theoretical definition of theconstruct.

  • 7/28/2019 2. Measurement

    42/43

    Score Sorting

    Raw score

    Score Class

    1. Range

    2. Number of groups: K=1.87(N-1)2/5

    3. Interval: I=R/K

    4. Highest and Lowest of the group

    5. Arrange the data into groups

  • 7/28/2019 2. Measurement

    43/43

    Central Tendency & Dispersion

    Mean: x-=x / N

    Median: middle of the range

    Mode: the score around which the bulk ofthe data congregate

    Variance: V=(x-x-)2 / (n-1)

    Standard deviation:

    S=((x-x-)2 / (n-1))