Investigating the relationship between empirical task ...· Investigating the relationship between

  • View

  • Download

Embed Size (px)

Text of Investigating the relationship between empirical task ...· Investigating the relationship between

  • 1

    Jamie Dunlea, British Council

    Investigating the relationship

    between empirical task difficulty,

    textual features and CEFR levels

    EALTA 2014

    29 May 1 June

    University of Warwick

    Language Assessment


  • 2



    information to share

  • 3

    What we will do Look at task specifications for reading to specify

    criterial features of input texts for different CEFR levels

    Focus on vocabulary profiles and readability

    measures which are included in item writing


    Discuss an exploratory analysis of the textual

    features of texts built to spec and the relationship to

    empirical difficulty

    Look at the relationship between

    Rasch difficulty estimates of reading tasks from

    the item bank of an operational test designed

    around the CEFR and

    selected linguistic indices which we use for item

    specification (and some additional measures).

  • Davidson & Fulcher (2007) encourage test developers to see the framework as a series of guidelines from which tests (and teaching materials) can be built to suit local contextualized needs. 4

    The CEFR can be a

    springboard to task and test


    Task specs: Where to start?

  • 5

    Test specs from the CEFR CEFR: Vocabulary Range


    Has a good range of vocabulary for matters connected to his

    field and most general topics? Can vary formulation to avoid

    frequent repetition, but lexical gaps can still cause hesitation and



    Has a sufficient vocabulary to express him/herself with some

    circumlocutions on most topics pertinent to his everyday life such

    as family, hobbies and interests, work, travel, and current events.


    Has sufficient vocabulary to conduct routine, everyday

    transactions involving familiar situations and topics.

    Has a sufficient vocabulary for the expression of basic

    communicative needs.

    Has a sufficient vocabulary for coping with simple survival


  • 6

    Task specs: Where to start?

    Descriptors need to remain holistic in order to give

    an overview; detailed lists of microfunctions,

    grammatical forms and vocabulary are presented

    in language specifications for particular languages

    (e.g. Threshold Level 1990).

    An analysis of the functions, notions, grammar

    and vocabulary necessary to perform the

    communicative tasks described on the scales

    could be part of the process of developing new

    sets of language specifications.

    (Council of Europe, 2001, p. 30)

  • CEFR Grid for Reading Tests 7


    Text source


    Discourse type



    Nature of content

    Text length



    Vocabulary Only frequent vocabulary

    Mostly frequent vocabulary

    Rather extended


    Manual (Council of Europe, 2009)

    Alderson, et al (2006)

  • Some criteria when considering categories

    Consistency Transparency Accountability Ease of use for item writers

    Specs have different audiences, and different levels of specificity according to the needs of the audience

    No spec is exhaustive: all specs will contain some of a possible range of categories and measures

    No spec is final: specs need to be reviewed and revised 8

    Some important principles

  • 9

    Test Aptis

    General Component Reading Task

    Matching headings

    to text Features of the Task

    Skill focus Expeditious global reading of longer text, integrating propositions across a longer

    text into a discourse-level representation.

    Task Level A1 A2 B1 B2 C1 C2 task


    Matching headings to paragraphs within a longer text. Candidates read through

    a longer text consisting of 7 paragraphs, identifying the best heading for each

    paragraph from a bank of 8 options.





    Expeditious reading: local

    (scan/search for specifics)

    Careful reading: local

    (understanding sentence)

    Expeditious reading: global

    (skim for gist/search for key


    Careful reading: global

    (comprehend main idea(s)/overall




    Levels of


    Word recognition

    Lexical access

    Syntactic parsing

    Establishing propositional meaning (cl./sent. level)


    Building a mental model

    Creating a text level representation (disc. structure)

    Creating an intertextual representation (multi-text)

    Task specs: an example

  • 10

    Features of the Input Text

    Words 700-750 words

    Domain Public Occupational Educational Personal

    Discourse mode Descriptive Narrative Expository Argumentative Instructive

    Content knowledge General Specific

    Cultural specificity Neutral Specific

    Nature information Only concrete Mostly concrete Fairly abstract Mainly abstract

    Lexical Level K1 K2 K3 K4 K5 K6 K7 K8 K9 K10

    The cumulative coverage should reach 95% at the K5 level. No

    more than 5% of words should be beyond the K5 level.

    Readability Flesch-Kincaid Grade Level 9-12

    Grammar A1-B2 Exponents Average sentence length 18-20 words

    Text genre Magazines, newspapers, instructional materials (such as extracts from

    undergraduate textbooks describing important events and ideas, etc).

    Task specs: an example

  • 11

    Task specs: an example

    Features of the Response Targets

    Length Up to 10

    words Lexical K1-K5 Grammatical A1-B2

    Distractors Length

    Up to 10

    words Lexical K1-K5 Grammatical B1-B2



    Within sentence Across sentences Across paragraphs

    Extra criteria

    Presentation Written Aural Illustration


  • 12

    Lexical Level K1 K2 K3 K4 K5 K6 K7 K8 K9 K10

    Readability Flesch-Kincaid Grade Level 9-12

    Using automated tools

    Lexical profiles: BNC-20 lists

    Derived from British National Corpus spoken corpora by

    Paul Nation (2006) and adapted by Tom Cobb

    20 1000-word levels, word=word family

    Tools for analysis:

    Alternative frequency lists

    General Service List (2000 word families

    Academic Word List

    BNC-Coca 25

  • 13

    Using automated tools Readability: Flesch-kincaid grade level

    Based on syllables per word and words per sentence.

    lexical level (longer words tend to more less frequent) and

    syntactic complexity (longer sentences have more

    compound sentences and embedded clauses)

    Scaled to US grade levels ( higher number, harder text)

    for analysis:

    Readability measures available in Word

    Some alternative readability

    Reading Ease (basis for Flesch-kincaid)

    Cohmetrix indices

    Lexile measures

  • How much of a text do learners need to be

    able to comprehend?

    A threshold level of 95% suggested for reasonable comprehension and guessing words from context (Laufer, 1989; Hirsch & Nation, 1992; Chujo & Oghigian, 2009)

    A higher threshold of 98% suggested for reading with ease (Hirsch & Nation, 1992; Hu & Nation, 2000; Nation, 2006)

    Van Zeeland & Schmitt (2012) suggest the different criteria could be suitable for different purposes. 95% suitable for adequate comprehension

  • 15

    Lvl Items/



    length Task focus Response format

    A1 5 50-60 Sentence level meaning

    (Careful, local reading)

    3-option multiple choice for

    each gap.

    A2 6 90-100 Inter-sentence cohesion

    (Careful global reading)

    Reorder 6 jumbled sentences.

    All sentences must be used to

    complete the story.

    B1 7 125-135

    Text-level comprehension

    of short texts

    (Careful global reading)

    7 gaps in a short text. Select

    the best word to fill each gap

    from a bank of 9 options.

    B2 7 700-750

    Text-level comprehension

    of longer text

    (Global reading, both

    careful and expeditious)

    7 Paragraphs forming a long

    text. Select the most

    appropriate heading for each

    paragraph from a bank of 8


    Aptis Reading Test Tasks

  • 16

    Lvl Word



View more >