Optimizing EFL Vocabulary Learning with IRT and … EFL Vocabulary Learning with IRT and Online...

Preview:

Citation preview

Optimizing EFL Vocabulary Learning with IRT and Online Technology

Barrier-Free Vocabulary Project

Dr. Brent CulliganEFL InstructorAoyama Gakuin Women’s CollegeSenior Scientist, Lexxica Corp.bculligan@lexxica.com

Dr. Charles BrowneProfessor of LinguisticsMeiji Gakuin UniversityCo-founder, Lexxica Corp.cbrowne@lexxica.com

Outline

1. Review key concepts of lexical coverage

2. How to create special purpose lexicons

3. How to identify each learner’s unknown words

4. Word Engine spaced repetition learning tools

5. Introducing V-Lexx - text analysis and control

EFL learners are not meeting and acquiring enough high-frequency vocabulary to achieve sufficient lexical coverage of English

Challenge #1

Part One

Review the key concepts of lexical coverage

Key coverage thresholds

Coverage describes the percentage of words that are known in a given text

Below 80 percent coverage, reading comprehension is almost impossible (Hu & Nation, 2001)

At 95 percent coverage, it becomes possible to read without the help of dictionaries (Laufer, 1989)

Part One

80 percent coveragePart One

If * planting rates are * with planting targets satisfied in each * and the forests are * at the earliest opportunity, the * .wood supplies could further increase to about 36 million * meters . * in the * *2001-2015. The additional * wood supply should greatly * d * - * , even if much is used for energy production.

12 of 58 words missing

95 percent coveragePart One

1 of 58 words missing

If current planting rates are maintained with planting targets satisfied in each region and the forests are milled at the earliest opportunity, the available wood supplies could further increase to about 36 million * -meters annually in the period 2001-2015. The additional available wood supply should greatly exceed domestic requirements, even if much is used for energyproduction.

HF Words

1 7% West (53), Nation (90)

100 50% West (53), Nation (90)

1000 75% West (53), Engles (68)

2000 85% West (53), Nation (90)

4200 95% Culligan (08)

Relationship between high-frequency words and coverage

Coverage Research

Part One

Latest analysis of 1.2 trillion words in corpora identified the 4200 high-frequency words that provide 95 percent coverage of general texts

These 4200 words are the most direct route to attaining 95 percent coverage of general English

4200 words = 95 Percent Coverage

Part One

100%

75

50

25

00 2000 4200

Lexi

cal C

over

age

High-frequency words

95%

The essential 2000 words

Part One

4200 words = 95 Percent Coverage

Specific purpose vocabulary words for TOEIC and TOEFL exams are quite different than the vocabulary of general English

How can we help learners acquire high-frequency vocabulary for specific purposes?

Challenge #2

Part Two

Creating special purpose lexicons

(The relatively easy part)

Examples of special purpose lexicons

Assemble digital or scanned text

Filter-out junk text

Implement standards of form and lemmatization

Implement Frequency Indexation rules

Organize words by coverage and cumulative coverage

TOEIC TOEFL Steps:

Part Two

Corpus provenance

TOEIC 3461

TOEFL 5290

Part Two

Analysis of 787,382 total words from 300 top selling TOEIC preparation textbooks and 1000 TOEIC practice exams.

Analysis of 1.24 million total words from 350 top selling TOEFL preparation textbooks and 1300 TOEFL practice exams.

The TOEIC and TOEFL corpora were prepared in cooperation with Compass Publishing

TOEIC

Lexicons by cumulative coveragePart Two

99%

Words are not learned in their order of frequency to a specific purpose, but rather words are learned in order of their difficulty within each specific culture

Which high-frequency words are unknown to a given student in any particular country?

Challenge #3

Part Three

Identifying each learner’s unknownhigh-frequency words

(The relatively difficult part)

Part Three

Lexxica uses IRT to identify the statistical difficulty of each word

Part Three

IRT models are defined by:

1. Number of estimated item parameters

2. Number of estimated person parameters

3. Number of intermediated steps in estimated parameters

4. Dimensionality

Part Three

One item parameter, one person parameter model

P(U =1 |θ,b)“The probability of an item being correct, P(U = 1), is conditioned on the ability of the student (θ), and the difficulty of the item (b).”

Part Three

One item parameter, one person parameter model

⎟⎟⎠

⎞⎜⎜⎝

⎛−

=)(1

)(lnθ

θθP

P

WhereP(θ ) is the probability of a correct response given by a student with ability θ.

⎟⎟⎠

⎞⎜⎜⎝

⎛ −=

)()(1ln

θθ

PPb

Easier More Difficult

null

stoprage

burn

Part Three

Item characteristic curves

Part Three

Probability can be interpreted as:

1. The number of items having the same or lower difficulty and that are likely to be known to a student of a given ability

2. The number of students with the same ability and that are likely to know an item or set of items having known difficulty metrics

Part Three

Taking advantage of item probability

1. Each word has an associated difficulty metric

2. The probability that any given word is known depends on the ability of the student

3. A student’s coverage of a specific purpose depends on the ability of the student and the difficulty of the subject’s high-frequency words

Average occurrences per million words

An example of how frequency does not predict difficulty

Part Three

Item injured hurt

Frequency 25x 55x

If difficulty were correlated to frequency then hurt would be the easier word because hurt occurs more frequently in English texts

An example of how frequency does not predict difficulty

Part Three

Item injured hurt

Frequency 25x 55x

Part Three

Item injured hurt

Frequency 25x 55x

Difficulty 1.33 2.34

Japanese people with a 2500word vocabulary will 

know hurt

Japanese people with a 1600word vocabulary will know injured

An example of how frequency does not predict difficulty

V-Check uses lexical decision tasks to identify the user’s ability

Part Three

injured

Easy word

V-Check uses lexical decision tasks to identify the user’s ability

Part Three

hurt

Not so easy word

V-Check uses lexical decision tasks to identify the user’s ability

Part Three

ghart

Many non‐words are used to control for guessing

V-Check uses lexical decision tasks to identify the user’s ability

Part Three

kohl

Is it real, or not?

Part Three

V-Check identifies which specific words are known and reports coverage by each purpose

General English

TOEFL

TOEIC

Interchange

1

・・

1,000

2,000

20,000

Coverage Goal 99%

10,000

Part Three

High-frequency

Low-frequency

Known words

Personal list of unknown words

V-Check creates personal target word lists

How can we transfer word knowledge from short-term to long-term memory?

Challenge #4

Part Four

Word Engine spaced repetition learning tools

(Available June 2008)

Part Four

High-speed learning tools menu

All Word Engine learning tools utilize a spaced repetition system to build long-term retention

Words are repeated at increasing time intervals until fully acquired

Spaced repetition engine and database

Time intervals based on research by Ebbinghaus (1885), Leitner (1972), Pimsleur (1967), and Mondria (1994)

Part Four

Flashcards focus on comprehensionPart Four

SightWords focuses on visual automaticityPart Four

SoundBubbles focuses on aural automaticityPart Four

Part Four

All learning tools provide Session Reports

Reading is a great way for learners to develop control of their vocabulary, grammar, and style

But reading materials have too many difficult words for learners – even the graded readers!

Challenge #5

Part Five

V-Lexx supports text analysis and editing for extensive graded reading

(Available September 2008)

Part Five

V-Lexx is a web-based application for creating lexically graded reading materials

Part Five

V-Lexx analyses text coverage by ability and identifies words that are too difficult

Part Five

Reading and practice materials may be edited so as to provide 95% coverage for any level of ability

V-Lexx displays the edited stories in channels (Available from Sept 2008)

Part Five

Thank you!

www.lexxica.comTo download our vocabulary researchTo use the free Word Engine softwareTo see links to other vocabulary support sitesTo download this presentation

Dr. Charles BrowneProfessor of LinguisticsMeiji Gakuin UniversityCo-founder, Lexxica Corp.cbrowne@lexxica.com

Dr. Brent CulliganEFL InstructorAoyama Gakuin Women’s CollegeSenior Scientist, Lexxica Corp.bculligan@lexxica.com

Go to:

Recommended