The New General Service List: Celebrating 60 years of Vocabulary Learning

Preview:

DESCRIPTION

Dr. Charles Browne Professor of Applied Linguistics Meiji Gakuin University, Tokyo browne@ltr.meijigakuin.ac.jp. The New General Service List: Celebrating 60 years of Vocabulary Learning. A few current Corpus Projects…. Business English Word List for NHK TV Show in Japan - PowerPoint PPT Presentation

Citation preview

THE NEW GENERAL SERVICE LIST:

CELEBRATING 60 YEARS OF VOCABULARY LEARNING

Dr. Charles BrowneProfessor of Applied Linguistics

Meiji Gakuin University, Tokyobrowne@ltr.meijigakuin.ac.jp

A few current Corpus Projects…

1. Business English Word List for NHK TV Show in Japan 2. EnglishCentral (a HUGE video corpus of authentic English)3. New General Service List (CEC)4. New Academic Word List (CEC)5. TOEIC Vocabulary Study List (using past tests materials)

A few of my many online vocabulary learning projects…

4123

Frequency600,000

5,000

EFL Vocabulary Learning in Japan…

chaos

permission

andof

the

exasperate

digress

chaos

permission

andof

the

abstain

emigrate

torment

The Negative Effect of “Test English”

PROBLEM: Students NEED to learn the first 5000 words of English to use English in the real word…

But entrance exams and high school textbooks force students to memorize hundreds of low-frequency words…

RESULT? High school students can’t deal with real world English because they don’t know hundreds of the most important high frequency words…sum

bid

ace

HFW2,289

2,566

4,441

14,641

23,371

25,537

42,024

84,168

When reading or listening to a text, students will of course will not know many words…What percentage of words do you think must be known for them to be able to read easily?

50% ?75% ?85% ?95% ?

75% Coverage 1000 high frequency words

…another possible problem with _____ _____ is how to _____ learner _____ although research suggests that _____ are a very _____ way to learn new words (Leitner, 1972, Mondria, 1994, Nation, 1990, 2001), students may lose interest if _____ are the _____ _____ of doing _____ _____. There is a _____ _____ in the _____ classroom of using games with a _____ purpose to increase and _____ learner _____ (Ersoz , 2000, Uberman 1988, Wright, Betteridge & Buckby, 1984), as well as lower the learner _____ _____ (Asher, 1965, 1977, Dulay, Krashen & Burt, 1982)

[ 19 missing words ]

85% Coverage 2000 high frequency words

…another possible problem with _____ _____ is how to _____ learner _____ although research suggests that _____ are a very efficient way to learn new words (Leitner, 1972, Mondria, 1994, Nation, 1990, 2001), students may lose interest if _____ are the _____ method of doing _____ _____. There is a rich tradition in the _____ classroom of using games with a communicative purpose to increase and maintain learner _____ (Ersoz , 2000, Uberman 1988, Wright, Betteridge & Buckby, 1984), as well as lower the learner _____ _____ (Asher, 1965, 1977, Dulay, Krashen & Burt, 1982)

[ 13 missing words ]

95% Coverage 5000 high frequency words

…another possible problem with vocabulary _____ is how to sustain learner motivation although research suggests that _____ are a very efficient way to learn new words (Leitner, 1972, Mondria, 1994, Nation, 1990, 2001), students may lose interest if _____ are the sole method of doing vocabulary review. There is a rich tradition in the _____ classroom of using games with a communicative purpose to increase and maintain learner motivation (Ersoz , 2000, Uberman 1988, Wright, Betteridge & Buckby, 1984), as well as lower the learner affective filter (Asher, 1965, 1977, Dulay, Krashen & Burt, 1982)

[ 4 missing words ]

Vocabulary Thresholds:

• Below 80%, reading comprehension is almost impossible (Hu & Nation, 2001)

• 95% coverage is the point at which learners can read without the help of dictionaries (Laufer, 1989)

Goals of the NGSL Project…1. to update and greatly expand the size of the corpus used (273

million words) compared to the limited corpus behind the original GSL (about 2.5 million words), with the hope of increasing the generalizability and validity of the list

2. to create a NGSL of the most important high-frequency words useful for second language learners of English which gives the highest possible coverage of English texts with the fewest words possible.

3. to make a NGSL that is based on a clearer definition of what constitutes a word

4. to be a starting point for discussion among interested scholars and teachers around the world, with the goal of updating and revising the list based on this input (in much the same way that West did with the original Interim version of the GSL)

Original GSL in a nutshell… West’s 1953 GSL was actually a more fully developed version of

Faucett’s 1936 “Interim Report on Vocabulary Selection” (sponsored by the Carnegie Corporation)

Contributors included many famous linguists such as Thorndike, Horn, Maki, Palmer and West

Based on a 2.5 million word hand collected corpus (later increased to 5 million words)

Combined objective (frequency) and subjective (teacher intuition) criteria

Approximately 2200 words giving about 80% coverage in general texts No systematic attempt to define what a word was:

“no attempt has been made to be rigidly consistent in the method used for displaying the words: each word has been treated as a separate problem, and the sole aim has been clearness” (West, 1953, page viii)

General Service Lists GSL (West, 1953)http://jbauman.com/aboutgsl.html#1953

Academic Word List AWL (Coxhead 2000)http://www.victoria.ac.nz/lals/resources/academicwordlist/

I made a few GSL/AWL apps and have made all the context available for free to teachers and researchers. Please contact me if you need any of the following for the GSL or AWL:

- Word lists- Parts of speech- Definitions in easy English- Definitions in Japanese- Sound files for pronunciation

of words

browne@ltr.meijigakuin.ac.jp

Getting AWL/GSL lists w/definitions & sound files…

Original GSL created in 1930s…2.5m corpus may have had too many agriculture and religion texts?

AGRICULTURE plow mill spade cultivator

SEA TRAVEL sailor oar vessel merchant

RELIGION kingdom god

devil mercy bless fellowship preach sacred worship holy pray heaven grace pupil church  Lord

NOT AS IN USE? telegraph chimney coal cottage gaiety shilling headdress saucer woolen amongst

Starting Point for NGSL….Access to Cambridge’s more modern 2 BILLION word corpus

CEC corpora used for preliminary analysis of NGSL Corpus Tokens Newspaper 748,391,436 Academic 260,904,352 Learner 38,219,480 Fiction 37,792,168 Journals 37,478,577 Magazines 37,329,846 Non-Fiction 35,443,408 Radio 28,882,717 Spoken 27,934,806 Documents 19,017,236 TV 11,515,296 Total 1,282,909,322

Problems… Newspaper subsection was too large

and dominated the frequencies Newspaper subsection in CEC had too

much of a bias towards financial terms Academic subcorpus of CEC not really

related to needs of General English for 2nd language learners

Corpus Development & WYPIIWYGO….

Balancing the NGSL Corpus…CEC corpora included in final analysis for NGSL  Corpus Tokens Learner 38,219,480 Fiction 37,792,168 Journals 37,478,577 Magazines 37,329,846 Non-Fiction 35,443,408 Radio 28,882,717 Spoken 27,934,806 Documents 19,017,236 TV 11,515,296 Total 273,613,534*

*273 million word subsection used is 100x larger than original GSL corpus…

Next steps… Removed proper nouns Removed numbers, days of the week,

months of the year, etc. Used statistical procedures to combine the

frequencies from the various sub-corpora while adjusting for differences in their relative sizes

Had meetings with Paul Nation to review list in relation to other frequency list and add/delete words deemed appropriate

Input from Paul Nation – Thanks!

Comparing the GSL and NGSL: Apples and Oranges?

Word Families

orLemmas?

Comparing the GSL and NGSL:

• 10 Tokensto, to, be, be, or, not, that, is, the, question

• 8 Typesto, be, or, not, that, is, the, question

• 7 Lemmasto, be, or, not, that, the, question

“To be or not to be, that is the question.”

“To be or not to be, that is the question.”

Rank Word Tokens Coverage1 be 3 30%2 to 2 20%3 not 1 10%3 or 1 10%3 question 1 10%3 that 1 10%3 the 1 10%

Comparing the GSL and NGSL:

The assumption in Word Families is that if the headword is known, so are

all derived forms…ACCEPT

ACCEPTABILITYACCEPTABLEUNACCEPTABLEACCEPTANCEACCEPTEDACCEPTINGACCEPTS

Comparing the GSL and NGSL:

But are they?Word BNCf Difficultyaccept 202 -2.923acceptable 36 -0.510unacceptable 12 -0.216acceptance 27 0.570

Comparing the GSL and NGSL:

THE WORD FAMILY APPROACH (Bauer and Nation, 1993)

Level 1

A different form is a different word. Capitalization is ignored.

Level 2

Regularly inflected words are part of the same family.

Level 3 (10 affixes)

-able, -er, -ish, -less, -ly, -ness, -th, -y, non-, un-, all with restricted uses

Level 4 (10 affixes)

-al, -ation, -ess, -ful, -ism, -ist, -ity, -ize, -ment, in-, all with restricted uses.

Comparing the GSL and NGSL:

Level 5 (48 affixes)

-age (leakage), -al (arrival), -an (American), -ance (clearance), -ant (consultant), -ary (revolutionary), -atory (confirmatory), -dom (kingdom: officialdom), -eer (black marketeer), -en (wooden), -en (widen), -ence (emergence, -ent(absorbent), -ery (bakery: trickery), - ese (Japanese; officialese), -esque (picturesque, -ette (usherette; roomette), -hood (childhood), -i (Israeli), -ian (phonetician; Johnsonian), -ite (Paisleyite; also chemical meaning), -let (coverlet), -ling (ducking), -ly (leisurely), -most (topmost), -ory (contradictory), -ship (studentship), -ward (homeward), -ways (crossways), -wise (endwise; discussion-wise), anti- (anti-inflation), ante- (anteroom), arch- (archbishop), bi- (biplane), circum- (circumnavigate), counter- (counter-attack), en- (encage; enslave), ex- (ex-president), fore- (forename), hyper- (hyperactive), inter- (interweave), mid- (mid-week), mis- (misfit), neo- (neo-colonialism), post- (post-date), pro- (pro-British), semi- (semi-automatic), sub- (subclassify; subterranean).

Comparing the GSL and NGSL:

Level 6 (10 affixes)

-able, -ee, -ic, -ify, -ion, -ist, -ition, -ive, -th, -y

Level 7

Classical roots

Comparing the GSL and NGSL:

However, the GSL is not consistent in defining what to count as a word.

“no attempt has been made to be rigidly consistent in the method used for displaying the words: each word has been treated as a separate problem, and the sole aim has been clearness” (West, 1953, page viii)

To get some consistency, Bauman and Culligan (1995) grouped the original GSL headwords using Level 4 affixes. Then they ranked the words according to frequencies from the Brown Corpus.

Subsequently, Nation released a word list with the program Range that grouped words up to Level 6 affixes, and also included numbers, days of the week, months of the year, and metric units of measurement.

Comparing the GSL and NGSL:

• All inflected forms for all parts of speech plus the plural of the gerund

• Includes both British & American spellings• Examples

– accept: accepts, accepted, accepting, acceptings– acceptable: acceptables– paint: paints, painted, painting, paintings

NGSL: A Modified Lexeme Approach

Comparing the GSL and NGSL:

Comparing the GSL and NGSL: Apples and Oranges no longer…

When both lists are lemmatized, the NGSL provides far more coverage with far fewer words, one of the chief goals of this project…

A Dedicated Website… www.newgeneralservicelist.org

List downloadable in many forms

www.newgeneralservicelist.orgHeadword list…

List downloadable in many forms

www.newgeneralservicelist.orgLemmatized list…

List downloadable in many forms

www.newgeneralservicelist.orgList with definitions in easy English…

List downloadable in many forms

www.newgeneralservicelist.orgList with raw data… (coming soon!)

Now available on free Quizlet Program…www.quizlet.com

Now available on free Quizlet Program…www.quizlet.com

Quizlet both intuitive and fun…www.quizlet.com

Quizlet both intuitive and fun…www.quizlet.com

Quizlet both intuitive and fun…www.quizlet.com

Quizlet both intuitive and fun…www.quizlet.com

Quizlet both intuitive and fun…www.quizlet.com

Quizlet both intuitive and fun…www.quizlet.com

Quizlet both intuitive and fun…www.quizlet.com

Quizlet both intuitive and fun…www.quizlet.com

Quizlet both intuitive and fun…www.quizlet.com

Quizlet both intuitive and fun…www.quizlet.com

Quizlet both intuitive and fun…www.quizlet.com

Soon to be available on WordEngine…www.wordengine.com

New Cambridge Text Series Using NGSL(both in text and online)

Screen Shot 2013-10-09 at 3.34.00 PM

Links to NGSL Resources…

Free Graded Text Editor & Analysis Tool www.er-central.com/ogte/

Free Graded Text Editor & Analysis Tool www.er-central.com/ogte/

Free Text Helper Toolidentifies/gets meanings/gives learning tools for words out of

your level…

Text Helper in Action…

Text Helper in Action…

Text Helper in Action…

Text Helper in Action…

THE NEW GENERAL SERVICE LIST:

CELEBRATING 60 YEARS OF VOCABULARY LEARNING

Dr. Charles BrowneProfessor of Applied Linguistics

Meiji Gakuin University, Tokyobrowne@ltr.meijigakuin.ac.jp

much more to come…

Thank you!

Recommended