65
training.dyslexiaaction .org.uk Online Corpus Literacy Teachers’ Best Friend Dominik Lukeš http://dominiklukes.net Dyslexia Guild Summer Conference 2011

Online corpus: Literacy teachers' best friend

Embed Size (px)

DESCRIPTION

Presentation delivered at Dyslexia Guild Summer Conference 2011 in Oxford. (Slideshow updated based on feedback from the session).

Citation preview

Page 1: Online corpus: Literacy teachers' best friend

training.dyslexiaaction.org.uk

Online CorpusLiteracy Teachers’

Best Friend

Dominik Lukešhttp://dominiklukes.net

Dyslexia Guild Summer Conference 2011

Page 2: Online corpus: Literacy teachers' best friend

Outline

training.dyslexiaaction.org.uk

http://www.flickr.com/photos/adactio/3563832656

What is a corpus

Answering questions with a corpus

The language of corpus searches

The corpus and the classroom

Practice

Page 3: Online corpus: Literacy teachers' best friend

Corpus / Corpora

training.dyslexiaaction.org.uk

????

Page 4: Online corpus: Literacy teachers' best friend

training.dyslexiaaction.org.uk

of about

language

knowledge

http://www.flickr.com/photos/missturner/3029700617/

Page 5: Online corpus: Literacy teachers' best friend

Prescriptivism

training.dyslexiaaction.org.uk

… how language should be used

Descriptivism

… how language is used

v

Page 6: Online corpus: Literacy teachers' best friend

training.dyslexiaaction.org.uk

“Most of the prescriptive rules of the language mavens make no sense on any level. They are bits of folklore that originated for screwball reasons several hundred years ago… For as long as they have existed, speakers have flouted them…”

Page 7: Online corpus: Literacy teachers' best friend

training.dyslexiaaction.org.uk

“intellectual abdication”“should be ashamed”

“current around 1900” “a perversion of grammatical

education” “blind to textual evidence even

when he himself exhibits it”

“dishonest and stupid”

“vile little compendium of tripe about style”

Grammarian Geoffrey K Pullum on …

“More passives in Orwell's pompous essay with the warning about how you

mustn't use them than in any periodical you can lay your

hands on! “

Page 8: Online corpus: Literacy teachers' best friend

This usage stuff is not straightforward and easy. If ever someone tells you that the rules of English grammar are simple and logical and you should just learn them and obey them, walk away, because you're getting advice from a fool.

http://languagelog.ldc.upenn.edu/nll/?p=2790

Page 9: Online corpus: Literacy teachers' best friend

Corpus

training.dyslexiaaction.org.uk

Key modern tool for finding out about how language works…

Page 10: Online corpus: Literacy teachers' best friend

Corpus

training.dyslexiaaction.org.uk

… is a large database of representative language samples …

Page 11: Online corpus: Literacy teachers' best friend

Corpus

training.dyslexiaaction.org.uk

… 100s of millions of words from (mostly) written language in different genres in small samples (~2000 words) …

Page 12: Online corpus: Literacy teachers' best friend

Corpus

training.dyslexiaaction.org.uk

… used for linguistic research, making dictionaries, writing grammars, …

Page 13: Online corpus: Literacy teachers' best friend

training.dyslexiaaction.org.uk

Page 14: Online corpus: Literacy teachers' best friend

Corpora available for teachers

training.dyslexiaaction.org.uk

http://corpus.byu.edu

Page 15: Online corpus: Literacy teachers' best friend

Access to COCA and related BYU corpora is free…

training.dyslexiaaction.org.uk

…but free registration required for more than ~10 queries a day

Page 16: Online corpus: Literacy teachers' best friend

training.dyslexiaaction.org.uk

Page 17: Online corpus: Literacy teachers' best friend

training.dyslexiaaction.org.uk

Brown – the grandfatherCOCABNCWebcorpGoogle

Page 18: Online corpus: Literacy teachers' best friend

training.dyslexiaaction.org.uk

Page 19: Online corpus: Literacy teachers' best friend

training.dyslexiaaction.org.uk

Page 20: Online corpus: Literacy teachers' best friend

training.dyslexiaaction.org.uk

http://www.flickr.com/photos/atoach/3900591006/

Searching a corpus early on in the process of making a generalization can save you a lot of unpleasant surprises later.

Page 21: Online corpus: Literacy teachers' best friend

How do we use the word dyslexia?

We speak more often of dyslexic children than adults.

We speak more often of dyslexia than any other dys- word.

training.dyslexiaaction.org.uk

Page 22: Online corpus: Literacy teachers' best friend

ConcordanceBNC:dyslexic [n*]

COCA: dyslexic [n*]

http://www.americancorpus.org/

http://corpus.byu.edu/bnc

Page 23: Online corpus: Literacy teachers' best friend

training.dyslexiaaction.org.uk

COCA:dys*

Page 24: Online corpus: Literacy teachers' best friend

Suffixing rules

training.dyslexiaaction.org.uk

*yed

*ied

Page 25: Online corpus: Literacy teachers' best friend

Suffixing rules

training.dyslexiaaction.org.uk

*yed

*ied

playedstayed

portrayedenjoyed

unemployedsurveyed

diedtried

marriedworried

identifiedapplied

Page 26: Online corpus: Literacy teachers' best friend

The Corpus Magic

training.dyslexiaaction.org.uk

*

[ ]

?

Different corpora use slightly different codes. Read the

manual.

[n* ]

Page 27: Online corpus: Literacy teachers' best friend

The Corpus Magic

training.dyslexiaaction.org.uk

*[ ]

?Any one character

Any number ofcharacters (incl 0)

Lemma (all inflectional

forms of a word)

Different corpora use slightly different codes. Read the

manual.

[n* ]Part of speech tags

(e.g. nouns)

Page 28: Online corpus: Literacy teachers' best friend

training.dyslexiaaction.org.uk

**each each, reach, beach, teach,

outreach, …, impeach, …

teach* teachers, teaching, …, teachable, teacher-librarians, …

t*ch touch, teach, tech, torch, trench, twitch, …, three-inch, …

teach * teach the, teach us, teach students, …

Page 29: Online corpus: Literacy teachers' best friend

training.dyslexiaaction.org.uk

??each reach, beach, teach, peach,

leach, keach, …

each? each- (1), each# (1) [ie nothing]

?each? peachy, bleachy, teacha, reachs (2) [ie spelling error], …

t?ch tech, tach, toch, tuch, tsch, tich

t??ch touch, teach, torch, tisch, …

Page 30: Online corpus: Literacy teachers' best friend

[Lemma]

training.dyslexiaaction.org.uk

Page 31: Online corpus: Literacy teachers' best friend

Part of speech tags

training.dyslexiaaction.org.uk

[run].[n*]

[run] [n*]

Page 32: Online corpus: Literacy teachers' best friend

Common tags

training.dyslexiaaction.org.uk

[n*] noun[NN2] plural nouns

[v*] verb[VVD] verb past tense

[aj*] (BNC) / [j*](COCA) adjective[av*] (BNC) / [r*](COCA) adverb

Page 33: Online corpus: Literacy teachers' best friend

Help

training.dyslexiaaction.org.uk

Page 34: Online corpus: Literacy teachers' best friend

training.dyslexiaaction.org.uk

Page 35: Online corpus: Literacy teachers' best friend

training.dyslexiaaction.org.uk

Page 36: Online corpus: Literacy teachers' best friend

You can also

training.dyslexiaaction.org.uk

cats and dogs search for idioms

?each*s combine wildcards

[=pretty] search for synonyms

car|bike|horse search for alternatives

used -car exclude searches

For more details see:

Page 37: Online corpus: Literacy teachers' best friend

Concordance + KWIC

training.dyslexiaaction.org.uk

*ies.[N*]

Page 38: Online corpus: Literacy teachers' best friend

training.dyslexiaaction.org.uk

KWIC – Key-Word In Context

*ies.[N*]

Page 39: Online corpus: Literacy teachers' best friend

Limit searches by genre

training.dyslexiaaction.org.uk

Page 40: Online corpus: Literacy teachers' best friend

Other questions corpus can answerAre there more nouns or verbs ending in -ies?

*ies.[V*] vs. *ies.[N*]

Are there four-letter verbs ending in -ed in the present tense? ??ed.[VVB]

What are the most common adjectives describing students vs. pupils. [j*] [student] vs. [j*] [pupil]

What do we say teachers do most often? [teacher] [vvb]

training.dyslexiaaction.org.uk

Page 41: Online corpus: Literacy teachers' best friend

Corpus, rules, and regularity

training.dyslexiaaction.org.uk

http://www.flickr.com/photos/51505078@N00/352492687

pre*

*ed

*ies.[V*]

Page 42: Online corpus: Literacy teachers' best friend

CollocationsLimits on variability

training.dyslexiaaction.org.uk

See also Kennedy, p. 80-23

Page 43: Online corpus: Literacy teachers' best friend

Collocations (cont)Limits on variability

training.dyslexiaaction.org.uk

See also Kennedy, p. 80-23

Page 44: Online corpus: Literacy teachers' best friend

Collocations (cont)

training.dyslexiaaction.org.uk

[teacher] must [v*]

Page 45: Online corpus: Literacy teachers' best friend

Idioms and set phrases

training.dyslexiaaction.org.uk

275 results

359 results

Page 46: Online corpus: Literacy teachers' best friend

Google as a Corpus

training.dyslexiaaction.org.uk

"put the search text in quotes"

use * for the search item

Page 47: Online corpus: Literacy teachers' best friend

training.dyslexiaaction.org.uk

Page 48: Online corpus: Literacy teachers' best friend

Google as a Corpus Pros & Cons

training.dyslexiaaction.org.uk

PRO: rare, low frequency usage, uptodate usage

CON: no sampling, no frequency sort, no genre limit, no part of speech tags

Page 49: Online corpus: Literacy teachers' best friend

Google results counts are only rough estimates…

training.dyslexiaaction.org.uk

http://searchengineland.com/why-google-cant-count-results-properly-53559

Different people searching in different geographic locations can get different numbers

Sometimes searching for A gives fewer results than searching for A without B

Page 50: Online corpus: Literacy teachers' best friend

…but Google fights can be fun training.dyslexiaaction.org.uk

Page 51: Online corpus: Literacy teachers' best friend

WebCorp is makes Google search results linguist-friendly

training.dyslexiaaction.org.uk

Page 52: Online corpus: Literacy teachers' best friend

Avoid Common Corpus Errors

training.dyslexiaaction.org.uk

Be aware of limitations: sampling, coverage, size, presence of typos and errors, bad part of speech tagging

Beware of low frequency results

Beware of homographs

Check results come from multiple sources

Check KWIC to confirm relevance

Limit search by genre http://www.flickr.com/photos/andreassolberg/433734311

Page 53: Online corpus: Literacy teachers' best friend

Check examples and sources

training.dyslexiaaction.org.uk

Page 54: Online corpus: Literacy teachers' best friend

Always check low frequency results

training.dyslexiaaction.org.uk

must [v*] [n*]

…sometimes they come from the same source

Page 55: Online corpus: Literacy teachers' best friend

False roots

http://etymonline.com

corner, silly, preface, cockroach, protest, stable …

Page 56: Online corpus: Literacy teachers' best friend

Make your own corpus with TextSTAT

http://neon.niederlandistik.fu-berlin.de/en/textstat

Page 57: Online corpus: Literacy teachers' best friend

Make your own corpus with AntConc

training.dyslexiaaction.org.uk

http://www.antlab.sci.waseda.ac.jp/software.html

Page 58: Online corpus: Literacy teachers' best friend

Corpus in the classroom

training.dyslexiaaction.org.uk

teacher preparation

student discovery

Page 59: Online corpus: Literacy teachers' best friend

Teacher preparation

training.dyslexiaaction.org.uk

find relevant, common examples prepare worksheets check for exceptions find out answers to student

questions about rules and usage

Page 60: Online corpus: Literacy teachers' best friend

Student discovery

training.dyslexiaaction.org.uk

show search results to students to work out rules or word meanings

teach students how to search for questions

ask students to give each other puzzles for searching

Page 61: Online corpus: Literacy teachers' best friend

For heavy classroom use…

training.dyslexiaaction.org.uk

register for group access to prevent spam lock out

Page 62: Online corpus: Literacy teachers' best friend

Corpus v dictionary

training.dyslexiaaction.org.uk

Page 63: Online corpus: Literacy teachers' best friend

Non-classroom corpus use

training.dyslexiaaction.org.uk

supplement dictionary

cross-word puzzles

check typical usage when writing

Page 64: Online corpus: Literacy teachers' best friend

Where to go next?

training.dyslexiaaction.org.uk

http://www.corpora4learning.net

Page 65: Online corpus: Literacy teachers' best friend

Thank youContact http://dominiklukes.net

training.dyslexiaaction.org.uk