Upload
geraldine-carpenter
View
219
Download
0
Tags:
Embed Size (px)
Citation preview
The BNC Design Model
Adam Kilgarriff, Sue Atkins, Michael Rundell
The Lexicography MasterClasshttp://www.lexmasterclass.com
Birmingham Jul 2007 Kilgarriff Atkins Rundell 2
BNC
Very widely used across Lexicography Linguistics Language technology Language teaching
A spectacular success
Birmingham Jul 2007 Kilgarriff Atkins Rundell 3
The BNC design model
Well planned Atkins Clear Ostler 1992
Produced a successful outcome A model for others (working on
other languages) to follow
Birmingham Jul 2007 Kilgarriff Atkins Rundell 4
Czech National Corpus American National Corpus Hungarian National Corpus Hellenic National Corpus Croatian National Corpus Slovak National Corpus National Corpus for Ireland
Birmingham Jul 2007 Kilgarriff Atkins Rundell 5
Great! However
Birmingham Jul 2007 Kilgarriff Atkins Rundell 6
Birmingham Jul 2007 Kilgarriff Atkins Rundell 7
BNC Design Model past its
sell-by
Adam Kilgarriff, Sue Atkins, Michael Rundell
The Lexicography MasterClasshttp://www.lexmasterclass.com
Birmingham Jul 2007 Kilgarriff Atkins Rundell 9
BNC Design Model
1980s Eighteen years old Pre-web
Birmingham Jul 2007 Kilgarriff Atkins Rundell 10
Sue Atkins’ dream, ca 1985 The dream
Gazillions of text More than we could possibly imagine
The plan Let’s reach for the sky:
Birmingham Jul 2007 Kilgarriff Atkins Rundell 11
Amazing
implausibleridiculous
you won’t possibly do it
100 million: you must be kidding
Birmingham Jul 2007 Kilgarriff Atkins Rundell 12
2007: Google
everyday access to eighty thousand times as much
Birmingham Jul 2007 Kilgarriff Atkins Rundell 13
Inference
Vision behind the BNC (gazillions, reach for the sky) leads to 1980s: the BNC 2007: something quite different
Birmingham Jul 2007 Kilgarriff Atkins Rundell 14
BNC vision: other aspects
A balance of text types Substantial share (10%) spoken No swingeing copyright constraints A reference corpus
Birmingham Jul 2007 Kilgarriff Atkins Rundell 15
Balance of text types Good goal – added value for BNC Design by linguists and publishers
Reflects their ideas/interests Constrained by collection costs Prescribes not describes
Costs now quite different Blogs etc are free
“What is a good taxonomy of text types” Good open research question
Birmingham Jul 2007 Kilgarriff Atkins Rundell 16
Spoken language Many things are possible
Online transcripts Hoffman: 300m words of Larry King show
Web 2.0 cheese Whole BNC: 2,954 occurrences Spoken BNC: 456 occurrences Youtube: 34,900 videos
Everyzing.com audio search (formerly podzinger) cheese: 37,030 files
Birmingham Jul 2007 Kilgarriff Atkins Rundell 17
Play Here 0:18:23 ...
Birmingham Jul 2007 Kilgarriff Atkins Rundell 18
Birmingham Jul 2007 Kilgarriff Atkins Rundell 19
Copyright BNC isn’t do-as-you-like
Compare WordNet Corpus collectors are like search
engines Copyright and Web 2:
Website’s defense: “we did not know it was there and will
promptly remove it” OK in US law
Birmingham Jul 2007 Kilgarriff Atkins Rundell 20
A reference corpus
“a reference point for the language” Balanced Fixed
Experiments are replicable Freely available Size might not be important
Brown still works well for many questions
Birmingham Jul 2007 Kilgarriff Atkins Rundell 21
1990s Corpus building: expensive Wanted for many purposes
One-size-fits-all BNC met many needs
Many text types have too few documents
Medical, technical, children’s Used ‘because it is available’
No affordable alternatives
Birmingham Jul 2007 Kilgarriff Atkins Rundell 22
2007
Corpus building: cheap WebBootCaT
Made-to-measure corpora
Different research question: different corpus
Is “general-purpose reference corpus” a useful idea?
Birmingham Jul 2007 Kilgarriff Atkins Rundell 23
In sum: BNC Design Model
1990s innovative and inspiring
2007 historic interest
New thinking needed for a new situation