View
225
Download
1
Category
Tags:
Preview:
Citation preview
Between Corpus and Dictionary
Adam KilgarriffLexical Computing Ltd
Lexicography MasterClass LtdUniversities of Leeds, Sussex
Szeged, Jan 2008 Kilgarriff, Global WordNet 2
What is a word sense?
Szeged, Jan 2008 Kilgarriff, Global WordNet 3
Preliminaries
What is language? What is meaning?
Szeged, Jan 2008 Kilgarriff, Global WordNet 4
What is language?
Szeged, Jan 2008 Kilgarriff, Global WordNet 5
What is language? In our heads
Szeged, Jan 2008 Kilgarriff, Global WordNet 6
What is language? In our heads In texts and sound signals
Szeged, Jan 2008 Kilgarriff, Global WordNet 7
What is language? In our heads In texts and sound signals Both
Szeged, Jan 2008 Kilgarriff, Global WordNet 8
Methodology Study language in our heads
Introspection Semantic analysis Experiments with human subjects
“rationalist” (Leibniz, Chomsky) Problems: coverage, arbitrariness
Szeged, Jan 2008 Kilgarriff, Global WordNet 9
Methodology Study text
“empiricist” (Locke, Hume)
Physics: forces, matter Chemistry: chemicals, bonds Language: text, speech
signals
Szeged, Jan 2008 Kilgarriff, Global WordNet 10
It goes against the grain
What is important about a sentence? its meaning
Corpus methodology: Throw away individual sentence
meaning Find patterns
Szeged, Jan 2008 Kilgarriff, Global WordNet 11
Empiricist linguistics A new way to find out about
language 15 years of rapid ascent
Computers Corpora
bigger and bigger data sets available Language technology tools
lemmatizers, POS-taggers, parsers, machine learning for pattern finding
Szeged, Jan 2008 Kilgarriff, Global WordNet 12
Rationalists vs empiricists in the age of the web
semantic web vs Google?
Szeged, Jan 2008 Kilgarriff, Global WordNet 13
What are you?
Temperament Complementary/alternatives
Barbu and Poesio, Keller and Lapata: comparisons, evaluations
(AK: current research project)
Szeged, Jan 2008 Kilgarriff, Global WordNet 14
What is meaning?
Fregean Gricean
Szeged, Jan 2008 Kilgarriff, Global WordNet 15
Gottlob Frege (1848-1925) Founder of modern logic Truth values
The sentence “grass is green” is true if and only if grass is green (Tarski)
Meanings of words, phrases are such that:
Put them together in a sentence State basic facts Sentence computes to ‘true’ if sentence is
true, ‘false’ if it is false
Szeged, Jan 2008 Kilgarriff, Global WordNet 16
Gottlob Frege (1848-1925)
Formal semantics Sparkling analyses for quantifiers,
connectives Montague semantics
Foundations for maths, databases, ontologies …
Szeged, Jan 2008 Kilgarriff, Global WordNet 17
H. P. Grice (1913-1988)
An agent means something by an utterance if and only if they intended the utterance to produce some effect in an audience by means of the recognition of this intention.
Dictionary of Philosophy of Mind, http://philosophy.uwaterloo.ca
Szeged, Jan 2008 Kilgarriff, Global WordNet 18
Meaning is something you do
Basis of meaning is Meaning event Speaker’s intention Speaker’s expectation of
interpretation of hearer (messy, hard)
Szeged, Jan 2008 Kilgarriff, Global WordNet 19
Strawson commentary (1970s)For the sake of a label, we might call it the
conflict between the theorists of communication-intention and the theorists of formal semantics. […] A struggle on what seems to be such a central issue in philosophy should have something of a Homeric quality; and a Homeric struggle calls for gods and heroes. I can at least, though tentatively, name some living captains and benevolent shades: on the one side, say, Grice, Austin, and the later Wittgenstein; on the other, Chomsky, Frege, and the earlier Wittgenstein.
Szeged, Jan 2008 Kilgarriff, Global WordNet 20
Battle of the two Adams?
Szeged, Jan 2008 Kilgarriff, Global WordNet 21
Relevance to word senses
Fregean Supports reasoning Builds on well-defined word-meanings
Identifying word meanings: can’t help Fall back on Grice
Szeged, Jan 2008 Kilgarriff, Global WordNet 22
Fauconnier and Turner
“linguistics expressions prompt for meanings rather than express meanings”
(AK chapter, Agirre and Edmonds WSD book)
Szeged, Jan 2008 Kilgarriff, Global WordNet 23
Preliminaries over
What is a word sense
Szeged, Jan 2008 Kilgarriff, Global WordNet 24
The lexicographers
They create them Methods
Introspection Other dictionaries Corpus
Atkins, Hanks, Krishnamurthy
Szeged, Jan 2008 Kilgarriff, Global WordNet 25
What is a word sense (1)
SFIP Sufficiently frequent insufficiently
predictable (a glass of) whisky x (a glass of) tequila
Szeged, Jan 2008 Kilgarriff, Global WordNet 26
What is a word sense (2)
homonymy
analogy polysemy rules
collocation
Szeged, Jan 2008 Kilgarriff, Global WordNet 27
What is a word sense (3)
A cluster Of instances of use
Operationalised as: corpus lines Clustered by lexicographers
Szeged, Jan 2008 Kilgarriff, Global WordNet 28
What is a word sense (3)
Szeged, Jan 2008 Kilgarriff, Global WordNet 29
What is a word sense (3)
Szeged, Jan 2008 Kilgarriff, Global WordNet 30
What is a word sense (3)
Szeged, Jan 2008 Kilgarriff, Global WordNet 31
What is a word sense (3)
Szeged, Jan 2008 Kilgarriff, Global WordNet 32
What is a word sense (3) A cluster
Of instances of use Operationalised as: corpus lines
Clustered by lexicographers Makes sense of
Overlapping senses Different dictionaries, different senses Lumping and splitting
Szeged, Jan 2008 Kilgarriff, Global WordNet 33
I don’t believe in word senses
Believe in: resurrection ghost witch vampire god
miracle fairy Philosophy:
Ontological commitment (same meaning different register)
“good entities to build belief systems on”
Szeged, Jan 2008 Kilgarriff, Global WordNet 34
But I’m an NLP person
Automatic clustering? Inspiration:
Hindle 1991, Schütze 1993, Grefenstette 1993, Lin 1999
You can get semantic sense from corpora+stats
Szeged, Jan 2008 Kilgarriff, Global WordNet 35
First attempt
Longman 1994 Abject failure
No grammar Corpus too small and noisy Naïve clustering Useless programmer
Szeged, Jan 2008 Kilgarriff, Global WordNet 36
Collocations Easy
Most words don’t go with most other words
Then build on what we can do well (metaphor, analogy, homonymy,
rules: all much harder)
Szeged, Jan 2008 Kilgarriff, Global WordNet 37
The Sketch Engine 2003: programmer problem solved Corpora
More available Build big clean ones from web
Grammar POS-taggers/lemmatisers available Shallow regexp grammars if no full parser
Stats: progress (Lin, Curran, Evert …)
Szeged, Jan 2008 Kilgarriff, Global WordNet 38
demo
Szeged, Jan 2008 Kilgarriff, Global WordNet 39
Clustering
Word sketch Collocates organised by grammar
Dictionary Collocates (and other things)
organised by meaning How to re-organise
Three phases
Szeged, Jan 2008 Kilgarriff, Global WordNet 40
Semi-automatic dictionary drafting (SADD) Automatic clustering of collocates
Propose senses Iterate:
Lexicographer input Confirm/reject/edit sense inventory Assigns collocates / corpus lines to senses
WSD Uses seeds to build full WSD for word Find more collocates for each sense
XML dictionary entry Load into dictionary-editing tool
Szeged, Jan 2008 Kilgarriff, Global WordNet 41
Atkins method for bilingual lexicography Analyse source language
From corpus List all expressions that might possibly have
a non-predictable translation Very fine grained Lots of collocations
target-language-neutral; re-usable Translate Edit to finalise dictionary
Szeged, Jan 2008 Kilgarriff, Global WordNet 42
New English-Irish Dictionary
Irish: Gaelic language, some native speakers,
culturally important for Ireland Project
To replace dictionary from 1950s Government-funded project Lexicography MasterClass (Atkins
Rundell Kilgarriff) designed project in 2003
Szeged, Jan 2008 Kilgarriff, Global WordNet 43
English analysis for NEID
New project, 1st Feb 2008- late 2010 Contractor: Lexicography
MasterClass 12 lexicographers Plan
Test SADD If viable, use it on industrial scale
Szeged, Jan 2008 Kilgarriff, Global WordNet 44
demo2
http://corpora.fi.muni.cz/sadd/
Szeged, Jan 2008 Kilgarriff, Global WordNet 45
Thank you
Sketch Engine: http://www.sketchengine.co.uk
Lexicom workshop Pre-Euralex, 10-15 July, Barcelona
http://www.iula.upf.edu/agenda/lexicom Pre-CICLING, Mexico, Feb 2009
Recommended