48
Without data, nothing Adam Kilgarriff Lexical Computing Ltd University of Leeds

Without data, nothing

  • Upload
    lovey

  • View
    53

  • Download
    0

Embed Size (px)

DESCRIPTION

Without data, nothing. Adam Kilgarriff Lexical Computing Ltd University of Leeds. Generative Lexicon. Account of non-standard uses of words So: we need a dataset. Method. Sample of words Sample of corpus instances for each Choose a dictionary Sense-tag - PowerPoint PPT Presentation

Citation preview

Page 1: Without data, nothing

Without data, nothing

Adam KilgarriffLexical Computing Ltd

University of Leeds

Page 2: Without data, nothing

Generative Lexicon

Account of non-standard uses of words

So: we need a dataset

Gasteiz-Vitoria, 2012

Kilgarriff: Without Data, Nothing 2

Page 3: Without data, nothing

Method

Sample of words Sample of corpus instances for each Choose a dictionary Sense-tag Identify mismatches to dict senses For each

Does it fit the GL model?

Gasteiz-Vitoria, 2012

Kilgarriff: Without Data, Nothing 3

Page 4: Without data, nothing

Resources

Words (random sample) modest disability steering seize sack (v)

sack (n) onion rabbit handbag Corpus instances

between 82 and 718 for each word Total: 2276

Dictionary: HECTOR OUP/Xerox project in corpus lexicography

Gasteiz-Vitoria, 2012

Kilgarriff: Without Data, Nothing 4

Page 5: Without data, nothing

Tagging

Three professional lexicographers Assign sense to each corpus instance

For this exercise If anything other than 3-way agreement

Re-examine 390 of 2276 cases (17%)

Gasteiz-Vitoria, 2012

Kilgarriff: Without Data, Nothing 5

Page 6: Without data, nothing

modest

Any two dictionaries divide up space differently HECTOR: 9 CIDE: 3 LDOCE: 4 COBUILD: 5

tagger agreement – less than half Messy but no GL-like casesGasteiz-Vitoria, 2012

Kilgarriff: Without Data, Nothing 6

Page 7: Without data, nothing

Szeged, Jan 2008 Kilgarriff, Global WordNet 7

What is language?

Page 8: Without data, nothing

steering 2 senses

Activity: his steering was careless Mechanism: they overhauled the steering

16 re-examined, most underspecified it has the Peugeot’s steering feel

One more complex case After nearly fifty years [as a bus driver] Mr. Hannis

stepped down from behind the steering wheel

Gasteiz-Vitoria, 2012

Kilgarriff: Without Data, Nothing 8

Page 9: Without data, nothing

onion

Two senses: plant and food 34 cases re-examined

10 bridged divide Plant the sets two inches apart to produce

a good yield of medium-sized onions Others – medicine, decorative feature,

dye, cliché of Frenchness It’s not all frogs legs and strings of onions

in the South of FranceGasteiz-Vitoria, 2012

Kilgarriff: Without Data, Nothing 9

Page 10: Without data, nothing

sack (n)

2 x sack race One metaphor

Santa Claus Ridley pulled another doubtful gift from his sack

Ridley: British politician

Gasteiz-Vitoria, 2012

Kilgarriff: Without Data, Nothing 10

Page 11: Without data, nothing

sack (v)

And Labour MP, Mr. Bruce George, has called for the firm to be sacked from duty at Prince Andrew’s £5 million home at Sunningwell Park near Windsor

Non-standard because end-employment needs PERSON as direct object.Candidate for GL treatmentGasteiz-Vitoria, 2012

Kilgarriff: Without Data, Nothing 11

Page 12: Without data, nothing

handbag She moved from handbags through

gifts to the flower shop [handbag department in department

store]

Candidate for GL treatment

Gasteiz-Vitoria, 2012

Kilgarriff: Without Data, Nothing 12

Page 13: Without data, nothing

Results 2276 corpus instances 390 re-examined 41 non-standard uses 2 potentially accounted for by GL

Conclusion GL will never account for a large share of non-

standard word use

Gasteiz-Vitoria, 2012

Kilgarriff: Without Data, Nothing 13

Page 14: Without data, nothing

Gasteiz-Vitoria, 2012

Kilgarriff: Without Data, Nothing 14

What is language?

Page 15: Without data, nothing

Gasteiz-Vitoria, 2012

Kilgarriff: Without Data, Nothing 15

What is language? In our heads

Page 16: Without data, nothing

Gasteiz-Vitoria, 2012

Kilgarriff: Without Data, Nothing 16

What is language? In our heads In texts and sound signals

Page 17: Without data, nothing

Gasteiz-Vitoria, 2012

Kilgarriff: Without Data, Nothing 17

What is language? In our heads In texts and sound signals Both

Page 18: Without data, nothing

Gasteiz-Vitoria, 2012

Kilgarriff: Without Data, Nothing 18

Methodology Study language in our heads

Introspection Semantic analysis Experiments with human subjects

“rationalist” (Leibniz, Chomsky) Problems: coverage, arbitrariness

Page 19: Without data, nothing

Gasteiz-Vitoria, 2012

Kilgarriff: Without Data, Nothing 19

Methodology Study text

“empiricist” (Locke, Hume)

Physics: forces, matter Chemistry: chemicals, bonds Language: text, speech

signals

Page 20: Without data, nothing

Gasteiz-Vitoria, 2012

Kilgarriff: Without Data, Nothing 20

Empiricist linguistics A new way to find out about

language 20 years of rapid ascent

Computers Corpora

bigger and bigger data sets available Language technology tools

lemmatizers, POS-taggers, parsers, machine learning for pattern finding

Page 21: Without data, nothing

Gasteiz-Vitoria, 2012

Kilgarriff: Without Data, Nothing 21

Preliminaries over

What is a word sense

Page 22: Without data, nothing

Gasteiz-Vitoria, 2012

Kilgarriff: Without Data, Nothing 22

Preliminaries over

What is a word sense (my PhD in 5 slides)

Page 23: Without data, nothing

Gasteiz-Vitoria, 2012

Kilgarriff: Without Data, Nothing 23

Preliminaries over

What is a word sense (my PhD in 5 slides) Where do you find them?

Page 24: Without data, nothing

Gasteiz-Vitoria, 2012

Kilgarriff: Without Data, Nothing 24

Preliminaries over

What is a word sense (my PhD in 5 slides) Where do you find them? Dictionaries!

Page 25: Without data, nothing

Gasteiz-Vitoria, 2012

Kilgarriff: Without Data, Nothing 25

The lexicographers

They create them Methods

Introspection Other dictionaries Corpus

Atkins, Hanks

Page 26: Without data, nothing

Gasteiz-Vitoria, 2012

Kilgarriff: Without Data, Nothing 26

What is a word sense (1)

SFIP Sufficiently frequent insufficiently

predictable (a glass of) whisky x (a glass of) tequila

Page 27: Without data, nothing

Gasteiz-Vitoria, 2012

Kilgarriff: Without Data, Nothing 27

What is a word sense (2)

homonymy

analogy polysemy rules

phraseology

Page 28: Without data, nothing

Gasteiz-Vitoria, 2012

Kilgarriff: Without Data, Nothing 28

What is a word sense (3)

A cluster Of instances of use

Operationalised as: corpus lines Clustered by lexicographers

Page 29: Without data, nothing

Gasteiz-Vitoria, 2012

Kilgarriff: Without Data, Nothing 29

What is a word sense (3)

Page 30: Without data, nothing

Gasteiz-Vitoria, 2012

Kilgarriff: Without Data, Nothing 30

What is a word sense (3)

Page 31: Without data, nothing

Gasteiz-Vitoria, 2012

Kilgarriff: Without Data, Nothing 31

What is a word sense (3)

Page 32: Without data, nothing

Gasteiz-Vitoria, 2012

Kilgarriff: Without Data, Nothing 32

What is a word sense (3)

Page 33: Without data, nothing

Gasteiz-Vitoria, 2012

Kilgarriff: Without Data, Nothing 33

What is a word sense (3) A cluster

Of instances of use Operationalised as: corpus lines

Clustered by lexicographers Makes sense of

Overlapping senses Different dictionaries, different senses Lumping and splitting

Page 34: Without data, nothing

Theory

Hanks Norms and exploitations Task of lexicographer

Record the norms Speakers may always exploit norms to

say something new

Gasteiz-Vitoria, 2012

Kilgarriff: Without Data, Nothing 34

Page 35: Without data, nothing

Boring question Homonymy or polysemy

We all know it’s a kline

Interesting question Norm or exploitation

Gasteiz-Vitoria, 2012

Kilgarriff: Without Data, Nothing 35

Page 36: Without data, nothing

metaphor

see meaning understand Norm

I travelled the path From life towards artDesire the horse Depression the cart Leonard Cohen

ExploitationGasteiz-Vitoria, 2012

Kilgarriff: Without Data, Nothing 36

Page 37: Without data, nothing

How do they do it?

honeymoon

Gasteiz-Vitoria, 2012

Kilgarriff: Without Data, Nothing 37

Page 38: Without data, nothing

Gasteiz-Vitoria, 2012

Kilgarriff: Without Data, Nothing 38

Page 39: Without data, nothing

Gasteiz-Vitoria, 2012

Kilgarriff: Without Data, Nothing 39

Page 40: Without data, nothing

Gasteiz-Vitoria, 2012

Kilgarriff: Without Data, Nothing 40

Page 41: Without data, nothing

Gasteiz-Vitoria, 2012

Kilgarriff: Without Data, Nothing 41

Page 42: Without data, nothing

Gasteiz-Vitoria, 2012

Kilgarriff: Without Data, Nothing 42

Page 43: Without data, nothing

The Sketch Engine

Corpus query tool Used for making dictionaries at

OUP, CUP, Collins, Macmillan, Le Robert, Cornelsen, Elhuyar Foundation

Also Universities Linguistic research Teaching

Linguistics, also languagesGasteiz-Vitoria, 2012

Kilgarriff: Without Data, Nothing 43

Page 44: Without data, nothing

60 languages covered

Gasteiz-Vitoria, 2012

Kilgarriff: Without Data, Nothing 44

Page 45: Without data, nothing

Gasteiz-Vitoria, 2012

Kilgarriff: Without Data, Nothing 45

Page 46: Without data, nothing

Individual licences (£4.99/month) University site licences Free trial – self register

Gasteiz-Vitoria, 2012

Kilgarriff: Without Data, Nothing 46

Page 47: Without data, nothing

Build instant corpora form the web WebBootCaT

Install your corpora Compare corpora

http://www.sketchengine.co.uk

Gasteiz-Vitoria, 2012

Kilgarriff: Without Data, Nothing 47

Page 48: Without data, nothing

Gasteiz-Vitoria, 2012

Kilgarriff: Without Data, Nothing 48

Thank you

homonymy

analogy polysemy rules

phraseology