Monitoring the penalization/advantage of lexical ambiguity in vector model representations

1st Joint Conference of the EPS and SEPEX, Granada 2010

Monitoring the penalization/advantage of lexical ambiguity in vector model representations

Guillermo Jorge-Botana, Ricardo Olmos & José A. LeónUniversidad Autónoma de Madrid

Extensional definition of lexical ambiguity

Two criteria for categorizing the phenomenon

Representation of meanings Separate vs. single representation of senses(Homonymy, Polysemy, Metonymy, Metaphors…)

Contextual distribution of word occurrencesOccurrences across few or many contexts

www.elsemantico.com

Representation of senses

Separate senses (Klein & Murphy, 2001).

A common core with specific parts(Rodd, Gaskell & Marslen-Wilson, 2002; Klepousniotou & Baum, 2006).

Unique representation as in LSA models (Kintsch, 2001; Kintsch & Mangalath, in press).

www.elsemantico.com

Contextual distribution of word occurrences

Measured by Contextual Diversity (Adelman, Gordon, Brown & Quesada, 2006, McDonald & Shillcock, 2001)

Polysemy: Polysemic words have greater Contextual Diversity than monosemic.

Since Abstract words have greater Contextual Diversity than Concrete words …

Can abstractness be a kind of lexical ambiguity?

www.elsemantico.com

Concept of Context Diversity in Vector-Models

The vectors that represent the words inside a Vector-Space (as in LSA) can be a measure of indices such Context Diversity

C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C11

C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C11

Measuring the thematic focus, for example with entropy formula

www.elsemantico.com

Current Study

So what can a model such as LSA tell us about some of the empirical effects related to lexical ambiguity?

A model of a unique lexical representationAn exclusively linguistic model.

www.elsemantico.com

Current Study

Effects related to lexical ambiguity

1. Associative difficulty2. Lexical Decision Task

www.elsemantico.com

1. Associative difficulty

For polysemic words: difficulty generating meaning without a context (Duffy, Morris, & Rayner, 1988)

Abstract words rarely have semantic relations (they predominantly have associative relations) (e.g. Crutch, 2006; Crutch, Ridha, & Warrington, 2006; Crutch & Warrington, 2005; Warrington & Crutch, 2007; Crutch & Warrington, 2005; Duñabeitia et al., 2009 )

www.elsemantico.com

Associative difficulty for LSA

www.elsemantico.com

A

B

C

D

E

F

G

Main sense The other sense

Penalization

The similaritydecreases Polysemy

Monosemy

Monosemy

2. Data with LDT

Polysemy advantage (Hino & Lupker, 1996; Pexman & Lupker, 1999)

Concrete advantageThe response to LDT is task dependentIt is favored by unspecific activation mediated by semantic representations (Piercey & Joordens, 2000)

www.elsemantico.com

LDT with LSA

Ambiguous vectors have difficulty with main relations but an advantage with other relations.This is due to the distributional properties of the vectors.This advantage with other relations can produce non-specific activation, improving LDT response times

www.elsemantico.com

LDT with LSA

www.elsemantico.com

A

B

C

D

E

F

Advantage with other relationsgenerates non-specific activation

Penalization with other wordswith different senses

Polysemy

Polysemy

Monosemy

HypothesesA vector model like LSA can simulate the associative difficulty

In polysemic wordsIn abstract words

A vector model can simulate the response to LDT in polysemicwordsSince responses to LDT are the opposite in Abstract words (to polysemic). A linguistic model like LSA cannot simulate the response of abstract words to LDT without introducing another source of activation (for example mental imagery).

www.elsemantico.com

Procedure

Three simulations1. Introducing an invented word into semantic space at

different points along the polysemy-monosemycontinuum.

2. With real polysemic-monosemic words

3. With real abstract-concrete words

www.elsemantico.com

The analysis

Extract semantic neighbors and analyze the similarities (cosines) between them and the reference word.Analyze Context Diversity within the vector with ranges and entropy measures.

www.elsemantico.com

First simulationLSA is trained with LEXESP corpus (Sebastián, Cuetos, Carreiras & Martí, 2000)

An artificial nonsense word “noray” is introduced into the space in documents with two possible senses represented in different proportions: the sense of “traditional sport” and sense of “old neighborhood”The proportions of the two senses were

30-0 (monosemy), 25-5 (dominant polysemy)15-15 (equally probable polysemy)5-25 (dominant polysemy)0-30 (monosemy)

www.elsemantico.com

Results(I)

0.3

0.35

0.4

0.45

0.5

0.55

0.6

Series1 0.5918 0.52358 0.44215 0.4206 0.42179 0.47611 0.47638

0-30 5-25 10-20 15-15 20-10 25-5 30-0

Mean of the 30 first semantic neighbors

The penalization exists for polysemic conditions

www.elsemantico.com

Results(I)

0

0,05

0,1

0,15

0,2

0,25

0,3

0,35

0,41

180

359

538

717

896

1075

1254

1433

1612

1791

1970

2149

2328

2507

2686

2865

3044

3223

3402

3581

3760

3939

0-305-2510-2015-1520-1025-530-0

Ranges of the 4000 first semantic neighbors

The pattern changed around neighbor 350 (cosine 0.2).For the polysemic conditions, the penalization became advantageous from this point.

www.elsemantico.com

Results(I)

-0,1

-0,08

-0,06

-0,04

-0,02

0

0,02

0,04

0,06

0,08

0,1

1 15 29 43 57 71 85 99 113 127 141 155 169 183 197 211 225 239

0-3015-1530-0

Scores on the dimension of the vectors in the three main conditions

The effect of the penalization and the change of pattern may be due to the distributional properties of the vectors.The 15-15 condition was spread along all dimensions.

www.elsemantico.com

Results(I)

1st, 2nd and 3rd position in the saturations of dimensions

The position of polysemic condition (15-15) were often medium (scoring low on all the dimensions).

www.elsemantico.com

Second simulation

Same as first simulation but with real polysemicand monosemic words extracted from other studies (Estévez, 1991; Jorge-Botana, León & Olmos, 2010).

Controlled for frequency, concreteness and imaginability.

www.elsemantico.com

Results (II)

0

0.1

0.2

0.3

0.4

0.5

0.6

1

153

305

457

609

761

913

1065

1217

1369

1521

1673

1825

1977

2129

2281

2433

2585

2737

2889

3041

3193

3345

3497

3649

3801

3953

4105

4257

4409

4561

4713

4865

MonosemyPolysemy

www.elsemantico.com

Ranges of the first 5000 semantic neighbors

Monosemy had an advantage for the first relations.The pattern changes around neighbor 550 (cosine 0.2) in favour of polysemic.

Results(II)

GrupomonosemyPolisemy

Entro

pía

0,80

0,70

0,60

0,50

0,40

0,30

87

Global weight

Polysemy Monosemy

The polysemy condition have less global weight thus more entropyThe spread among different contexts for polysemic words is a substantive fact.

www.elsemantico.com

Third simulation

Same as second simulation but with real abstract and concrete words extracted from other studies (Duñabeitia et al., 2009).

www.elsemantico.com

Results(III)

0

0.1

0.2

0.3

0.4

0.5

0.6

1

242

483

724

965

1206

1447

1688

1929

2170

2411

2652

2893

3134

3375

3616

3857

4098

4339

4580

4821

AbstractConcrete

Ranges of the first 5000 semantic neighbors

Concrete words obtained an advantage for the first 400 relations.The pattern changed around neighbor 400 (cosine 0.2) in favor ofabstract words.

www.elsemantico.com

Results(III)

GrupoConcreteAbstract

Entro

pía

0,80

0,70

0,60

0,50

0,40

0,30

Global weight

Abtracts Concretes

The abstract condition had less global weight thus more entropy.The spread among different contexts for abstract words is a substantive fact.

www.elsemantico.com

Preliminary conclusionsSo, what can a model like LSA tell us about the empirical effects in question?

A model of a unique representation

An exclusively linguistic model.

www.elsemantico.com

Preliminary conclusions (I)

In polysemy and in abstractness, associative difficulty could be explained by the linguistic distributional properties of a unique representation.

www.elsemantico.com

Preliminary conclusions (II)

In polysemy, LDT responses could be explained by the unspecific activation produced by the linguistic distributional properties of a unique representation.

www.elsemantico.com

Preliminary conclusions (III)

In the Abstract word condition, LDT responses couldn’t be explained by the linguistic distributional properties. Another source of potential activation is required.

(perceptual representations?)

www.elsemantico.com

Preliminary conclusions (IV)

In abstract word condition:No additional source other than linguistic is needed as an explanation to account for associative difficulty.No differences in cerebral activation between concrete and abstract in semantic tasks (Pexman et al., 2007).

Another source of activation is needed for LDT responses.Differences in cerebral activation between concrete and abstract in LDT (Binder, Westbury, et al, 2005).

www.elsemantico.com

Thank you

www.elsemantico.com

Monitoring the penalization/advantage of lexical ambiguity in vector model representations

Guillermo Jorge-Botana, Ricardo Olmos & José A. León

Universidad Autónoma de Madrid

Technology

Monitoring the penalization/advantage of lexical ambiguity in vector model representations