29
Word Meaning and Similarity Word Similarity: Distributional Similarity (I)

Word Similarity: Distributional Similarity (I)faculty.cse.tamu.edu/huangrh/Fall17/l13_2_sparse_vectors.pdf · Intuition of distributional word similarity • Nidaexample: A bottle

  • Upload
    others

  • View
    13

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Word Similarity: Distributional Similarity (I)faculty.cse.tamu.edu/huangrh/Fall17/l13_2_sparse_vectors.pdf · Intuition of distributional word similarity • Nidaexample: A bottle

WordMeaningandSimilarity

WordSimilarity:DistributionalSimilarity(I)

Page 2: Word Similarity: Distributional Similarity (I)faculty.cse.tamu.edu/huangrh/Fall17/l13_2_sparse_vectors.pdf · Intuition of distributional word similarity • Nidaexample: A bottle

Problemswiththesaurus-basedmeaning

• Wedon’thaveathesaurusforeverylanguage• Evenifwedo,theyhaveproblemswithrecall

• Manywordsaremissing• Most(ifnotall)phrasesaremissing• Someconnectionsbetweensensesaremissing• Thesauriworklesswellforverbs,adjectives• Adjectivesandverbshavelessstructuredhyponymyrelations

Page 3: Word Similarity: Distributional Similarity (I)faculty.cse.tamu.edu/huangrh/Fall17/l13_2_sparse_vectors.pdf · Intuition of distributional word similarity • Nidaexample: A bottle

Distributionalmodelsofmeaning

• Alsocalledvector-spacemodelsofmeaning• Offermuchhigherrecallthanhand-builtthesauri

• Althoughtheytendtohavelowerprecision

• Zellig Harris(1954):“oculistandeye-doctor…occurinalmostthesameenvironments….IfAandBhavealmostidenticalenvironmentswesaythattheyaresynonyms.

• Firth(1957):“Youshallknowawordbythecompanyitkeeps!”3

• Alsocalledvector-spacemodelsofmeaning• Offermuchhigherrecallthanhand-builtthesauri

• Althoughtheytendtohavelowerprecision

Page 4: Word Similarity: Distributional Similarity (I)faculty.cse.tamu.edu/huangrh/Fall17/l13_2_sparse_vectors.pdf · Intuition of distributional word similarity • Nidaexample: A bottle

Intuitionofdistributionalwordsimilarity

• Nida example:A bottle of tesgüino is on the tableEverybody likes tesgüinoTesgüino makes you drunkWe make tesgüino out of corn.

• Fromcontextwordshumanscanguesstesgüinomeans• analcoholicbeveragelikebeer

• Intuitionforalgorithm:• Twowordsaresimilariftheyhavesimilarwordcontexts.

Page 5: Word Similarity: Distributional Similarity (I)faculty.cse.tamu.edu/huangrh/Fall17/l13_2_sparse_vectors.pdf · Intuition of distributional word similarity • Nidaexample: A bottle

As#You#Like#It Twelfth#Night Julius#Caesar Henry#Vbattle 1 1 8 15soldier 2 2 12 36fool 37 58 1 5clown 6 117 0 0

Reminder:Term-documentmatrix

• Eachcell:countoftermt inadocumentd:tft,d:• Eachdocumentisacountvectorinℕv:acolumnbelow

5

Page 6: Word Similarity: Distributional Similarity (I)faculty.cse.tamu.edu/huangrh/Fall17/l13_2_sparse_vectors.pdf · Intuition of distributional word similarity • Nidaexample: A bottle

Reminder:Term-documentmatrix

• Twodocumentsaresimilariftheirvectorsaresimilar

6

As#You#Like#It Twelfth#Night Julius#Caesar Henry#Vbattle 1 1 8 15soldier 2 2 12 36fool 37 58 1 5clown 6 117 0 0

Page 7: Word Similarity: Distributional Similarity (I)faculty.cse.tamu.edu/huangrh/Fall17/l13_2_sparse_vectors.pdf · Intuition of distributional word similarity • Nidaexample: A bottle

Thewordsinaterm-documentmatrix

• EachwordisacountvectorinℕD:arowbelow

7

As#You#Like#It Twelfth#Night Julius#Caesar Henry#Vbattle 1 1 8 15soldier 2 2 12 36fool 37 58 1 5clown 6 117 0 0

Page 8: Word Similarity: Distributional Similarity (I)faculty.cse.tamu.edu/huangrh/Fall17/l13_2_sparse_vectors.pdf · Intuition of distributional word similarity • Nidaexample: A bottle

Thewordsinaterm-documentmatrix

• Twowords aresimilariftheirvectorsaresimilar

8

As#You#Like#It Twelfth#Night Julius#Caesar Henry#Vbattle 1 1 8 15soldier 2 2 12 36fool 37 58 1 5clown 6 117 0 0

Page 9: Word Similarity: Distributional Similarity (I)faculty.cse.tamu.edu/huangrh/Fall17/l13_2_sparse_vectors.pdf · Intuition of distributional word similarity • Nidaexample: A bottle

TheTerm-Contextmatrix

• Insteadofusingentiredocuments,usesmallercontexts• Paragraph• Windowof10words

• Awordisnowdefinedbyavectorovercountsofcontextwords

9

Page 10: Word Similarity: Distributional Similarity (I)faculty.cse.tamu.edu/huangrh/Fall17/l13_2_sparse_vectors.pdf · Intuition of distributional word similarity • Nidaexample: A bottle

Samplecontexts:20words(Browncorpus)• equalamountofsugar,aslicedlemon,atablespoonfulofapricot

preserveorjam,apincheachofcloveandnutmeg,• onboardfortheirenjoyment.Cautiouslyshesampledherfirst

pineapple andanotherfruitwhosetasteshelikenedtothatof

10

• ofarecursivetypewellsuitedtoprogrammingonthedigital computer.InfindingtheoptimalR-stagepolicyfromthatof

• substantiallyaffectcommerce,forthepurposeofgatheringdataandinformation necessaryforthestudyauthorizedinthefirstsectionofthis

Page 11: Word Similarity: Distributional Similarity (I)faculty.cse.tamu.edu/huangrh/Fall17/l13_2_sparse_vectors.pdf · Intuition of distributional word similarity • Nidaexample: A bottle

Term-contextmatrixforwordsimilarity

• Twowords aresimilarinmeaningiftheircontextvectorsaresimilar

11

aardvark computer data pinch result sugar …apricot 0 0 0 1 0 1pineapple 0 0 0 1 0 1digital 0 2 1 0 1 0information 0 1 6 0 4 0

Page 12: Word Similarity: Distributional Similarity (I)faculty.cse.tamu.edu/huangrh/Fall17/l13_2_sparse_vectors.pdf · Intuition of distributional word similarity • Nidaexample: A bottle

Shouldweuserawcounts?

• Fortheterm-documentmatrix• Weused tf-idf insteadofrawtermcounts

• Fortheterm-contextmatrix• PositivePointwise MutualInformation(PPMI)iscommon

12

Page 13: Word Similarity: Distributional Similarity (I)faculty.cse.tamu.edu/huangrh/Fall17/l13_2_sparse_vectors.pdf · Intuition of distributional word similarity • Nidaexample: A bottle

Pointwise MutualInformation• Pointwise mutualinformation:

• Doeventsxandyco-occurmorethaniftheywereindependent?

• PMIbetweentwowords:(Church&Hanks1989)• Dowordsxandyco-occurmorethaniftheywereindependent?

• PositivePMIbetweentwowords(Niwa &Nitta1994)• ReplaceallPMIvalueslessthan0withzero

PMI(X,Y ) = log2P(x,y)P(x)P(y)

PMI(word1,word2 ) = log2P(word1,word2)P(word1)P(word2)

Page 14: Word Similarity: Distributional Similarity (I)faculty.cse.tamu.edu/huangrh/Fall17/l13_2_sparse_vectors.pdf · Intuition of distributional word similarity • Nidaexample: A bottle

ComputingPPMIonaterm-contextmatrix

• MatrixF withW rows(words)andC columns(contexts)• fij is#oftimeswi occursincontextcj

14

pij =fij

fijj=1

C

∑i=1

W

∑pi* =

fijj=1

C

fijj=1

C

∑i=1

W

∑p* j =

fiji=1

W

fijj=1

C

∑i=1

W

pmiij = log2pij

pi*p* jppmiij =

pmiij if pmiij > 0

0 otherwise

!"#

$#

Page 15: Word Similarity: Distributional Similarity (I)faculty.cse.tamu.edu/huangrh/Fall17/l13_2_sparse_vectors.pdf · Intuition of distributional word similarity • Nidaexample: A bottle

p(w=information,c=data)=p(w=information)=p(c=data)=

15

p(w,context) p(w)computer data pinch result sugar

apricot 0.00 0.00 0.05 0.00 0.05 0.11pineapple 0.00 0.00 0.05 0.00 0.05 0.11digital 0.11 0.05 0.00 0.05 0.00 0.21information 0.05 0.32 0.00 0.21 0.00 0.58

p(context) 0.16 0.37 0.11 0.26 0.11

=.326/1911/19 =.58

7/19 =.37

pij =fij

fijj=1

C

∑i=1

W

p(wi ) =fij

j=1

C

Np(cj ) =

fiji=1

W

N

Page 16: Word Similarity: Distributional Similarity (I)faculty.cse.tamu.edu/huangrh/Fall17/l13_2_sparse_vectors.pdf · Intuition of distributional word similarity • Nidaexample: A bottle

16

pmiij = log2pij

pi*p* j

• pmi(information,data)=log2 (

p(w,context) p(w)computer data pinch result sugar

apricot 0.00 0.00 0.05 0.00 0.05 0.11pineapple 0.00 0.00 0.05 0.00 0.05 0.11digital 0.11 0.05 0.00 0.05 0.00 0.21information 0.05 0.32 0.00 0.21 0.00 0.58

p(context) 0.16 0.37 0.11 0.26 0.11

PPMI(w,context)computer data pinch result sugar

apricot 1 1 2.25 1 2.25pineapple 1 1 2.25 1 2.25digital 1.66 0.00 1 0.00 1information 0.00 0.57 1 0.47 1

.32/ (.37*.58)) =.57

Page 17: Word Similarity: Distributional Similarity (I)faculty.cse.tamu.edu/huangrh/Fall17/l13_2_sparse_vectors.pdf · Intuition of distributional word similarity • Nidaexample: A bottle

WeighingPMI

• PMIisbiasedtowardinfrequentevents• Variousweightingschemeshelpalleviatethis

• SeeTurney andPantel (2010)

• Add-onesmoothingcanalsohelp

17

Page 18: Word Similarity: Distributional Similarity (I)faculty.cse.tamu.edu/huangrh/Fall17/l13_2_sparse_vectors.pdf · Intuition of distributional word similarity • Nidaexample: A bottle

18

Add#2%Smoothed%Count(w,context)computer data pinch result sugar

apricot 2 2 3 2 3pineapple 2 2 3 2 3digital 4 3 2 3 2information 3 8 2 6 2

p(w,context),[add02] p(w)computer data pinch result sugar

apricot 0.03 0.03 0.05 0.03 0.05 0.20pineapple 0.03 0.03 0.05 0.03 0.05 0.20digital 0.07 0.05 0.03 0.05 0.03 0.24information 0.05 0.14 0.03 0.10 0.03 0.36

p(context) 0.19 0.25 0.17 0.22 0.17

Page 19: Word Similarity: Distributional Similarity (I)faculty.cse.tamu.edu/huangrh/Fall17/l13_2_sparse_vectors.pdf · Intuition of distributional word similarity • Nidaexample: A bottle

19

PPMI(w,context).[add22]computer data pinch result sugar

apricot 0.00 0.00 0.56 0.00 0.56pineapple 0.00 0.00 0.56 0.00 0.56digital 0.62 0.00 0.00 0.00 0.00information 0.00 0.58 0.00 0.37 0.00

PPMI(w,context)computer data pinch result sugar

apricot 1 1 2.25 1 2.25pineapple 1 1 2.25 1 2.25digital 1.66 0.00 1 0.00 1information 0.00 0.57 1 0.47 1

Page 20: Word Similarity: Distributional Similarity (I)faculty.cse.tamu.edu/huangrh/Fall17/l13_2_sparse_vectors.pdf · Intuition of distributional word similarity • Nidaexample: A bottle

Usingsyntaxtodefineaword’scontext• Zellig Harris(1968)

• “Themeaningofentities,andthemeaningofgrammaticalrelationsamongthem,isrelatedtotherestrictionofcombinationsoftheseentitiesrelativetootherentities”

• Twowordsaresimilariftheyhavesimilarparsecontexts• Duty andresponsibility (ChrisCallison-Burch’sexample)

Modifiedbyadjectives

additional,administrative,assumed,collective,congressional,constitutional…

Objectsofverbs assert,assign,assume,attendto,avoid,become,breach…

Page 21: Word Similarity: Distributional Similarity (I)faculty.cse.tamu.edu/huangrh/Fall17/l13_2_sparse_vectors.pdf · Intuition of distributional word similarity • Nidaexample: A bottle

Co-occurrencevectorsbasedonsyntacticdependencies

• ThecontextsCaredifferentdependencyrelations• Subject-of- “absorb”• Prepositional-objectof“inside”

• Countsforthewordcell:

Dekang Lin,1998“AutomaticRetrievalandClusteringofSimilarWords”

Page 22: Word Similarity: Distributional Similarity (I)faculty.cse.tamu.edu/huangrh/Fall17/l13_2_sparse_vectors.pdf · Intuition of distributional word similarity • Nidaexample: A bottle

PMIappliedtodependencyrelations

• “Drink it” morecommonthan“drink wine”• But“wine”isabetter“drinkable”thingthan“it”

Objectof“drink” Count PMI

it 3 1.3

anything 3 5.2

wine 2 9.3

tea 2 11.8

liquid 2 10.5

Hindle, Don. 1990. Noun Classification from Predicate-Argument Structure. ACL

Objectof“drink” Count PMItea 2 11.8

liquid 2 10.5

wine 2 9.3

anything 3 5.2

it 3 1.3

Page 23: Word Similarity: Distributional Similarity (I)faculty.cse.tamu.edu/huangrh/Fall17/l13_2_sparse_vectors.pdf · Intuition of distributional word similarity • Nidaexample: A bottle

WordMeaningandSimilarity

WordSimilarity:DistributionalSimilarity(I)

Page 24: Word Similarity: Distributional Similarity (I)faculty.cse.tamu.edu/huangrh/Fall17/l13_2_sparse_vectors.pdf · Intuition of distributional word similarity • Nidaexample: A bottle

WordMeaningandSimilarity

WordSimilarity:DistributionalSimilarity(II)

Page 25: Word Similarity: Distributional Similarity (I)faculty.cse.tamu.edu/huangrh/Fall17/l13_2_sparse_vectors.pdf · Intuition of distributional word similarity • Nidaexample: A bottle

Reminder:cosineforcomputingsimilarity

cos(v, w) =v • wv w

=vv•ww=

viwii=1

N∑vi2

i=1

N∑ wi

2i=1

N∑

Dot product Unit vectors

vi isthePPMIvalueforwordv incontextiwi isthePPMIvalueforwordw incontexti.

Cos(v,w)isthecosinesimilarityofv andw

Sec. 6.3

Page 26: Word Similarity: Distributional Similarity (I)faculty.cse.tamu.edu/huangrh/Fall17/l13_2_sparse_vectors.pdf · Intuition of distributional word similarity • Nidaexample: A bottle

Cosineasasimilaritymetric

• -1:vectorspointinoppositedirections• +1:vectorspointinsamedirections• 0:vectorsareorthogonal

• RawfrequencyorPPMIarenon-negative,socosinerange0-1

26

Page 27: Word Similarity: Distributional Similarity (I)faculty.cse.tamu.edu/huangrh/Fall17/l13_2_sparse_vectors.pdf · Intuition of distributional word similarity • Nidaexample: A bottle

large data computerapricot 1 0 0digital 0 1 2information 1 6 1

27

Whichpairofwordsismoresimilar?cosine(apricot,information)=

cosine(digital,information)=

cosine(apricot,digital)=

cos(v, w) =v • wv w

=vv•ww=

viwii=1

N∑vi2

i=1

N∑ wi

2i=1

N∑

1+ 0+ 0

1+ 0+ 0

1+36+1

1+36+1

0+1+ 4

0+1+ 4

1+ 0+ 0

0+ 6+ 2

0+ 0+ 0

=138

= .16

=838 5

= .58

= 0

Page 28: Word Similarity: Distributional Similarity (I)faculty.cse.tamu.edu/huangrh/Fall17/l13_2_sparse_vectors.pdf · Intuition of distributional word similarity • Nidaexample: A bottle

Otherpossiblesimilaritymeasures

D:KLDivergence

Page 29: Word Similarity: Distributional Similarity (I)faculty.cse.tamu.edu/huangrh/Fall17/l13_2_sparse_vectors.pdf · Intuition of distributional word similarity • Nidaexample: A bottle

WordMeaningandSimilarity

WordSimilarity:DistributionalSimilarity(II)