15
Running example Euclidean Length norm Cosine Other methods Distance metric? Generalizations Code snippets Distributed word representations: Vector comparison Christopher Potts Stanford Linguistics CS224u: Natural language understanding 1 / 11

Distributed word representations: Vector comparison

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Distributed word representations: Vector comparison

Running example Euclidean Length norm Cosine Other methods Distance metric? Generalizations Code snippets

Distributed word representations:Vector comparison

Christopher Potts

Stanford Linguistics

CS224u: Natural language understanding

1 / 11

Page 2: Distributed word representations: Vector comparison

Running example Euclidean Length norm Cosine Other methods Distance metric? Generalizations Code snippets

Running example

dx dy

A 2 4B 10 15C 14 10

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

0

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

(10,15)B

(2,4)A

(14,10)C

• Focus on distance measures• Illustrations with row vectors

2 / 11

Page 3: Distributed word representations: Vector comparison

Running example Euclidean Length norm Cosine Other methods Distance metric? Generalizations Code snippets

EuclideanBetween vectors u and v of dimension n:

euclidean(u,v) =

n∑

i=1

|ui − vi|2

dx dy

A 2 4B 10 15C 14 10

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

0

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

(10,15)B

(2,4)A

(14,10)C

10 − 142 + 15 − 102 = 6.4

2 − 102 + 4 − 152 = 13.6

3 / 11

Page 4: Distributed word representations: Vector comparison

Running example Euclidean Length norm Cosine Other methods Distance metric? Generalizations Code snippets

Length normalization

Given a vector u of dimension n, the L2-length of u is

||u||2 =

n∑

i=1

u2i

and the length normalization of u is�

u1

||u||2,

u2

||u||2, · · · ,

un

||u||2

4 / 11

Page 5: Distributed word representations: Vector comparison

Running example Euclidean Length norm Cosine Other methods Distance metric? Generalizations Code snippets

Length normalization

dx dy ||u||2A 2 4 4.47B 10 15 18.03C 14 10 17.20

row L2 norm⇒

dx dy

A 24.47

44.47

B 1018.03

1518.03

C 1417.20

1017.20

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

0

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

(10,15)B

(2,4)A

(14,10)C

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

(0.55,0.83)B

(0.45,0.89)A

(0.81,0.58)C

4 / 11

Page 6: Distributed word representations: Vector comparison

Running example Euclidean Length norm Cosine Other methods Distance metric? Generalizations Code snippets

Cosine distanceBetween vectors u and v of dimension n:

cosine(u,v) = 1−

∑ni=1 ui × vi

||u||2 × ||v||2

dx dy

A 2 4B 10 15C 14 10

5 / 11

Page 7: Distributed word representations: Vector comparison

Running example Euclidean Length norm Cosine Other methods Distance metric? Generalizations Code snippets

Cosine distanceBetween vectors u and v of dimension n:

cosine(u,v) = 1−

∑ni=1 ui × vi

||u||2 × ||v||2

dx dy

A 2 4B 10 15C 14 10

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

0

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

(10,15)B

(2,4)A

(14,10)C

1 −(10

× 14) + (15

× 10)

||10, 1

5||× ||14

, 10||= 0.

065

1−

(2× 1

0)+ (

4× 1

5)

||2, 4

|| ×||1

0, 1

5||=

0.00

8

5 / 11

Page 8: Distributed word representations: Vector comparison

Running example Euclidean Length norm Cosine Other methods Distance metric? Generalizations Code snippets

Cosine distanceBetween vectors u and v of dimension n:

cosine(u,v) = 1−

∑ni=1 ui × vi

||u||2 × ||v||2

dx dy

A 2 4B 10 15C 14 10

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

(0.55,0.83)B

(0.45,0.89)A

(0.81,0.58)C

1 −(0.

55× 0.

81) + (0.

83× 0.

58)

||0.55

, 0.8

1||× ||0.

81, 0

.58||= 0.

065

1−

(0.4

0.55

) +(0

.89

×0.

83)

||0.4

5, 0

.89||

×||0

.55,

0.8

3||=

0.00

8

5 / 11

Page 9: Distributed word representations: Vector comparison

Running example Euclidean Length norm Cosine Other methods Distance metric? Generalizations Code snippets

Matching-based methods

Matching coefficientmatching(u,v) =

n∑

i=1

min(ui,vi)

Jaccard distancejaccard(u,v) = 1−

matching(u,v)∑n

i=1 max(ui,vi)

Dice distancedice(u,v) = 1−

2×matching(u,v)∑n

i=1 ui + vi

Overlapoverlap(u,v) = 1−

matching(u,v)

min�∑n

i=1 ui ,∑n

i=1 vi�

6 / 11

Page 10: Distributed word representations: Vector comparison

Running example Euclidean Length norm Cosine Other methods Distance metric? Generalizations Code snippets

KL divergence and variantsKL divergenceBetween probability distributions p and q:

D(p ‖ q) =n∑

i=1

pi log�

pi

qi

p is the reference distribution. Before calculation, smooth byadding ε.

Symmetric KL

D(p ‖ q) +D(q ‖ p)

Jensen–Shannon distance√

√1

2D�

p ‖p+ q

2

+1

2D�

q ‖p+ q

2

7 / 11

Page 11: Distributed word representations: Vector comparison

Running example Euclidean Length norm Cosine Other methods Distance metric? Generalizations Code snippets

Proper distance metric?To qualify as a distance metric, a vector comparison methodd has to be symmetric (d(x,y) = d(y,x)), assign 0 to identicalvectors (d(x,x) = 0), and satisfy the triangle inequality:

d(x, z) ¶ d(x,y) + d(y, z)

Cosine distance as I defined itdoesn’t satisfy this:

Distance metric?

Yes: Euclidean, Jaccard forbinary vectors, Jensen–Shannon,cosine as

cos−1�∑n

i=1ui × vi

||u||2×||v||2

π

No: Matching, Jaccard, Dice,Overlap, KL divergence,Symmetric KL

8 / 11

Page 12: Distributed word representations: Vector comparison

Running example Euclidean Length norm Cosine Other methods Distance metric? Generalizations Code snippets

Comparing the two versions of cosineRandom sample of 100 vectors from our giga20 countmatrix. Correlation is 99.8.

0.2 0.3 0.4 0.5Improper cosine distance

0.175

0.200

0.225

0.250

0.275

0.300

0.325

0.350

Prop

er c

osin

e di

stan

ce

9 / 11

Page 13: Distributed word representations: Vector comparison

Running example Euclidean Length norm Cosine Other methods Distance metric? Generalizations Code snippets

Relationships and generalizations

1. Euclidean, Jaccard, and Dice with raw count vectors willtend to favor raw frequency over distributional patterns.

2. Euclidean with L2-normed vectors is equivalent to cosinew.r.t. ranking.

3. Jaccard and Dice are equivalent w.r.t. ranking.

4. Both L2-norms and probability distributions can obscuredifferences in the amount/strength of evidence, whichcan in turn have an effect on the reliability of cosine,normed-euclidean, and KL divergence. Theseshortcomings might be addressed through weightingschemes.

10 / 11

Page 14: Distributed word representations: Vector comparison

Running example Euclidean Length norm Cosine Other methods Distance metric? Generalizations Code snippets

Code snippetspbKn+Q/2knbQHp2/

J�`+? kj- kykR

(R), BKTQ`i QbBKTQ`i T�M/�b �b T/BKTQ`i pbK

(k), �"* 4 T/X.�i�6`�K2U(( kXy- 9Xy)-(RyXy- R8Xy)-(R9Xy- RyXy))- BM/2t4(^�^- ^"^- ^*^)- +QHmKMb4(^t^- ^v^)V

(j), pbKX2m+HB/2�MU�"*XHQ+(^�^)- �"*XHQ+(^"^)V

(j), RjXeyR9dy8y3dj8999

(9), pbKXp2+iQ`nH2M;i?U�"*XHQ+(^�^)V

(9), 9X9dkRj8N89NNN83

(8), pbKXH2M;i?nMQ`KU�"*XHQ+(^�^)VXp�Hm2b

(8), �``�vU(yX99dkRje - yX3N99kdRN)V

(e), pbKX+QbBM2U�"*XHQ+(^�^)- �"*XHQ+(^"^)V

(e), yXyyddkkRkjk3ejjkkeR

(d), pbKXK�i+?BM;U�"*XHQ+(^�^)- �"*XHQ+(^"^)V

(d), eXy

(3), pbKXD�++�`/U�"*XHQ+(^�^)- �"*XHQ+(^"^)V

(3), yXde

(N), .�h�n>PJ1 4 QbXT�i?XDQBMU^/�i�^- ^pbK/�i�^V

v2HT8 4 T/X`2�/n+bpUQbXT�i?XDQBMU.�h�n>PJ1- ^v2HTnrBM/Qr8@b+�H2/X+bpX;x^V- BM/2tn+QH4yV

R

11 / 11

Page 15: Distributed word representations: Vector comparison

Running example Euclidean Length norm Cosine Other methods Distance metric? Generalizations Code snippets

Code snippets( ),

(N), .�h�n>PJ1 4 QbXT�i?XDQBMU^/�i�^- ^pbK/�i�^V

v2HT8 4 T/X`2�/n+bpUQbXT�i?XDQBMU.�h�n>PJ1- ^v2HTnrBM/Qr8@b+�H2/X+bpX;x^V- BM/2tn+QH4yV

(Ry), pbKX+QbBM2Uv2HT8XHQ+(^;QQ/^)- v2HT8XHQ+(^2t+2HH2Mi^)V

(Ry), yXRRNd9kR89jdyy98R

(RR), pbKX+QbBM2Uv2HT8XHQ+(^;QQ/^)- v2HT8XHQ+(^#�/^)V

(RR), yXR9RR3k8jyjj3333Rd

(Rk), pbKXM2B;?#Q`bU^#�/^- v2HT8VX?2�/UV

(Rk), #�/ yXyyyyyymM7Q`imM�i2Hv yXRReR3jK2KQ`�#H2 yXRkyRdNē yXRkkyk9Q#pBQmbHv yXRkjRky/ivT2, 7HQ�ie9

(Rj), pbKXM2B;?#Q`bU^#�/^- v2HT8- /Bbi7mM+4pbKXD�++�`/VX?2�/UjV

(Rj), #�/ yXyyyyyyXX yX98k9kdi?Qm;? yX939keN/ivT2, 7HQ�ie9

k

11 / 11