From Word Embeddings To Document Distances · From Word Embeddings To Document Distances Matt J....

From Word Embeddings To Document DistancesMatt J. Kusner1, Yu Sun1,2, Nicholas I. Kolkin1, Kilian Q. Weinberger1,2

1Department of Computer Science & Engineering, Washington University in St. Louis, USA2Department of Computer Science, Cornell University, USA

word embeddingRd

Illinois

President

greets

speaks

Chicago

ObamaspeaksIllinois

d1/41/2

PresidentgreetspressChicago

word embeddingRd

Illinois

President

greets

speaks

Chicago

word embeddingRd

Illinois

President

greets

speaks

Chicago

Word Embedding & Document Vectors

minT�0

Tijkxi � xjk2

Tij = di

Tij = d0j

WMD Optimization

The Word Mover's Distance

different from [Collobert & Weston, 2008], [Mnih & Hinton, 2009] word2vec is not deep!

word2vec[Mikolov et al., 2013]

3 million different words embeddedEmbedded Google News Corpus

trained on 100 billion words

embeds billions of words / hour with a desktop computer

word2vec model

How can we leverage this high quality word embedding to compute document distances?

learns complex word relationships:vec(Japan) - vec(sushi) + vec(Germany) ⇡ vec(bratwurst)vec(Einstein) - vec(scientist) + vec(Picasso) ⇡ vec(painter)

Tij'transport' between word i in d and word j in d'

xi word embedding for word i

dinormalized BOW (nBOW) weight for word i in first document

nBOW weight for word j in second document d0

n number of words in dictionary

in CV this is the Earth Mover's Distance (EMD) [Rubner et al., 1998]

k-Nearest Neighbor Results

‘campaign’

‘Obama’

‘speech’

‘Washington’

bag-of-words TF-IDF

LSI LDA

‘Oba

‘speech’

‘a’

‘Oba

‘speech’

‘a’

topic distributions

sports

politics

Civil War

politics topic

‘Obam

a’‘speech’‘W

ashington’

‘football’‘M

adonna’‘Vicksburg’‘guitar’

‘soccer’

[Salton & Buckley, 1988] [Deerwester et al., 1990] [Blei et al., 2003]

Prior Art

1 2 3 4 5 6 7 80

0.20.40.60.81

1.21.41.6

r w.r.

W 1.61.41.21.00.80.60.40.2

1.291.15

1.00.72

0.60 0.55 0.49 0.42

BOW TF-IDF Okapi BM25

LSI LDA mSDA CCG

Approximations

different word embeddingsaverage kNN error

HLBL: [Mnih & Hinton, 2009] CW: [Collobert & Weston, 2008]

W2V: [Mikolov et al., 2013]

word embeddingRd

Illinois

President

greets

speaks

Chicago

O(n3 log n)WCD(d,d0) , kXd�Xd0k2

Approximation Results

2. The Relaxed Word Mover's Distance

1. The Word Centroid Distance

RWMD(d,d0),max

⇤1,ijkxi � xjk2,

⇤2,ijkxi � xjk2

O(n2d)

minT�0

Tijkxi � xjk2

Tij = di

Tij = d0j

minT�0

Tijkxi � xjk2

Tij = di

Tij = d0j

minT�0

Tijkxi � xjk2

Tij = di

Tij = d0j

constraint polytope

kxi� x

jk28(i, j

kxi� x

jk28(i, j

kxi� x

jk28(i, j

dimensionality of word embeddings

matrix of all word embeddings (d,n)

0.20.40.60.81

1.21.21.0

RWMD WMDWCDRWMDRWMD

0.56 0.540.45 0.42

average kNN error w.r.t. BOW

1.21.0

RWMD WMDWCDRWMDRWMD

0.92 0.92

0.96 1.0

average distance w.r.t. WMD

c2 c1 T⇤2T⇤

Approximations

for a random test input

References1. Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., and Dean, J. Distributed representations of words and phrases and their compositionality. In NIPS, pp. 3111– 3119, 2013 2. Collobert, R. and Weston, J. A unified architecture for natural language processing: Deep neural networks with multitask learning. In ICML, pp. 160–167. ACM, 2008. 3. Mnih, A. and Hinton, G. E. A scalable hierarchical distributed language model. In NIPS, pp. 1081–1088, 2009. 4. Rubner, Y., Tomasi, C., and Guibas, L. J. A metric for distributions with applications to image databases. In ICCV, pp. 59–66. IEEE, 1998. 5. Salton, G. and Buckley, C. Term-weighting approaches in automatic text retrieval. Information processing & man- agement, 24(5):513–523, 1988. 6. Deerwester, S. C., Dumais, S. T., Landauer, T. K., Furnas, G. W., and Harshman, R. A. Indexing by latent semantic analysis. Journal of the American Society of Information Science, 41(6):391–407, 1990. 7. Blei, D. M., Ng, A. Y., and Jordan, M. I. Latent dirichlet allocation. Journal of Machine Learning Research, 3: 993–1022, 2003. 8. Gardner, J., Kusner, M. J., Xu, E., Weinberger, K. Q., and Cunningham, J. Bayesian optimization with inequality constraints. In ICML, pp. 937–945, 2014.

We tuned all hyperparameters for all methods with BO

[Gardner et al., 2014]

From Word Embeddings To Document Distances · From Word Embeddings To Document Distances Matt J....

Documents

Learning Word Embeddings from Tagging Data: A ...ceur-ws.org/Vol-1917/paper32.pdfWord Embeddings. The concept of word embeddings, i.e., word representations in low dimensional vector

From Word Embeddings To Document Distancesproceedings.mlr.press/v37/kusnerb15.pdf · From Word Embeddings To Document Distances vectors v w j and v w t (seeMikolov et al.(2013a) for

Dynamic Word Embeddings - arxiv.org

Dependency-Based Word Embeddings

Book Embeddings of Chessboard Graphs

Word Embeddings - w4nderlustw4nderlu.st/content/4-teaching/3-word-embeddings/... · Word Embeddings: hot trend in NLP ... Different languages use different ... Zellig Harris, Mathematical

Iterated elementary embeddings and the model … elementary embeddings and the model ... elementary embeddings can be found in ... similar requirement that the elementary submodel

From Word Embeddings To Document Distancesmkusner.github.io/posters/WMD_Poster.pdf · From Word Embeddings To Document Distances Matt J. Kusner1, Yu Sun1,2, Nicholas I. Kolkin1, Kilian

Convolutional Graph Embeddings - cs.ubc.ca

Distances to Astronomical Objects. Recap Distances in astronomy – Measuring distances directly with light travel time – Measuring distances geometrically

Support VectorMachines SequenceKernels Embeddings › sequence-learning › 09...Agenda Today •Support VectorMachines •SequenceKernels •Learning Feature Representations: Embeddings

This Talk - Stanford Universitysnap.stanford.edu/proj/embeddings-www/files/nrltutorial-part1-embeddings.pdfThis Talk § 1) Node embeddings § Map nodes to low-dimensional embeddings

Improving the Compositionality of Word Embeddings · 2020-05-08 · Improving the Compositionality of Word Embeddings We present an in-depth analysis of four popular word embeddings

A survey of linkless embeddings

Word2vec embeddings: CBOW and Skipgram · Word2vec embeddings: CBOW and Skipgram VL Embeddings Uni Heidelberg SS 2019. Skipgram { IntuitionGradient DescentStochastic Gradient DescentBackpropagation

Vector Semantics And Embeddings

Word Embeddings - cocoxu.github.io

Distances Mesurer les distances

geometric embeddings and graph expansion

Word embeddings for social goods