DF1 - Py - Kalaidin - Introduction to Word Embeddings with Python

Introduction to word embeddings

Pavel Kalaidin@facultyofwonder

Moscow Data Fest, September, 12th, 2015

https://twitter.com/facultyofwonder

https://twitter.com/facultyofwonder

distributional hypothesis

лойс

годно, лойслойс за песню

из принципа не поставлю лойсвзаимные лойсы

лойс, если согласен

What is the meaning of лойс?

годно, лойслойс за песню

из принципа не поставлю лойсвзаимные лойсы

лойс, если согласен

What is the meaning of лойс?

кек

кек, что ли?кек)))))))ну ты кек

What is the meaning of кек?

кек, что ли?кек)))))))ну ты кек

What is the meaning of кек?

vectorial representations of words

simple and flexible platform for

understanding text and probably not messing up

one-hot encoding?

1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

co-occurrence matrix

recall: word-document co-occurrence matrix for LSA

credits: [x]

http://thoughtsondatavisualisation.blogspot.ru/2013/04/be-aware-of-limitations-on-human.html

from entire document to window (length 5-10)

still seems suboptimal -> big, sparse, etc.

lower dimensions, we want dense vectors

(say, 25-1000)

How?

matrix factorization?

SVD of co-occurrence matrix

lots of memory?

idea: directly learn low-dimensional vectors

here comes word2vec

Distributed Representations of Words and Phrases and their Compositionality, Mikolov et al: [paper]

http://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf

idea: instead of capturing co-occurrence counts

predict surrounding words

Two models:C-BOW

predicting the word given its context

skip-grampredicting the context given a word

Explained in great detail here, so we’ll skip it for now Also see: word2vec Parameter Learning Explained, Rong, paper

http://alexminnaar.com/word2vec-tutorial-part-i-the-skip-gram-model.html

http://www-personal.umich.edu/~ronxin/pdf/w2vexp.pdf

CBOW: several times faster than skip-gram, slightly better accuracy for the frequent wordsSkip-Gram: works well with small amount of

data, represents well rare words or phrases

Examples?

Wwoman- Wman= Wqueen- Wking

classic example

<censored example>

word2vec Explained: Deriving Mikolov et al.’s Negative-Sampling Word-Embedding Method, Goldberg et al, 2014 [arxiv]

http://arxiv.org/pdf/1402.3722.pdf

all done with gensim:github.com/piskvorky/gensim/

https://github.com/piskvorky/gensim/

https://github.com/piskvorky/gensim/

...failing to take advantage of the vast amount of repetition

in the data

so back to co-occurrences

GloVe for Global VectorsPennington et al, 2014: nlp.stanford.

edu/pubs/glove.pdf

http://nlp.stanford.edu/pubs/glove.pdf



Ratios seem to cancel noise

The gist: model ratios with vectors

The model

Preserving linearity

Preventing mixing dimensions

Restoring symmetry, part 1

recall:

Restoring symmetry, part 2

Least squares problem it is now

SGD->AdaGrad

ok, Python code

glove-python:github.com/maciejkula/glove-python

https://github.com/maciejkula/glove-python

https://github.com/maciejkula/glove-python

two sets of vectorsinput and context + bias

average/sum/drop

complexity |V|2

complexity |C|0.8

Evaluation: it works

#spb#gatchina#msk#kyiv#minsk#helsinki

Compared to word2vec

#spb#gatchina#msk#kyiv#minsk#helsinki

t-SNE:github.com/oreillymedia/t-SNE-tutorial

seaborn:stanford.edu/~mwaskom/software/seaborn/

https://github.com/oreillymedia/t-SNE-tutorial

https://github.com/oreillymedia/t-SNE-tutorial

http://stanford.edu/~mwaskom/software/seaborn/

http://stanford.edu/~mwaskom/software/seaborn/

Abusing models

music playlists:github.com/mattdennewitz/playlist-to-vec

https://github.com/mattdennewitz/playlist-to-vec

https://github.com/mattdennewitz/playlist-to-vec

deep walk:DeepWalk: Online Learning of Social

Representations [link]

http://arxiv.org/abs/1403.6652

user interestsParagraph vectors: cs.stanford.

edu/~quocle/paragraph_vector.pdf

http://cs.stanford.edu/~quocle/paragraph_vector.pdf



predicting hashtagsinteresting read: #TAGSPACE: Semantic

Embeddings from Hashtags [link]

#TAGSPACE: Semantic Embeddings from Hashtags

RusVectōrēs: distributional semantic models for Russian: ling.go.mail.ru/dsm/en/

http://ling.go.mail.ru/dsm/en/



corpus matters

building block forbigger models╰(*´︶`*)╯

</slides>

Science

DF1 - Py - Kalaidin - Introduction to Word Embeddings with Python