74
Introduction to word embeddings Pavel Kalaidin @facultyofwonder Moscow Data Fest, September, 12th, 2015

DF1 - Py - Kalaidin - Introduction to Word Embeddings with Python

Embed Size (px)

Citation preview

Page 1: DF1 - Py - Kalaidin - Introduction to Word Embeddings with Python

Introduction to word embeddings

Pavel Kalaidin@facultyofwonder

Moscow Data Fest, September, 12th, 2015

Page 2: DF1 - Py - Kalaidin - Introduction to Word Embeddings with Python
Page 3: DF1 - Py - Kalaidin - Introduction to Word Embeddings with Python
Page 4: DF1 - Py - Kalaidin - Introduction to Word Embeddings with Python

distributional hypothesis

Page 5: DF1 - Py - Kalaidin - Introduction to Word Embeddings with Python

лойс

Page 6: DF1 - Py - Kalaidin - Introduction to Word Embeddings with Python

годно, лойслойс за песню

из принципа не поставлю лойсвзаимные лойсы

лойс, если согласен

What is the meaning of лойс?

Page 7: DF1 - Py - Kalaidin - Introduction to Word Embeddings with Python

годно, лойслойс за песню

из принципа не поставлю лойсвзаимные лойсы

лойс, если согласен

What is the meaning of лойс?

Page 8: DF1 - Py - Kalaidin - Introduction to Word Embeddings with Python

кек

Page 9: DF1 - Py - Kalaidin - Introduction to Word Embeddings with Python

кек, что ли?кек)))))))ну ты кек

What is the meaning of кек?

Page 10: DF1 - Py - Kalaidin - Introduction to Word Embeddings with Python

кек, что ли?кек)))))))ну ты кек

What is the meaning of кек?

Page 11: DF1 - Py - Kalaidin - Introduction to Word Embeddings with Python

vectorial representations of words

Page 12: DF1 - Py - Kalaidin - Introduction to Word Embeddings with Python

simple and flexible platform for

understanding text and probably not messing up

Page 13: DF1 - Py - Kalaidin - Introduction to Word Embeddings with Python

one-hot encoding?

1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Page 14: DF1 - Py - Kalaidin - Introduction to Word Embeddings with Python

co-occurrence matrix

recall: word-document co-occurrence matrix for LSA

Page 16: DF1 - Py - Kalaidin - Introduction to Word Embeddings with Python

from entire document to window (length 5-10)

Page 17: DF1 - Py - Kalaidin - Introduction to Word Embeddings with Python

still seems suboptimal -> big, sparse, etc.

Page 18: DF1 - Py - Kalaidin - Introduction to Word Embeddings with Python

lower dimensions, we want dense vectors

(say, 25-1000)

Page 19: DF1 - Py - Kalaidin - Introduction to Word Embeddings with Python

How?

Page 20: DF1 - Py - Kalaidin - Introduction to Word Embeddings with Python

matrix factorization?

Page 21: DF1 - Py - Kalaidin - Introduction to Word Embeddings with Python

SVD of co-occurrence matrix

Page 22: DF1 - Py - Kalaidin - Introduction to Word Embeddings with Python

lots of memory?

Page 23: DF1 - Py - Kalaidin - Introduction to Word Embeddings with Python

idea: directly learn low-dimensional vectors

Page 24: DF1 - Py - Kalaidin - Introduction to Word Embeddings with Python

here comes word2vec

Distributed Representations of Words and Phrases and their Compositionality, Mikolov et al: [paper]

Page 25: DF1 - Py - Kalaidin - Introduction to Word Embeddings with Python

idea: instead of capturing co-occurrence counts

predict surrounding words

Page 26: DF1 - Py - Kalaidin - Introduction to Word Embeddings with Python

Two models:C-BOW

predicting the word given its context

skip-grampredicting the context given a word

Explained in great detail here, so we’ll skip it for now Also see: word2vec Parameter Learning Explained, Rong, paper

Page 27: DF1 - Py - Kalaidin - Introduction to Word Embeddings with Python
Page 28: DF1 - Py - Kalaidin - Introduction to Word Embeddings with Python

CBOW: several times faster than skip-gram, slightly better accuracy for the frequent wordsSkip-Gram: works well with small amount of

data, represents well rare words or phrases

Page 29: DF1 - Py - Kalaidin - Introduction to Word Embeddings with Python

Examples?

Page 30: DF1 - Py - Kalaidin - Introduction to Word Embeddings with Python
Page 31: DF1 - Py - Kalaidin - Introduction to Word Embeddings with Python
Page 32: DF1 - Py - Kalaidin - Introduction to Word Embeddings with Python
Page 33: DF1 - Py - Kalaidin - Introduction to Word Embeddings with Python
Page 34: DF1 - Py - Kalaidin - Introduction to Word Embeddings with Python
Page 35: DF1 - Py - Kalaidin - Introduction to Word Embeddings with Python
Page 36: DF1 - Py - Kalaidin - Introduction to Word Embeddings with Python

Wwoman- Wman= Wqueen- Wking

classic example

Page 37: DF1 - Py - Kalaidin - Introduction to Word Embeddings with Python

<censored example>

Page 38: DF1 - Py - Kalaidin - Introduction to Word Embeddings with Python

word2vec Explained: Deriving Mikolov et al.’s Negative-Sampling Word-Embedding Method, Goldberg et al, 2014 [arxiv]

Page 39: DF1 - Py - Kalaidin - Introduction to Word Embeddings with Python

all done with gensim:github.com/piskvorky/gensim/

Page 40: DF1 - Py - Kalaidin - Introduction to Word Embeddings with Python

...failing to take advantage of the vast amount of repetition

in the data

Page 41: DF1 - Py - Kalaidin - Introduction to Word Embeddings with Python

so back to co-occurrences

Page 42: DF1 - Py - Kalaidin - Introduction to Word Embeddings with Python

GloVe for Global VectorsPennington et al, 2014: nlp.stanford.

edu/pubs/glove.pdf

Page 43: DF1 - Py - Kalaidin - Introduction to Word Embeddings with Python

Ratios seem to cancel noise

Page 44: DF1 - Py - Kalaidin - Introduction to Word Embeddings with Python

The gist: model ratios with vectors

Page 45: DF1 - Py - Kalaidin - Introduction to Word Embeddings with Python

The model

Page 46: DF1 - Py - Kalaidin - Introduction to Word Embeddings with Python

Preserving linearity

Page 47: DF1 - Py - Kalaidin - Introduction to Word Embeddings with Python

Preventing mixing dimensions

Page 48: DF1 - Py - Kalaidin - Introduction to Word Embeddings with Python

Restoring symmetry, part 1

Page 49: DF1 - Py - Kalaidin - Introduction to Word Embeddings with Python

recall:

Page 50: DF1 - Py - Kalaidin - Introduction to Word Embeddings with Python
Page 51: DF1 - Py - Kalaidin - Introduction to Word Embeddings with Python

Restoring symmetry, part 2

Page 52: DF1 - Py - Kalaidin - Introduction to Word Embeddings with Python

Least squares problem it is now

Page 53: DF1 - Py - Kalaidin - Introduction to Word Embeddings with Python

SGD->AdaGrad

Page 54: DF1 - Py - Kalaidin - Introduction to Word Embeddings with Python

ok, Python code

Page 55: DF1 - Py - Kalaidin - Introduction to Word Embeddings with Python

glove-python:github.com/maciejkula/glove-python

Page 56: DF1 - Py - Kalaidin - Introduction to Word Embeddings with Python

two sets of vectorsinput and context + bias

average/sum/drop

Page 57: DF1 - Py - Kalaidin - Introduction to Word Embeddings with Python

complexity |V|2

Page 58: DF1 - Py - Kalaidin - Introduction to Word Embeddings with Python

complexity |C|0.8

Page 59: DF1 - Py - Kalaidin - Introduction to Word Embeddings with Python

Evaluation: it works

Page 60: DF1 - Py - Kalaidin - Introduction to Word Embeddings with Python

#spb#gatchina#msk#kyiv#minsk#helsinki

Page 61: DF1 - Py - Kalaidin - Introduction to Word Embeddings with Python

Compared to word2vec

Page 62: DF1 - Py - Kalaidin - Introduction to Word Embeddings with Python

#spb#gatchina#msk#kyiv#minsk#helsinki

Page 63: DF1 - Py - Kalaidin - Introduction to Word Embeddings with Python
Page 65: DF1 - Py - Kalaidin - Introduction to Word Embeddings with Python

Abusing models

Page 66: DF1 - Py - Kalaidin - Introduction to Word Embeddings with Python

music playlists:github.com/mattdennewitz/playlist-to-vec

Page 67: DF1 - Py - Kalaidin - Introduction to Word Embeddings with Python

deep walk:DeepWalk: Online Learning of Social

Representations [link]

Page 69: DF1 - Py - Kalaidin - Introduction to Word Embeddings with Python

predicting hashtagsinteresting read: #TAGSPACE: Semantic

Embeddings from Hashtags [link]

Page 70: DF1 - Py - Kalaidin - Introduction to Word Embeddings with Python

RusVectōrēs: distributional semantic models for Russian: ling.go.mail.ru/dsm/en/

Page 71: DF1 - Py - Kalaidin - Introduction to Word Embeddings with Python
Page 72: DF1 - Py - Kalaidin - Introduction to Word Embeddings with Python

corpus matters

Page 73: DF1 - Py - Kalaidin - Introduction to Word Embeddings with Python

building block forbigger models╰(*´︶`*)╯

Page 74: DF1 - Py - Kalaidin - Introduction to Word Embeddings with Python

</slides>