61
Vector Semantics Dense Vectors

Dense Vectors - Seoul National Universityling.snu.ac.kr/class/AI_Agent/lecture/vector2.pdf · 2018. 6. 15. · Dan Jurafsky. Sparse versus dense vectors • Why dense vectors? •

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Dense Vectors - Seoul National Universityling.snu.ac.kr/class/AI_Agent/lecture/vector2.pdf · 2018. 6. 15. · Dan Jurafsky. Sparse versus dense vectors • Why dense vectors? •

Vector Semantics

Dense Vectors

Page 2: Dense Vectors - Seoul National Universityling.snu.ac.kr/class/AI_Agent/lecture/vector2.pdf · 2018. 6. 15. · Dan Jurafsky. Sparse versus dense vectors • Why dense vectors? •

Dan Jurafsky

Sparse versus dense vectors

• PPMI vectors are• long (length |V|= 20,000 to 50,000)• sparse (most elements are zero)

• Alternative: learn vectors which are• short (length 200-1000)• dense (most elements are non-zero)

2

Page 3: Dense Vectors - Seoul National Universityling.snu.ac.kr/class/AI_Agent/lecture/vector2.pdf · 2018. 6. 15. · Dan Jurafsky. Sparse versus dense vectors • Why dense vectors? •

Dan Jurafsky

Sparse versus dense vectors

• Why dense vectors?• Short vectors may be easier to use as features in machine

learning (less weights to tune)• Dense vectors may generalize better than storing explicit counts• They may do better at capturing synonymy:

• car and automobile are synonyms; but are represented as distinct dimensions; this fails to capture similarity between a word with car as a neighbor and a word with automobile as a neighbor

3

Page 4: Dense Vectors - Seoul National Universityling.snu.ac.kr/class/AI_Agent/lecture/vector2.pdf · 2018. 6. 15. · Dan Jurafsky. Sparse versus dense vectors • Why dense vectors? •

Dan Jurafsky

Three methods for getting short dense vectors

• Singular Value Decomposition (SVD)• A special case of this is called LSA – Latent Semantic Analysis

• “Neural Language Model”-inspired predictive models• skip-grams and CBOW

• Brown clustering

4

Page 5: Dense Vectors - Seoul National Universityling.snu.ac.kr/class/AI_Agent/lecture/vector2.pdf · 2018. 6. 15. · Dan Jurafsky. Sparse versus dense vectors • Why dense vectors? •

Vector Semantics

Dense Vectors via SVD

Page 6: Dense Vectors - Seoul National Universityling.snu.ac.kr/class/AI_Agent/lecture/vector2.pdf · 2018. 6. 15. · Dan Jurafsky. Sparse versus dense vectors • Why dense vectors? •

Introduction to Vector

• 선형대수학(linear algebra)은벡터공간(vector space), 벡터공간사이의선형변환(linear mapping)을다루는학문이다.

• 선형대수학은데이터의표현, 분석및변환에서많이활용되고있으며, 수학적기술의편리함때문에많이사용되고있다.

• 선형대수학적기법은많은공학, 과학분야의기반도구로활용되고있다. 6

Page 7: Dense Vectors - Seoul National Universityling.snu.ac.kr/class/AI_Agent/lecture/vector2.pdf · 2018. 6. 15. · Dan Jurafsky. Sparse versus dense vectors • Why dense vectors? •

Introduction to Vector• 선형방정식 (linear equation) 변수(미지수)간의관계가

1차방정식으로표현

• 선형변환 (linear mapping, linear transformation) • 벡터공간간의변환이선형방정식으로표현

Page 8: Dense Vectors - Seoul National Universityling.snu.ac.kr/class/AI_Agent/lecture/vector2.pdf · 2018. 6. 15. · Dan Jurafsky. Sparse versus dense vectors • Why dense vectors? •

Introduction to Vector

• 행렬行列• matrix /matrices • 수, 기호, 수식을행(row)과열(column)이있는사각형배열에나타낸것

Page 9: Dense Vectors - Seoul National Universityling.snu.ac.kr/class/AI_Agent/lecture/vector2.pdf · 2018. 6. 15. · Dan Jurafsky. Sparse versus dense vectors • Why dense vectors? •

행렬연산과역행렬

9

Page 10: Dense Vectors - Seoul National Universityling.snu.ac.kr/class/AI_Agent/lecture/vector2.pdf · 2018. 6. 15. · Dan Jurafsky. Sparse versus dense vectors • Why dense vectors? •

전치행렬(transpose)

10

Page 11: Dense Vectors - Seoul National Universityling.snu.ac.kr/class/AI_Agent/lecture/vector2.pdf · 2018. 6. 15. · Dan Jurafsky. Sparse versus dense vectors • Why dense vectors? •

역행렬(inverse matrix)

11

Page 12: Dense Vectors - Seoul National Universityling.snu.ac.kr/class/AI_Agent/lecture/vector2.pdf · 2018. 6. 15. · Dan Jurafsky. Sparse versus dense vectors • Why dense vectors? •

역행렬(inverse matrix)

12

Page 13: Dense Vectors - Seoul National Universityling.snu.ac.kr/class/AI_Agent/lecture/vector2.pdf · 2018. 6. 15. · Dan Jurafsky. Sparse versus dense vectors • Why dense vectors? •

역행렬(inverse matrix)

13

Page 14: Dense Vectors - Seoul National Universityling.snu.ac.kr/class/AI_Agent/lecture/vector2.pdf · 2018. 6. 15. · Dan Jurafsky. Sparse versus dense vectors • Why dense vectors? •

행렬의분해

14

Page 15: Dense Vectors - Seoul National Universityling.snu.ac.kr/class/AI_Agent/lecture/vector2.pdf · 2018. 6. 15. · Dan Jurafsky. Sparse versus dense vectors • Why dense vectors? •

벡터의내적(inner/dot product)

15

Page 16: Dense Vectors - Seoul National Universityling.snu.ac.kr/class/AI_Agent/lecture/vector2.pdf · 2018. 6. 15. · Dan Jurafsky. Sparse versus dense vectors • Why dense vectors? •

벡터의크기(norm)

16

Page 17: Dense Vectors - Seoul National Universityling.snu.ac.kr/class/AI_Agent/lecture/vector2.pdf · 2018. 6. 15. · Dan Jurafsky. Sparse versus dense vectors • Why dense vectors? •

직교벡터(orthogonal vector)

17

Page 18: Dense Vectors - Seoul National Universityling.snu.ac.kr/class/AI_Agent/lecture/vector2.pdf · 2018. 6. 15. · Dan Jurafsky. Sparse versus dense vectors • Why dense vectors? •

직교벡터(orthogonal vector)

18

Page 19: Dense Vectors - Seoul National Universityling.snu.ac.kr/class/AI_Agent/lecture/vector2.pdf · 2018. 6. 15. · Dan Jurafsky. Sparse versus dense vectors • Why dense vectors? •

Dan Jurafsky

Eigenvalue와 Eigenvector

• Eigenvector(고유벡터)• n xn행렬 A에대하여 Ax = λx를만족하는영벡터(0)가아닌벡터• Scalar λ: eigenvalue(고유값)

• Eigenvalue와 Eigenvector찾기

19

Page 20: Dense Vectors - Seoul National Universityling.snu.ac.kr/class/AI_Agent/lecture/vector2.pdf · 2018. 6. 15. · Dan Jurafsky. Sparse versus dense vectors • Why dense vectors? •

대각화와고유값분해

20

Page 21: Dense Vectors - Seoul National Universityling.snu.ac.kr/class/AI_Agent/lecture/vector2.pdf · 2018. 6. 15. · Dan Jurafsky. Sparse versus dense vectors • Why dense vectors? •

고유값분해

21

Page 22: Dense Vectors - Seoul National Universityling.snu.ac.kr/class/AI_Agent/lecture/vector2.pdf · 2018. 6. 15. · Dan Jurafsky. Sparse versus dense vectors • Why dense vectors? •

특이값(singular value)

22

Page 23: Dense Vectors - Seoul National Universityling.snu.ac.kr/class/AI_Agent/lecture/vector2.pdf · 2018. 6. 15. · Dan Jurafsky. Sparse versus dense vectors • Why dense vectors? •

특이값분해(singular value decomposition)

23

Page 24: Dense Vectors - Seoul National Universityling.snu.ac.kr/class/AI_Agent/lecture/vector2.pdf · 2018. 6. 15. · Dan Jurafsky. Sparse versus dense vectors • Why dense vectors? •

특이값분해(singular value decomposition)특이값 분해(SVD)는 임의의 m×n 행렬 A를 A = UΣVT 로 분해하는것으로, U와 V는 직교행렬이고, Σ 는 대각성분에 특이값을 갖는사각행렬이다. 행렬의 rank는 SVD에서 0이 아닌 특이값(singular value)의 개수와같다. SVD는 행렬에 대한 low-rank 근사를 통해 데이터를 압축하는데사용될 수 있다. SVD는 역행렬, pseudo-inverse 등의 계산을 쉽게 할 수 있게 한다. SVD는 데이터 압축, 영상 및 신호 처리, 행렬의 효과적인 연산 등다양한 분야에서 활용되고 있다.

24

Page 25: Dense Vectors - Seoul National Universityling.snu.ac.kr/class/AI_Agent/lecture/vector2.pdf · 2018. 6. 15. · Dan Jurafsky. Sparse versus dense vectors • Why dense vectors? •

Dan Jurafsky

Intuition• Approximate an N-dimensional dataset using fewer dimensions• By first rotating the axes into a new space• In which the highest order dimension captures the most

variance in the original dataset• And the next dimension captures the next most variance, etc.• Many such (related) methods:

• PCA – principle components analysis• Factor Analysis• SVD

25

Page 26: Dense Vectors - Seoul National Universityling.snu.ac.kr/class/AI_Agent/lecture/vector2.pdf · 2018. 6. 15. · Dan Jurafsky. Sparse versus dense vectors • Why dense vectors? •

Dan Jurafsky

Dimensionality reduction

26

Page 27: Dense Vectors - Seoul National Universityling.snu.ac.kr/class/AI_Agent/lecture/vector2.pdf · 2018. 6. 15. · Dan Jurafsky. Sparse versus dense vectors • Why dense vectors? •

Dan Jurafsky

Singular Value Decomposition

Any rectangular w x c matrix X equals the product of 3 matrices:W: rows corresponding to original but m columns represents a dimension in a new latent space, such that

• M column vectors are orthogonal to each other• Columns are ordered by the amount of variance in the dataset each new

dimension accounts for

S: diagonal m x m matrix of singular values expressing the importance of each dimension.C: columns corresponding to original but m rows corresponding to singular values27

Page 28: Dense Vectors - Seoul National Universityling.snu.ac.kr/class/AI_Agent/lecture/vector2.pdf · 2018. 6. 15. · Dan Jurafsky. Sparse versus dense vectors • Why dense vectors? •

Dan Jurafsky

Singular Value Decomposition

Landuaer and Dumais 199728

Page 29: Dense Vectors - Seoul National Universityling.snu.ac.kr/class/AI_Agent/lecture/vector2.pdf · 2018. 6. 15. · Dan Jurafsky. Sparse versus dense vectors • Why dense vectors? •

Dan Jurafsky

SVD applied to term-document matrix:Latent Semantic Analysis

• If instead of keeping all m dimensions, we just keep the top k singular values. Let’s say 300.

• The result is a least-squares approximation to the original X• But instead of multiplying,

we’ll just make use of W.• Each row of W:

• A k-dimensional vector• Representing word W

k/

/k

/k

/k

Deerwester et al (1988)

29

Page 30: Dense Vectors - Seoul National Universityling.snu.ac.kr/class/AI_Agent/lecture/vector2.pdf · 2018. 6. 15. · Dan Jurafsky. Sparse versus dense vectors • Why dense vectors? •

Dan Jurafsky

LSA more details

• 300 dimensions are commonly used• The cells are commonly weighted by a product of two weights

• Local weight: Log term frequency• Global weight: either idf or an entropy measure

30

Page 31: Dense Vectors - Seoul National Universityling.snu.ac.kr/class/AI_Agent/lecture/vector2.pdf · 2018. 6. 15. · Dan Jurafsky. Sparse versus dense vectors • Why dense vectors? •

Dan Jurafsky

Let’s return to PPMI word-word matrices

• Can we apply to SVD to them?

31

Page 32: Dense Vectors - Seoul National Universityling.snu.ac.kr/class/AI_Agent/lecture/vector2.pdf · 2018. 6. 15. · Dan Jurafsky. Sparse versus dense vectors • Why dense vectors? •

Dan Jurafsky

SVD applied to term-term matrix

(I’m simplifying here by assuming the matrix has rank |V|)32

Page 33: Dense Vectors - Seoul National Universityling.snu.ac.kr/class/AI_Agent/lecture/vector2.pdf · 2018. 6. 15. · Dan Jurafsky. Sparse versus dense vectors • Why dense vectors? •

Dan Jurafsky

Truncated SVD on term-term matrix

33

Page 34: Dense Vectors - Seoul National Universityling.snu.ac.kr/class/AI_Agent/lecture/vector2.pdf · 2018. 6. 15. · Dan Jurafsky. Sparse versus dense vectors • Why dense vectors? •

Dan Jurafsky

Truncated SVD produces embeddings

• Each row of W matrix is a k-dimensional representation of each word w

• K might range from 50 to 1000• Generally we keep the top k dimensions,

but some experiments suggest that getting rid of the top 1 dimension or even the top 50 dimensions is helpful (Lapesaand Evert 2014).

34

Page 35: Dense Vectors - Seoul National Universityling.snu.ac.kr/class/AI_Agent/lecture/vector2.pdf · 2018. 6. 15. · Dan Jurafsky. Sparse versus dense vectors • Why dense vectors? •

Dan Jurafsky

Embeddings versus sparse vectors

• Dense SVD embeddings sometimes work better than sparse PPMI matrices at tasks like word similarity• Denoising: low-order dimensions may represent unimportant

information• Truncation may help the models generalize better to unseen data.• Having a smaller number of dimensions may make it easier for

classifiers to properly weight the dimensions for the task.• Dense models may do better at capturing higher order co-

occurrence. 35

Page 36: Dense Vectors - Seoul National Universityling.snu.ac.kr/class/AI_Agent/lecture/vector2.pdf · 2018. 6. 15. · Dan Jurafsky. Sparse versus dense vectors • Why dense vectors? •

Vector Semantics

Embeddings inspired by neural language models:

skip-grams and CBOW

Page 37: Dense Vectors - Seoul National Universityling.snu.ac.kr/class/AI_Agent/lecture/vector2.pdf · 2018. 6. 15. · Dan Jurafsky. Sparse versus dense vectors • Why dense vectors? •

Dan Jurafsky Prediction-based models:An alternative way to get dense vectors

• Skip-gram (Mikolov et al. 2013a) CBOW (Mikolov et al. 2013b)• Learn embeddings as part of the process of word prediction.• Train a neural network to predict neighboring words

• Inspired by neural net language models.• In so doing, learn dense embeddings for the words in the training corpus.

• Advantages:• Fast, easy to train (much faster than SVD)• Available online in the word2vec package• Including sets of pretrained embeddings!37

Page 38: Dense Vectors - Seoul National Universityling.snu.ac.kr/class/AI_Agent/lecture/vector2.pdf · 2018. 6. 15. · Dan Jurafsky. Sparse versus dense vectors • Why dense vectors? •

Dan Jurafsky

Skip-grams

• Predict each neighboring word • in a context window of 2C words • from the current word.

• So for C=2, we are given word wt and predicting these 4 words:

38

Page 39: Dense Vectors - Seoul National Universityling.snu.ac.kr/class/AI_Agent/lecture/vector2.pdf · 2018. 6. 15. · Dan Jurafsky. Sparse versus dense vectors • Why dense vectors? •

Dan Jurafsky

Skip-grams learn 2 embeddingsfor each w

input embedding v, in the input matrix W• Column i of the input matrix W is the 1×d

embedding vi for word i in the vocabulary.

output embedding v′, in output matrix W’• Row i of the output matrix W′ is a d × 1

vector embedding v′i for word i in the vocabulary.

39

Page 40: Dense Vectors - Seoul National Universityling.snu.ac.kr/class/AI_Agent/lecture/vector2.pdf · 2018. 6. 15. · Dan Jurafsky. Sparse versus dense vectors • Why dense vectors? •

Dan Jurafsky

Setup

• Walking through corpus pointing at word w(t), whose index in the vocabulary is j, so we’ll call it wj (1 < j < |V |).

• Let’s predict w(t+1) , whose index in the vocabulary is k (1 < k < |V |). Hence our task is to compute P(wk|wj).

40

Page 41: Dense Vectors - Seoul National Universityling.snu.ac.kr/class/AI_Agent/lecture/vector2.pdf · 2018. 6. 15. · Dan Jurafsky. Sparse versus dense vectors • Why dense vectors? •

Dan Jurafsky

Intuition: similarity as dot-productbetween a target vector and context vector

41

Page 42: Dense Vectors - Seoul National Universityling.snu.ac.kr/class/AI_Agent/lecture/vector2.pdf · 2018. 6. 15. · Dan Jurafsky. Sparse versus dense vectors • Why dense vectors? •

Dan Jurafsky

Similarity is computed from dot product

• Remember: two vectors are similar if they have a high dot product• Cosine is just a normalized dot product

• So:• Similarity(j,k) ∝ ck ∙ vj

• We’ll need to normalize to get a probability

42

Page 43: Dense Vectors - Seoul National Universityling.snu.ac.kr/class/AI_Agent/lecture/vector2.pdf · 2018. 6. 15. · Dan Jurafsky. Sparse versus dense vectors • Why dense vectors? •

Dan Jurafsky

Turning dot products into probabilities

• Similarity(j,k) = ck · vj

• We use softmax to turn into probabilities

43

Page 44: Dense Vectors - Seoul National Universityling.snu.ac.kr/class/AI_Agent/lecture/vector2.pdf · 2018. 6. 15. · Dan Jurafsky. Sparse versus dense vectors • Why dense vectors? •

Dan Jurafsky

Embeddings from W and W’

• Since we have two embeddings, vj and cj for each word wj• We can either:

• Just use vj

• Sum them• Concatenate them to make a double-length embedding

44

Page 45: Dense Vectors - Seoul National Universityling.snu.ac.kr/class/AI_Agent/lecture/vector2.pdf · 2018. 6. 15. · Dan Jurafsky. Sparse versus dense vectors • Why dense vectors? •

Dan Jurafsky

Learning

• Start with some initial embeddings (e.g., random)• iteratively make the embeddings for a word

• more like the embeddings of its neighbors • less like the embeddings of other words.

45

Page 46: Dense Vectors - Seoul National Universityling.snu.ac.kr/class/AI_Agent/lecture/vector2.pdf · 2018. 6. 15. · Dan Jurafsky. Sparse versus dense vectors • Why dense vectors? •

Dan Jurafsky

Visualizing W and C as a network for doing error backprop

46

Page 47: Dense Vectors - Seoul National Universityling.snu.ac.kr/class/AI_Agent/lecture/vector2.pdf · 2018. 6. 15. · Dan Jurafsky. Sparse versus dense vectors • Why dense vectors? •

Dan Jurafsky

One-hot vectors

• A vector of length |V| • 1 for the target word and 0 for other words• So if “popsicle” is vocabulary word 5• The one-hot vector is• [0,0,0,0,1,0,0,0,0…….0]

47

Page 48: Dense Vectors - Seoul National Universityling.snu.ac.kr/class/AI_Agent/lecture/vector2.pdf · 2018. 6. 15. · Dan Jurafsky. Sparse versus dense vectors • Why dense vectors? •

Dan Jurafsky

Skip-gram h = vj

o = Chok = ckhok = ck∙vj

48

Page 49: Dense Vectors - Seoul National Universityling.snu.ac.kr/class/AI_Agent/lecture/vector2.pdf · 2018. 6. 15. · Dan Jurafsky. Sparse versus dense vectors • Why dense vectors? •

Dan Jurafsky

Skip-gram

49

Page 50: Dense Vectors - Seoul National Universityling.snu.ac.kr/class/AI_Agent/lecture/vector2.pdf · 2018. 6. 15. · Dan Jurafsky. Sparse versus dense vectors • Why dense vectors? •

Dan Jurafsky

Problem with the softamx

• The denominator: have to compute over every word in vocab

• Instead: just sample a few of those negative words

50

Page 51: Dense Vectors - Seoul National Universityling.snu.ac.kr/class/AI_Agent/lecture/vector2.pdf · 2018. 6. 15. · Dan Jurafsky. Sparse versus dense vectors • Why dense vectors? •

Dan Jurafsky

Goal in learning• Make the word like the context words

• We want this to be high:

• And not like k randomly selected “noise words”

• We want this to be low:51

Page 52: Dense Vectors - Seoul National Universityling.snu.ac.kr/class/AI_Agent/lecture/vector2.pdf · 2018. 6. 15. · Dan Jurafsky. Sparse versus dense vectors • Why dense vectors? •

Dan Jurafsky

Skipgram with negative sampling:Loss function

52

Page 53: Dense Vectors - Seoul National Universityling.snu.ac.kr/class/AI_Agent/lecture/vector2.pdf · 2018. 6. 15. · Dan Jurafsky. Sparse versus dense vectors • Why dense vectors? •

Dan Jurafsky

Relation between skipgrams and PMI!

• If we multiply WW’T

• We get a |V|x|V| matrix M , each entry mij corresponding to some association between input word i and output word j

• Levy and Goldberg (2014b) show that skip-gram reaches its optimum just when this matrix is a shifted version of PMI:

WW′T =MPMI −log k • So skip-gram is implicitly factoring a shifted version of the PMI

matrix into the two embedding matrices.53

Page 54: Dense Vectors - Seoul National Universityling.snu.ac.kr/class/AI_Agent/lecture/vector2.pdf · 2018. 6. 15. · Dan Jurafsky. Sparse versus dense vectors • Why dense vectors? •

Dan Jurafsky

Properties of embeddings

• Nearest words to some embeddings (Mikolov et al. 20131)

54

Page 55: Dense Vectors - Seoul National Universityling.snu.ac.kr/class/AI_Agent/lecture/vector2.pdf · 2018. 6. 15. · Dan Jurafsky. Sparse versus dense vectors • Why dense vectors? •

Dan Jurafsky

Embeddings capture relational meaning!

vector(‘king’) - vector(‘man’) + vector(‘woman’) ≈ vector(‘queen’)vector(‘Paris’) - vector(‘France’) + vector(‘Italy’) ≈ vector(‘Rome’)

55

Page 56: Dense Vectors - Seoul National Universityling.snu.ac.kr/class/AI_Agent/lecture/vector2.pdf · 2018. 6. 15. · Dan Jurafsky. Sparse versus dense vectors • Why dense vectors? •

Vector Semantics

Brown clustering

Page 57: Dense Vectors - Seoul National Universityling.snu.ac.kr/class/AI_Agent/lecture/vector2.pdf · 2018. 6. 15. · Dan Jurafsky. Sparse versus dense vectors • Why dense vectors? •

Dan Jurafsky

Brown clustering

• An agglomerative clustering algorithm that clusters words based on which words precede or follow them

• These word clusters can be turned into a kind of vector• We’ll give a very brief sketch here.

57

Page 58: Dense Vectors - Seoul National Universityling.snu.ac.kr/class/AI_Agent/lecture/vector2.pdf · 2018. 6. 15. · Dan Jurafsky. Sparse versus dense vectors • Why dense vectors? •

Dan Jurafsky

Brown clustering algorithm

• Each word is initially assigned to its own cluster. • We now consider consider merging each pair of clusters. Highest

quality merge is chosen.• Quality = merges two words that have similar probabilities of preceding

and following words• (More technically quality = smallest decrease in the likelihood of the

corpus according to a class-based language model)

• Clustering proceeds until all words are in one big cluster.

58

Page 59: Dense Vectors - Seoul National Universityling.snu.ac.kr/class/AI_Agent/lecture/vector2.pdf · 2018. 6. 15. · Dan Jurafsky. Sparse versus dense vectors • Why dense vectors? •

Dan Jurafsky

Brown Clusters as vectors

• By tracing the order in which clusters are merged, the model builds a binary tree from bottom to top.

• Each word represented by binary string = path from root to leaf• Each intermediate node is a cluster • Chairman is 0010, “months” = 01, and verbs = 1

59

Page 60: Dense Vectors - Seoul National Universityling.snu.ac.kr/class/AI_Agent/lecture/vector2.pdf · 2018. 6. 15. · Dan Jurafsky. Sparse versus dense vectors • Why dense vectors? •

Dan Jurafsky

Brown cluster examples

60

Page 61: Dense Vectors - Seoul National Universityling.snu.ac.kr/class/AI_Agent/lecture/vector2.pdf · 2018. 6. 15. · Dan Jurafsky. Sparse versus dense vectors • Why dense vectors? •

Dan Jurafsky

Class-based language model

• Suppose each word was in some class ci:

61