51
Introducing Structure into Neural Network-Based Semantic Models Stephen Clark 15th Meeting on the Mathematics of Language Queen Mary University of London, 14 July 2017 University of Cambridge and DeepMind 1

Introducing Structure into Neural Network-Based …sc609/talks/mol17.pdf · Introducing Structure into Neural Network-Based Semantic Models Stephen Clark 15th Meeting on the Mathematics

  • Upload
    buidang

  • View
    221

  • Download
    4

Embed Size (px)

Citation preview

Page 1: Introducing Structure into Neural Network-Based …sc609/talks/mol17.pdf · Introducing Structure into Neural Network-Based Semantic Models Stephen Clark 15th Meeting on the Mathematics

Introducing Structure into Neural Network-Based Semantic Models

Stephen Clark

15th Meeting on the Mathematics of Language

Queen Mary University of London, 14 July 2017

University of Cambridge and DeepMind

1

Page 2: Introducing Structure into Neural Network-Based …sc609/talks/mol17.pdf · Introducing Structure into Neural Network-Based Semantic Models Stephen Clark 15th Meeting on the Mathematics

Where’s the Structure?

−−→mat

−→cat

2

−→sat

Page 3: Introducing Structure into Neural Network-Based …sc609/talks/mol17.pdf · Introducing Structure into Neural Network-Based Semantic Models Stephen Clark 15th Meeting on the Mathematics

Where’s the Structure?

3

taken from Colah’s blog: http://colah.github.io/posts/2015-08-Understanding-LSTMs/

Page 4: Introducing Structure into Neural Network-Based …sc609/talks/mol17.pdf · Introducing Structure into Neural Network-Based Semantic Models Stephen Clark 15th Meeting on the Mathematics

Roadmap

• Background on vector-based semantics

• some existing compositional approaches

• Case study I

• a new evaluation dataset based on relative clauses (RELPRON)

• results using feedforward neural networks

• Case study II

• inducing structure using tree-based LSTMs

• results on a dictionary term-matching task

4

Page 5: Introducing Structure into Neural Network-Based …sc609/talks/mol17.pdf · Introducing Structure into Neural Network-Based Semantic Models Stephen Clark 15th Meeting on the Mathematics

Collaborators

5

Page 6: Introducing Structure into Neural Network-Based …sc609/talks/mol17.pdf · Introducing Structure into Neural Network-Based Semantic Models Stephen Clark 15th Meeting on the Mathematics

6

Distributional Semantics

... examples of felines are the big cats – the lion , tiger , leopard , ...

... Calico cat is an American name for cats with three colors . Out ...

... Siamese is a breed of cat from Thailand . It is a well-known cat ...

... Cats is a musical . It is based on Old Possums Book of Practical ...

feline lion ... musical color ...

cat

tiger

10 20 4 5

2 4 0 1

target words context words

tTestpwi, cjq “ ppwi, cjq ´ ppwiqppcjqappwiqppcjq

Page 7: Introducing Structure into Neural Network-Based …sc609/talks/mol17.pdf · Introducing Structure into Neural Network-Based Semantic Models Stephen Clark 15th Meeting on the Mathematics

Neural Representations

7

(Mikolov et al., 2013)

Page 8: Introducing Structure into Neural Network-Based …sc609/talks/mol17.pdf · Introducing Structure into Neural Network-Based Semantic Models Stephen Clark 15th Meeting on the Mathematics

Vector Space Semantics

8

furry

stroke

pet��

��✠

✁✁✁✁✁✁✁✕

���

��✒

cat

dog

Page 9: Introducing Structure into Neural Network-Based …sc609/talks/mol17.pdf · Introducing Structure into Neural Network-Based Semantic Models Stephen Clark 15th Meeting on the Mathematics

From Words to Sentences

s1

s2

s3�

��

��✠

✂✂✂✂✂✍

���✒

man killed dog

man murdered cat

man killed by dog

9

Page 10: Introducing Structure into Neural Network-Based …sc609/talks/mol17.pdf · Introducing Structure into Neural Network-Based Semantic Models Stephen Clark 15th Meeting on the Mathematics

Elementwise Operators for Composition

10

black 0.34 0.64 ... -0.06 ...

cat 0.15 0.29 ... -0.03 ...

+

black+ cat 0.49 0.93 ... -0.09 ...

=cat 0.15 0.29 ... 0.03 ...

blacko cat 0.05 0.19 ... -0.002 ...

=

black 0.34 0.64 ... -0.06 ...

(Mitchell & Lapata, 2008)

Page 11: Introducing Structure into Neural Network-Based …sc609/talks/mol17.pdf · Introducing Structure into Neural Network-Based Semantic Models Stephen Clark 15th Meeting on the Mathematics

Composition in Neural Models

11

Deep Learning for NLP (Socher et al., 2013)

Page 12: Introducing Structure into Neural Network-Based …sc609/talks/mol17.pdf · Introducing Structure into Neural Network-Based Semantic Models Stephen Clark 15th Meeting on the Mathematics

Roadmap

• Background on vector-based semantics

• some existing compositional approaches

• Case study I

• a new evaluation dataset based on relative clauses (RELPRON)

• results using feedforward neural networks

• Case study II

• inducing structure using tree-based LSTMs

• results on a dictionary term-matching task

12

Page 13: Introducing Structure into Neural Network-Based …sc609/talks/mol17.pdf · Introducing Structure into Neural Network-Based Semantic Models Stephen Clark 15th Meeting on the Mathematics

RELPRON Dataset

13

wisdom: quality that experience teaches

bowler: player that dominates batsmen

theatre: building that hosts premieres

• Relative clauses have an interesting compositional structure

• Intermediate level of syntactic complexity with a function word

Page 14: Introducing Structure into Neural Network-Based …sc609/talks/mol17.pdf · Introducing Structure into Neural Network-Based Semantic Models Stephen Clark 15th Meeting on the Mathematics

The Ranking Task

14

navyorganisation that fleet destroyorganisation that sailor joinperson that make journeyorganisation that soldier joindevice that seal openingorganisation that defeat fleetorganisation that establish blockademammal that fleet hunt

Page 15: Introducing Structure into Neural Network-Based …sc609/talks/mol17.pdf · Introducing Structure into Neural Network-Based Semantic Models Stephen Clark 15th Meeting on the Mathematics

Terminology

15

telescope: device that ____ detects planets

telescope: device that observatory has ____

RELATIVE CLAUSE

VERB ARGUMENTPROPERTY

TERM HEAD NOUN

VERBARGUMENTTERM HEAD NOUN

RELATIVE CLAUSE

PROPERTY

Page 16: Introducing Structure into Neural Network-Based …sc609/talks/mol17.pdf · Introducing Structure into Neural Network-Based Semantic Models Stephen Clark 15th Meeting on the Mathematics

Selection of Head Nouns and Terms

16

Head Nouns

activity, building, device, document, mammal, material, organization, person, phenomenon, player, quality, room, scientist, vehicle, woman

Terms

person: traveler, intellectual, follower, survivor, ...quality: accuracy, balance, widsom, rhythm, ...

Page 17: Introducing Structure into Neural Network-Based …sc609/talks/mol17.pdf · Introducing Structure into Neural Network-Based Semantic Models Stephen Clark 15th Meeting on the Mathematics

Extraction of SVO Triples

17

it allow travelerwebsite allow traveler

inn serve travelerbusiness serve traveler

she become travelerstudent become traveler

traveler make decisiontraveler make journeytraveler take roadtraveler take advantagetraveler have option

traveler: person that business serves

traveler: person that takes advantage

Page 18: Introducing Structure into Neural Network-Based …sc609/talks/mol17.pdf · Introducing Structure into Neural Network-Based Semantic Models Stephen Clark 15th Meeting on the Mathematics

18

traveler: person that it allowstraveler: person that website allowstraveler: person that inn servestraveler: person that business servestraveler: person that she becomestraveler: person that student becomes

traveler: person that makes decisiontraveler: person that makes journeytraveler: person that takes roadtraveler: person that takes advantagetraveler: person that has optionstraveler: person that has baggage

Manual Annotation of Properties

Page 19: Introducing Structure into Neural Network-Based …sc609/talks/mol17.pdf · Introducing Structure into Neural Network-Based Semantic Models Stephen Clark 15th Meeting on the Mathematics

Full List of Properties for navy

19

OBJ navy: organization that sailor joinOBJ navy: organization that fleet destroyOBJ navy: organization that vessel serveOBJ navy: organization that battleship fight

SBJ navy: orgnaization that use submarineSBJ navy: organization that maintain blockadeSBJ navy: organization that defeat fleetSBJ navy: organization that protect watersSBJ navy: organization that establish blockadeSBJ navy: organization that blockade port

Page 20: Introducing Structure into Neural Network-Based …sc609/talks/mol17.pdf · Introducing Structure into Neural Network-Based Semantic Models Stephen Clark 15th Meeting on the Mathematics

Data Set Statistics

20

Test Set Development SetHead noun Term Prop Head noun Term Prop

phenomenon 10 80 quality 10 81activity 10 83 organization 10 99material 10 82 device 10 76room 10 80 building 10 69vehicle 10 79 document 10 69mammal 10 70 person 10 86woman 5 37 player 5 38scientist 8 58TOTAL 73 569 TOTAL 65 518

Page 21: Introducing Structure into Neural Network-Based …sc609/talks/mol17.pdf · Introducing Structure into Neural Network-Based Semantic Models Stephen Clark 15th Meeting on the Mathematics

21

navy

organisation that fleet destroy

organisation that sailor join

person that make journey

organisation that soldier join

device that seal opening

organisation that defeat fleet

organisation that establish blockade

mammal that fleet hunt

= ∑ ∑ Prec(k) × rel(k)1N

N

t=11

tP k=1

M

Mean Average Precision (MAP)

Evaluation

Page 22: Introducing Structure into Neural Network-Based …sc609/talks/mol17.pdf · Introducing Structure into Neural Network-Based Semantic Models Stephen Clark 15th Meeting on the Mathematics

Roadmap

• Background on vector-based semantics

• some existing compositional approaches

• Case study I

• a new evaluation dataset based on relative clauses (RELPRON)

• results using feedforward neural networks

• Case study II

• inducing structure using tree-based LSTMs

• results on a dictionary term-matching task

22

Page 23: Introducing Structure into Neural Network-Based …sc609/talks/mol17.pdf · Introducing Structure into Neural Network-Based Semantic Models Stephen Clark 15th Meeting on the Mathematics

Relative Pronoun Neural Network

23

use telescope

use telescope

person that

person that use telescope

Page 24: Introducing Structure into Neural Network-Based …sc609/talks/mol17.pdf · Introducing Structure into Neural Network-Based Semantic Models Stephen Clark 15th Meeting on the Mathematics

Feedforward Neural Network

24

use telescope⊕

Page 25: Introducing Structure into Neural Network-Based …sc609/talks/mol17.pdf · Introducing Structure into Neural Network-Based Semantic Models Stephen Clark 15th Meeting on the Mathematics

Training the Neural Networks

25

use telescope

use telescope

person that

person that use telescope

Page 26: Introducing Structure into Neural Network-Based …sc609/talks/mol17.pdf · Introducing Structure into Neural Network-Based Semantic Models Stephen Clark 15th Meeting on the Mathematics

“Holistic” Observed Vectors

26

promote society

promote society

Anarchism s_be_Anarchism_s be a political a_political_philosophy_a philosophy which consider the state undesirable , unnecessary and harmful , and instead s_promote_philosophy_s promote o_promote_society_o a stateless a_stateless_society_a society , or anarchy .it seek to diminish o_diminish_authority_o or even abolish o_abolish_authority_o authority in the conduct of human a_human_relation_a relation . anarchist may widely disagree on what additional a_additional_criterion_a criterion be require in anarchism .

Word2Vec (Mikolov et al.)

promote societyabolish authorityinclude strainsupport economybecome ruler

Page 27: Introducing Structure into Neural Network-Based …sc609/talks/mol17.pdf · Introducing Structure into Neural Network-Based Semantic Models Stephen Clark 15th Meeting on the Mathematics

Holistic Relative Clauses?

27

use telescope

use telescope

person that

person that use telescope

Page 28: Introducing Structure into Neural Network-Based …sc609/talks/mol17.pdf · Introducing Structure into Neural Network-Based Semantic Models Stephen Clark 15th Meeting on the Mathematics

Relative Clauses as Adjectives

28

use telescope

use telescope

person that

“using-telescope” person

Page 29: Introducing Structure into Neural Network-Based …sc609/talks/mol17.pdf · Introducing Structure into Neural Network-Based Semantic Models Stephen Clark 15th Meeting on the Mathematics

“Holistic” Observed Vectors

29

political philosophy

political philosophy

Anarchism s_Anarchism_be_s be a political a_political_philosophy_a philosophy which consider the state undesirable , unnecessary and harmful , and instead s_philosophy_promote_s promote o_promote_society_o a stateless a_stateless_society_a society , or anarchy .it seek to diminish o_diminish_authority_o or even abolish o_abolish_authority_o authority in the conduct of human a_human_relation_a relation . anarchist may widely disagree on what additional a_additional_criterion_a criterion be require in anarchism .

Word2Vec (Mikolov et al.)

political philosophystateless societyhuman relationadditional criterionred cortina

Page 30: Introducing Structure into Neural Network-Based …sc609/talks/mol17.pdf · Introducing Structure into Neural Network-Based Semantic Models Stephen Clark 15th Meeting on the Mathematics

Implementation Details

• tensorflow

• 300 dim. word2vec embeddings

• hidden layers: 1000, 600, tanh

• MSQE loss, Adagrad, minibatch

• 831M words Wikipedia dump

• 50 freq cutoff for holisitic vectors

• 75,106 adjective nouns

• 54,825 verb objects

• 39,571 subject verbs

30

use telescope

use telescope

person that

person that use telescope

Page 31: Introducing Structure into Neural Network-Based …sc609/talks/mol17.pdf · Introducing Structure into Neural Network-Based Semantic Models Stephen Clark 15th Meeting on the Mathematics

Development Results

31

Model map

Addition 53.8Neural Net (all holistic) 58.2

Page 32: Introducing Structure into Neural Network-Based …sc609/talks/mol17.pdf · Introducing Structure into Neural Network-Based Semantic Models Stephen Clark 15th Meeting on the Mathematics

Definition Ranking Example

32

TERM: navy 10

navy: organization that blockade port navy: organization that fleet destroy navy: organization that battleship fight navy: organization that defeat fleet navy: organization that establish blockade navy: organization that use submarine navy: organization that maintain blockade navy: organization that sailor join garrison: organization that army install army: organization that seize garrison pub: building that attract sailor traveler: person that take ship navy: organization that vessel serve

Page 33: Introducing Structure into Neural Network-Based …sc609/talks/mol17.pdf · Introducing Structure into Neural Network-Based Semantic Models Stephen Clark 15th Meeting on the Mathematics

Definition Ranking Example

33

TERM: expert 10

specification: document that engineer developexpert: person that author monographlearner: person that develop skillexpert: person that have knowledgeintellectual: person that hold seminarintellectual: person that publish studyexpert: person that panel includespecification: document that set standardexpert: person that lawyer consultgolfer: player that bunker challengecharacteristic: quality that determine identitycharity: organization that celebrity representlikelihood: quality that risk assesslearner: person that subject engageexpert: person that give tutorial

Page 34: Introducing Structure into Neural Network-Based …sc609/talks/mol17.pdf · Introducing Structure into Neural Network-Based Semantic Models Stephen Clark 15th Meeting on the Mathematics

Test Results

34

Model map

Addition 53.2Neural Net (all holistic) 54.1Neural Net (holistic + dict) 52.8Ensemble 55.6

Page 35: Introducing Structure into Neural Network-Based …sc609/talks/mol17.pdf · Introducing Structure into Neural Network-Based Semantic Models Stephen Clark 15th Meeting on the Mathematics

Roadmap

• Background on vector-based semantics

• some existing compositional approaches

• Case study I

• a new evaluation dataset based on relative clauses (RELPRON)

• results using feedforward neural networks

• Case study II

• inducing structure using tree-based LSTMs

• results on a dictionary term-matching task

35

Page 36: Introducing Structure into Neural Network-Based …sc609/talks/mol17.pdf · Introducing Structure into Neural Network-Based Semantic Models Stephen Clark 15th Meeting on the Mathematics

Where do the Trees Come From?

36

The cat chased Tweety

A supervised constituency parser Automatically induced

Page 37: Introducing Structure into Neural Network-Based …sc609/talks/mol17.pdf · Introducing Structure into Neural Network-Based Semantic Models Stephen Clark 15th Meeting on the Mathematics

Recurrent Neural Networks (RNNs)

37

beautiful pictures from Colah’s blog: http://colah.github.io/posts/2015-08-Understanding-LSTMs/

ht = tanh(Wht−1 + Uxt + b)

Page 38: Introducing Structure into Neural Network-Based …sc609/talks/mol17.pdf · Introducing Structure into Neural Network-Based Semantic Models Stephen Clark 15th Meeting on the Mathematics

LSTMs and Long-Range Dependencies

38

ft = σ(Wf ht−1 + Uf xt + bf )

it = σ(Wi ht−1 + Ui xt + bi)

C̃t = tanh(WC ht−1 + UC xt + bC)

Ct = ft ⊙ Ct−1 + it ⊙ C̃t

ht = ot ⊙ tanh(Ct)

ot = σ(Wo ht−1 + Uo xt + bo)

Page 39: Introducing Structure into Neural Network-Based …sc609/talks/mol17.pdf · Introducing Structure into Neural Network-Based Semantic Models Stephen Clark 15th Meeting on the Mathematics

Chart-Based Vector Composition

39

neuro linguistic programming rocks1

2

3

4

Page 40: Introducing Structure into Neural Network-Based …sc609/talks/mol17.pdf · Introducing Structure into Neural Network-Based Semantic Models Stephen Clark 15th Meeting on the Mathematics

Chart-Based Vector Composition

40

‘neuro’h c

‘linguistic’h c

‘programming’h c

‘rocks’h c

‘linguisticprogramming rocks’

h ch c‘neuro linguisticprogramming’

‘neurolinguistic’

h c‘linguistic

programming’

h c h c‘programming

rocks’

‘neuro linguistic programming rocks’h c

−→nlp = λ f(

−→nl,−→p ) + (1− λ) f(−→n ,

−→lp)

ei = cos(−→u ,−→h i)

λi = softmax(ei/t)

−→h =

�i λi

−→h i

Page 41: Introducing Structure into Neural Network-Based …sc609/talks/mol17.pdf · Introducing Structure into Neural Network-Based Semantic Models Stephen Clark 15th Meeting on the Mathematics

Tree-LSTM Composition Function

41

‘neuro’h c

‘linguistic’h c

‘programming’h c

‘rocks’h c

‘linguisticprogramming rocks’

h ch c‘neuro linguisticprogramming’

‘neurolinguistic’

h c‘linguistic

programming’

h c h c‘programming

rocks’

‘neuro linguistic programming rocks’h c

−→nlp = λ f(

−→nl,−→p ) + (1− λ) f(−→n ,

−→lp)

−→h = F (U

−→hL + V

−→hR +

−→b )

Page 42: Introducing Structure into Neural Network-Based …sc609/talks/mol17.pdf · Introducing Structure into Neural Network-Based Semantic Models Stephen Clark 15th Meeting on the Mathematics

Roadmap

• Background on vector-based semantics

• some existing compositional approaches

• Case study I

• a new evaluation dataset based on relative clauses (RELPRON)

• results using feedforward neural networks

• Case study II

• inducing structure using tree-based LSTMs

• results on a dictionary term-matching task

42

Page 43: Introducing Structure into Neural Network-Based …sc609/talks/mol17.pdf · Introducing Structure into Neural Network-Based Semantic Models Stephen Clark 15th Meeting on the Mathematics

Dictionary Data

43

projection: a forecast or prognosis obtained by extrapolationprojection: a belief or assumption that others have similar thoughts and experiences as oneselfprojection: the image that a translucent object casts onto another objectprojection: an image of an object on a surface of fewer dimensionsdna: a nucleic acid that carries the genetic information in the celldna: the sequence of nucleotides determines individual hereditary characteristicsdna: defense nuclear agencydna: did not answerheathland: a tract of level wasteland uncultivated land with sandy soil and scrubby vegetationmatroid: a structure that captures the essence of a notion of independence that generalizes

linear independence in vector spacespolioptila: an isolated genus of oscine passerine birds typical of the subfamily polioptilinæ polioptila: new world gnatcatchersdangdut: a genre of indonesian music that combines elements of arab and malay folk music

Learning to Understand Phrases by Embedding the Dictionary (Hill et al. 2016)

Page 44: Introducing Structure into Neural Network-Based …sc609/talks/mol17.pdf · Introducing Structure into Neural Network-Based Semantic Models Stephen Clark 15th Meeting on the Mathematics

Term-Definition Matching Task

a forecast or prognosis obtained by extrapolation

44

projection

Page 45: Introducing Structure into Neural Network-Based …sc609/talks/mol17.pdf · Introducing Structure into Neural Network-Based Semantic Models Stephen Clark 15th Meeting on the Mathematics

End-to-End Training

45

a forecast or prognosis obtained by extrapolation

projection

500D CBOW vectors

max cosine(−−→term,

−−−−−−→definition)

• DyNet neural network library

• Stochastic Gradient Descent

• Batch size = 16

• 852k word-definition pairs in total

• 1.1M parameters

• 5 days training on a GPU

Page 46: Introducing Structure into Neural Network-Based …sc609/talks/mol17.pdf · Introducing Structure into Neural Network-Based Semantic Models Stephen Clark 15th Meeting on the Mathematics

Evaluation

46

control consisting of a mechanical device for controlling fluid flow

1. plug2. clutch3. valve4. gear5. sluice6. brake 7. piston . . . . 66k. amoeba

RANK = 3, TOP-1 = 0, TOP-10 = 1, TOP-100 = 1

Page 47: Introducing Structure into Neural Network-Based …sc609/talks/mol17.pdf · Introducing Structure into Neural Network-Based Semantic Models Stephen Clark 15th Meeting on the Mathematics

Results

47

Page 48: Introducing Structure into Neural Network-Based …sc609/talks/mol17.pdf · Introducing Structure into Neural Network-Based Semantic Models Stephen Clark 15th Meeting on the Mathematics

Parsing with the Model

48

‘neuro’h c

‘linguistic’h c

‘programming’h c

‘rocks’h c

‘linguisticprogramming rocks’

h ch c‘neuro linguisticprogramming’

‘neurolinguistic’

h c‘linguistic

programming’

h c h c‘programming

rocks’

‘neuro linguistic programming rocks’h c

Choose highest scoring node for eachcell in the chart top-down

Neuro linguistic programming rocks

Page 49: Introducing Structure into Neural Network-Based …sc609/talks/mol17.pdf · Introducing Structure into Neural Network-Based Semantic Models Stephen Clark 15th Meeting on the Mathematics

Example Trees

49

Page 50: Introducing Structure into Neural Network-Based …sc609/talks/mol17.pdf · Introducing Structure into Neural Network-Based Semantic Models Stephen Clark 15th Meeting on the Mathematics

Conclusion

• Modelled function word separately from content words (RELPRON)

• Difficult to beat the addition baseline (RELPRON), but structure does help

• Structure helps (a bit) over LSTM sequence model (DICTIONARY)

• Do we need the full binary trees?

• hunch for the future: LSTM for sequences of content words; tree-LSTM defined over the function words

50

Page 51: Introducing Structure into Neural Network-Based …sc609/talks/mol17.pdf · Introducing Structure into Neural Network-Based Semantic Models Stephen Clark 15th Meeting on the Mathematics

Associated Papers

51

• RELPRON: A Relative Clause Evaluation Data Set for Compositional Distributional Semantics. Laura Rimell, Jean Maillard, Tamara Polajnar, Stephen Clark. Computational Linguistics 42(4), pp.661-701, 2016

• Jointly Learning Sentence Embeddings and Syntax with Unsupervised Tree-LSTMs. Jean Maillard, Stephen Clark, Dani Yogatama. arXiv:1705.09189