Upload
buidang
View
221
Download
4
Embed Size (px)
Citation preview
Introducing Structure into Neural Network-Based Semantic Models
Stephen Clark
15th Meeting on the Mathematics of Language
Queen Mary University of London, 14 July 2017
University of Cambridge and DeepMind
1
Where’s the Structure?
−−→mat
−→cat
2
−→sat
Where’s the Structure?
3
taken from Colah’s blog: http://colah.github.io/posts/2015-08-Understanding-LSTMs/
Roadmap
• Background on vector-based semantics
• some existing compositional approaches
• Case study I
• a new evaluation dataset based on relative clauses (RELPRON)
• results using feedforward neural networks
• Case study II
• inducing structure using tree-based LSTMs
• results on a dictionary term-matching task
4
Collaborators
5
6
Distributional Semantics
... examples of felines are the big cats – the lion , tiger , leopard , ...
... Calico cat is an American name for cats with three colors . Out ...
... Siamese is a breed of cat from Thailand . It is a well-known cat ...
... Cats is a musical . It is based on Old Possums Book of Practical ...
feline lion ... musical color ...
cat
tiger
10 20 4 5
2 4 0 1
target words context words
tTestpwi, cjq “ ppwi, cjq ´ ppwiqppcjqappwiqppcjq
Neural Representations
7
(Mikolov et al., 2013)
Vector Space Semantics
8
furry
✲
stroke
✻
pet��
��✠
✁✁✁✁✁✁✁✕
���
��✒
cat
dog
From Words to Sentences
s1
✲
s2
✻
s3�
��
��✠
✂✂✂✂✂✍
���✒
❄
man killed dog
man murdered cat
man killed by dog
9
Elementwise Operators for Composition
10
black 0.34 0.64 ... -0.06 ...
cat 0.15 0.29 ... -0.03 ...
+
black+ cat 0.49 0.93 ... -0.09 ...
=cat 0.15 0.29 ... 0.03 ...
blacko cat 0.05 0.19 ... -0.002 ...
=
black 0.34 0.64 ... -0.06 ...
(Mitchell & Lapata, 2008)
Composition in Neural Models
11
Deep Learning for NLP (Socher et al., 2013)
Roadmap
• Background on vector-based semantics
• some existing compositional approaches
• Case study I
• a new evaluation dataset based on relative clauses (RELPRON)
• results using feedforward neural networks
• Case study II
• inducing structure using tree-based LSTMs
• results on a dictionary term-matching task
12
RELPRON Dataset
13
wisdom: quality that experience teaches
bowler: player that dominates batsmen
theatre: building that hosts premieres
• Relative clauses have an interesting compositional structure
• Intermediate level of syntactic complexity with a function word
The Ranking Task
14
navyorganisation that fleet destroyorganisation that sailor joinperson that make journeyorganisation that soldier joindevice that seal openingorganisation that defeat fleetorganisation that establish blockademammal that fleet hunt
Terminology
15
telescope: device that ____ detects planets
telescope: device that observatory has ____
RELATIVE CLAUSE
VERB ARGUMENTPROPERTY
TERM HEAD NOUN
VERBARGUMENTTERM HEAD NOUN
RELATIVE CLAUSE
PROPERTY
Selection of Head Nouns and Terms
16
Head Nouns
activity, building, device, document, mammal, material, organization, person, phenomenon, player, quality, room, scientist, vehicle, woman
Terms
person: traveler, intellectual, follower, survivor, ...quality: accuracy, balance, widsom, rhythm, ...
Extraction of SVO Triples
17
it allow travelerwebsite allow traveler
inn serve travelerbusiness serve traveler
she become travelerstudent become traveler
traveler make decisiontraveler make journeytraveler take roadtraveler take advantagetraveler have option
traveler: person that business serves
traveler: person that takes advantage
18
traveler: person that it allowstraveler: person that website allowstraveler: person that inn servestraveler: person that business servestraveler: person that she becomestraveler: person that student becomes
traveler: person that makes decisiontraveler: person that makes journeytraveler: person that takes roadtraveler: person that takes advantagetraveler: person that has optionstraveler: person that has baggage
Manual Annotation of Properties
Full List of Properties for navy
19
OBJ navy: organization that sailor joinOBJ navy: organization that fleet destroyOBJ navy: organization that vessel serveOBJ navy: organization that battleship fight
SBJ navy: orgnaization that use submarineSBJ navy: organization that maintain blockadeSBJ navy: organization that defeat fleetSBJ navy: organization that protect watersSBJ navy: organization that establish blockadeSBJ navy: organization that blockade port
Data Set Statistics
20
Test Set Development SetHead noun Term Prop Head noun Term Prop
phenomenon 10 80 quality 10 81activity 10 83 organization 10 99material 10 82 device 10 76room 10 80 building 10 69vehicle 10 79 document 10 69mammal 10 70 person 10 86woman 5 37 player 5 38scientist 8 58TOTAL 73 569 TOTAL 65 518
21
navy
organisation that fleet destroy
organisation that sailor join
person that make journey
organisation that soldier join
device that seal opening
organisation that defeat fleet
organisation that establish blockade
mammal that fleet hunt
= ∑ ∑ Prec(k) × rel(k)1N
N
t=11
tP k=1
M
Mean Average Precision (MAP)
Evaluation
Roadmap
• Background on vector-based semantics
• some existing compositional approaches
• Case study I
• a new evaluation dataset based on relative clauses (RELPRON)
• results using feedforward neural networks
• Case study II
• inducing structure using tree-based LSTMs
• results on a dictionary term-matching task
22
Relative Pronoun Neural Network
23
use telescope
use telescope
person that
person that use telescope
Feedforward Neural Network
24
use telescope⊕
Training the Neural Networks
25
use telescope
use telescope
person that
person that use telescope
“Holistic” Observed Vectors
26
promote society
promote society
Anarchism s_be_Anarchism_s be a political a_political_philosophy_a philosophy which consider the state undesirable , unnecessary and harmful , and instead s_promote_philosophy_s promote o_promote_society_o a stateless a_stateless_society_a society , or anarchy .it seek to diminish o_diminish_authority_o or even abolish o_abolish_authority_o authority in the conduct of human a_human_relation_a relation . anarchist may widely disagree on what additional a_additional_criterion_a criterion be require in anarchism .
Word2Vec (Mikolov et al.)
promote societyabolish authorityinclude strainsupport economybecome ruler
Holistic Relative Clauses?
27
use telescope
use telescope
person that
person that use telescope
Relative Clauses as Adjectives
28
use telescope
use telescope
person that
“using-telescope” person
“Holistic” Observed Vectors
29
political philosophy
political philosophy
Anarchism s_Anarchism_be_s be a political a_political_philosophy_a philosophy which consider the state undesirable , unnecessary and harmful , and instead s_philosophy_promote_s promote o_promote_society_o a stateless a_stateless_society_a society , or anarchy .it seek to diminish o_diminish_authority_o or even abolish o_abolish_authority_o authority in the conduct of human a_human_relation_a relation . anarchist may widely disagree on what additional a_additional_criterion_a criterion be require in anarchism .
Word2Vec (Mikolov et al.)
political philosophystateless societyhuman relationadditional criterionred cortina
Implementation Details
• tensorflow
• 300 dim. word2vec embeddings
• hidden layers: 1000, 600, tanh
• MSQE loss, Adagrad, minibatch
• 831M words Wikipedia dump
• 50 freq cutoff for holisitic vectors
• 75,106 adjective nouns
• 54,825 verb objects
• 39,571 subject verbs
30
use telescope
use telescope
person that
person that use telescope
Development Results
31
Model map
Addition 53.8Neural Net (all holistic) 58.2
Definition Ranking Example
32
TERM: navy 10
navy: organization that blockade port navy: organization that fleet destroy navy: organization that battleship fight navy: organization that defeat fleet navy: organization that establish blockade navy: organization that use submarine navy: organization that maintain blockade navy: organization that sailor join garrison: organization that army install army: organization that seize garrison pub: building that attract sailor traveler: person that take ship navy: organization that vessel serve
Definition Ranking Example
33
TERM: expert 10
specification: document that engineer developexpert: person that author monographlearner: person that develop skillexpert: person that have knowledgeintellectual: person that hold seminarintellectual: person that publish studyexpert: person that panel includespecification: document that set standardexpert: person that lawyer consultgolfer: player that bunker challengecharacteristic: quality that determine identitycharity: organization that celebrity representlikelihood: quality that risk assesslearner: person that subject engageexpert: person that give tutorial
Test Results
34
Model map
Addition 53.2Neural Net (all holistic) 54.1Neural Net (holistic + dict) 52.8Ensemble 55.6
Roadmap
• Background on vector-based semantics
• some existing compositional approaches
• Case study I
• a new evaluation dataset based on relative clauses (RELPRON)
• results using feedforward neural networks
• Case study II
• inducing structure using tree-based LSTMs
• results on a dictionary term-matching task
35
Where do the Trees Come From?
36
The cat chased Tweety
A supervised constituency parser Automatically induced
Recurrent Neural Networks (RNNs)
37
beautiful pictures from Colah’s blog: http://colah.github.io/posts/2015-08-Understanding-LSTMs/
ht = tanh(Wht−1 + Uxt + b)
LSTMs and Long-Range Dependencies
38
ft = σ(Wf ht−1 + Uf xt + bf )
it = σ(Wi ht−1 + Ui xt + bi)
C̃t = tanh(WC ht−1 + UC xt + bC)
Ct = ft ⊙ Ct−1 + it ⊙ C̃t
ht = ot ⊙ tanh(Ct)
ot = σ(Wo ht−1 + Uo xt + bo)
Chart-Based Vector Composition
39
neuro linguistic programming rocks1
2
3
4
Chart-Based Vector Composition
40
‘neuro’h c
‘linguistic’h c
‘programming’h c
‘rocks’h c
‘linguisticprogramming rocks’
h ch c‘neuro linguisticprogramming’
‘neurolinguistic’
h c‘linguistic
programming’
h c h c‘programming
rocks’
‘neuro linguistic programming rocks’h c
−→nlp = λ f(
−→nl,−→p ) + (1− λ) f(−→n ,
−→lp)
ei = cos(−→u ,−→h i)
λi = softmax(ei/t)
−→h =
�i λi
−→h i
Tree-LSTM Composition Function
41
‘neuro’h c
‘linguistic’h c
‘programming’h c
‘rocks’h c
‘linguisticprogramming rocks’
h ch c‘neuro linguisticprogramming’
‘neurolinguistic’
h c‘linguistic
programming’
h c h c‘programming
rocks’
‘neuro linguistic programming rocks’h c
−→nlp = λ f(
−→nl,−→p ) + (1− λ) f(−→n ,
−→lp)
−→h = F (U
−→hL + V
−→hR +
−→b )
Roadmap
• Background on vector-based semantics
• some existing compositional approaches
• Case study I
• a new evaluation dataset based on relative clauses (RELPRON)
• results using feedforward neural networks
• Case study II
• inducing structure using tree-based LSTMs
• results on a dictionary term-matching task
42
Dictionary Data
43
projection: a forecast or prognosis obtained by extrapolationprojection: a belief or assumption that others have similar thoughts and experiences as oneselfprojection: the image that a translucent object casts onto another objectprojection: an image of an object on a surface of fewer dimensionsdna: a nucleic acid that carries the genetic information in the celldna: the sequence of nucleotides determines individual hereditary characteristicsdna: defense nuclear agencydna: did not answerheathland: a tract of level wasteland uncultivated land with sandy soil and scrubby vegetationmatroid: a structure that captures the essence of a notion of independence that generalizes
linear independence in vector spacespolioptila: an isolated genus of oscine passerine birds typical of the subfamily polioptilinæ polioptila: new world gnatcatchersdangdut: a genre of indonesian music that combines elements of arab and malay folk music
Learning to Understand Phrases by Embedding the Dictionary (Hill et al. 2016)
Term-Definition Matching Task
a forecast or prognosis obtained by extrapolation
44
projection
End-to-End Training
45
a forecast or prognosis obtained by extrapolation
projection
500D CBOW vectors
max cosine(−−→term,
−−−−−−→definition)
• DyNet neural network library
• Stochastic Gradient Descent
• Batch size = 16
• 852k word-definition pairs in total
• 1.1M parameters
• 5 days training on a GPU
Evaluation
46
control consisting of a mechanical device for controlling fluid flow
1. plug2. clutch3. valve4. gear5. sluice6. brake 7. piston . . . . 66k. amoeba
RANK = 3, TOP-1 = 0, TOP-10 = 1, TOP-100 = 1
Results
47
Parsing with the Model
48
‘neuro’h c
‘linguistic’h c
‘programming’h c
‘rocks’h c
‘linguisticprogramming rocks’
h ch c‘neuro linguisticprogramming’
‘neurolinguistic’
h c‘linguistic
programming’
h c h c‘programming
rocks’
‘neuro linguistic programming rocks’h c
Choose highest scoring node for eachcell in the chart top-down
Neuro linguistic programming rocks
Example Trees
49
Conclusion
• Modelled function word separately from content words (RELPRON)
• Difficult to beat the addition baseline (RELPRON), but structure does help
• Structure helps (a bit) over LSTM sequence model (DICTIONARY)
• Do we need the full binary trees?
• hunch for the future: LSTM for sequences of content words; tree-LSTM defined over the function words
50
Associated Papers
51
• RELPRON: A Relative Clause Evaluation Data Set for Compositional Distributional Semantics. Laura Rimell, Jean Maillard, Tamara Polajnar, Stephen Clark. Computational Linguistics 42(4), pp.661-701, 2016
• Jointly Learning Sentence Embeddings and Syntax with Unsupervised Tree-LSTMs. Jean Maillard, Stephen Clark, Dani Yogatama. arXiv:1705.09189