Upload
others
View
13
Download
0
Embed Size (px)
Citation preview
Modeling and Learning Semantic Co-Compositionality through Prototype Projections and Neural Networks
Masashi Tsubaki, Kevin Duh, Masashi Shimbo, Yuji Matsumoto
Nara Institute of Science and Technology (NAIST), Japan
Unsupervised word vector re-training algorithm
considering compositionality
New model of compositionality
in word vector space
Masashi Tsubaki 1
Contributions
Two contributions in our work
Unsupervised word vector re-training algorithm
considering compositionality
New model of compositionality
in word vector space
Masashi Tsubaki 1
Contributions
Two contributions in our work
Masashi Tsubaki 2
Introduction
From word to phrase representation with matrix-vector operation [Mitchell and Lapata 08], [Baroni and Zamparell 10], [Socher+ 12], [Van de Cruys+ 13]
run
company
run company = operate
marathon
run marathon = race
Modeling of compositionality in word vector space
Masashi Tsubaki 2
run
company
run company = operate
marathon
run marathon = race
From word to phrase representation with matrix-vector operation [Mitchell and Lapata 08], [Baroni and Zamparell 10], [Socher+ 12], [Van de Cruys+ 13]
Introduction
Modeling of compositionality in word vector space
Masashi Tsubaki 2
run
company
run company = operate
marathon
run marathon = race
From word to phrase representation with matrix-vector operation [Mitchell and Lapata 08], [Baroni and Zamparell 10], [Socher+ 12], [Van de Cruys+ 13]
Introduction
Modeling of compositionality in word vector space
Masashi Tsubaki 2
run
company
run company = operate
marathon
run marathon = race
New model inspired by Co-Compositionality
From word to phrase representation with matrix-vector operation [Mitchell and Lapata 08], [Baroni and Zamparell 10], [Socher+ 12], [Van de Cruys+ 13]
Introduction
Modeling of compositionality in word vector space
Masashi Tsubaki 3
Verb and object are allowed to modify each other’s meanings and generate the overall semantics
Co-compositionality
Our Model
Main Idea : Co-Compositionality [Pustejovsky 1995]
Masashi Tsubaki 3
f( run , company ) = operate
f( run , marathon ) = race
Verb and object are allowed to modify each other’s meanings and generate the overall semantics
Co-compositionality
Our Model
Main Idea : Co-Compositionality [Pustejovsky 1995]
Masashi Tsubaki 3
f( runcompany , company ) = operate
f( runmarathon , marathon ) = race
Verb and object are allowed to modify each other’s meanings and generate the overall semantics
Co-compositionality
Our Model
Main Idea : Co-Compositionality [Pustejovsky 1995]
Masashi Tsubaki 3
f( runcompany , companyrun ) = operate
f( runmarathon , marathonrun ) = race
Verb and object are allowed to modify each other’s meanings and generate the overall semantics
Co-compositionality
Our Model
Main Idea : Co-Compositionality [Pustejovsky 1995]
Masashi Tsubaki 3
f( runcompany , companyrun ) = operate
f( runmarathon , marathonrun ) = race
Verb and object are allowed to modify each other’s meanings and generate the overall semantics
Co-compositionality
Our Model
Main Idea : Co-Compositionality [Pustejovsky 1995]
Masashi Tsubaki
Verb and object are allowed to modify each other’s meanings and generate the overall semantics
3
How do we implement co-compositionality in vector space ?
Question
f( runcompany , companyrun ) = operate
f( runmarathon , marathonrun ) = race
Co-compositionality
Our Model
Main Idea : Co-Compositionality [Pustejovsky 1995]
Masashi Tsubaki 4
Our Model
Matrix-vector operation as an implementation for Co-Compositionality
Prototype Projection
Prototype Projection for Co-Compositionality
Masashi Tsubaki
run company
5
Our Model
Co-Compositionality with Prototype Projections
Masashi Tsubaki
start
build
buy
run
VerbOf
・・・
company
5
Prototype verbs of “company”
Our Model
Co-Compositionality with Prototype Projections
Masashi Tsubaki
× start
build
buy
run
VerbOf
・・・
≈ ・・・
company
5
Prototype verbs of “company”
Our Model
Co-Compositionality with Prototype Projections
Masashi Tsubaki
×
V
start
build
buy
run
VerbOf
・・・
≈ ・・・
company
5
Latent subspace formed by prototype verb vectors of company
Prototype verbs of “company”
Our Model
Co-Compositionality with Prototype Projections
Masashi Tsubaki
×
V
start
build
buy
run
VerbOf
runcompany
・・・
≈ ・・・
company Pcompany=VTV
5
(Orthogonal projection matrix to V)
Prototype verbs of “company”
Latent subspace formed by prototype verb vectors of company
Our Model
Co-Compositionality with Prototype Projections
Masashi Tsubaki
×
V
start
build
buy
run
VerbOf
runcompany
・・・
≈ ・・・
company Pcompany=VTV
Prototype Projection
5
(Orthogonal projection matrix to V)
Prototype verbs of “company”
Latent subspace formed by prototype verb vectors of company
Our Model
Co-Compositionality with Prototype Projections
Masashi Tsubaki
×
V
start
build
buy
run
VerbOf
runcompany
ObjectOf
・・・
≈ ・・・
firm
bank
hotel
・・・
company Pcompany=VTV
Prototype Projection
5
(Orthogonal projection matrix to V)
Prototype verbs of “company”
Prototype objects of “run”
Our Model
Co-Compositionality with Prototype Projections
Masashi Tsubaki
×
V
start
build
buy
run
VerbOf
runcompany
ObjectOf
・・・ O
≈ ・・・
× firm
bank
hotel
・・・
≈ ・・・
company Pcompany=VTV
Prototype Projection
5
(Orthogonal projection matrix to V)
Prototype verbs of “company”
Prototype objects of “run”
Our Model
Co-Compositionality with Prototype Projections
Masashi Tsubaki
×
V
start
build
buy
run
VerbOf
runcompany
ObjectOf
・・・ O
Prun=OTO
≈ ・・・
× firm
bank
hotel
・・・
≈ ・・・
company
companyrun
Pcompany=VTV Prototype Projection
5
(Orthogonal projection matrix to V)
Prototype verbs of “company”
Prototype objects of “run”
Our Model
Co-Compositionality with Prototype Projections
Masashi Tsubaki
×
V
start
build
buy
operate =
run
VerbOf
runcompany
ObjectOf
・・・ O
Prun=OTO
≈ ・・・
× firm
bank
hotel
・・・
≈ ・・・
company
companyrun
+
Pcompany=VTV Prototype Projection
5
Prototype verbs of “company”
Prototype objects of “run”
(Orthogonal projection matrix to V)
Our Model
Co-Compositionality with Prototype Projections
Masashi Tsubaki 6
start finish
buy
build
race
operate
run
Tease out the proper semantics from aggregate representation by projection to latent space
Assume various senses of “run” are aggregated in one vector
Prototype verbs of “marathon”
Prototype verbs of “company”
Our Model
Intuitive image of prototype projection
Masashi Tsubaki 7
Experiment
Evaluation dataset [Grefenstette and Sadrzadeh 11]
200 subject-verb-object triples judged by 25 participants
Subj-Verb-Obj Landmark verb Similarity of human judgment
People-run-company operate 7
People-run-company move 2
Evaluation : Verb disambiguation in subject-verb-object triples
Masashi Tsubaki 7
Evaluation dataset [Grefenstette and Sadrzadeh 11]
200 subject-verb-object triples judged by 25 participants
Subj-Verb-Obj Landmark verb Similarity of human judgment
People-run-company operate 7
People-run-company move 2
Final co-compositional vector for subject-verb-object
subj + cocompositioned(verb,obj)
Experiment
Evaluation : Verb disambiguation in subject-verb-object triples
Masashi Tsubaki
Models are evaluated by Spearman’s rank correlation between vectors’ computed similarity and human judgment
7
Evaluation dataset [Grefenstette and Sadrzadeh 11]
200 subject-verb-object triples judged by 25 participants
Subj-Verb-Obj Landmark verb Similarity of human judgment
People-run-company operate 7
People-run-company move 2
Final co-compositional vector for subject-verb-object
subj + cocompositioned(verb,obj)
Experiment
Evaluation : Verb disambiguation in subject-verb-object triples
Experiment
Masashi Tsubaki 8
×
V
start
build
buy
run
VerbOf ObjectOf
・・・ O
≈ ・・・
× firm
bank
hotel
・・・
≈ ・・・
company
Extracted 20 prototype words from ukWaC corpus
Implementation details
Masashi Tsubaki 8
high frequency
×
V
start
build
buy
run
VerbOf ObjectOf
・・・ O
≈ ・・・
× firm
bank
hotel
・・・
≈ ・・・
company
Extracted 20 prototype words from ukWaC corpus
Experiment
Implementation details
Masashi Tsubaki 8
high frequency both high frequency and high similarity
×
V
start
build
buy
run
VerbOf ObjectOf
・・・ O
≈ ・・・
× firm
bank
hotel
・・・
≈ ・・・
company
Extracted 20 prototype words from ukWaC corpus
Experiment
Implementation details
Masashi Tsubaki 8
high frequency both high frequency and high similarity 80% of the top singular values
×
V
start
build
buy
run
VerbOf ObjectOf
・・・ O
≈ ・・・
× firm
bank
hotel
・・・
≈ ・・・
company
Extracted 20 prototype words from ukWaC corpus
Experiment
Implementation details
Masashi Tsubaki 8
high frequency both high frequency and high similarity 80% of the top singular values
Word representation [Blacoe and Lapata 12] ①Distributional vector (2000 dim) ②Neural vector (50 dim)
×
V
start
build
buy
run
VerbOf ObjectOf
・・・ O
≈ ・・・
× firm
bank
hotel
・・・
≈ ・・・
company
Extracted 20 prototype words from ukWaC corpus
Experiment
Implementation details
Experiment
Masashi Tsubaki 9
Add [Mitchell and Lapata 08]
Multiply [Mitchell and Lapata 08]
Grefenstette and Sadrzadeh 11
Mathematical model based on abstract categorical framework
Van de Cruys+13 Multi-way interaction model
based on non-negative matrix factorization
sbj × verb×obj
sbj + verb+obj
Baselines : Models compared to ours
Achieves high performance (ρ = 0.41)
0.31 0.35
0.21
0.37 0.41
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
1 2 3 4 5
Cor
rela
tion ρ
Masashi Tsubaki 10
Add Multiply Grefenstette and Sadrzadeh 11
Van de Cruys+ 13
Our Model
Result and Discussion
Correlation with human judgment (Distributional vector)
Masashi Tsubaki 11
State of the art performance (ρ= 0.44)
0.31 0.3
0.21
0.37
0.44
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
1 2 3 4 5 Add Multiply Grefenstette and Sadrzadeh 11
Van de Cruys+ 13
Our Model
Result and Discussion
Cor
rela
tion ρ
Correlation with human judgment (Neural vector)
Masashi Tsubaki 11
State of the art performance (ρ= 0.44)
0.31 0.3
0.21
0.37
0.44
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
1 2 3 4 5 Add Multiply Grefenstette and Sadrzadeh 11
Van de Cruys+ 13
Our Model
Result and Discussion
Co-Compositionality is useful for word sense disambiguation Prototype projection is effective implementation for Co-Compositionality
Cor
rela
tion ρ
Correlation with human judgment (Neural vector)
Unsupervised word vector re-training algorithm
considering compositionality
New model of compositionality
in word vector space
Masashi Tsubaki 12
Introduction
Two contributions in our work
Unsupervised word vector re-training algorithm
considering compositionality
New model of compositionality
in word vector space
Masashi Tsubaki 13
Introduction
Two contributions in our work
Our Model
Masashi Tsubaki 14
Re-training word representation with decomposition of phrase vector
Compositional Neural Language Model
Masashi Tsubaki 14
v orun company
z = v + o
s=uTz
Re-training word representation with decomposition of phrase vector
u
①Compute the score s of correct phrase
Our Model
Compositional Neural Language Model
Masashi Tsubaki 14
vc eat company
zc = vc + o
Re-training word representation with decomposition of phrase vector
sc=uTz
u
o
①Compute the score s of correct phrase ②Compute the score sc of corrupted incorrect phrase
Our Model
Compositional Neural Language Model
Masashi Tsubaki 14
Re-training word representation with decomposition of phrase vector
vc vc eat company
zc = vc + o
sc=uTz
u
o
J =max 0,1− s+ sc( )Correct score > Incorrect score
①Compute the score s of correct phrase ②Compute the score sc of corrupted incorrect phrase
Our Model
Compositional Neural Language Model
Masashi Tsubaki 14
Re-training word representation with decomposition of phrase vector
vc eat company
zc = vc + o
sc=uTz
u
o
J =max 0,1− s+ sc( )Correct score > Incorrect score
①Compute the score s of correct phrase ②Compute the score sc of corrupted incorrect phrase ③Minimize cost function by SGD, u → unew, z → znew
Our Model
Compositional Neural Language Model
Masashi Tsubaki 14
znew
Re-training word representation with decomposition of phrase vector
s=uTz
unew
J =max 0,1− s+ sc( )Correct score > Incorrect score
①Compute the score s of correct phrase ②Compute the score sc of corrupted incorrect phrase ③Minimize cost function by SGD, u → unew, z → znew
Our Model
Compositional Neural Language Model
Masashi Tsubaki 14
vnew run company
znew
Re-training word representation with decomposition of phrase vector
s=uTz
unew
o
J =max 0,1− s+ sc( )Correct score > Incorrect score
①Compute the score s of correct phrase ②Compute the score sc of corrupted incorrect phrase ③Minimize cost function by SGD, u → unew, z → znew ④New verb vector is vnew = znew - o
Our Model
Compositional Neural Language Model
Masashi Tsubaki 14
vnew run company
J =max 0,1− s+ sc( )Correct score > Incorrect score
znew
Re-training word representation with decomposition of phrase vector
New word representations considering compositionality
s=uTz
unew
o
①Compute the score s of correct phrase ②Compute the score sc of corrupted incorrect phrase ③Minimize cost function by SGD, u → unew, z → znew ④New verb vector is vnew = znew - o
Our Model
Compositional Neural Language Model
Masashi Tsubaki 15
x y
z = x + y
s=uTz
u
Our Model
Compositional Neural Language Model
Masashi Tsubaki 15
x y
z = x + y
s=uTz
u Co-Compositionality with Prototype Projection +
Our Model
Compositional Neural Language Model
Masashi Tsubaki 15
x y
z = x + y
s=uTz
u Co-Compositionality with Prototype Projection +
v orun company
Pverb Pobj
Our Model
Compositional Neural Language Model
Our Model
Masashi Tsubaki 15
Compositional Neural Language Model with Prototype Projection
x y
z = x + y
s=uTz
u Co-Compositionality with Prototype Projection +
v orun company
Pverb Pobj
Co-Compositional Neural Language Model
Masashi Tsubaki 15
Compositional Neural Language Model with Prototype Projection
x y
z = x + y
s=uTz
u
v orun company
Pverb Pobj
①Prototype projection for both verb and object ②Optimize parameters with same method as Compositional NLM ③Minimize
minv
xnew −Pobjv2+λ v 2( )
Our Model
Co-Compositional Neural Language Model
Masashi Tsubaki 15
Compositional Neural Language Model with Prototype Projection
xnew y
s=uTz
znew
unew
v orun company
Pverb Pobj
①Prototype projection for both verb and object ②Optimize parameters with same method as Compositional NLM ③Minimize
minv
xnew −Pobjv2+λ v 2( )
Our Model
Co-Compositional Neural Language Model
Masashi Tsubaki 15
Compositional Neural Language Model with Prototype Projection
xnew y
s=uTz
znew
unew
vnew orun company
Pverb Pobj
①Prototype projection for both verb and object ②Optimize parameters with same method as Compositional NLM ③Minimize
minv
xnew −Pobjv2+λ v 2( )
Our Model
Co-Compositional Neural Language Model
Masashi Tsubaki 15
Compositional Neural Language Model with Prototype Projection
xnew y
s=uTz
vnew orun company
Pverb Pobj
znew
unew
New word representations considering co-compositionality
①Prototype projection for both verb and object ②Optimize parameters with same method as Compositional NLM ③Minimize
minv
xnew −Pobjv2+λ v 2( )
Our Model
Co-Compositional Neural Language Model
Experiment
Masashi Tsubaki 16
Original neural vector [Blacoe and Lapata 12]
Re-trained neural vector with our learning models
vs.
Evaluation : Verb disambiguation [Grefenstette and Sadrzadeh 11]
Masashi Tsubaki
Original neural vector [Blacoe and Lapata 12]
Re-trained neural vector with our learning models
Training data Extracted 5000 Verb-Obj pairs from ukWaC corpus
Hyper-parameters Learning rate:0.01, Regularization:10^4 20 iterations (One iteration is one run through the training data)
vs.
16
Experiment
Evaluation : Verb disambiguation [Grefenstette and Sadrzadeh 11]
Result and Discussion
Masashi Tsubaki 17
Higher performance with re-trained word representation
Cor
rela
tion ρ
0.31
0.44
0.38
0.47
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
1 2 Co-Compositional NLM Compositional NLM
Original
Re-trained
New state of the art performance (ρ= 0.47)
Correlation with human judgment (Re-trained neural vector)
Unsupervised word vector re-training algorithm
considering compositionality
New model of compositionality
in word vector space
Masashi Tsubaki 18
Summary
Co-Compositionality with Prototype Projection
Compositional & Co-Compositional Neural Language Models Achieve state of the art on verb disambiguation task
Conclusion
Masashi Tsubaki 19
Examples Appendix
Masashi Tsubaki 20
Results of the different compositionality models Appendix
Masashi Tsubaki 21
The number of prototype words Appendix
Masashi Tsubaki 22
Variations in model configuration Appendix
Masashi Tsubaki 23
Composition operator and parameter Appendix