View
31
Download
0
Category
Preview:
Citation preview
Topic Models for Dynamic Translation Model Adaptation
Vladimir Eidelman
Jordan Boyd-Graber
Philip Resnik
(Typical) Domain Adaptation
doc4
doc3 doc2
doc4 doc3
doc2 doc1
doc4 doc3
doc2 doc1 doc1
Training Corpus
(Typical) Domain Adaptation
doc4
doc3 doc2
doc4 doc3
doc2 doc1
doc4 doc3
doc2 doc1 doc1
Newswire
(Typical) Domain Adaptation
doc4
doc3 doc2
doc4 doc3
doc2 doc1
doc4 doc3
doc2 doc1 doc1
Newswire Web
(Typical) Domain Adaptation
doc4
doc3 doc2
doc4 doc3
doc2 doc1
doc4 doc3
doc2 doc1 doc1
Newswire Web Europarl
(Typical) Domain Adaptation
doc4
doc3 doc2
doc4 doc3
doc2 doc1
doc4 doc3
doc2 doc1 doc1
Newswire Web Europarl
dev
(Typical) Domain Adaptation
doc4
doc3 doc2
doc4 doc3
doc2 doc1
doc4 doc3
doc2 doc1 doc1
out out in
dev
(Typical) Domain Adaptation
doc4
doc3 doc2
doc4 doc3
doc2 doc1
doc4 doc3
doc2 doc1 doc1
out out in
dev w test
Motivation
doc4
doc3 doc2
doc4 doc3
doc2 doc1
doc4 doc3
doc2 doc1 doc1
Motivation
doc4
doc3 doc2
doc4 doc3
doc2 doc1
doc4 doc3
doc2 doc1 doc1
test dev w test
Aims
• Model Domain
– Induce soft unsupervised domains
• Latent Topics
• Apply to MT
– Bias translation model
• Introduce topic-dependent lexical weighting
Lexical Weighting
• Estimate phrase pair quality word-by-word
粉丝 很多 fěnsī hěnduō noodles a lot of
Lexical Weighting
• Estimate phrase pair quality word-by-word
粉丝 很多 fěnsī hěnduō noodles a lot of
Lexical Weighting
• Estimate phrase pair quality word-by-word
粉丝 很多 fěnsī hěnduō noodles a lot of fans a lot of
Topic Models
•Used MALLET (McCallum, 2002) •Latent Dirichlet Allocation (Blei, Ng, Jordan 2003) •Only on source •Topic distribution the same for every sentence in document
Standard Lexical Weighting
粉丝很多
粉丝很多
Standard Lexical Weighting
Source Target P(e|f)
粉丝很多 lots of noodles .45
粉丝很多 lots of fans .33
粉丝很多
粉丝很多
Translation Table
Standard Lexical Weighting
Translation Table
粉丝很多
粉丝很多
Source Target P(e|f)
粉丝很多 lots of noodles .45
粉丝很多 lots of fans .33
Domain Lexical Weighting (Chiang 2011)
粉丝很多
粉丝很多
Domain Lexical Weighting
Translation Table: nw
(Chiang 2011)
粉丝很多
粉丝很多
Source Target P(e|f)
粉丝很多 lots of noodles .41
粉丝很多 lots of fans .32
Domain Lexical Weighting
Translation Table: nw
Translation Table: Web
(Chiang 2011)
粉丝很多
粉丝很多
Source Target Ps=nw(e|f)
粉丝很多 lots of noodles .41
粉丝很多 lots of fans .32
Source Target Ps=wb(e|f)
粉丝很多 lots of noodles .30
粉丝很多 lots of fans .58
Lexical Weighting with Topic Models
粉丝很多
粉丝很多
Lexical Weighting with Topic Models
粉丝很多
粉丝很多
Translation Table: Topic 1
Source Target Ptopic=1(e|f)
粉丝很多 lots of noodles .71
粉丝很多 lots of fans .15
Lexical Weighting with Topic Models
Translation Table: Topic 2
粉丝很多
粉丝很多
Translation Table: Topic 1
Source Target Ptopic=1(e|f)
粉丝很多 lots of noodles .71
粉丝很多 lots of fans .15
Source Target Ptopic=2(e|f)
粉丝很多 lots of noodles .41
粉丝很多 lots of fans .47
Lexical Weighting with Topic Models
Translation Table: Topic 2
粉丝很多
粉丝很多
Source Target Ptopic=2(e|f)
粉丝很多 lots of noodles .41
粉丝很多 lots of fans .47
Translation Table: Topic 1
Source Target Ptopic=1(e|f)
粉丝很多 lots of noodles .71
粉丝很多 lots of fans .15
Translation Table: Topic 3
Source Target Ptopic=3(e|f)
粉丝很多 lots of noodles .21
粉丝很多 lots of fans .68
Lexical Weighting Adaptation Features
Source Target Ptopic(e|f)
粉丝很多 lots of noodles .71
粉丝很多 lots of fans .15
Source Target Ptopic(e|f)
粉丝很多 lots of noodles .41
粉丝很多 lots of fans .47
Source Target Ptopic(e|f)
粉丝很多 lots of noodles .21
粉丝很多 lots of fans .68
Translation Table: Topic 1
Translation Table: Topic 2
Translation Table: Topic 3
test sentence
Lexical Weighting Adaptation Features
ƒ1(e|f) = 0.71 * 0.65
Source Target Ptopic(e|f)
粉丝很多 lots of noodles .71
粉丝很多 lots of fans .15
Source Target Ptopic(e|f)
粉丝很多 lots of noodles .41
粉丝很多 lots of fans .47
Source Target Ptopic(e|f)
粉丝很多 lots of noodles .21
粉丝很多 lots of fans .68
Translation Table: Topic 1
Translation Table: Topic 2
Translation Table: Topic 3
Lexical Weighting Adaptation Features
ƒ1(e|f) = 0.15 * 0.65
Source Target Ptopic(e|f)
粉丝很多 lots of noodles .71
粉丝很多 lots of fans .15
Source Target Ptopic(e|f)
粉丝很多 lots of noodles .41
粉丝很多 lots of fans .47
Source Target Ptopic(e|f)
粉丝很多 lots of noodles .21
粉丝很多 lots of fans .68
Translation Table: Topic 1
Translation Table: Topic 2
Translation Table: Topic 3
Lexical Weighting Adaptation Features
Source Target Ptopic(e|f)
粉丝很多 lots of noodles .71 0.46
粉丝很多 lots of fans .15 0.09
Source Target Ptopic(e|f)
粉丝很多 lots of noodles .41 0.09
粉丝很多 lots of fans .47 0.10
Source Target Ptopic(e|f)
粉丝很多 lots of noodles .21 0.02
粉丝很多 lots of fans .68 0.08
Translation Table: Topic 1
Translation Table: Topic 2
Translation Table: Topic 3
Lexical Weighting Adaptation Features
粉丝很多 ||| lots of fans ||| ƒ1(e|f)=.46 ƒ2(e|f)=.09 ƒ3(e|f)=.02 ƒ1(f|e) ƒ2(f|e) ƒ3(f|e) …
Source Target Ptopic(e|f)
粉丝很多 lots of noodles .71 0.46
粉丝很多 lots of fans .15 0.09
Source Target Ptopic(e|f)
粉丝很多 lots of noodles .41 0.09
粉丝很多 lots of fans .47 0.10
Source Target Ptopic(e|f)
粉丝很多 lots of noodles .21 0.02
粉丝很多 lots of fans .68 0.08
Translation Table: Topic 1
Translation Table: Topic 2
Translation Table: Topic 3
Experiments
• Chinese-English
• Two settings – Small (FBIS)
• 300k sentence pairs
• Document boundaries
– Large (~NIST) • 1.6m sentence pairs
• No documents
• NIST MT06 tune, MT03 & 05 test
• MIRA optimizer
Unsupervised Domain Induction
• What is a document (for topic modeling)?
• Only some MT data have document boundaries
• Treat each sentence as document
Document v. Sentence Results
Document v. Sentence Results
Document v. Sentence Results
Document v. Sentence Results
Document v. Sentence Results
FBIS Document v. Sentence Results
Large Setting
Large Setting
Future Work
• Improve Topic Model
– Multilingual Topic Modeling
– More (mono,multi)-lingual data
– Hierarchical models
• Other languages
Conclusions
• Extend domain adaptation
– No reliance on collection/genre annotation
– Finer-grained topic distributions
• Bias transation toward topic
– Lexical weighting adaptation with soft membership
• Add Ptopic(e|f) and Ptopic(f|e) features to every rule
• Thank You!
• Question?
Feature Representation
• Topic Identity
– Probability under topic 1, topic 2?
– Cross-domain
• Topic Distribution
– Probability under most probable topic? Second most?
– Dynamic
Global vs. Local Topic Model
Large Corpus
Recommended