View
864
Download
1
Embed Size (px)
Citation preview
15/09/10 Copyright (C) 2015 by Yusuke Oda, AHC-Lab, IS, NAIST 1
A Context-Aware Topic Modelfor Statistical Machine Translation
Jinsong Su, Deyi Xiong, Yang Liu, Xianpei Han, Hongyu Lin,Junfeng Yao, Min Zhang
ACL 2015
Introduced by Yusuke Oda@odashi_t
2015/9/10 NAIST MT-Study Group
15/09/10 Copyright (C) 2015 by Yusuke Oda, AHC-Lab, IS, NAIST 2
Lexical Selection for SMT● Lexical selection is important for SMT
● Two categories in previous studies for lexical selection:
– Incorporating sentence-level (local) contexts
– Integrating document-level (global) topics
● Considering the correlation between local and global information
– Have never been explored
– But both are highly correlated
sentence-levelcontexts
document-leveltopics
15/09/10 Copyright (C) 2015 by Yusuke Oda, AHC-Lab, IS, NAIST 3
Proposed Model● Context-aware topic model (CATM)
– Jointly model both local and global contexts for lexical selection
– Based on topic modeling
– Performing Gibbs sampling to learn parameters of the model
● Terms
– Topical words: telated to topics of the document
● In this study, we use content words (= noun, verb, adjective, adverb)
– Contextual words: effect translation selections of topical words
● In this study, we use all words in the sentence
– Target-side topical items: are translation candidates of source topical words
15/09/10 Copyright (C) 2015 by Yusuke Oda, AHC-Lab, IS, NAIST 4
Assumption
● Assumption
– Topic consistency: all should be consistent with in the document
– Context compatibility: all should be compatible with neighboring
Topical words
Target-side topical items
Contextual words
Topic
15/09/10 Copyright (C) 2015 by Yusuke Oda, AHC-Lab, IS, NAIST 5
Graphical Representation of Proposed Model
Topic distributionof the document
Topic
Target-side topical item
Neighboringtarget-side topical item
Topic distribution over
Distribution of Distribution of
15/09/10 Copyright (C) 2015 by Yusuke Oda, AHC-Lab, IS, NAIST 6
Generation Steps
15/09/10 Copyright (C) 2015 by Yusuke Oda, AHC-Lab, IS, NAIST 7
Joint Probability ● Objective: fitting below joint probability distribution given training data :
● …OMG, too complex.
15/09/10 Copyright (C) 2015 by Yusuke Oda, AHC-Lab, IS, NAIST 8
Gibbs Sampling (1)● Directly fitting the joint probability is intractable to compute
● Use Gibbs sampling instead
● Given the training data ,the joint distribution of is propotion to:
15/09/10 Copyright (C) 2015 by Yusuke Oda, AHC-Lab, IS, NAIST 9
Gibbs Sampling (2)● Sampling ● Sampling
● Sampling
● indicates
the count of b in a range a
● (-i) indicates ignoring i-th content
15/09/10 Copyright (C) 2015 by Yusuke Oda, AHC-Lab, IS, NAIST 10
Experiments● Domain: Chinese to English
● Corpus:
– Training: FBIS / Hansards (1M sent., 54.6k doc.)
– Dev: NIST MT05
– Test: NIST MT06 / 08
● Alignment: GIZA++ / grow-diag-final-and
● Hyperparameters:
– number of topic = 25
– α = 50 / number of topics
– β = 0.1
– γ = 1.0 / number of topical words
– δ = 2000 / number of contextual words
15/09/10 Copyright (C) 2015 by Yusuke Oda, AHC-Lab, IS, NAIST 11
Result: Impact of Window Size
● Best performance under window size = 12
– Sufficient for predicting target-side translations for ambiguous source-side topical words
12 words 12 wordsAttention
15/09/10 Copyright (C) 2015 by Yusuke Oda, AHC-Lab, IS, NAIST 12
Result: Overall Performance
● Proposed method achieves the best performancewith statistical significance
BLEU4
15/09/10 Copyright (C) 2015 by Yusuke Oda, AHC-Lab, IS, NAIST 13
Result: Effect of Correlation Modeling● Comparing with separated models
– CATM (Content): substitutes uniform distribution for
● Omitting effects from topics
– CATM (Topic): window size = 0
● Omitting effects from contexts
– CATM (Log-linear): combining above two wusing log-linear mannar
● Proposed model achieves best performance
– Jointly learning both context and topic is effective for lexical selection.
15/09/10 Copyright (C) 2015 by Yusuke Oda, AHC-Lab, IS, NAIST 14
Topic Examples
15/09/10 Copyright (C) 2015 by Yusuke Oda, AHC-Lab, IS, NAIST 15
Summaries● Context-aware topic model (CATM)
– Jointly learning context and topic information
– Is the first work in author's knowledge
– Achieves highest translation performance thanusing only context or topic informationand naively combining using log-linear mannar
● Future work
– Considering modeling for phrase-level as well as word-level
– Improving model with monolingual corpora
15/09/10 Copyright (C) 2015 by Yusuke Oda, AHC-Lab, IS, NAIST 16
Impressions● Is it correct to use sequence-of-words window as the context?
– How about using some syntax information?
● This model uses the word alignment (GIZA++)for selecting translation candidates
– How about the effect of alignment accuracy?