[Paper Introduction] A Context-Aware Topic Model for Statistical Machine Translation

15/09/10 Copyright (C) 2015 by Yusuke Oda, AHC-Lab, IS, NAIST 1

A Context-Aware Topic Modelfor Statistical Machine Translation

Jinsong Su, Deyi Xiong, Yang Liu, Xianpei Han, Hongyu Lin,Junfeng Yao, Min Zhang

ACL 2015

Introduced by Yusuke Oda@odashi_t

2015/9/10 NAIST MT-Study Group


Lexical Selection for SMT● Lexical selection is important for SMT

● Two categories in previous studies for lexical selection:

– Incorporating sentence-level (local) contexts

– Integrating document-level (global) topics

● Considering the correlation between local and global information

– Have never been explored

– But both are highly correlated

sentence-levelcontexts

document-leveltopics


Proposed Model● Context-aware topic model (CATM)

– Jointly model both local and global contexts for lexical selection

– Based on topic modeling

– Performing Gibbs sampling to learn parameters of the model

● Terms

– Topical words: telated to topics of the document

● In this study, we use content words (= noun, verb, adjective, adverb)

– Contextual words: effect translation selections of topical words

● In this study, we use all words in the sentence

– Target-side topical items: are translation candidates of source topical words


Assumption

● Assumption

– Topic consistency: all should be consistent with in the document

– Context compatibility: all should be compatible with neighboring

Topical words

Target-side topical items

Contextual words

Topic


Graphical Representation of Proposed Model

Topic distributionof the document

Topic

Target-side topical item

Neighboringtarget-side topical item

Topic distribution over

Distribution of Distribution of


Generation Steps


Joint Probability ● Objective: fitting below joint probability distribution given training data :

● …OMG, too complex.


Gibbs Sampling (1)● Directly fitting the joint probability is intractable to compute

● Use Gibbs sampling instead

● Given the training data ,the joint distribution of is propotion to:


Gibbs Sampling (2)● Sampling ● Sampling

● Sampling

● indicates

the count of b in a range a

● (-i) indicates ignoring i-th content


Experiments● Domain: Chinese to English

● Corpus:

– Training: FBIS / Hansards (1M sent., 54.6k doc.)

– Dev: NIST MT05

– Test: NIST MT06 / 08

● Alignment: GIZA++ / grow-diag-final-and

● Hyperparameters:

– number of topic = 25

– α = 50 / number of topics

– β = 0.1

– γ = 1.0 / number of topical words

– δ = 2000 / number of contextual words


Result: Impact of Window Size

● Best performance under window size = 12

– Sufficient for predicting target-side translations for ambiguous source-side topical words

12 words 12 wordsAttention


Result: Overall Performance

● Proposed method achieves the best performancewith statistical significance

BLEU4


Result: Effect of Correlation Modeling● Comparing with separated models

– CATM (Content): substitutes uniform distribution for

● Omitting effects from topics

– CATM (Topic): window size = 0

● Omitting effects from contexts

– CATM (Log-linear): combining above two wusing log-linear mannar

● Proposed model achieves best performance

– Jointly learning both context and topic is effective for lexical selection.


Topic Examples


Summaries● Context-aware topic model (CATM)

– Jointly learning context and topic information

– Is the first work in author's knowledge

– Achieves highest translation performance thanusing only context or topic informationand naively combining using log-linear mannar

● Future work

– Considering modeling for phrase-level as well as word-level

– Improving model with monolingual corpora


Impressions● Is it correct to use sequence-of-words window as the context?

– How about using some syntax information?

● This model uses the word alignment (GIZA++)for selecting translation candidates

– How about the effect of alignment accuracy?

Engineering

[Paper Introduction] A Context-Aware Topic Model for Statistical Machine Translation