Using Percolated Dependencies in PBSMT Ankit K. Srivastava and Andy Way Dublin City University CLUKI XII: April 24, 2009

Using Percolated Dependencies in PBSMTUsing Percolated Dependencies in PBSMT

Ankit K. Srivastava and Andy Way

Dublin City University

CLUKI XII: April 24, 2009

About

PARSING PBSMT SYSTEM

Using Percolated Dependencies in Phrase Based Statistical Machine Translation

ENDNOTE ANALYSIS NUMBERS

Syntactic Parsing and Head Percolation




Parsing I: Constituency Structure

Vinken will join the board as a nonexecutive director Nov 29

(ROOT (S (NP (NNP Vinken)) (VP (MD will) (VP (VB join) (NP (DT the) (NN board)) (PP (IN as) (NP (DT a) (JJ nonexecutive) (NN director))) (NP (NNP Nov) (CD 29))))))

Vinken will join the board as a nonexecutive director Nov 29

HEAD DEPENDENTjoin Vinkenjoin willboard thejoin boardjoin asdirector adirector nonexecutiveas director29 Novjoin 29

Parsing II: Dependency Structure

Parsing III: Head Percolation It is straightforward to convert constituency tree to an unlabeled

dependency tree (Gaifman 1965) Use head percolation tables to identify head child in a constituency

representation (Magerman 1995) Dependency tree is obtained by recursively applying head child and

non-head child heuristics (Xia & Palmer 2001)

(NP (DT the) (NN board))

NP right NN/NNP/CD/JJ

(NP-board (DT the) (NN board))

the is dependent on board

Parsing IV: Three Parses

Constituency (phrase-structure) parses : CONrequires CON parser

Dependency (head-dependent) parses : DEPrequires DEP parser

Percolated (head-dependent) parses : PERCrequires CON parser + heuristics

Phrase-Based Statistical Machine Translation




PBSMT I: Framework

argmaxe p(e|f) = argmaxe p(f|e) p(e)

Decoder, Translation Model, Language Model PBSMT framework in Moses (Koehn et al., 2007)

Phrase Table in Translation Model := Align words + extract phrases + score phrases

Different methods to extract phrases Moses phrase extraction as baseline system…

PBSMT II: Non-syntactic Phrase Extraction

… baseline Moses Get word alignments (src2tgt, tgt2src) Perform grow-diag-final heuristics (Koehn et al., 2003) Extract phrase pairs consistent with the word alignments

String-based (non-syntactic) phrases: STR

PBSMT III: Syntactic Phrase Extraction

Get word alignments (src2tgt, tgt2src) Parse src sentences Parse tgt sentences Use Tree Aligner to align subtree nodes (Zhechev 2009) Extract surface-level chunks from parallel treebanks Previously, Tinsley et al., 2007 & Hearne et al., 2008 Syntactic phrases:

CON DEP PERC

System Design




System I: Tools and Resources English-French parallel corpora Phrase Structure Parsers (En, Fr) Dependency Structure Parsers (En, Fr) Head Percolation tables (En, Fr) Statistical Tree Aligner Giza++ Word Aligner SRILM (Language Modeling) Toolkit Moses Decoder

CORPORA TRAIN DEV TEST

JOC 7,723 400 599

EUROPARL 100,000 1,889 2,000

System II: # Entries in Phrase tables: Europarl

Phrase Types Common to both

Unique in 1st type

Unique in 2nd type

DEP & PERC 369K 213K 195K

CON & PERC 492K 171K 72K

STR & PERC 127K 2,018K 437K

CON & DEP 391K 271K 191K

STR & DEP 128K 2,016K 454K

STR & CON 144K 2,000K 518K

STR CON DEP PERC

2,145K 663K 585K 565K

PERC is a unique knowledge source…

… but is it useful?

System III: Combinations Concatenate phrase tables and re-estimate probabilities 15 different systems: ∑4Cr , 1≤r≤4

STR CON DEP PERC

UNI BI TRI QUAD

S SC, SD, SP SCD, SCP, SDP SCDP

C CD, CP CDP -

D DP - -

P - - -

MT Systems and Evaluation




Numbers I: Evaluation - JOC

Numbers II: Evaluation - Europarl

Numbers III: Uniquely best Evaluate MT systems STR, CON, DEP, PERC on a per sentence

level. (Translation Error Rate)

JOC (440 sentences):

Europarl (2000 sentences):

STR CON DEP PERC

183 73 83 101

STR CON DEP PERC

248 1120 301 331

Numbers IV: Adding +PERC: Europarl

Analysis of Results




Analysis I: STR

Using Moses baseline phrases (STR) is essential for coverage. SIZE matters!

However, adding any system to STR increases baseline score. Symbiotic!

Hence, do not replace STR, but augment it.

Analysis II: CON

Seems to be the best combination with STR (S+C seems to be the best performing system)

Has most common chunks with PERC

Does PERC harm a CON system – needs more analysis

Analysis III: DEP

PERC is different from DEP chunks, despite being formally equivalent

PERC can substitute DEP

Analysis IV: PERC

Is a unique knowledge source.

Sometimes, it helps.

Needs more work on finding connection with CON / DEP

Conclusion & Future Work




Conclusion & Future Work

Extended Hearne et al., 2008 by- scaling up data size from 7.7K to 100K- introducing percolated dependencies in PBSMT

Manual evaluation More analysis of results More combining strategies Seek to determine if each chunk type “owns” sentence

types

Thanks

<asrivastava @ computing.dcu.ie>

Documents

Using Percolated Dependencies in PBSMT Ankit K. Srivastava and Andy Way Dublin City University CLUKI XII: April 24, 2009