61
Statistical Machine Translation Part III – Phrase-based SMT / Decoding Alex Fraser Institute for Natural Language Processing University of Stuttgart 2008.07.23 EMA Summer School

Statistical Machine Translation Part III – Phrase-based SMT / Decoding Alex Fraser Institute for Natural Language Processing University of Stuttgart 2008.07.23

Embed Size (px)

Citation preview

Page 1: Statistical Machine Translation Part III – Phrase-based SMT / Decoding Alex Fraser Institute for Natural Language Processing University of Stuttgart 2008.07.23

Statistical Machine TranslationPart III – Phrase-based SMT / Decoding

Alex FraserInstitute for Natural Language Processing

University of Stuttgart

2008.07.23 EMA Summer School

Page 2: Statistical Machine Translation Part III – Phrase-based SMT / Decoding Alex Fraser Institute for Natural Language Processing University of Stuttgart 2008.07.23

Outline

• Phrase-based translation • Log-linear model• Tuning log-linear model• Decoding

Page 3: Statistical Machine Translation Part III – Phrase-based SMT / Decoding Alex Fraser Institute for Natural Language Processing University of Stuttgart 2008.07.23

Slide from Koehn 2008

Page 4: Statistical Machine Translation Part III – Phrase-based SMT / Decoding Alex Fraser Institute for Natural Language Processing University of Stuttgart 2008.07.23

Slide from Koehn 2008

Page 5: Statistical Machine Translation Part III – Phrase-based SMT / Decoding Alex Fraser Institute for Natural Language Processing University of Stuttgart 2008.07.23

Language Model

• Usually a trigram language model is used for p(e)• P(the man went home) = p(the | START) p(man | START

the) p(went | the man) p(home | man went)• Language models work well for comparing the

grammaticality of strings of the same length– However, when comparing short strings with long strings

they favor short strings– For this reason, a very important component of the

language model is the length bonus• This is a constant > 1 multiplied for each English word in the

hypothesis

Page 6: Statistical Machine Translation Part III – Phrase-based SMT / Decoding Alex Fraser Institute for Natural Language Processing University of Stuttgart 2008.07.23

Modified from Koehn 2008

d

Page 7: Statistical Machine Translation Part III – Phrase-based SMT / Decoding Alex Fraser Institute for Natural Language Processing University of Stuttgart 2008.07.23

Slide from Koehn 2008

Page 8: Statistical Machine Translation Part III – Phrase-based SMT / Decoding Alex Fraser Institute for Natural Language Processing University of Stuttgart 2008.07.23

Slide from Koehn 2008

Page 9: Statistical Machine Translation Part III – Phrase-based SMT / Decoding Alex Fraser Institute for Natural Language Processing University of Stuttgart 2008.07.23

Slide from Koehn 2008

Page 10: Statistical Machine Translation Part III – Phrase-based SMT / Decoding Alex Fraser Institute for Natural Language Processing University of Stuttgart 2008.07.23

Slide from Koehn 2008

Page 11: Statistical Machine Translation Part III – Phrase-based SMT / Decoding Alex Fraser Institute for Natural Language Processing University of Stuttgart 2008.07.23

Slide from Koehn 2008

Page 12: Statistical Machine Translation Part III – Phrase-based SMT / Decoding Alex Fraser Institute for Natural Language Processing University of Stuttgart 2008.07.23

Slide from Koehn 2008

Page 13: Statistical Machine Translation Part III – Phrase-based SMT / Decoding Alex Fraser Institute for Natural Language Processing University of Stuttgart 2008.07.23

Slide from Koehn 2008

Page 14: Statistical Machine Translation Part III – Phrase-based SMT / Decoding Alex Fraser Institute for Natural Language Processing University of Stuttgart 2008.07.23

Slide from Koehn 2008

Page 15: Statistical Machine Translation Part III – Phrase-based SMT / Decoding Alex Fraser Institute for Natural Language Processing University of Stuttgart 2008.07.23

Slide from Koehn 2008

Page 16: Statistical Machine Translation Part III – Phrase-based SMT / Decoding Alex Fraser Institute for Natural Language Processing University of Stuttgart 2008.07.23

Slide from Koehn 2008

Page 17: Statistical Machine Translation Part III – Phrase-based SMT / Decoding Alex Fraser Institute for Natural Language Processing University of Stuttgart 2008.07.23

Slide from Koehn 2008

Page 18: Statistical Machine Translation Part III – Phrase-based SMT / Decoding Alex Fraser Institute for Natural Language Processing University of Stuttgart 2008.07.23

Outline

• Phrase-based translation • Log-linear model• Tuning log-linear model• Decoding

Page 19: Statistical Machine Translation Part III – Phrase-based SMT / Decoding Alex Fraser Institute for Natural Language Processing University of Stuttgart 2008.07.23

Slide from Koehn 2008

Page 20: Statistical Machine Translation Part III – Phrase-based SMT / Decoding Alex Fraser Institute for Natural Language Processing University of Stuttgart 2008.07.23

Slide from Koehn 2008

Page 21: Statistical Machine Translation Part III – Phrase-based SMT / Decoding Alex Fraser Institute for Natural Language Processing University of Stuttgart 2008.07.23

Slide from Koehn 2008

Page 22: Statistical Machine Translation Part III – Phrase-based SMT / Decoding Alex Fraser Institute for Natural Language Processing University of Stuttgart 2008.07.23

Slide from Koehn 2008

Page 23: Statistical Machine Translation Part III – Phrase-based SMT / Decoding Alex Fraser Institute for Natural Language Processing University of Stuttgart 2008.07.23

Slide from Koehn 2008

Page 24: Statistical Machine Translation Part III – Phrase-based SMT / Decoding Alex Fraser Institute for Natural Language Processing University of Stuttgart 2008.07.23

Slide from Koehn 2008

Page 25: Statistical Machine Translation Part III – Phrase-based SMT / Decoding Alex Fraser Institute for Natural Language Processing University of Stuttgart 2008.07.23

Slide from Koehn 2008

Page 26: Statistical Machine Translation Part III – Phrase-based SMT / Decoding Alex Fraser Institute for Natural Language Processing University of Stuttgart 2008.07.23

Slide from Koehn 2008

Page 27: Statistical Machine Translation Part III – Phrase-based SMT / Decoding Alex Fraser Institute for Natural Language Processing University of Stuttgart 2008.07.23

Outline

• Phrase-based translation model• Log-linear model• Tuning log-linear model automatically• Decoding

Page 28: Statistical Machine Translation Part III – Phrase-based SMT / Decoding Alex Fraser Institute for Natural Language Processing University of Stuttgart 2008.07.23

Outline

• Phrase-based translation model• Log-linear model• Tuning log-linear model automatically• Decoding– Basic phrase-based decoding– Dealing with complexity

• Recombination• Pruning• Future cost estimation

– Decoding output

Page 29: Statistical Machine Translation Part III – Phrase-based SMT / Decoding Alex Fraser Institute for Natural Language Processing University of Stuttgart 2008.07.23

Slide from Koehn 2008

Page 30: Statistical Machine Translation Part III – Phrase-based SMT / Decoding Alex Fraser Institute for Natural Language Processing University of Stuttgart 2008.07.23

Slide from Koehn 2008

Page 31: Statistical Machine Translation Part III – Phrase-based SMT / Decoding Alex Fraser Institute for Natural Language Processing University of Stuttgart 2008.07.23

Slide from Koehn 2008

Page 32: Statistical Machine Translation Part III – Phrase-based SMT / Decoding Alex Fraser Institute for Natural Language Processing University of Stuttgart 2008.07.23

Slide from Koehn 2008

Page 33: Statistical Machine Translation Part III – Phrase-based SMT / Decoding Alex Fraser Institute for Natural Language Processing University of Stuttgart 2008.07.23

Slide from Koehn 2008

Page 34: Statistical Machine Translation Part III – Phrase-based SMT / Decoding Alex Fraser Institute for Natural Language Processing University of Stuttgart 2008.07.23

Slide from Koehn 2008

Page 35: Statistical Machine Translation Part III – Phrase-based SMT / Decoding Alex Fraser Institute for Natural Language Processing University of Stuttgart 2008.07.23

Slide from Koehn 2008

Page 36: Statistical Machine Translation Part III – Phrase-based SMT / Decoding Alex Fraser Institute for Natural Language Processing University of Stuttgart 2008.07.23

Slide from Koehn 2008

Page 37: Statistical Machine Translation Part III – Phrase-based SMT / Decoding Alex Fraser Institute for Natural Language Processing University of Stuttgart 2008.07.23

Slide from Koehn 2008

Page 38: Statistical Machine Translation Part III – Phrase-based SMT / Decoding Alex Fraser Institute for Natural Language Processing University of Stuttgart 2008.07.23

Slide from Koehn 2008

Page 39: Statistical Machine Translation Part III – Phrase-based SMT / Decoding Alex Fraser Institute for Natural Language Processing University of Stuttgart 2008.07.23

Slide from Koehn 2008

Page 40: Statistical Machine Translation Part III – Phrase-based SMT / Decoding Alex Fraser Institute for Natural Language Processing University of Stuttgart 2008.07.23

Slide from Koehn 2008

Page 41: Statistical Machine Translation Part III – Phrase-based SMT / Decoding Alex Fraser Institute for Natural Language Processing University of Stuttgart 2008.07.23

Slide from Koehn 2008

Page 42: Statistical Machine Translation Part III – Phrase-based SMT / Decoding Alex Fraser Institute for Natural Language Processing University of Stuttgart 2008.07.23

Slide from Koehn 2008

Page 43: Statistical Machine Translation Part III – Phrase-based SMT / Decoding Alex Fraser Institute for Natural Language Processing University of Stuttgart 2008.07.23

Slide from Koehn 2008

Page 44: Statistical Machine Translation Part III – Phrase-based SMT / Decoding Alex Fraser Institute for Natural Language Processing University of Stuttgart 2008.07.23

Slide from Koehn 2008

Page 45: Statistical Machine Translation Part III – Phrase-based SMT / Decoding Alex Fraser Institute for Natural Language Processing University of Stuttgart 2008.07.23

Slide from Koehn 2008

Page 46: Statistical Machine Translation Part III – Phrase-based SMT / Decoding Alex Fraser Institute for Natural Language Processing University of Stuttgart 2008.07.23

Slide from Koehn 2008

Page 47: Statistical Machine Translation Part III – Phrase-based SMT / Decoding Alex Fraser Institute for Natural Language Processing University of Stuttgart 2008.07.23

Slide from Koehn 2008

Page 48: Statistical Machine Translation Part III – Phrase-based SMT / Decoding Alex Fraser Institute for Natural Language Processing University of Stuttgart 2008.07.23

Slide from Koehn 2008

Page 49: Statistical Machine Translation Part III – Phrase-based SMT / Decoding Alex Fraser Institute for Natural Language Processing University of Stuttgart 2008.07.23

Slide from Koehn 2008

Page 50: Statistical Machine Translation Part III – Phrase-based SMT / Decoding Alex Fraser Institute for Natural Language Processing University of Stuttgart 2008.07.23

Slide from Koehn 2008

Page 51: Statistical Machine Translation Part III – Phrase-based SMT / Decoding Alex Fraser Institute for Natural Language Processing University of Stuttgart 2008.07.23

Slide from Koehn 2008

Page 52: Statistical Machine Translation Part III – Phrase-based SMT / Decoding Alex Fraser Institute for Natural Language Processing University of Stuttgart 2008.07.23

Slide from Koehn 2008

Page 53: Statistical Machine Translation Part III – Phrase-based SMT / Decoding Alex Fraser Institute for Natural Language Processing University of Stuttgart 2008.07.23

Slide from Koehn 2008

Page 54: Statistical Machine Translation Part III – Phrase-based SMT / Decoding Alex Fraser Institute for Natural Language Processing University of Stuttgart 2008.07.23

Slide from Koehn 2008

Page 55: Statistical Machine Translation Part III – Phrase-based SMT / Decoding Alex Fraser Institute for Natural Language Processing University of Stuttgart 2008.07.23

Slide from Koehn 2008

Page 56: Statistical Machine Translation Part III – Phrase-based SMT / Decoding Alex Fraser Institute for Natural Language Processing University of Stuttgart 2008.07.23

Slide from Koehn 2008

Page 57: Statistical Machine Translation Part III – Phrase-based SMT / Decoding Alex Fraser Institute for Natural Language Processing University of Stuttgart 2008.07.23

Slide from Koehn 2008

Page 58: Statistical Machine Translation Part III – Phrase-based SMT / Decoding Alex Fraser Institute for Natural Language Processing University of Stuttgart 2008.07.23

Slide from Koehn 2008

Page 59: Statistical Machine Translation Part III – Phrase-based SMT / Decoding Alex Fraser Institute for Natural Language Processing University of Stuttgart 2008.07.23

Slide from Koehn 2008

Page 60: Statistical Machine Translation Part III – Phrase-based SMT / Decoding Alex Fraser Institute for Natural Language Processing University of Stuttgart 2008.07.23

Assignment 2

• Build a state of the art phrase-based SMT system!– German to English or French to English– Using a small amount of data– This is a „learning by doing“ exercise

• See my home page again

Page 61: Statistical Machine Translation Part III – Phrase-based SMT / Decoding Alex Fraser Institute for Natural Language Processing University of Stuttgart 2008.07.23

Thank you!