CS395T: Structured Models for NLP Administrivia Lecture 12 ...gdurrett/courses/fa2017/lectures/lec12-4pp.pdf · translation model + language model to translate foreign to English

1

CS395T:StructuredModelsforNLPLecture12:MachineTransla=on

GregDurreAAdaptedfromDanKlein–UCBerkeley

AdministriviaProject2dueoneweekfromtoday!

P1testsetresults:top3

YasumasaOnoe:78.55F1(78.27P/78.83R)

PrateekShrishailKolhar:82.32F1(82.61P/82.07R)

Conjunc=onsofwords,POS,andshapesinneighborhoodVeryfastvectorizedimplementa=on(15sperepoch)

SuWang:84.03F1(86.10P/82.05R)

LargerwindowsizeandWikipediagazeAeer

Usedtransi=onprobabili=esfromHMM,character5-gramsandotherfeaturetuning

MachineTransla=on

MachineTransla=on:Examples

2

LevelsofTransfer Word-LevelMT:Examples

§  lapoli'quedelahaine. (ForeignOriginal)§  poli=csofhate. (ReferenceTransla=on)§  thepolicyofthehatred. (IBM4+N-grams+Stack)

§  nousavonssignéleprotocole. (ForeignOriginal)§  wedidsignthememorandumofagreement. (ReferenceTransla=on)§  wehavesignedtheprotocol. (IBM4+N-grams+Stack)

§  oùétaitleplansolide? (ForeignOriginal)§  butwherewasthesolidplan? (ReferenceTransla=on)§  wherewastheeconomicbase? (IBM4+N-grams+Stack)

PhrasalMT:Examples

Metrics

3

MT:Evalua=on§  Humanevalua=ons:subjectmeasures,fluency/

adequacy

§  Automa=cmeasures:n-grammatchtoreferences§  NISTmeasure:n-gramrecall(workedpoorly)§  BLEU:n-gramprecision(noonereallylikesit,but

everyoneusesit)§  Lotsmore:TER,HTER,METEOR,…

§  BLEU:§  P1=unigramprecision§  P2,P3,P4=bi-,tri-,4-gramprecision§  WeightedgeometricmeanofP1-4§  Brevitypenalty(why?)§  Somewhathardtogame…§  Magnitudeonlymeaningfulonsamelanguage,corpus,

numberofreferences,probablyonlywithinsystemtypes…

Automa=cMetricsWork(?)

SystemsOverview

Corpus-BasedMTModeling correspondences between languages

Sentence-aligned parallel corpus:

Yo lo haré mañana I will do it tomorrow

Hasta pronto See you soon

Hasta pronto See you around

Yo lo haré pronto Novel Sentence

I will do it soon

I will do it around

See you tomorrow

Machine translation system:

Model of translation

4

Phrase-BasedSystemOverview

Sentence-aligned corpus

cat ||| chat ||| 0.9 the cat ||| le chat ||| 0.8 dog ||| chien ||| 0.8 house ||| maison ||| 0.6 my house ||| ma maison ||| 0.9 language ||| langue ||| 0.9 … Phrase table

(translation model) Word alignments

Many slides and examples from Philipp Koehn or John DeNero


Unlabeled English data

cat ||| chat ||| 0.9 the cat ||| le chat ||| 0.8 dog ||| chien ||| 0.8 house ||| maison ||| 0.6 my house ||| ma maison ||| 0.9 language ||| langue ||| 0.9 …

Language model P(e)


Phrase table P(f|e) P (e|f) / P (f |e)P (e)

Noisy channel model: combine scores from translation model + language model to translate foreign to

English

“Translate faithfully but make fluent English”

}

WordAlignment

WordAlignment

5

WordAlignment

What is the anticipated cost of collecting fees under the new proposal?

En vertu des nouvelles propositions, quel est le coût prévu de perception des droits?

x z What

is the

anticipated cost

of collecting

fees under

the new

proposal ?

En vertu de les nouvelles propositions , quel est le coût prévu de perception de les droits ?

UnsupervisedWordAlignment§  Input:abitext:pairsoftranslatedsentences

§  Output:alignments:pairsoftranslatedwords

§  Notalwaysone-to-one!

nous acceptons votre opinion .

we accept your view .

1-to-ManyAlignments Evalua=ngModels§  Howdowemeasurequalityofaword-to-wordmodel?

§  Method1:useinanend-to-endtransla=onsystem§  Slowdevelopmentcycle§  MisleadingifyourMTsystemwas“tuned”forcertainaspectsofbadalignments

§  Method2:measurequalityofthealignmentsproduced§  Easytomeasure§  Hardtoknowwhatthegoldalignmentsshouldbe§  Onendoesnotcorrelatewellwithtransla=onquality

6

AlignmentErrorRate§  AlignmentErrorRate

Sure align.

Possible align.

Predicted align.

=

=

=

IBMModel1

IBMModel1(Brown93)§  Alignments:ahiddenvectorcalledanalignmentspecifieswhichEnglish

source(oraspecialnulltoken)isresponsibleforeachFrenchtargetword.

A:

IBMModel1

Thank you , I shall do so gladly .

1 3 7 6 9

1 2 3 4 5 7 6 8 9

Gracias , lo haré de muy buen grado .

8 8 8 8

E:

F:

Model Parameters

P( F1 = Gracias | A1 = 1) = P(Gracias | Thank) <- learn these translation probs

P(A1 = 1) = 1/10, nothing to learn

7

EMforModel1

§  Model1Parameters:Transla=onprobabili=es

§  Startwith uniform,including§  Foreachsentence,foreachforeignposi=onj:

§  CalculateposterioroverEnglishposi=ons

§  Incrementcountofwordfjwithwordeibytheseamounts

§  Doforwholecorpus,re-es=mateP(f|e)withM-step

P (aj = i|f , e) = P (fj |ei)Pi0 P (fj |e0i)

ProblemswithModel1

§  There’sareasontheydesignedmodels2-5!

§  Problems:alignmentsjumparound,aligneverythingtorarewords

§  Experimentalsetup:§  Trainingdata:1.1Msentences

ofFrench-Englishtext,CanadianHansards

§  Evalua=onmetric:alignmenterrorRate(AER)

§  Evalua=ondata:447hand-alignedsentences

IntersectedModel1

§  Post-intersec=on:standardprac=cetotrainmodelsineachdirec=onthenintersecttheirpredic=ons[OchandNey,03]

§  Secondmodelisbasicallyafilteronthefirst§  Precisionjumps,recalldrops§  Endupnotguessinghard

alignments

Model P/R AER Model 1 E→F 82/58 30.6 Model 1 F→E 85/58 28.7 Model 1 AND 96/46 34.8

HMMModel:LocalMonotonicity

8

MonotonicTransla=on

Le Japon secoué par deux nouveaux séismes

Japan shaken by two new quakes

LocalOrderChange

Le Japon est au confluent de quatre plaques tectoniques

Japan is at the junction of four tectonic plates

TheHMMModel

§  Wantlocalmonotonicity:mostjumpsaresmall§  HMMmodel(Vogel96)

§  Re-es=mateusingtheforward-backwardalgorithm -2 -1 0 1 2 3

HMMExamples

9

AERforHMMs

Model AER Model 1 INT 19.5 HMM E→F 11.4 HMM F→E 10.8 HMM AND 7.1 HMM INT 4.7 GIZA M4 AND 6.9

LanguageModeling




Language model P(e)




English


}N-gramLanguageModeling

§  Couldgiveseverallecturesonthis!

§  Es=mate

§  Genera=vemodel:readoffcountsandnormalize§  P(fox|thequickbrown)=0.9,etc.

§  Verycomplexdistribu=ons,needtosmooth§  Interpolatewithlower-ordermodels§  Lotsofcomplextechniques

P (wn|wn�k, wn�k+1, . . . , wn�1)

10

Phrase-BasedMT


Sentence-aligned corpus

cat ||| chat ||| 0.9 the cat ||| le chat ||| 0.8 dog ||| chien ||| 0.8 house ||| maison ||| 0.6 my house ||| ma maison ||| 0.9 language ||| langue ||| 0.9 … Phrase table

(translation model) Word alignments

§  Wehaveaphrasetablenow(ranaligner,extractedphrasesandcountedthemtogetscores)–phraseextrac=onandcoun=ngaretricky,butwe’llignorethis...




Language model P(e)




English


}

11

Phrase-BasedDecoding

这 7人中包括来自法国和俄罗斯的宇航员 .

Decoder design is important: [Koehn et al. 03]

Phrase-BasedDecoding

MonotonicWordTransla=on

§  CostisLM*TM§  It’sanHMM?

§  P(e|e-1,e-2)§  P(f|e)

§  Stateincludes§  ExposedEnglish§  Posi=oninforeign

§  Dynamicprogramloop?

a <- to 0.8

a <- by 0.1

[…. a slap, 5] 0.00001

[…. slap to, 6] 0.00000016

[…. slap by, 6] 0.00000001

a slap to

0.02

a slap by 0.01

for (fPosition in 1…|f|) for (eContext in allEContexts) for (eOption in translations[fPosition]) score = scores[fPosition-1][eContext] * LM(eContext+eOption) * TM(eOption, fWord[fPosition]) scores[fPosition][eContext[2]+eOption] =max score

BeamDecoding§  ForrealMTmodels,thiskindofdynamicprogramisadisaster(why?)§  Standardsolu=onisbeamsearch:foreachposi=on,keeptrackofonlythe

bestkhypotheses

for (fPosition in 1…|f|) for (eContext in bestEContexts[fPosition]) for (eOption in translations[fPosition]) score = scores[fPosition-1][eContext] * LM(eContext+eOption) * TM(eOption, fWord[fPosition]) bestEContexts.maybeAdd(eContext[2]+eOption, score)

Example from David Chiang

12

PhraseTransla=on

§  Ifmonotonic,almostanHMM;technicallyasemi-HMM

§  Ifdistor=on…nowwhat?

for (fPosition in 1…|f|) for (lastPosition < fPosition) for (eContext in eContexts) for (eOption in translations[fPosition]) … combine hypothesis for (lastPosition ending in eContext) with eOption

Non-MonotonicPhrasalMT

Pruning:Beams+ForwardCosts

§  Problem:easypar=alanalysesarecheaper§  Solu=on1:usebeamsperforeignsubset§  Solu=on2:es=mateforwardcosts(A*-like)

ThePharaohDecoder

13

HypotheisLawces

Syntac=cModels

14

15

16

Syntac=cTransla=on§  Lotsofcomplexity:largephrasetables,errorsintroducedbyparsers,parsesdon’tagree,inferenceisharder,...

§  Goodforsomelanguages(Japanese->English),butgenerallymoretroublethanit’sworth

§  Easiermethod:syntac=c“pre-reordering”

MT:Takeaways§  Wordalignments:unsupervisedprocessforfindingword-levelcorrespondences.Turntheseintophraselevelcorrespondences->phrasetable

§  Languagemodel:es=maten-grammodelonaverylargecorpus

§  Transla=onprocess:usebeamsearchtofindthebesttransla=onargmaxeP(f|e)P(e)

Documents

CS395T: Structured Models for NLP Administrivia Lecture 12 ...gdurrett/courses/fa2017/lectures/lec12-4pp.pdf · translation model + language model to translate foreign to English