73
Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1

Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1

Embed Size (px)

Citation preview

Page 1: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1

Dual Decomposition Inference for Graphical Models over Strings

Nanyun (Violet) PengRyan Cotterell Jason Eisner

Johns Hopkins University

1

Page 2: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1

Attention!

• Don’t care about phonology?

• Listen anyway. This is a general method for

inferring strings from other strings (if you have a probability model).

• So if you haven’t yet observed all the words of your noisy or complex language, try it!

2

Page 3: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1

A Phonological ExerciseTenses

Verb

s

3

[tɔk] [tɔks] [tɔkt]TALKTHANKHACK

1P Pres. Sg. 3P Pres. Sg. Past Tense Past Part.

[tɔkt][θeɪŋk] [θeɪŋks] [θeɪŋkt] [θeɪŋkt][hæk] [hæks] [hækt] [hækt]

CRACKSLAP

[kɹæks] [kɹækt][slæp] [slæpt]

Page 4: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1

Matrix Completion: Collaborative Filtering

Movies

Use

rs

-37 29 19 29-36 67 77 22-24 61 74 12

-79 -41-52 -39

Page 5: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1

Matrix Completion: Collaborative Filtering

29 19

Movies

Use

rs

2967 77 2261 74 12

-79 -41-39

-6 -3 2

[ 4 1 -5][ 7 -2 0][ 6 -2 3][-9 1 4][ 3 8 -5]

5

[

[

9 -2 1

[

[

9 -7 2

[

[

4 3 -2

[

[

-37-36-24

-52

Page 6: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1

Matrix Completion: Collaborative Filtering

6

Prediction!

59 -806 46

-37 29 19 29-36 67 77 22-24 61 74 12

-79 -41-52 -39

-6 -3 2

[

[

9 -2 1

[

[

9 -7 2

[

[

[

[

[ 4 1 -5][ 7 -2 0][ 6 -2 3][-9 1 4][ 3 8 -5]

Movies

Use

rs

4 3 -2[

Page 7: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1

Matrix Completion: Collaborative Filtering

[1,-4,3] [-5,2,1]

-10

-11

Dot Product

Gaussian Noise

7

Page 8: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1

A Phonological Exercise

[tɔk] [tɔks] [tɔkt]TALKTHANKHACK

1P Pres. Sg. 3P Pres. Sg. Past Tense Past Part.

Tenses

Verb

s

[tɔkt][θeɪŋk] [θeɪŋks] [θeɪŋkt] [θeɪŋkt][hæk] [hæks] [hækt] [hækt]

CRACKSLAP

[kɹæks] [kɹækt][slæp] [slæpt]

8

Page 9: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1

A Phonological Exercise

[tɔk] [tɔks] [tɔkt]TALKTHANKHACK

1P Pres. Sg. 3P Pres. Sg. Past Tense Past Part.

Suffixes

Stem

s

[tɔkt][θeɪŋk] [θeɪŋks] [θeɪŋkt] [θeɪŋkt][hæk] [hæks] [hækt] [hækt]

CRACKSLAP

[kɹæks] [kɹækt][slæp] [slæpt]

/Ø/ /s/ /t/ /t/

/tɔk//θeɪŋk//hæk/

/slæp//kɹæk/

9

Page 10: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1

A Phonological Exercise

[tɔk] [tɔks] [tɔkt]TALKTHANKHACK

1P Pres. Sg. 3P Pres. Sg. Past Tense Past Part.

[tɔkt][θeɪŋk] [θeɪŋks] [θeɪŋkt] [θeɪŋkt][hæk] [hæks] [hækt] [hækt]

CRACKSLAP

[kɹæks] [kɹækt][slæp] [slæpt]

/Ø/ /s/ /t/ /t/

/tɔk//θeɪŋk//hæk/

/slæp//kɹæk/

10

Suffixes

Stem

s

Page 11: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1

A Phonological Exercise

[tɔk] [tɔks] [tɔkt]TALK

HACK

1P Pres. Sg. 3P Pres. Sg. Past Tense Past Part.

[tɔkt][θeɪŋk] [θeɪŋks] [θeɪŋkt] [θeɪŋkt][hæk] [hæks] [hækt] [hækt]

CRACKSLAP

[kɹæk] [kɹæks] [kɹækt] [kɹækt][slæp] [slæps] [slæpt] [slæpt]

/Ø/ /s/ /t/ /t/

/tɔk//θeɪŋk//hæk/

/slæp//kɹæk/

Prediction!11

THANK

Suffixes

Stem

s

Page 12: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1

A Model of Phonology

tɔk s

tɔks

Concatenate

“talks”12

Page 13: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1

A Phonological Exercise

[tɔk] [tɔks] [tɔkt]TALKTHANKHACK

1P Pres. Sg. 3P Pres. Sg. Past Tense Past Part.

[tɔkt][θeɪŋk] [θeɪŋks] [θeɪŋkt] [θeɪŋkt][hæk] [hæks] [hækt] [hækt]

CRACKSLAPCODEBAT

[kɹæks] [kɹækt][slæp] [slæpt]

[koʊdz] [koʊdɪt][bæt] [bætɪt]

/Ø/ /s/ /t/ /t/

/tɔk//θeɪŋk//hæk/

/bæt//koʊd//slæp//kɹæk/

13

Suffixes

Stem

s

Page 14: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1

A Phonological Exercise

[tɔk] [tɔks] [tɔkt]TALK

HACK

1P Pres. Sg. 3P Pres. Sg. Past Tense Past Part.

[tɔkt][θeɪŋk] [θeɪŋks] [θeɪŋkt] [θeɪŋkt][hæk] [hæks] [hækt] [hækt]

CRACKSLAPCODEBAT

[kɹæks] [kɹækt][slæp] [slæpt]

[koʊdz] [koʊdɪt][bæt] [bætɪt]

/Ø/ /s/ /t/ /t/

/tɔk//θeɪŋk//hæk/

/bæt//koʊd//slæp//kɹæk/

z instead of s ɪt instead of t14

THANK

Suffixes

Stem

s

Page 15: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1

A Phonological Exercise

[tɔk] [tɔks] [tɔkt]TALK

HACK

1P Pres. Sg. 3P Pres. Sg. Past Tense Past Part.

[tɔkt][θeɪŋk] [θeɪŋks] [θeɪŋkt] [θeɪŋkt][hæk] [hæks] [hækt] [hækt]

CRACKSLAPCODEBATEAT

[kɹæks] [kɹækt][slæp] [slæpt]

[koʊdz] [koʊdɪt][bæt] [bætɪt][it] [eɪt] [itən]

/Ø/ /s/ /t/ /t/

/tɔk//θeɪŋk//hæk/

/it//bæt//koʊd//slæp//kɹæk/

eɪt instead of itɪt 15

THANK

Suffixes

Stem

s

Page 16: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1

A Model of Phonology

koʊd s

koʊd#s

koʊdz

Concatenate

Phonology (stochastic)

“codes”

16

Modeling word forms using latent underlying morphs and phonology.Cotterell et. al. TACL 2015

Page 17: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1

A Model of Phonology

rizaign ation

rizaign#ation

rεzɪgneɪʃn

“resignation”

Concatenate

17

Phonology (stochastic)

Page 18: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1

dæmneɪʃənzrizaign

r,εzɪgn’eɪʃn riz’ajnz d,æmn’eɪʃn d’æmz

rεzɪgn#eɪʃən rizajn#z dæmn#zdæmn#eɪʃən

Fragment of Our Graph for English

18

1) Morphemes

2) Underlying words

3) Surface words

Concatenation

Phonology

“resignation” “resigns”

“damnation” “damns”

3rd-personsingular suffix:very common!

Page 19: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1

Limited to concatenation? No, could extend to templatic morphology …

19

Page 20: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1

Outline

20

● A motivating example: phonology● General framework:

o graphical models over stringso Inference on graphical models over strings

● Dual decomposition inferenceo The general ideao Substring features and active set

● Experiments and results

Page 21: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1

Graphical Models over Strings?

● Joint distribution over many strings

● Variables● Range over * Σ infinite set of all strings

● Relations among variables● Usually specified by (multi-tape) FSTs

21

A probabilistic approach to language change (Bouchard-Côté et. al. NIPS 2008)

Graphical models over multiple strings. (Dreyer and Eisner. EMNLP 2009)

Large-scale cognate recovery (Hall and Klein. EMNLP 2011)

Page 22: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1

Graphical Models over Strings?

● Strings are the basic units in natural languages.● Use

o Orthographic (spelling)o Phonological (pronunciation)o Latent (intermediate steps not observed directly)

● Sizeo Morphemes (meaningful subword units)o Wordso Multi-word phrases, including “named entities”o URLs

22

Page 23: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1

What relationships could you model?

● spelling pronunciation

● word noisy word (e.g., with a typo)

● word related word in another language

(loanwords, language evolution, cognates)

● singular plural (for example)

● root word

● underlying form surface form

23

Page 24: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1

Factor Graph for phonology

25

zrizajgn eɪʃən dæmn

rεzɪgn#eɪʃən rizajn#z dæmn#eɪʃən dæmn#z

r,εzɪgn’eɪʃn riz’ajnz d,æmn’eɪʃn d’æmz

1) Morpheme URs

2) Word URs

3) Word SRs

Concatenation (e.g.)

Phonology (PFST)

log-probabilityLet’s maximize it!

zrizajgn eɪʃən dæmn

rεzɪgn#eɪʃən rizajn#z dæmn#eɪʃən dæmn#z

r,εzɪgn’eɪʃn riz’ajnz d,æmn’eɪʃn d’æmz

1) Morpheme URs

2) Word URs

3) Word SRs

Concatenation (e.g.)

Phonology (PFST)

Page 25: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1

Contextual Stochastic Edit Process

26

Stochastic contextual edit distance and probabilistic FSTs. (Cotterell et. al. ACL 2014)

Page 26: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1

?

?

riz’ajnz

?

r,εzɪgn’eɪʃn

?

?

riz’ajnd

??

Inference on a Factor Graph

28

1) Morpheme URs

2) Word URs

3) Word SRs

Page 27: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1

foo

?

riz’ajnz

?

r,εzɪgn’eɪʃn

s

?

riz’ajnd

dabar

Inference on a Factor Graph

29

1) Morpheme URs

2) Word URs

3) Word SRs

Page 28: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1

Inference on a Factor Graph

30

foo

bar#s

riz’ajnz

bar#foo

r,εzɪgn’eɪʃn

s

bar#da

riz’ajnd

dabar1) Morpheme URs

2) Word URs

3) Word SRs

Page 29: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1

Inference on a Factor Graph

8e-3 0.01 0.05 0.02

31

foo

bar#s

riz’ajnz

bar#foo

r,εzɪgn’eɪʃn

s

bar#da

riz’ajnd

dabar1) Morpheme URs

2) Word URs

3) Word SRs

Page 30: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1

Inference on a Factor Graph

8e-3 0.01 0.05 0.02

32

foo

bar#s

riz’ajnz

bar#foo

r,εzɪgn’eɪʃn

s

bar#da

riz’ajnd

dabar1) Morpheme URs

2) Word URs

3) Word SRs

6e-12002e-1300 7e-1100

Page 31: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1

Inference on a Factor Graph

8e-3 0.01 0.05 0.02

33

foo

bar#s

riz’ajnz

bar#foo

r,εzɪgn’eɪʃn

s

bar#da

riz’ajnd

dabar1) Morpheme URs

2) Word URs

3) Word SRs

6e-12002e-1300 7e-1100

Page 32: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1

Inference on a Factor Graph

34

foo

far#s

riz’ajnz

far#foo

r,εzɪgn’eɪʃn

s

far#da

riz’ajnd

dafar1) Morpheme URs

2) Word URs

3) Word SRs

?

Page 33: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1

Inference on a Factor Graph

35

foo

size#s

riz’ajnz

size#foo

r,εzɪgn’eɪʃn

s

size#da

riz’ajnd

dasize1) Morpheme URs

2) Word URs

3) Word SRs

?

Page 34: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1

Inference on a Factor Graph

36

foo

…#s

riz’ajnz

…#foo

r,εzɪgn’eɪʃn

s

…#da

riz’ajnd

da…1) Morpheme URs

2) Word URs

3) Word SRs

?

Page 35: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1

Inference on a Factor Graph

37

foo

rizajn#s

riz’ajnz

rizajn#foo

r,εzɪgn’eɪʃn

s

rizajn#da

riz’ajnd

darizajn1) Morpheme URs

2) Word URs

3) Word SRs

Page 36: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1

Inference on a Factor Graph

38

foo

rizajn#s

riz’ajnz

rizajn#foo

r,εzɪgn’eɪʃn

s

rizajn#da

riz’ajnd

darizajn1) Morpheme URs

2) Word URs

3) Word SRs

0.012e-5 0.008

Page 37: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1

Inference on a Factor Graph

39

eɪʃn

rizajn#s

riz’ajnz

rizajn#eɪʃn

r,εzɪgn’eɪʃn

s

rizajn#d

riz’ajnd

drizajn1) Morpheme URs

2) Word URs

3) Word SRs

0.010.001 0.015

Page 38: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1

Inference on a Factor Graph

40

eɪʃn

rizajgn#s

riz’ajnz

rizajgn#eɪʃn

r,εzɪgn’eɪʃn

s

rizajgn#d

riz’ajnd

drizajgn1) Morpheme URs

2) Word URs

3) Word SRs

0.0080.008 0.013

Page 39: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1

eɪʃn

rizajgn#s

riz’ajnz

rizajgn#eɪʃn

r,εzɪgn’eɪʃn

s

rizajgn#d

riz’ajnd

drizajgn

0.0080.008 0.013

Inference on a Factor Graph

41

Page 40: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1

Challenges in Inference

42

• Global discrete optimization problem.

• Variables range over a infinite set … cannot be solved by ILP or even brute force. Undecidable!

• Our previous papers used approximate algorithms: Loopy Belief Propagation, or Expectation Propagation.

Q: Can we do exact inference? A: If we can live with 1-best and not marginal inference, then we can use Dual Decomposition … which is exact.

(if it terminates! the problem is undecidable in general …)

Page 41: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1

Outline

43

● A motivating example: phonology● General framework:

o graphical models over stringso Inference on graphical models over strings

● Dual decomposition inferenceo The general ideao Substring features and active set

● Experiments and results

Page 42: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1

Graphical Model for Phonology

44

Jointly decide the values of the inter-dependent latent variables, which range over a infinite set.

1) Morpheme URs

2) Word URs

3) Word SRs

Concatenation (e.g.)

Phonology (PFST)

zrizajgn eɪʃən dæmn

rεzɪgn#eɪʃən rizajn#z dæmn#eɪʃən dæmn#z

r,εzɪgn’eɪʃn riz’ajnz d,æmn’eɪʃn d’æmz

rεzign eɪʃən

Page 43: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1

General Idea of Dual Decomp

45

zrizajgn eɪʃən dæmn

rεzɪgn#eɪʃən rizajn#z dæmn#eɪʃən dæmn#z

r,εzɪgn’eɪʃn riz’ajnz d,æmn’eɪʃn d’æmz

rεzign eɪʃən

Page 44: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1

General Idea of Dual Decomp

zrizajneɪʃən dæmn eɪʃən zdæmnrεzɪgn

rεzɪgn#eɪʃən rizajn#z dæmn#eɪʃən dæmn#z

r,εzɪgn’eɪʃn riz’ajnz d,æmn’eɪʃn d’æmz

Subproblem 1 Subproblem 2 Subproblem 3 Subproblem 4

46

Page 45: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1

I preferrεzɪgn

I preferrizajn

General Idea of Dual Decomp

zrizajneɪʃən dæmn eɪʃən zdæmnrεzɪgn

rεzɪgn#eɪʃən rizajn#z dæmn#eɪʃən dæmn#z

r,εzɪgn’eɪʃn riz’ajnz d,æmn’eɪʃn d’æmz

Subproblem 1 Subproblem 2 Subproblem 3 Subproblem 4

47

Page 46: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1

Outline

48

● A motivating example: phonology● General framework:

o graphical models over stringso Inference on graphical models over strings

● Dual decomposition inferenceo The general ideao Substring features and active set

● Experiments and results

Page 47: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1

zrizajneɪʃən dæmn eɪʃən zdæmnrεzɪgn

rεzɪgn#eɪʃən rizajn#z dæmn#eɪʃən dæmn#z

r,εzɪgn’eɪʃn riz’ajnz d,æmn’eɪʃn d’æmz

Subproblem 1 Subproblem 2 Subproblem 3 Subproblem 4

49

Page 48: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1

Substring Features and Active Set

zrizajneɪʃən dæmn eɪʃən zdæmnrεzɪgn

rεzɪgn#eɪʃən rizajn#z dæmn#eɪʃən dæmn#z

r,εzɪgn’eɪʃn riz’ajnz d,æmn’eɪʃn d’æmz

Subproblem 1 Subproblem 1 Subproblem 1 Subproblem 1

50

I preferrεzɪgn

Less ε, ɪ, g; more i, a, j(to match others)

I preferrizajn

Less i, a, j;more ε, ɪ, g(to match others)

Page 49: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1

Features: “Active set” method

• How many features?

• Infinitely many possible n-grams!

• Trick: Gradually increase feature set as needed.– Like Paul & Eisner (2012), Cotterell & Eisner (2015)

1. Only add features on which strings disagree.2. Only add abcd once abc and bcd already agree.

– Exception: Add unigrams and bigrams for free.

51

Page 50: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1

Fragment of Our Graph for Catalan

52

?

?

grizos

?

gris

?

?

grize

??

grizes

?

?

Stem of “grey”

Separate these 4 words into 4 subproblems as before …

Page 51: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1

53

? ?

grizos

?

gris

?

?

grize

??

??

grizes

Redraw the graph to focus on the stem …

Page 52: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1

54

? ?

grizos

?

gris

?

grize

??

grizes

??

???

Separate into 4 subproblems – each gets its own copy of the stem

Page 53: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1

55

? ?

grizos

?

gris

ε

?

grize

??

grizes

??

εε

ε

nonzero features:{ }

Iteration: 1

Page 54: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1

56

? ?

grizos

?

gris

g

?

grize

??

grizes

??

gg

g

nonzero features: { }

Iteration: 3

Page 55: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1

57

? ?

grizos

?

gris

gris

?

grize

??

grizes

??

grizgriz

griz

nonzero features: {s, z, is, iz, s$, z$ }

Iteration: 4

Feature weights (dual variable)

Page 56: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1

58

? ?

grizos

?

gris

gris

?

grize

??

grizes

??

grizgrizo

griz

nonzero features: {s, z, is, iz, s$, z$,o, zo, o$ }

Iteration: 5

Feature weights (dual variable)

Page 57: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1

59

? ?

grizos

?

gris

gris

?

grize

??

grizes

??

grizgrizo

griz

nonzero features: {s, z, is, iz, s$, z$,o, zo, o$ }

Iteration: 6

Iteration: 13

Feature weights (dual variable)

Page 58: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1

60

? ?

grizos

?

gris

griz

?

grize

??

grizes

??

grizgrizo

griz

nonzero features: {s, z, is, iz, s$, z$,o, zo, o$ }

Iteration: 14

Feature weights (dual variable)

Page 59: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1

61

? ?

grizos

?

gris

griz

?

grize

??

grizes

??

grizgriz

griz

nonzero features: {s, z, is, iz, s$, z$,o, zo, o$ }

Iteration: 17

Feature weights (dual variable)

Page 60: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1

62

? ?

grizos

?

gris

griz

?

grize

??

grizes

??

grizegriz

griz

nonzero features: {s, z, is, iz, s$, z$, o, zo, o$, e, ze, e$}

Iteration: 18

Feature weights (dual variable)

Page 61: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1

63

? ?

grizos

?

gris

griz

?

grize

??

grizes

??

grizegriz

griz

nonzero features: {s, z, is, iz, s$, z$, o, zo, o$, e, ze, e$}

Iteration: 19

Iteration: 29

Feature weights (dual variable)

Page 62: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1

64

? ?

grizos

?

gris

griz

?

grize

??

grizes

??

grizgriz

griz

nonzero features: {s, z, is, iz, s$, z$, o, zo, o$, e, ze, e$}

Iteration: 30

Feature weights (dual variable)

Page 63: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1

65

? ?

grizos

?

gris

griz

?

grize

??

grizes

??

grizgriz

griz

nonzero features: {s, z, is, iz, s$, z$, o, zo, o$, e, ze, e$}

Iteration: 30

Converged!

Page 64: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1

I’ll try to arrange forr not i at position 2, i not z at position 3,z not at position 4.

Why n-gram features?

66

• Positional features don’t understand insertion:

• In contrast, our “z” feature counts the number of “z” phonemes, without regard to position.

These solutions already agree on “g”, “i”, “z” counts … they’re only negotiating over the “r” count.

gizgriz

gizgriz

I need more r’s.

Page 65: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1

Why n-gram features?

67

• Adjust weights λ until the “r” counts match:

• Next iteration agrees on all our unigram features:

– Oops! Features matched only counts, not positions – But bigram counts are still wrong …

so bigram features get activated to save the day

– If that’s not enough, add even longer substrings …

gizgriz I need more r’s … somewhere.

girzgriz I need more gr, ri, iz,less gi, ir, rz.

Page 66: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1

Outline

68

● A motivating example: phonology● General framework:

o graphical models over stringso Inference on graphical models over strings

● Dual decomposition inferenceo The general ideao Substring features and active set

● Experiments and results

Page 67: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1

7 Inference Problems (graphs)

EXERCISE (small)

o 4 languages: Catalan, English, Maori, Tangale

o 16 to 55 underlying morphemes.

o 55 to 106 surface words.

CELEX (large)

o 3 languages: English, German, Dutch

o 341 to 381 underlying morphemes.

o 1000 surface words for each language.

69

# vars (unknown strings)

# subproblems

Page 68: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1

Experimental Questions

o Is exact inference by DD practical?o Does it converge? o Does it get better results than approximate

inference methods?

o Does exact inference help EM?

71

Page 69: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1

● DD seeks best λ via subgradient algorithm reduce dual objective tighten upper bound on primal objective

● If λ gets all sub-problems to agree (x1 = … = xK) constraints satisfied dual value is also value of a primal solution which must be max primal! (and min dual)

72

primal (function of strings x)

dual(function of weights λ)

Page 70: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1

Convergence behavior (full graph)

Catalan Maori

English Tangale73

Dual (tighten upper bound)

primal(improve strings)

optimal!

Page 71: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1

Comparisons

● Compare DD with two types of Belief Propagation (BP) inference.

Approximate MAP inference(max-product BP)

(baseline)

Approximate marginal inference(sum-product BP)

(TACL 2015)

Exact MAP inference(dual decomposition)

(this paper)

74

Exact marginal inference(we don’t know how!)

variationalapproximation

Viterbiapproximation

Page 72: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1

Inference accuracy

75

Approximate MAP inference(max-product BP)

(baseline)

Approximate marginal inference(sum-product BP)

(TACL 2015)

Exact MAP inference(dual decomposition)

(this paper)

Model 1, EXERCISE: 90% Model 1, CELEX: 84% Model 2S, CELEX: 99%Model 2E, EXERCISE: 91%

Model 1, EXERCISE: 95% Model 1, CELEX: 86% Model 2S, CELEX: 96%Model 2E, EXERCISE: 95%

Model 1, EXERCISE: 97% Model 1, CELEX: 90% Model 2S, CELEX: 99%Model 2E, EXERCISE: 98%

Model 1 – trivial phonologyModel 2S – oracle phonologyModel 2E – learned phonology (inference used within EM)

impro

ves improvesmore!

worse

Page 73: Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner Johns Hopkins University 1

Conclusion

•A general DD algorithm for MAP inference on graphical models over strings.

•On the phonology problem, terminates in practice, guaranteeing the exact MAP solution.

•Improved inference for supervised model; improved EM training for unsupervised model.

•Try it for your own problems generalizing to new strings!

76