141
Applications of Weighted Tree Automata and Tree Transducers in Natural Language Processing Andreas Maletti Institute of Computer Science Universität Leipzig, Germany Krippen — March 28, 2019

Applications of Weighted Tree Automata and Tree ... · Applications of Weighted Tree Automata and Tree Transducers in Natural Language Processing Andreas Maletti Institute of Computer

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Applications of Weighted Tree Automata and Tree ... · Applications of Weighted Tree Automata and Tree Transducers in Natural Language Processing Andreas Maletti Institute of Computer

Applications of Weighted Tree Automata andTree Transducers in Natural Language Processing

Andreas Maletti

Institute of Computer ScienceUniversität Leipzig, Germany

Krippen — March 28, 2019

Page 2: Applications of Weighted Tree Automata and Tree ... · Applications of Weighted Tree Automata and Tree Transducers in Natural Language Processing Andreas Maletti Institute of Computer

Final Round of Jeopardy!

IBM Watson$36,681

Brad Rutter$5,400

Ken Jennings$2,400

Final CategoryU.S. cities

Answer: Its largest airport was named for a World War II hero;its second largest, for a World War II battle.

Watson (bet $947)What is Toronto?????(con�dence ≈ 30%)

Brad (bet $5,000)What is Chicago?

Ken (bet $2,400)What is Chicago?

Page 3: Applications of Weighted Tree Automata and Tree ... · Applications of Weighted Tree Automata and Tree Transducers in Natural Language Processing Andreas Maletti Institute of Computer

Final Round of Jeopardy!

IBM Watson$36,681

Brad Rutter$5,400

Ken Jennings$2,400

Final CategoryU.S. cities

Answer: Its largest airport was named for a World War II hero;its second largest, for a World War II battle.

Watson (bet $947)What is Toronto?????(con�dence ≈ 30%)

Brad (bet $5,000)What is Chicago?

Ken (bet $2,400)What is Chicago?

Page 4: Applications of Weighted Tree Automata and Tree ... · Applications of Weighted Tree Automata and Tree Transducers in Natural Language Processing Andreas Maletti Institute of Computer

Final Round of Jeopardy!

Page 5: Applications of Weighted Tree Automata and Tree ... · Applications of Weighted Tree Automata and Tree Transducers in Natural Language Processing Andreas Maletti Institute of Computer

Information Retrieval

Wikipedia page of Toronto Pearson International AirportLester B. Pearson International Airport [. . . ] is the primary internationalairport serving Toronto, its metropolitan area, and surrounding regionknown as the Golden Horseshoe in the province of Ontario, Canada.

It is the largest and busiest airport in Canada, the second-busiestinternational air passenger gateway in the Americas, and the 31st-busiestairport in the world [. . . ].

The airport is named in honour of Lester B. Pearson, Nobel Peace Prizelaureate and 14th Prime Minister of Canada.

Mention Detection is the task to identify all instances of the same entity

Page 6: Applications of Weighted Tree Automata and Tree ... · Applications of Weighted Tree Automata and Tree Transducers in Natural Language Processing Andreas Maletti Institute of Computer

Information Retrieval

Wikipedia page of Toronto Pearson International AirportLester B. Pearson International Airport [. . . ] is the primary internationalairport serving Toronto, its metropolitan area, and surrounding regionknown as the Golden Horseshoe in the province of Ontario, Canada.

It is the largest and busiest airport in Canada, the second-busiestinternational air passenger gateway in the Americas, and the 31st-busiestairport in the world [. . . ].

The airport is named in honour of Lester B. Pearson, Nobel Peace Prizelaureate and 14th Prime Minister of Canada.

Mention Detection is the task to identify all instances of the same entity

Page 7: Applications of Weighted Tree Automata and Tree ... · Applications of Weighted Tree Automata and Tree Transducers in Natural Language Processing Andreas Maletti Institute of Computer

Mention Detection

Mention Detection1 Identify all mentions (= noun phrases)2 Identify which mentions reference the same entity

S

NP

PRP

It

VP

VBZ

is

NP

NP

NP

DT

the

ADJP

JJS

largest

CC

and

JJS

busiest

NN

airport

PP

IN

in

NP

NNP

Canada

,

,

NP

NP

DT

the

JJ

second-busiest

JJ

international

NN

air

NN

passenger

NN

gateway

PP

IN

in

NP

DT

the

NNPS

Americas

,

,

CC

and

NP

NP

DT

the

JJ

31st-busiest

NN

airport

PP

IN

in

NP

DT

the

NN

world

Page 8: Applications of Weighted Tree Automata and Tree ... · Applications of Weighted Tree Automata and Tree Transducers in Natural Language Processing Andreas Maletti Institute of Computer

Mention Detection

Mention Detection1 Identify all mentions (= noun phrases)2 Identify which mentions reference the same entity

S

NP

PRP

It

VP

VBZ

is

NP

NP

NP

DT

the

ADJP

JJS

largest

CC

and

JJS

busiest

NN

airport

PP

IN

in

NP

NNP

Canada

,

,

NP

NP

DT

the

JJ

second-busiest

JJ

international

NN

air

NN

passenger

NN

gateway

PP

IN

in

NP

DT

the

NNPS

Americas

,

,

CC

and

NP

NP

DT

the

JJ

31st-busiest

NN

airport

PP

IN

in

NP

DT

the

NN

world

Page 9: Applications of Weighted Tree Automata and Tree ... · Applications of Weighted Tree Automata and Tree Transducers in Natural Language Processing Andreas Maletti Institute of Computer

Mention Detection

Mention Detection1 Identify all mentions (= noun phrases)2 Identify which mentions reference the same entity

S

NP

PRP

It

VP

VBZ

is

NP

NP

NP

DT

the

ADJP

JJS

largest

CC

and

JJS

busiest

NN

airport

PP

IN

in

NP

NNP

Canada

,

,

NP

NP

DT

the

JJ

second-busiest

JJ

international

NN

air

NN

passenger

NN

gateway

PP

IN

in

NP

DT

the

NNPS

Americas

,

,

CC

and

NP

NP

DT

the

JJ

31st-busiest

NN

airport

PP

IN

in

NP

DT

the

NN

world

Page 10: Applications of Weighted Tree Automata and Tree ... · Applications of Weighted Tree Automata and Tree Transducers in Natural Language Processing Andreas Maletti Institute of Computer

Parsing

Parsingdetermining the syntactic structure of a sentencesubject to a given theory of syntax (encoded in the training data)

I constituent syntaxI dependency syntaxI . . .

S

NP-SBJ

NP

NNP

Mr.

NNP

Hahn

,

,

NP

NP

DT

the

ADJP

NP

CD

62

HYPH

-

NN

year

HYPH

-

JJ

old

NML

NML

NN

chairman

CC

and

NML

JJ

chief

JJ

executive

NN

officer

PP

IN

of

NP

NNP

Georgia

HYPH

-

NNP

Pacific

NNP

Corp.

VP

VBZ

is

VP

VBG

leading

NP

NP

NP

DT

the

NML

NN

forest

HYPH

-

NN

product

NN

concern

POS

’s

JJ

unsolicited

NML

QP

$

$

CD

3.19

CD

billion

-NONE-

*U*

NN

bid

PP

IN

for

NP

NNP

Great

NNP

Northern

NNP

Nekoosa

NNP

Corp

.

.

Page 11: Applications of Weighted Tree Automata and Tree ... · Applications of Weighted Tree Automata and Tree Transducers in Natural Language Processing Andreas Maletti Institute of Computer

Parsing

Parsingdetermining the syntactic structure of a sentencesubject to a given theory of syntax (encoded in the training data)

I constituent syntaxI dependency syntaxI . . .

S

NP-SBJ

NP

NNP

Mr.

NNP

Hahn

,

,

NP

NP

DT

the

ADJP

NP

CD

62

HYPH

-

NN

year

HYPH

-

JJ

old

NML

NML

NN

chairman

CC

and

NML

JJ

chief

JJ

executive

NN

officer

PP

IN

of

NP

NNP

Georgia

HYPH

-

NNP

Pacific

NNP

Corp.

VP

VBZ

is

VP

VBG

leading

NP

NP

NP

DT

the

NML

NN

forest

HYPH

-

NN

product

NN

concern

POS

’s

JJ

unsolicited

NML

QP

$

$

CD

3.19

CD

billion

-NONE-

*U*

NN

bid

PP

IN

for

NP

NNP

Great

NNP

Northern

NNP

Nekoosa

NNP

Corp

.

.

Page 12: Applications of Weighted Tree Automata and Tree ... · Applications of Weighted Tree Automata and Tree Transducers in Natural Language Processing Andreas Maletti Institute of Computer

Constituent Parsing

Example: We must bear in mind the Community as a whole

S

NP

PRP

We

VP

MD

must

VP

VB

bear

PP

IN

in

NP

NN

mind

NP

NP

DT

the

NN

Community

PP

IN

as

NP

DT

a

NN

whole

POS-tag: part-of-speech tag, “class” of a word

Page 13: Applications of Weighted Tree Automata and Tree ... · Applications of Weighted Tree Automata and Tree Transducers in Natural Language Processing Andreas Maletti Institute of Computer

Constituent Parsing

Example: We must bear in mind the Community as a whole

S

NP

PRP

We

VP

MD

must

VP

VB

bear

PP

IN

in

NP

NN

mind

NP

NP

DT

the

NN

Community

PP

IN

as

NP

DT

a

NN

whole

POS-tag: part-of-speech tag, “class” of a word

Page 14: Applications of Weighted Tree Automata and Tree ... · Applications of Weighted Tree Automata and Tree Transducers in Natural Language Processing Andreas Maletti Institute of Computer

Constituent Parsing

todayLinear-time dependency models; optimized by neural networks

2016Subcategorizationmanual: Collins (1999), Stanford (2003), BLLIP (2005)automatic, e.g. Berkeley (2007)

2000Statistical approach (cheap, automatically trained)Penn and WSJ tree bank (1M and 30M words)automatically obtained weighted CFG

1990Chomskyan approach (perfect analysis, poor coverage)hand-cra�ed CFG, TAG (re�ned via POS tags)corrections and selection by human annotators

Page 15: Applications of Weighted Tree Automata and Tree ... · Applications of Weighted Tree Automata and Tree Transducers in Natural Language Processing Andreas Maletti Institute of Computer

Constituent Parsing

All models use weights for disambiguation:

S

NP

PRP

We

VP

VBD

saw

NP

PRP$

her

NN

duck

S

NP

PRP

We

VP

VBD

saw

S-BAR

S

NP

PRP

her

VP

VBP

duck

Page 16: Applications of Weighted Tree Automata and Tree ... · Applications of Weighted Tree Automata and Tree Transducers in Natural Language Processing Andreas Maletti Institute of Computer

Weight Structure

De�nition (Commutative semiring)Algebraic structure (S,+, ·,0, 1) is commutative semiring if

(S,+,0) and (S, ·, 1) are commutative monoids

· distributes over �nite sums(s · 0 = 0 for all s ∈ S )

Examples

semiring (N,+, ·,0, 1) of nonnegative integers

semiring (Q,+, ·,0, 1) of rational numbers

Viterbi semiring ([0, 1],max, ·,0, 1) of probabilities

Page 17: Applications of Weighted Tree Automata and Tree ... · Applications of Weighted Tree Automata and Tree Transducers in Natural Language Processing Andreas Maletti Institute of Computer

Weight Structure

De�nition (Commutative semiring)Algebraic structure (S,+, ·,0, 1) is commutative semiring if

(S,+,0) and (S, ·, 1) are commutative monoids

· distributes over �nite sums(s · 0 = 0 for all s ∈ S )

Examples

semiring (N,+, ·,0, 1) of nonnegative integers

semiring (Q,+, ·,0, 1) of rational numbers

Viterbi semiring ([0, 1],max, ·,0, 1) of probabilities

Page 18: Applications of Weighted Tree Automata and Tree ... · Applications of Weighted Tree Automata and Tree Transducers in Natural Language Processing Andreas Maletti Institute of Computer

Weighted Tree Automaton

Fix a commutative semiring (S,+, ·,0, 1)

De�nition (Weighted tree automaton [Berstel, Reutenauer 1982])Tuple (Q,Σ, I, R) is weighted tree automaton if

�nite set Q of states

�nite set Σ of terminals

initial states I ⊆ Q�nite set R of weighted rules of the form q s→ σ(q1, . . . , qk)

(s ∈ S , σ ∈ Σ, k ≥ 0, q, q1, . . . , qk ∈ Q)

Example rules

q40.4→

VP

q5 q2 q3q0

0.8→S

q1 q4q0

0.2→S

q6 q2

Page 19: Applications of Weighted Tree Automata and Tree ... · Applications of Weighted Tree Automata and Tree Transducers in Natural Language Processing Andreas Maletti Institute of Computer

Weighted Tree Automaton

Fix a commutative semiring (S,+, ·,0, 1)

De�nition (Weighted tree automaton [Berstel, Reutenauer 1982])Tuple (Q,Σ, I, R) is weighted tree automaton if

�nite set Q of states

�nite set Σ of terminals

initial states I ⊆ Q�nite set R of weighted rules of the form q s→ σ(q1, . . . , qk)

(s ∈ S , σ ∈ Σ, k ≥ 0, q, q1, . . . , qk ∈ Q)

Example rules

q40.4→

VP

q5 q2 q3q0

0.8→S

q1 q4q0

0.2→S

q6 q2

Page 20: Applications of Weighted Tree Automata and Tree ... · Applications of Weighted Tree Automata and Tree Transducers in Natural Language Processing Andreas Maletti Institute of Computer

Weighted Tree Automaton

De�nition (Derivation semantics and recognized tree language)Let (Q,Σ, I, R) tree automaton

for the state q ∈ Q labeling the le�most state-labeled leaf positionand every rule q s→ r ∈ R

qs

=⇒G

r

derivation D : ts1=⇒ · · · sn=⇒ u from t to u with weight

∏ni=1 si

recognized weighted tree language L

L(t) =∑q∈I

( ∑D : derivationfrom q to t

weight(D)

)

Page 21: Applications of Weighted Tree Automata and Tree ... · Applications of Weighted Tree Automata and Tree Transducers in Natural Language Processing Andreas Maletti Institute of Computer

Weighted Tree Automaton

De�nition (Derivation semantics and recognized tree language)Let (Q,Σ, I, R) tree automaton

for the state q ∈ Q labeling the le�most state-labeled leaf positionand every rule q s→ r ∈ R

qs

=⇒G

r

derivation D : ts1=⇒ · · · sn=⇒ u from t to u with weight

∏ni=1 si

recognized weighted tree language L

L(t) =∑q∈I

( ∑D : derivationfrom q to t

weight(D)

)

Page 22: Applications of Weighted Tree Automata and Tree ... · Applications of Weighted Tree Automata and Tree Transducers in Natural Language Processing Andreas Maletti Institute of Computer

Weighted Tree Automaton

De�nition (Derivation semantics and recognized tree language)Let (Q,Σ, I, R) tree automaton

for the state q ∈ Q labeling the le�most state-labeled leaf positionand every rule q s→ r ∈ R

qs

=⇒G

r

derivation D : ts1=⇒ · · · sn=⇒ u from t to u with weight

∏ni=1 si

recognized weighted tree language L

L(t) =∑q∈I

( ∑D : derivationfrom q to t

weight(D)

)

Page 23: Applications of Weighted Tree Automata and Tree ... · Applications of Weighted Tree Automata and Tree Transducers in Natural Language Processing Andreas Maletti Institute of Computer

Constituent Parsing

F1-scoregrammar |w| ≤ 40 full

CFG 62.7TSG [Post, Gildea, 2009] 82.6TSG [Cohn et al., 2010] 85.4 84.7

CFGsub [Collins, 1999] 88.6 88.2CFGsub [Petrov, Klein, 2007] 90.6 90.1CFGsub [Petrov, 2010] 91.8

TSGsub [Shindo et al., 2012] 92.9 92.4

Page 24: Applications of Weighted Tree Automata and Tree ... · Applications of Weighted Tree Automata and Tree Transducers in Natural Language Processing Andreas Maletti Institute of Computer

Local Tree Languages

CFG (in tree-generation mode) = Local tree grammar

De�nition (Local tree grammar [Gécseg, Steinby 1984])

Local tree grammar (Σ, I, P) is �nite set P of weighted branchings σ s→ w(with σ ∈ Σ, s ∈ S , and w ∈ Σk ) together with a set I ⊆ Σ of root labels

Page 25: Applications of Weighted Tree Automata and Tree ... · Applications of Weighted Tree Automata and Tree Transducers in Natural Language Processing Andreas Maletti Institute of Computer

Local Tree Languages

Example (with root label S)

S 0.3→ NP1 VP2 VP20.6→ MD VP3

NP20.4→ NP2 PP VP3

0.2→ VB PP NP2

MD 0.1→ must . . .

Page 26: Applications of Weighted Tree Automata and Tree ... · Applications of Weighted Tree Automata and Tree Transducers in Natural Language Processing Andreas Maletti Institute of Computer

Local Tree Languages

Example (with root label S)

S 0.3→ NP1 VP2 VP20.6→ MD VP3

NP20.4→ NP2 PP VP3

0.2→ VB PP NP2

MD 0.1→ must . . .

S

NP1

PRP

We

VP2

MD

must

VP3

VB

bear

PP

IN

in

NP1

NN

mind

NP2

NP2

DT

the

NN

Community

PP

IN

as

NP2

DT

a

NN

whole

Page 27: Applications of Weighted Tree Automata and Tree ... · Applications of Weighted Tree Automata and Tree Transducers in Natural Language Processing Andreas Maletti Institute of Computer

Local Tree Languages

Example (with root label S)

S 0.3→ NP1 VP2 VP20.6→ MD VP3

NP20.4→ NP2 PP VP3

0.2→ VB PP NP2

MD 0.1→ must . . .

S

NP1

PRP

We

VP2

MD

must

VP3

VB

bear

PP

IN

in

NP1

NN

mind

NP2

NP2

DT

the

NN

Community

PP

IN

as

NP2

DT

a

NN

whole

Page 28: Applications of Weighted Tree Automata and Tree ... · Applications of Weighted Tree Automata and Tree Transducers in Natural Language Processing Andreas Maletti Institute of Computer

Local Tree Languages

Example (with root label S)

S 0.3→ NP1 VP2 VP20.6→ MD VP3

NP20.4→ NP2 PP VP3

0.2→ VB PP NP2

MD 0.1→ must . . .

S

NP1

PRP

We

VP2

MD

must

VP3

VB

bear

PP

IN

in

NP1

NN

mind

NP2

NP2

DT

the

NN

Community

PP

IN

as

NP2

DT

a

NN

whole

Page 29: Applications of Weighted Tree Automata and Tree ... · Applications of Weighted Tree Automata and Tree Transducers in Natural Language Processing Andreas Maletti Institute of Computer

Local Tree Languages

Example (with root label S)

S 0.3→ NP1 VP2 VP20.6→ MD VP3

NP20.4→ NP2 PP VP3

0.2→ VB PP NP2

MD 0.1→ must . . .

S

NP1

PRP

We

VP2

MD

must

VP3

VB

bear

PP

IN

in

NP1

NN

mind

NP2

NP2

DT

the

NN

Community

PP

IN

as

NP2

DT

a

NN

whole

Page 30: Applications of Weighted Tree Automata and Tree ... · Applications of Weighted Tree Automata and Tree Transducers in Natural Language Processing Andreas Maletti Institute of Computer

Local Tree Languages

Example (with root label S)

S 0.3→ NP1 VP2 VP20.6→ MD VP3

NP20.4→ NP2 PP VP3

0.2→ VB PP NP2

MD 0.1→ must . . .

S

NP1

PRP

We

VP2

MD

must

VP3

VB

bear

PP

IN

in

NP1

NN

mind

NP2

NP2

DT

the

NN

Community

PP

IN

as

NP2

DT

a

NN

whole

Page 31: Applications of Weighted Tree Automata and Tree ... · Applications of Weighted Tree Automata and Tree Transducers in Natural Language Processing Andreas Maletti Institute of Computer

Local Tree Languages

not closed under union

these singletons are local

S

NP2

PRP$

My

NN

dog

VP1

VBZ

sleeps

S

NP2

DT

The

NN

candidates

VP2

VBD

scored

ADVP

RB

well

but their union cannot be local

(as we also generate these trees — overgeneralization)

Page 32: Applications of Weighted Tree Automata and Tree ... · Applications of Weighted Tree Automata and Tree Transducers in Natural Language Processing Andreas Maletti Institute of Computer

Local Tree Languages

not closed under union

these singletons are local

S

NP2

DT

The

NN

candidates

VP1

VBZ

sleeps

S

NP2

PRP$

My

NN

dog

VP2

VBD

scored

ADVP

RB

well

but their union cannot be local(as we also generate these trees — overgeneralization)

Page 33: Applications of Weighted Tree Automata and Tree ... · Applications of Weighted Tree Automata and Tree Transducers in Natural Language Processing Andreas Maletti Institute of Computer

Local Tree Languages

Question

Given a regular tree language L, determine whether L is local

Answer

decidable

Page 34: Applications of Weighted Tree Automata and Tree ... · Applications of Weighted Tree Automata and Tree Transducers in Natural Language Processing Andreas Maletti Institute of Computer

Local Tree Languages

Question

Given a regular tree language L, determine whether L is local

Answer

decidable

Page 35: Applications of Weighted Tree Automata and Tree ... · Applications of Weighted Tree Automata and Tree Transducers in Natural Language Processing Andreas Maletti Institute of Computer

Tree Substitution Languages

De�nition (Tree substitution grammar [Joshi, Schabes 1997])Tree substitution grammar (Σ, I, P) is �nite set P of weighted fragmentsroot(t) s→ t (with s ∈ S and t ∈ TΣ) together with a set of root labels I ⊆ Σ

Typical fragments [Post 2011]

VP 0.2→VP

VBD NP

CD

PP S 0.4→S

NP

PRP

VP S 0.6→

S

NP VP

TO VP

Page 36: Applications of Weighted Tree Automata and Tree ... · Applications of Weighted Tree Automata and Tree Transducers in Natural Language Processing Andreas Maletti Institute of Computer

Tree Substitution Languages

De�nition (Tree substitution grammar [Joshi, Schabes 1997])Tree substitution grammar (Σ, I, P) is �nite set P of weighted fragmentsroot(t) s→ t (with s ∈ S and t ∈ TΣ) together with a set of root labels I ⊆ Σ

Typical fragments [Post 2011]

VP 0.2→VP

VBD NP

CD

PP S 0.4→S

NP

PRP

VP S 0.6→

S

NP VP

TO VP

Page 37: Applications of Weighted Tree Automata and Tree ... · Applications of Weighted Tree Automata and Tree Transducers in Natural Language Processing Andreas Maletti Institute of Computer

Tree Substitution Languages

(Le�most) Derivation step ξs⇒G ζ

ξ = c[root(t)

]and ζ = c

[t]for some context c and root(t) s→ t ∈ P

for each fragment t ∈ P with root label σ

σs

=⇒G

t

recognized weighted tree language L

L(t) =

D : derivationfrom root(t) to t

weight(D) if root(t) ∈ I

0 otherwise

Page 38: Applications of Weighted Tree Automata and Tree ... · Applications of Weighted Tree Automata and Tree Transducers in Natural Language Processing Andreas Maletti Institute of Computer

Tree Substitution Languages

(Le�most) Derivation step ξs⇒G ζ

ξ = c[root(t)

]and ζ = c

[t]for some context c and root(t) s→ t ∈ P

for each fragment t ∈ P with root label σ

σs

=⇒G

t

recognized weighted tree language L

L(t) =

D : derivationfrom root(t) to t

weight(D) if root(t) ∈ I

0 otherwise

Page 39: Applications of Weighted Tree Automata and Tree ... · Applications of Weighted Tree Automata and Tree Transducers in Natural Language Processing Andreas Maletti Institute of Computer

Tree Substitution Languages

(Le�most) Derivation step ξs⇒G ζ

ξ = c[root(t)

]and ζ = c

[t]for some context c and root(t) s→ t ∈ P

for each fragment t ∈ P with root label σ

σs

=⇒G

t

recognized weighted tree language L

L(t) =

D : derivationfrom root(t) to t

weight(D) if root(t) ∈ I

0 otherwise

Page 40: Applications of Weighted Tree Automata and Tree ... · Applications of Weighted Tree Automata and Tree Transducers in Natural Language Processing Andreas Maletti Institute of Computer

Tree Substitution Languages

Fragments

S 0.3→ S(NP1(PRP),VP2

)PRP 0.5→ PRP(We)

VP 0.1→ VP2(MD,VP3(VB, PP,NP2)

)MD 0.2→ MD(must)

Derivation

S

NP1

PRP

We

VP2

MD

must

VP3

VB PP NP2

Page 41: Applications of Weighted Tree Automata and Tree ... · Applications of Weighted Tree Automata and Tree Transducers in Natural Language Processing Andreas Maletti Institute of Computer

Tree Substitution Languages

Fragments

S 0.3→ S(NP1(PRP),VP2

)PRP 0.5→ PRP(We)

VP 0.1→ VP2(MD,VP3(VB, PP,NP2)

)MD 0.2→ MD(must)

Derivation

S

NP1

PRP

We

VP2

MD

must

VP3

VB PP NP2

Page 42: Applications of Weighted Tree Automata and Tree ... · Applications of Weighted Tree Automata and Tree Transducers in Natural Language Processing Andreas Maletti Institute of Computer

Tree Substitution Languages

Fragments

S 0.3→ S(NP1(PRP),VP2

)PRP 0.5→ PRP(We)

VP 0.1→ VP2(MD,VP3(VB, PP,NP2)

)MD 0.2→ MD(must)

Derivation

S

NP1

PRP

We

VP2

MD

must

VP3

VB PP NP2

Page 43: Applications of Weighted Tree Automata and Tree ... · Applications of Weighted Tree Automata and Tree Transducers in Natural Language Processing Andreas Maletti Institute of Computer

Tree Substitution Languages

Fragments

S 0.3→ S(NP1(PRP),VP2

)PRP 0.5→ PRP(We)

VP 0.1→ VP2(MD,VP3(VB, PP,NP2)

)MD 0.2→ MD(must)

Derivation

S

NP1

PRP

We

VP2

MD

must

VP3

VB PP NP2

Page 44: Applications of Weighted Tree Automata and Tree ... · Applications of Weighted Tree Automata and Tree Transducers in Natural Language Processing Andreas Maletti Institute of Computer

Tree Substitution Languages

Fragments

S 0.3→ S(NP1(PRP),VP2

)PRP 0.5→ PRP(We)

VP 0.1→ VP2(MD,VP3(VB, PP,NP2)

)MD 0.2→ MD(must)

Derivation

S

NP1

PRP

We

VP2

MD

must

VP3

VB PP NP2

Page 45: Applications of Weighted Tree Automata and Tree ... · Applications of Weighted Tree Automata and Tree Transducers in Natural Language Processing Andreas Maletti Institute of Computer

Tree Substitution Languages

Fragments

S 0.3→ S(NP1(PRP),VP2

)PRP 0.5→ PRP(We)

VP 0.1→ VP2(MD,VP3(VB, PP,NP2)

)MD 0.2→ MD(must)

Derivation

S

NP1

PRP

We

VP2

MD

must

VP3

VB PP NP2

Page 46: Applications of Weighted Tree Automata and Tree ... · Applications of Weighted Tree Automata and Tree Transducers in Natural Language Processing Andreas Maletti Institute of Computer

Tree Substitution Languages

Fragments

S 0.3→ S(NP1(PRP),VP2

)PRP 0.5→ PRP(We)

VP 0.1→ VP2(MD,VP3(VB, PP,NP2)

)MD 0.2→ MD(must)

Derivation

S

NP1

PRP

We

VP2

MD

must

VP3

VB PP NP2

Page 47: Applications of Weighted Tree Automata and Tree ... · Applications of Weighted Tree Automata and Tree Transducers in Natural Language Processing Andreas Maletti Institute of Computer

Tree Substitution Languages

Fragments

S 0.3→ S(NP1(PRP),VP2

)PRP 0.5→ PRP(We)

VP 0.1→ VP2(MD,VP3(VB, PP,NP2)

)MD 0.2→ MD(must)

Derivation

S

NP1

PRP

We

VP2

MD

must

VP3

VB PP NP2

Page 48: Applications of Weighted Tree Automata and Tree ... · Applications of Weighted Tree Automata and Tree Transducers in Natural Language Processing Andreas Maletti Institute of Computer

Tree Substitution Languages

Fragments

S 0.3→ S(NP1(PRP),VP2

)PRP 0.5→ PRP(We)

VP 0.1→ VP2(MD,VP3(VB, PP,NP2)

)MD 0.2→ MD(must)

Derivation

S

NP1

PRP

We

VP2

MD

must

VP3

VB PP NP2

Page 49: Applications of Weighted Tree Automata and Tree ... · Applications of Weighted Tree Automata and Tree Transducers in Natural Language Processing Andreas Maletti Institute of Computer

Tree Substitution Languages

Fragments

S 0.3→ S(NP1(PRP),VP2

)PRP 0.5→ PRP(We)

VP 0.1→ VP2(MD,VP3(VB, PP,NP2)

)MD 0.2→ MD(must)

Derivation

S

NP1

PRP

We

VP2

MD

must

VP3

VB PP NP2

Page 50: Applications of Weighted Tree Automata and Tree ... · Applications of Weighted Tree Automata and Tree Transducers in Natural Language Processing Andreas Maletti Institute of Computer

Tree Substitution Languages

Fragments

S 0.3→ S(NP1(PRP),VP2

)PRP 0.5→ PRP(We)

VP 0.1→ VP2(MD,VP3(VB, PP,NP2)

)MD 0.2→ MD(must)

Derivation

S

NP1

PRP

We

VP2

MD

must

VP3

VB PP NP2

Page 51: Applications of Weighted Tree Automata and Tree ... · Applications of Weighted Tree Automata and Tree Transducers in Natural Language Processing Andreas Maletti Institute of Computer

Tree Substitution Languages

Fragments

S 0.3→ S(NP1(PRP),VP2

)PRP 0.5→ PRP(We)

VP 0.1→ VP2(MD,VP3(VB, PP,NP2)

)MD 0.2→ MD(must)

Derivation

S

NP1

PRP

We

VP2

MD

must

VP3

VB PP NP2

Page 52: Applications of Weighted Tree Automata and Tree ... · Applications of Weighted Tree Automata and Tree Transducers in Natural Language Processing Andreas Maletti Institute of Computer

Tree Substitution Languages

Fragments

S 0.3→ S(NP1(PRP),VP2

)PRP 0.5→ PRP(We)

VP 0.1→ VP2(MD,VP3(VB, PP,NP2)

)MD 0.2→ MD(must)

Derivation

S

NP1

PRP

We

VP2

MD

must

VP3

VB PP NP2

Page 53: Applications of Weighted Tree Automata and Tree ... · Applications of Weighted Tree Automata and Tree Transducers in Natural Language Processing Andreas Maletti Institute of Computer

Tree Substitution Languages

not closed under union

these languages are tree substitution languages individuallyS

C

C

C

a

a

S

C

C

C

b

b

L1 = {S(Cn(a), a) | n ∈ N} L2 = {S(Cn(b), b) | n ∈ N}

but their union is not

(exchange subtrees below the indicated cuts)

Page 54: Applications of Weighted Tree Automata and Tree ... · Applications of Weighted Tree Automata and Tree Transducers in Natural Language Processing Andreas Maletti Institute of Computer

Tree Substitution Languages

not closed under union

these languages are tree substitution languages individuallyS

C

C

C

a

a

S

C

C

C

b

b

L1 = {S(Cn(a), a) | n ∈ N} L2 = {S(Cn(b), b) | n ∈ N}

but their union is not(exchange subtrees below the indicated cuts)

Page 55: Applications of Weighted Tree Automata and Tree ... · Applications of Weighted Tree Automata and Tree Transducers in Natural Language Processing Andreas Maletti Institute of Computer

Tree Substitution Languages

not closed under intersectionthese languages L1 and L2 are tree substitution languages individuallyfor n ≥ 1 and arbitrary x1, . . . , xn ∈ {a, b}

S’

x1 S

x1 S

x2 S

x2 S

x3 S

xn−1 S

xn S

xn c

∈ L1

S’

x1 S

x2 S

x2 S

x3 S

x3 S

xn−1 S

xn−1 S

xn c

∈ L2

but their intersection only contains trees with x1 = x2 = · · · = xnand is not a tree substitution language

Page 56: Applications of Weighted Tree Automata and Tree ... · Applications of Weighted Tree Automata and Tree Transducers in Natural Language Processing Andreas Maletti Institute of Computer

Tree Substitution Languages

not closed under intersectionthese languages L1 and L2 are tree substitution languages individuallyfor n ≥ 1 and arbitrary x1, . . . , xn ∈ {a, b}

S’

x1 S

x1 S

x2 S

x2 S

x3 S

xn−1 S

xn S

xn c

∈ L1

S’

x1 S

x2 S

x2 S

x3 S

x3 S

xn−1 S

xn−1 S

xn c

∈ L2

but their intersection only contains trees with x1 = x2 = · · · = xnand is not a tree substitution language

Page 57: Applications of Weighted Tree Automata and Tree ... · Applications of Weighted Tree Automata and Tree Transducers in Natural Language Processing Andreas Maletti Institute of Computer

Tree Substitution Languages

not closed under complement

this language L is a tree substitution languageS

A

A

A′

A′

a

∈ L

S

B

B

B′

B′

b

∈ L

but its complement is not

(exchange as indicated in red)

Page 58: Applications of Weighted Tree Automata and Tree ... · Applications of Weighted Tree Automata and Tree Transducers in Natural Language Processing Andreas Maletti Institute of Computer

Tree Substitution Languages

not closed under complement

this language L is a tree substitution languageS

A

A

A′

A′

a

∈ L

S

B

B

B′

B′

b

∈ L

S

A

A

A′

A′

b

/∈ L

S

B

B

A′

A′

a

/∈ L

but its complement is not(exchange as indicated in red)

Page 59: Applications of Weighted Tree Automata and Tree ... · Applications of Weighted Tree Automata and Tree Transducers in Natural Language Processing Andreas Maletti Institute of Computer

Tree Substitution Languages

Question

Given regular tree language L, is it a tree substitution language

Answer

???

Page 60: Applications of Weighted Tree Automata and Tree ... · Applications of Weighted Tree Automata and Tree Transducers in Natural Language Processing Andreas Maletti Institute of Computer

Tree Substitution Languages

Question

Given regular tree language L, is it a tree substitution language

Answer

???

Page 61: Applications of Weighted Tree Automata and Tree ... · Applications of Weighted Tree Automata and Tree Transducers in Natural Language Processing Andreas Maletti Institute of Computer

Subcategorization

Tags:o�cial tags o�en conservative

I English: ≈ 50 tagsI German: � 200 tags ADJA-Sup-Dat-Sg-Fem

all modern parsers use re�ned tags→ subcategorizationbut return parse over o�cial tags→ relabeling

S

NP

PRP$

My

NN

dog

VP

VBZ

sleeps

Comparison:rule of subcategorized CFG vs. corresponding rule of tree automaton

S-1→ ADJP-2 S-1 S-1→ S(ADJP-2, S-1

)

Page 62: Applications of Weighted Tree Automata and Tree ... · Applications of Weighted Tree Automata and Tree Transducers in Natural Language Processing Andreas Maletti Institute of Computer

Subcategorization

Tags:o�cial tags o�en conservative

I English: ≈ 50 tagsI German: � 200 tags ADJA-Sup-Dat-Sg-Fem

all modern parsers use re�ned tags→ subcategorization

but return parse over o�cial tags→ relabeling

S-1

NP-4

PRP$-3

My

NN-2

dog

VP-5

VBZ-7

sleeps

Comparison:rule of subcategorized CFG vs. corresponding rule of tree automaton

S-1→ ADJP-2 S-1 S-1→ S(ADJP-2, S-1

)

Page 63: Applications of Weighted Tree Automata and Tree ... · Applications of Weighted Tree Automata and Tree Transducers in Natural Language Processing Andreas Maletti Institute of Computer

Subcategorization

Tags:o�cial tags o�en conservative

I English: ≈ 50 tagsI German: � 200 tags ADJA-Sup-Dat-Sg-Fem

all modern parsers use re�ned tags→ subcategorizationbut return parse over o�cial tags→ relabeling

S

NP

PRP$

My

NN

dog

VP

VBZ

sleeps

Comparison:rule of subcategorized CFG vs. corresponding rule of tree automaton

S-1→ ADJP-2 S-1 S-1→ S(ADJP-2, S-1

)

Page 64: Applications of Weighted Tree Automata and Tree ... · Applications of Weighted Tree Automata and Tree Transducers in Natural Language Processing Andreas Maletti Institute of Computer

Constituent Parsing

F1-scoregrammar |w| ≤ 40 full

CFG 62.7TSG [Post, Gildea, 2009] 82.6TSG [Cohn et al., 2010] 85.4 84.7

CFGsub [Collins, 1999] 88.6 88.2CFGsub [Petrov, Klein, 2007] 90.6 90.1CFGsub [Petrov, 2010] 91.8

TSGsub [Shindo et al., 2012] 92.9 92.4

TA

TSGsub

TSG CFGsub

CFG

Hence:

subcategorization = �nite-state

all modern models equivalent to tree automata in expressive power

Page 65: Applications of Weighted Tree Automata and Tree ... · Applications of Weighted Tree Automata and Tree Transducers in Natural Language Processing Andreas Maletti Institute of Computer

Constituent Parsing

F1-scoregrammar |w| ≤ 40 full

CFG 62.7TSG [Post, Gildea, 2009] 82.6TSG [Cohn et al., 2010] 85.4 84.7

CFGsub [Collins, 1999] 88.6 88.2CFGsub [Petrov, Klein, 2007] 90.6 90.1CFGsub [Petrov, 2010] 91.8

TSGsub [Shindo et al., 2012] 92.9 92.4

TA

TSGsub

TSG CFGsub

CFG

Hence:

subcategorization = �nite-state

all modern models equivalent to tree automata in expressive power

Page 66: Applications of Weighted Tree Automata and Tree ... · Applications of Weighted Tree Automata and Tree Transducers in Natural Language Processing Andreas Maletti Institute of Computer

Constituent Parsing

F1-scoregrammar |w| ≤ 40 full

CFG 62.7TSG [Post, Gildea, 2009] 82.6TSG [Cohn et al., 2010] 85.4 84.7

CFGsub [Collins, 1999] 88.6 88.2CFGsub [Petrov, Klein, 2007] 90.6 90.1CFGsub [Petrov, 2010] 91.8

TSGsub [Shindo et al., 2012] 92.9 92.4

TA

TSGsub

TSG CFGsub

CFG

Hence:

subcategorization = �nite-state

all modern models equivalent to tree automata in expressive power

Page 67: Applications of Weighted Tree Automata and Tree ... · Applications of Weighted Tree Automata and Tree Transducers in Natural Language Processing Andreas Maletti Institute of Computer

Parsing

Parsingdetermining the syntactic structure of a given sentencesubject to a given theory of syntax (encoded in the training data)

I constituent syntaxI dependency syntaxI . . .

> John saw a dog yesterday which was a Yorkshire Terrier

Practical results:linear-time statistical parsersGoogle’s “Parsey McParseface” [Andor et al., 2016]94% F1-score; linguists achieve 96–97%

Page 68: Applications of Weighted Tree Automata and Tree ... · Applications of Weighted Tree Automata and Tree Transducers in Natural Language Processing Andreas Maletti Institute of Computer

Parsing

Parsingdetermining the syntactic structure of a given sentencesubject to a given theory of syntax (encoded in the training data)

I constituent syntaxI dependency syntaxI . . .

> John saw a dog yesterday which was a Yorkshire Terrier

Practical results:linear-time statistical parsersGoogle’s “Parsey McParseface” [Andor et al., 2016]94% F1-score; linguists achieve 96–97%

Page 69: Applications of Weighted Tree Automata and Tree ... · Applications of Weighted Tree Automata and Tree Transducers in Natural Language Processing Andreas Maletti Institute of Computer

Combinatory Categorial Grammars

Combinators (Compositions)Composition rules of degree k are

ax/c, cy → axy (forward rule)

cy, ax /c → axy (backward rule)

with y = |1c1 |2 · · · |kck

ExamplesC D/E/D /C

D/E/D︸ ︷︷ ︸degree 0

D/E/D D/E /CD/E/E /C︸ ︷︷ ︸degree 2

Page 70: Applications of Weighted Tree Automata and Tree ... · Applications of Weighted Tree Automata and Tree Transducers in Natural Language Processing Andreas Maletti Institute of Computer

Combinatory Categorial Grammars

Combinators (Compositions)Composition rules of degree k are

ax/c, cy → axy (forward rule)

cy, ax /c → axy (backward rule)

with y = |1c1 |2 · · · |kck

ExamplesC D/E/D /C

D/E/D︸ ︷︷ ︸degree 0

D/E/D D/E /CD/E/E /C︸ ︷︷ ︸degree 2

Page 71: Applications of Weighted Tree Automata and Tree ... · Applications of Weighted Tree Automata and Tree Transducers in Natural Language Processing Andreas Maletti Institute of Computer

Combinatory Categorial Grammars

Combinatory Categorial Grammar (CCG)(Σ,A, k, I, L)

terminal alphabet Σ and atomic categories A

maximal degree k ∈ N ∪ {∞} of composition rules

initial categories I ⊆ Alexicon L ⊆ Σ× C(A) with C(A) categories over A

Notes:

always all rules up to the given degree k allowed

k-CCG = CCG using all composition rules up to degree k

Page 72: Applications of Weighted Tree Automata and Tree ... · Applications of Weighted Tree Automata and Tree Transducers in Natural Language Processing Andreas Maletti Institute of Computer

Combinatory Categorial Grammars

Combinatory Categorial Grammar (CCG)(Σ,A, k, I, L)

terminal alphabet Σ and atomic categories A

maximal degree k ∈ N ∪ {∞} of composition rules

initial categories I ⊆ Alexicon L ⊆ Σ× C(A) with C(A) categories over A

Notes:

always all rules up to the given degree k allowed

k-CCG = CCG using all composition rules up to degree k

Page 73: Applications of Weighted Tree Automata and Tree ... · Applications of Weighted Tree Automata and Tree Transducers in Natural Language Processing Andreas Maletti Institute of Computer

Combinatory Categorial Grammars

c........

C

c..C

d..

D/E/D /CD/E/D

d.....

D/E /CD/E/E /C

D/E/E

e...........

ED/E

e..............

ED

2-CCG generates string language L with L ∩ c+d+e+ = {c id iei | i ≥ 1}for initial categories {D}

L(c) = {C}L(d) = {D/E /C , D/E/D /C}L(e) = {E}

Page 74: Applications of Weighted Tree Automata and Tree ... · Applications of Weighted Tree Automata and Tree Transducers in Natural Language Processing Andreas Maletti Institute of Computer

Combinatory Categorial Grammars

allow (deterministic) relabeling (to allow arbitrary labels)

Question

Which tree languages are legal proof trees of CCG

Page 75: Applications of Weighted Tree Automata and Tree ... · Applications of Weighted Tree Automata and Tree Transducers in Natural Language Processing Andreas Maletti Institute of Computer

Machine Translation

Review translation [by Translate

a�er 2016

]1 The room it is not narrowly was a simple, bathtub was also attached.2 Wi-�, TV and I was available.3 Church looked When morning awake open the curtain.4 But was a little cold, morning walks was good.

Original [Japanese — © ]1 部屋もシンプルでしたが狭くなく、バスタブもついていました。

2 Wi-�、テレビも利用出来ました。

3 朝起きてカーテンを開けると教会が見えました。

4 ちょっと寒かったけれど、朝の散策はグッドでしたよ。

Page 76: Applications of Weighted Tree Automata and Tree ... · Applications of Weighted Tree Automata and Tree Transducers in Natural Language Processing Andreas Maletti Institute of Computer

Machine Translation

Review translation [by Translate

a�er 2016

]1 The room it is not narrowly was a simple, bathtub was also attached.2 Wi-�, TV and I was available.3 Church looked When morning awake open the curtain.4 But was a little cold, morning walks was good.

Original [Japanese — © ]1 部屋もシンプルでしたが狭くなく、バスタブもついていました。

2 Wi-�、テレビも利用出来ました。

3 朝起きてカーテンを開けると教会が見えました。

4 ちょっと寒かったけれど、朝の散策はグッドでしたよ。

Page 77: Applications of Weighted Tree Automata and Tree ... · Applications of Weighted Tree Automata and Tree Transducers in Natural Language Processing Andreas Maletti Institute of Computer

Machine Translation

Review translation [by Translate a�er 2016]1 The room was simple, but it was not small,

and the bathtub was also attached.2 Wi-�, TV was also available.3 When I woke up in the morning and opened the curtain,

I saw the church.4 It was a bit cold, but walking in the morning was good.

Original [Japanese — © ]1 部屋もシンプルでしたが狭くなく、バスタブもついていました。

2 Wi-�、テレビも利用出来ました。

3 朝起きてカーテンを開けると教会が見えました。

4 ちょっと寒かったけれど、朝の散策はグッドでしたよ。

Page 78: Applications of Weighted Tree Automata and Tree ... · Applications of Weighted Tree Automata and Tree Transducers in Natural Language Processing Andreas Maletti Institute of Computer

Machine Translation

Short History:today

Neural networks

2016Reformationphrase-based and syntax-based systemsstatistical approach (cheap, automatically trained)

1991Dark agerule-based systems (e.g., )Chomskyan approach (perfect translation, poor coverage)

1960

Page 79: Applications of Weighted Tree Automata and Tree ... · Applications of Weighted Tree Automata and Tree Transducers in Natural Language Processing Andreas Maletti Institute of Computer

Machine Translation

Vauquois triangle:

phrase

syntax

semantics

foreign English

Translation model: string-to-string

Page 80: Applications of Weighted Tree Automata and Tree ... · Applications of Weighted Tree Automata and Tree Transducers in Natural Language Processing Andreas Maletti Institute of Computer

Machine Translation

Vauquois triangle:

phrase

syntax

semantics

foreign English

Translation model: string-to-tree

Page 81: Applications of Weighted Tree Automata and Tree ... · Applications of Weighted Tree Automata and Tree Transducers in Natural Language Processing Andreas Maletti Institute of Computer

Machine Translation

Vauquois triangle:

phrase

syntax

semantics

foreign English

Translation model: tree-to-tree

Page 82: Applications of Weighted Tree Automata and Tree ... · Applications of Weighted Tree Automata and Tree Transducers in Natural Language Processing Andreas Maletti Institute of Computer

Machine Translation

parallel corpus, word alignments, parse tree

I would like your advice about Rule 143 concerning inadmissibility

Könnten Sie mir eine Auskun� zu Artikel 143 im Zusammenhang mit der Unzulässigkeit geben

KOUS PPER PPER ART NN APPR NN CD AART NN APPR ART NN VV

PP

PP

PP

NP

S

Page 83: Applications of Weighted Tree Automata and Tree ... · Applications of Weighted Tree Automata and Tree Transducers in Natural Language Processing Andreas Maletti Institute of Computer

Machine Translation

parallel corpus, word alignments, parse tree

I would like your advice about Rule 143 concerning inadmissibility

Könnten Sie mir eine Auskun� zu Artikel 143 im Zusammenhang mit der Unzulässigkeit geben

KOUS PPER PPER ART NN APPR NN CD AART NN APPR ART NN VV

PP

PP

PP

NP

S

via GIZA++ [Och, Ney: A systematic comparison of various statistical alignment models. Computational Linguistics 29(1), 2003]

Page 84: Applications of Weighted Tree Automata and Tree ... · Applications of Weighted Tree Automata and Tree Transducers in Natural Language Processing Andreas Maletti Institute of Computer

Machine Translation

parallel corpus, word alignments, parse tree

I would like your advice about Rule 143 concerning inadmissibility

Könnten Sie mir eine Auskun� zu Artikel 143 im Zusammenhang mit der Unzulässigkeit geben

KOUS PPER PPER ART NN APPR NN CD AART NN APPR ART NN VV

PP

PP

PP

NP

S

via Berkeley parser [Petrov, Barrett, Thibaux, Klein: Learning accurate, compact, and interpretable tree annotation. Proc. ACL, 2006]

Page 85: Applications of Weighted Tree Automata and Tree ... · Applications of Weighted Tree Automata and Tree Transducers in Natural Language Processing Andreas Maletti Institute of Computer

Weighted Synchronous Grammars

Synchronous tree substitution grammar: productions N →(r, r1

)nonterminal N

right-hand side r of context-free grammar production

right-hand side r1 of tree substitution grammar production

(bijective) synchronization of nonterminals

S→

PPER would likeKOUS PPER advice PP

Könnten eine Auskun� geben

KOUS PPER PPER ART NN PP VV

NP

S

variant of [M., Graehl, Hopkins, Knight: The power of extended top-down tree transducers. SIAM Journal on Computing 39(2), 2009]

Page 86: Applications of Weighted Tree Automata and Tree ... · Applications of Weighted Tree Automata and Tree Transducers in Natural Language Processing Andreas Maletti Institute of Computer

Weighted Synchronous Grammars

Synchronous tree substitution grammar: productions N →(r, r1

)nonterminal N

right-hand side r of context-free grammar production

right-hand side r1 of tree substitution grammar production

(bijective) synchronization of nonterminals

S→

PPER would likeKOUS PPER advice PP

Könnten eine Auskun� geben

KOUS PPER PPER ART NN PP VV

NP

S

variant of [M., Graehl, Hopkins, Knight: The power of extended top-down tree transducers. SIAM Journal on Computing 39(2), 2009]

Page 87: Applications of Weighted Tree Automata and Tree ... · Applications of Weighted Tree Automata and Tree Transducers in Natural Language Processing Andreas Maletti Institute of Computer

Synchronous Grammars

S→

PPER would likeKOUS PPER advice PP

Könnten eine Auskun� geben

KOUS PPER PPER ART NN PP VV

NP

S

Production application:1 Selection of synchronous nonterminals

2 Selection of suitable production3 Replacement on both sides

Page 88: Applications of Weighted Tree Automata and Tree ... · Applications of Weighted Tree Automata and Tree Transducers in Natural Language Processing Andreas Maletti Institute of Computer

Synchronous Grammars

S→

PPER would likeKOUS PPER advice PP

Könnten eine Auskun� geben

KOUS PPER PPER ART NN PP VV

NP

S

Production application:1 Selection of synchronous nonterminals

2 Selection of suitable production3 Replacement on both sides

Page 89: Applications of Weighted Tree Automata and Tree ... · Applications of Weighted Tree Automata and Tree Transducers in Natural Language Processing Andreas Maletti Institute of Computer

Synchronous Grammars

S→

PPER would likeKOUS PPER advice PP

Könnten eine Auskun� geben

KOUS PPER PPER ART NN PP VV

NP

S

Production application:1 Selection of synchronous nonterminals2 Selection of suitable production

3 Replacement on both sides

KOUS→

would like

Könnten

KOUS

Page 90: Applications of Weighted Tree Automata and Tree ... · Applications of Weighted Tree Automata and Tree Transducers in Natural Language Processing Andreas Maletti Institute of Computer

Synchronous Grammars

S→

PPER KOUSwould like PPER advice PP

Könnten eine Auskun� geben

KOUS PPER PPER ART NN PP VV

NP

S

Production application:1 Selection of synchronous nonterminals2 Selection of suitable production3 Replacement on both sides

KOUS→

would like

Könnten

KOUS

Page 91: Applications of Weighted Tree Automata and Tree ... · Applications of Weighted Tree Automata and Tree Transducers in Natural Language Processing Andreas Maletti Institute of Computer

Synchronous Grammars

S→

PPER would like PPER advice APPR NN CD PPPP

Könnten eine Auskun� geben

KOUS PPER PPER ART NN APPR NN CD PP VV

PP

NP

S

Production application:1 synchronous nonterminals

2 suitable production3 replacement

Page 92: Applications of Weighted Tree Automata and Tree ... · Applications of Weighted Tree Automata and Tree Transducers in Natural Language Processing Andreas Maletti Institute of Computer

Synchronous Grammars

S→

PPER would like PPER advice APPR NN CD PPPP

Könnten eine Auskun� geben

KOUS PPER PPER ART NN APPR NN CD PP VV

PP

NP

S

Production application:1 synchronous nonterminals

2 suitable production3 replacement

Page 93: Applications of Weighted Tree Automata and Tree ... · Applications of Weighted Tree Automata and Tree Transducers in Natural Language Processing Andreas Maletti Institute of Computer

Synchronous Grammars

S→

PPER would like PPER advice APPR NN CD PPPP

Könnten eine Auskun� geben

KOUS PPER PPER ART NN APPR NN CD PP VV

PP

NP

S

Production application:1 synchronous nonterminals2 suitable production

3 replacement

PP→

APPR NN CD PP

APPR NN CD PP

PP

Page 94: Applications of Weighted Tree Automata and Tree ... · Applications of Weighted Tree Automata and Tree Transducers in Natural Language Processing Andreas Maletti Institute of Computer

Synchronous Grammars

S→

PPER would like PPER advice PPAPPR NN CD PP

Könnten eine Auskun� geben

KOUS PPER PPER ART NN APPR NN CD PP VV

PP

NP

S

Production application:1 synchronous nonterminals2 suitable production3 replacement

PP→

APPR NN CD PP

APPR NN CD PP

PP

Page 95: Applications of Weighted Tree Automata and Tree ... · Applications of Weighted Tree Automata and Tree Transducers in Natural Language Processing Andreas Maletti Institute of Computer

Production Extraction

(extractable productions marked in red)

would like your advice about Rule 143 concerning inadmissibility

Könnten Sie mir eine Auskun� zu Artikel 143 im Zusammenhang mit der Unzulässigkeit geben

KOUS PPER PPER ART NN APPR NN CD AART NN APPR ART NN VV

PP

PP

PP

NP

S

I

following [Galley, Hopkins, Knight, Marcu: What’s in a translation rule? Proc. NAACL, 2004]

Page 96: Applications of Weighted Tree Automata and Tree ... · Applications of Weighted Tree Automata and Tree Transducers in Natural Language Processing Andreas Maletti Institute of Computer

Production Extraction

(extractable productions marked in red)

would like your advice about Rule 143 concerning inadmissibility

Könnten Sie mir eine Auskun� zu Artikel 143 im Zusammenhang mit der Unzulässigkeit geben

KOUS PPER PPER ART NN APPR NN CD AART NN APPR ART NN VV

PP

PP

PP

NP

S

I

following [Galley, Hopkins, Knight, Marcu: What’s in a translation rule? Proc. NAACL, 2004]

Page 97: Applications of Weighted Tree Automata and Tree ... · Applications of Weighted Tree Automata and Tree Transducers in Natural Language Processing Andreas Maletti Institute of Computer

Production Extraction

(extractable productions marked in red)

would like your advice about Rule 143 concerning inadmissibility

Könnten Sie mir eine Auskun� zu Artikel 143 im Zusammenhang mit der Unzulässigkeit geben

KOUS PPER PPER ART NN APPR NN CD AART NN APPR ART NN VV

PP

PP

PP

NP

S

I

following [Galley, Hopkins, Knight, Marcu: What’s in a translation rule? Proc. NAACL, 2004]

Page 98: Applications of Weighted Tree Automata and Tree ... · Applications of Weighted Tree Automata and Tree Transducers in Natural Language Processing Andreas Maletti Institute of Computer

Production Extraction

(extractable productions marked in red)

would like your advice about Rule 143 concerning inadmissibility

Könnten Sie mir eine Auskun� zu Artikel 143 im Zusammenhang mit der Unzulässigkeit geben

KOUS PPER PPER ART NN APPR NN CD AART NN APPR ART NN VV

PP

PP

PP

NP

S

I

following [Galley, Hopkins, Knight, Marcu: What’s in a translation rule? Proc. NAACL, 2004]

Page 99: Applications of Weighted Tree Automata and Tree ... · Applications of Weighted Tree Automata and Tree Transducers in Natural Language Processing Andreas Maletti Institute of Computer

Production Extraction

(extractable productions marked in red)

would like your advice about Rule 143 concerning inadmissibility

Könnten Sie mir eine Auskun� zu Artikel 143 im Zusammenhang mit der Unzulässigkeit geben

KOUS PPER PPER ART NN APPR NN CD AART NN APPR ART NN VV

PP

PP

PP

NP

S

I

following [Galley, Hopkins, Knight, Marcu: What’s in a translation rule? Proc. NAACL, 2004]

Page 100: Applications of Weighted Tree Automata and Tree ... · Applications of Weighted Tree Automata and Tree Transducers in Natural Language Processing Andreas Maletti Institute of Computer

Production Extraction

Removal of extractable production:

would like your advice about Rule 143 concerning inadmissibility

Könnten Sie mir eine Auskun� zu Artikel 143 im Zusammenhang mit der Unzulässigkeit geben

KOUS PPER PPER ART NN APPR NN CD AART NN APPR ART NN VV

PP

PP

PP

NP

S

I

Page 101: Applications of Weighted Tree Automata and Tree ... · Applications of Weighted Tree Automata and Tree Transducers in Natural Language Processing Andreas Maletti Institute of Computer

Production Extraction

Removal of extractable production:

PPER would like your advice about Rule 143

Könnten Sie eine Auskun� zu Artikel 143 geben

KOUS PPER PPER ART NN APPR NN CD VV

PP

PP

NP

S

PP

Page 102: Applications of Weighted Tree Automata and Tree ... · Applications of Weighted Tree Automata and Tree Transducers in Natural Language Processing Andreas Maletti Institute of Computer

Production Extraction

Repeated production extraction:

(extractable productions marked in red)

PPER would like your advice about Rule 143

Könnten Sie eine Auskun� zu Artikel 143 geben

KOUS PPER PPER ART NN APPR NN CD VV

PP

PP

NP

S

PP

Page 103: Applications of Weighted Tree Automata and Tree ... · Applications of Weighted Tree Automata and Tree Transducers in Natural Language Processing Andreas Maletti Institute of Computer

Production Extraction

Repeated production extraction:

(extractable productions marked in red)

PPER would like your advice about Rule 143

Könnten Sie eine Auskun� zu Artikel 143 geben

KOUS PPER PPER ART NN APPR NN CD VV

PP

PP

NP

S

PP

Page 104: Applications of Weighted Tree Automata and Tree ... · Applications of Weighted Tree Automata and Tree Transducers in Natural Language Processing Andreas Maletti Institute of Computer

Production Extraction

Repeated production extraction: (extractable productions marked in red)

PPER would like your advice about Rule 143

Könnten Sie eine Auskun� zu Artikel 143 geben

KOUS PPER PPER ART NN APPR NN CD VV

PP

PP

NP

S

PP

Page 105: Applications of Weighted Tree Automata and Tree ... · Applications of Weighted Tree Automata and Tree Transducers in Natural Language Processing Andreas Maletti Institute of Computer

Production Extraction

Repeated production extraction: (extractable productions marked in red)

PPER would like your advice about Rule 143

Könnten Sie eine Auskun� zu Artikel 143 geben

KOUS PPER PPER ART NN APPR NN CD VV

PP

PP

NP

S

PP

Page 106: Applications of Weighted Tree Automata and Tree ... · Applications of Weighted Tree Automata and Tree Transducers in Natural Language Processing Andreas Maletti Institute of Computer

Production Extraction

Repeated production extraction: (extractable productions marked in red)

PPER would like your advice about Rule 143

Könnten Sie eine Auskun� zu Artikel 143 geben

KOUS PPER PPER ART NN APPR NN CD VV

PP

PP

NP

S

PP

Page 107: Applications of Weighted Tree Automata and Tree ... · Applications of Weighted Tree Automata and Tree Transducers in Natural Language Processing Andreas Maletti Institute of Computer

Production Extraction

Repeated production extraction: (extractable productions marked in red)

PPER would like your advice about Rule 143

Könnten Sie eine Auskun� zu Artikel 143 geben

KOUS PPER PPER ART NN APPR NN CD VV

PP

PP

NP

S

PP

Page 108: Applications of Weighted Tree Automata and Tree ... · Applications of Weighted Tree Automata and Tree Transducers in Natural Language Processing Andreas Maletti Institute of Computer

Synchronous Tree Substitution Grammars

Advantages:

very simple

implemented in framework ‘Moses’[Koehn et al.: Moses — Open source toolkit for statistical machine translation. Proc. ACL, 2007]

“context-free”

Disadvantages:

problems with discontinuities

composition and binarization not possible[M., Graehl, Hopkins, Knight: The power of extended top-down tree transducers. SIAM Journal on Computing 39(2), 2009]

[Zhang, Huang, Gildea, Knight: Synchronous Binarization for Machine Translation. Proc. NAACL, 2006]

“context-free”

Page 109: Applications of Weighted Tree Automata and Tree ... · Applications of Weighted Tree Automata and Tree Transducers in Natural Language Processing Andreas Maletti Institute of Computer

Synchronous Tree Substitution Grammars

Advantages:

very simple

implemented in framework ‘Moses’[Koehn et al.: Moses — Open source toolkit for statistical machine translation. Proc. ACL, 2007]

“context-free”

Disadvantages:

problems with discontinuities

composition and binarization not possible[M., Graehl, Hopkins, Knight: The power of extended top-down tree transducers. SIAM Journal on Computing 39(2), 2009]

[Zhang, Huang, Gildea, Knight: Synchronous Binarization for Machine Translation. Proc. NAACL, 2006]

“context-free”

Page 110: Applications of Weighted Tree Automata and Tree ... · Applications of Weighted Tree Automata and Tree Transducers in Natural Language Processing Andreas Maletti Institute of Computer

Evaluation

English → German translation task: (higher BLEU is better)

Type System BLEUvanilla WMT 2013 WMT 2015

string-to-string FST 16.8 20.3 25.2string-to-tree STSG 15.2 19.4 24.5tree-to-tree STSG 14.5 — 15.3

STSG = synchronous tree substitution grammar

Observations:

syntax-based systems competitive with manual adjustments

much less so for vanilla systems

very unfortunate situation (more supervision yields lower scores)

from [Seemann, Braune, M.: A systematic evaluation of MBOT in statistical machine translation. Proc. MT-Summit, 2015]and [Bojar et al.: Findings of the 2013 workshop on statistical machine translation. Proc. WMT, 2013]and [Bojar et al.: Findings of the 2015 workshop on statistical machine translation. Proc. WMT, 2015]

Page 111: Applications of Weighted Tree Automata and Tree ... · Applications of Weighted Tree Automata and Tree Transducers in Natural Language Processing Andreas Maletti Institute of Computer

Evaluation

English → German translation task: (higher BLEU is better)

Type System BLEUvanilla WMT 2013 WMT 2015

string-to-string FST 16.8 20.3 25.2string-to-tree STSG 15.2 19.4 24.5tree-to-tree STSG 14.5 — 15.3

STSG = synchronous tree substitution grammar

Observations:

syntax-based systems competitive with manual adjustments

much less so for vanilla systems

very unfortunate situation (more supervision yields lower scores)from [Seemann, Braune, M.: A systematic evaluation of MBOT in statistical machine translation. Proc. MT-Summit, 2015]and [Bojar et al.: Findings of the 2013 workshop on statistical machine translation. Proc. WMT, 2013]and [Bojar et al.: Findings of the 2015 workshop on statistical machine translation. Proc. WMT, 2015]

Page 112: Applications of Weighted Tree Automata and Tree ... · Applications of Weighted Tree Automata and Tree Transducers in Natural Language Processing Andreas Maletti Institute of Computer

Production Extraction

PPER would like your advice about Rule 143

Könnten Sie eine Auskun� zu Artikel 143 geben

KOUS PPER PPER ART NN APPR NN CD VV

PP

PP

NP

S

PP

very speci�c production

every production for ‘advice’ contains sentence structure(syntax “in the way”)

Page 113: Applications of Weighted Tree Automata and Tree ... · Applications of Weighted Tree Automata and Tree Transducers in Natural Language Processing Andreas Maletti Institute of Computer

Synchronous Grammars

Synchronous multi tree substitution grammar: N →(r, 〈r1, . . . , rn〉

)variant of [M.: Why synchronous tree substitution grammars?. Proc. NAACL, 2010]

nonterminal N

right-hand side r of context-free grammar production

right-hand sides r1, . . . , rn of regular tree grammar production

synchronization via map NT r1, . . . , rn to NT r

Page 114: Applications of Weighted Tree Automata and Tree ... · Applications of Weighted Tree Automata and Tree Transducers in Natural Language Processing Andreas Maletti Institute of Computer

Synchronous Grammars

Synchronous multi tree substitution grammar: N →(r, 〈r1, . . . , rn〉

)variant of [M.: Why synchronous tree substitution grammars?. Proc. NAACL, 2010]

nonterminal N

right-hand side r of context-free grammar production

right-hand sides r1, . . . , rn of regular tree grammar production

synchronization via map NT r1, . . . , rn to NT r

ART-NN-VV→

advice

eine

ART

Auskun�

NN

geben

VV

Page 115: Applications of Weighted Tree Automata and Tree ... · Applications of Weighted Tree Automata and Tree Transducers in Natural Language Processing Andreas Maletti Institute of Computer

Synchronous Grammars

Synchronous multi tree substitution grammar: N →(r, 〈r1, . . . , rn〉

)variant of [M.: Why synchronous tree substitution grammars?. Proc. NAACL, 2010]

nonterminal N

right-hand side r of context-free grammar production

right-hand sides r1, . . . , rn of regular tree grammar production

synchronization via map NT r1, . . . , rn to NT r

ART-NN-VV→

advice

eine

ART

Auskun�

NN

geben

VV

Page 116: Applications of Weighted Tree Automata and Tree ... · Applications of Weighted Tree Automata and Tree Transducers in Natural Language Processing Andreas Maletti Institute of Computer

Synchronous Grammars

Synchronous multi tree substitution grammar: N →(r, 〈r1, . . . , rn〉

)variant of [M.: Why synchronous tree substitution grammars?. Proc. NAACL, 2010]

nonterminal N

right-hand side r of context-free grammar production

right-hand sides r1, . . . , rn of regular tree grammar production

synchronization via map NT r1, . . . , rn to NT r

NP-VV→

ART-NN-VV about Rule 143 PP

zu Artikel 143

ART NN APPR NN CD PP VV

PP

NP

Page 117: Applications of Weighted Tree Automata and Tree ... · Applications of Weighted Tree Automata and Tree Transducers in Natural Language Processing Andreas Maletti Institute of Computer

Synchronous Grammars

Synchronous multi tree substitution grammar: N →(r, 〈r1, . . . , rn〉

)variant of [M.: Why synchronous tree substitution grammars?. Proc. NAACL, 2010]

nonterminal N

right-hand side r of context-free grammar production

right-hand sides r1, . . . , rn of regular tree grammar production

synchronization via map NT r1, . . . , rn to NT r

NP-VV→

ART-NN-VV about Rule 143 PP

zu Artikel 143

ART NN APPR NN CD PP VV

PP

NP

Page 118: Applications of Weighted Tree Automata and Tree ... · Applications of Weighted Tree Automata and Tree Transducers in Natural Language Processing Andreas Maletti Institute of Computer

Synchronous Grammars

NP-VV→

ART-NN-VV about Rule 143 PP

zu Artikel 143

ART NN APPR NN CD PP VV

PP

NP

Production application:1 synchronous nonterminals

2 suitable production3 replacement

Page 119: Applications of Weighted Tree Automata and Tree ... · Applications of Weighted Tree Automata and Tree Transducers in Natural Language Processing Andreas Maletti Institute of Computer

Synchronous Grammars

NP-VV→

ART-NN-VV about Rule 143 PP

zu Artikel 143

ART NN APPR NN CD PP VV

PP

NP

Production application:1 synchronous nonterminals

2 suitable production3 replacement

Page 120: Applications of Weighted Tree Automata and Tree ... · Applications of Weighted Tree Automata and Tree Transducers in Natural Language Processing Andreas Maletti Institute of Computer

Synchronous Grammars

NP-VV→

ART-NN-VV about Rule 143 PP

zu Artikel 143

ART NN APPR NN CD PP VV

PP

NP

Production application:1 synchronous nonterminals2 suitable production

3 replacement

ART-NN-VV→

advice

eine

ART

Auskun�

NN

geben

VV

Page 121: Applications of Weighted Tree Automata and Tree ... · Applications of Weighted Tree Automata and Tree Transducers in Natural Language Processing Andreas Maletti Institute of Computer

Synchronous Grammars

NP-VV→

advice about Rule 143 PP

eine Auskun� zu Artikel 143 geben

ART NN APPR NN CD PP VV

PP

NP

Production application:1 synchronous nonterminals2 suitable production3 replacement

ART-NN-VV→

advice

eine

ART

Auskun�

NN

geben

VV

Page 122: Applications of Weighted Tree Automata and Tree ... · Applications of Weighted Tree Automata and Tree Transducers in Natural Language Processing Andreas Maletti Institute of Computer

Production Extraction

(extractable productions marked in red)

PPER would like your advice about Rule 143

Könnten Sie eine Auskun� zu Artikel 143 geben

KOUS PPER PPER ART NN APPR NN CD VV

PP

PP

NP

S

PP

variant of [M.: How to train your multi bottom-up tree transducer. Proc. ACL, 2011]

Page 123: Applications of Weighted Tree Automata and Tree ... · Applications of Weighted Tree Automata and Tree Transducers in Natural Language Processing Andreas Maletti Institute of Computer

Production Extraction

(extractable productions marked in red)

PPER would like your advice about Rule 143

Könnten Sie eine Auskun� zu Artikel 143 geben

KOUS PPER PPER ART NN APPR NN CD VV

PP

PP

NP

S

PP

variant of [M.: How to train your multi bottom-up tree transducer. Proc. ACL, 2011]

Page 124: Applications of Weighted Tree Automata and Tree ... · Applications of Weighted Tree Automata and Tree Transducers in Natural Language Processing Andreas Maletti Institute of Computer

Production Extraction

(extractable productions marked in red)

PPER would like your advice about Rule 143

Könnten Sie eine Auskun� zu Artikel 143 geben

KOUS PPER PPER ART NN APPR NN CD VV

PP

PP

NP

S

PP

variant of [M.: How to train your multi bottom-up tree transducer. Proc. ACL, 2011]

Page 125: Applications of Weighted Tree Automata and Tree ... · Applications of Weighted Tree Automata and Tree Transducers in Natural Language Processing Andreas Maletti Institute of Computer

Production Extraction

(extractable productions marked in red)

PPER would like your advice about Rule 143

Könnten Sie eine Auskun� zu Artikel 143 geben

KOUS PPER PPER ART NN APPR NN CD VV

PP

PP

NP

S

PP

variant of [M.: How to train your multi bottom-up tree transducer. Proc. ACL, 2011]

Page 126: Applications of Weighted Tree Automata and Tree ... · Applications of Weighted Tree Automata and Tree Transducers in Natural Language Processing Andreas Maletti Institute of Computer

Synchronous Multi Tree Substitution Grammars

Advantages:

complicated discontinuities

implemented in framework ‘Moses’[Braune, Seemann, Quernheim, M.: Shallow local multi bottom-up tree transducers in SMT. Proc. ACL, 2013]

binarizable, composable

Disadvantages:

output non-regular (tree-level) or non-context-free (string-level)(in fact output is captured by MRTG = MCFTG without variables)

not symmetric (input context-free; output not)

Page 127: Applications of Weighted Tree Automata and Tree ... · Applications of Weighted Tree Automata and Tree Transducers in Natural Language Processing Andreas Maletti Institute of Computer

Synchronous Multi Tree Substitution Grammars

Advantages:

complicated discontinuities

implemented in framework ‘Moses’[Braune, Seemann, Quernheim, M.: Shallow local multi bottom-up tree transducers in SMT. Proc. ACL, 2013]

binarizable, composable

Disadvantages:

output non-regular (tree-level) or non-context-free (string-level)(in fact output is captured by MRTG = MCFTG without variables)

not symmetric (input context-free; output not)

Page 128: Applications of Weighted Tree Automata and Tree ... · Applications of Weighted Tree Automata and Tree Transducers in Natural Language Processing Andreas Maletti Institute of Computer

Evaluation

Task BLEU

Productions

STSG SMTSG

STSG SMTSG

English → German 15.0 *15.5

14M 144M

English → Arabic 48.2 *49.1

55M 491M

English → Chinese 17.7 *18.4

17M 162M

English → Polish 21.3 *23.4

— —

English → Russian 24.7 *26.1

— —

STSG = synchronous tree substitution grammarSMTSG = synchronous multi tree substitution grammar

Observations:

consistent improvements

1 magnitude more productions

SMTSG alleviate some of the problems of syntax-based systems

from [Seemann, Braune, M.: A systematic evaluation of MBOT in statistical machine translation. Proc. MT-Summit, 2015]and [Seemann, M.: Discontinuous statistical machine translation with target-side dependency syntax. Proc. WMT, 2015]

Page 129: Applications of Weighted Tree Automata and Tree ... · Applications of Weighted Tree Automata and Tree Transducers in Natural Language Processing Andreas Maletti Institute of Computer

Evaluation

Task BLEU ProductionsSTSG SMTSG STSG SMTSG

English → German 15.0 *15.5 14M 144MEnglish → Arabic 48.2 *49.1 55M 491MEnglish → Chinese 17.7 *18.4 17M 162MEnglish → Polish 21.3 *23.4 — —English → Russian 24.7 *26.1 — —

STSG = synchronous tree substitution grammarSMTSG = synchronous multi tree substitution grammar

Observations:

consistent improvements

1 magnitude more productions

SMTSG alleviate some of the problems of syntax-based systems

from [Seemann, Braune, M.: A systematic evaluation of MBOT in statistical machine translation. Proc. MT-Summit, 2015]and [Seemann, M.: Discontinuous statistical machine translation with target-side dependency syntax. Proc. WMT, 2015]

Page 130: Applications of Weighted Tree Automata and Tree ... · Applications of Weighted Tree Automata and Tree Transducers in Natural Language Processing Andreas Maletti Institute of Computer

Evaluation

Task BLEU ProductionsSTSG SMTSG STSG SMTSG

English → German 15.0 *15.5 14M 144MEnglish → Arabic 48.2 *49.1 55M 491MEnglish → Chinese 17.7 *18.4 17M 162MEnglish → Polish 21.3 *23.4 — —English → Russian 24.7 *26.1 — —

STSG = synchronous tree substitution grammarSMTSG = synchronous multi tree substitution grammar

Observations:

consistent improvements

1 magnitude more productions

SMTSG alleviate some of the problems of syntax-based systemsfrom [Seemann, Braune, M.: A systematic evaluation of MBOT in statistical machine translation. Proc. MT-Summit, 2015]and [Seemann, M.: Discontinuous statistical machine translation with target-side dependency syntax. Proc. WMT, 2015]

Page 131: Applications of Weighted Tree Automata and Tree ... · Applications of Weighted Tree Automata and Tree Transducers in Natural Language Processing Andreas Maletti Institute of Computer

Synchronous Grammars

Evaluation properties:

rotations implementable? (for arbitrary t1, t2, t3)σ

σ

t1 t2

t3 7→

σ

t1 σ

t2 t3

symmetric?M N domain regular?M N range regular?

closed under composition?

following [Knight: Capturing practical natural language transformations. Machine Translation 21(2), 2007]and [May, Knight, Vogler: E�cient inference through cascades of weighted tree transducers. Proc. ACL, 2010]

Icons by interactivemania (http://www.interactivemania.com/) and UN O�ce for the Coordination of Humanitarian A�airs

Page 132: Applications of Weighted Tree Automata and Tree ... · Applications of Weighted Tree Automata and Tree Transducers in Natural Language Processing Andreas Maletti Institute of Computer

Synchronous Grammars

Illustration of rotation:

S

NP

NNP

Alice

VP

VBD

carries

NP

NNP

Bob

7→

S

NP

NNP

Bob

VP

VBZ

is

VP

VBN

carried

PP

IN

by

NP

NNP

Alice

Page 133: Applications of Weighted Tree Automata and Tree ... · Applications of Weighted Tree Automata and Tree Transducers in Natural Language Processing Andreas Maletti Institute of Computer

Top-down Tree Transducer

Hasse diagram:

TOPR1

TOP2 s-TOPR1

s-TOP2 n-TOP(R)1

ns-TOP(R)1

(composition closure in subscript)

Model Property

M N M N

ns-TOP 7 7 3 3 3

n-TOP 7 7 3 3 3

s-TOP 7 7 3 3 72s-TOPR 7 7 3 3 3

TOP 7 7 3 3 72TOPR 7 7 3 3 3

Page 134: Applications of Weighted Tree Automata and Tree ... · Applications of Weighted Tree Automata and Tree Transducers in Natural Language Processing Andreas Maletti Institute of Computer

Synchronous Tree Substitution Grammars

Hasse diagram:

STSGR3

STSG4

n-STSG(R)∞ s-STSGR

2

s-STSG2 TOPR1

ns-STSG(R)2

TOP2

n-TOP(R)1 s-TOPR1

s-TOP2

ns-TOP(R)1

(composition closure in subscript)

Model Property

M N M N

n-TOP 7 7 3 3 3TOP 7 7 3 3 72

TOPR 7 7 3 3 3

ns-STSG 3 3 3 3 72

n-STSG 3 7 3 3 7∞s-STSG(R) 3 7 3 3 72

STSG 3 7 3 3 74

STSGR 3 7 3 3 73

composition closures by[Engelfriet, Fülöp, M.: Composition closure of linear extended top-down tree transducers. Theory of Computing Systems, to appear 2016]

Page 135: Applications of Weighted Tree Automata and Tree ... · Applications of Weighted Tree Automata and Tree Transducers in Natural Language Processing Andreas Maletti Institute of Computer

Synchronous Multi Tree Substitution Grammars

Advantages of SMTSG

always have regular look-ahead

can always be made nondeleting & shallow

closed under composition

Disadvantages of SMTSG:

non-regular range (theoretically interesting?)

[Engelfriet, Lilin, M.: Extended multi bottom-up tree transducers — composition and decomposition. Acta Informatica 46(8), 2009]

Page 136: Applications of Weighted Tree Automata and Tree ... · Applications of Weighted Tree Automata and Tree Transducers in Natural Language Processing Andreas Maletti Institute of Computer

Synchronous Multi Tree Substitution Grammars

Advantages of SMTSG

always have regular look-ahead

can always be made nondeleting & shallow

closed under composition

Disadvantages of SMTSG:

non-regular range (theoretically interesting?)

[Engelfriet, Lilin, M.: Extended multi bottom-up tree transducers — composition and decomposition. Acta Informatica 46(8), 2009]

Page 137: Applications of Weighted Tree Automata and Tree ... · Applications of Weighted Tree Automata and Tree Transducers in Natural Language Processing Andreas Maletti Institute of Computer

Synchronous Multi Tree Substitution Grammars

Hasse diagram:

(n)-SMTSG(R)1

(n)s-SMTSG(R)1

STSGR3

STSG4

n-STSG(R)∞ s-STSGR

2

s-STSG2 TOPR1

ns-STSG(R)2

TOP2

n-TOP(R)1 s-TOPR1

s-TOP2

ns-TOP(R)1

(composition closure in subscript)

Model Property

M N M N

n-TOP 7 7 3 3 3TOP 7 7 3 3 72

TOPR 7 7 3 3 3

ns-STSG 3 3 3 3 72

n-STSG 3 7 3 3 7∞s-STSG(R) 3 7 3 3 72

STSG 3 7 3 3 74

STSGR 3 7 3 3 73

SMTSG 3 7 3 7 3reg. range 3 7 3 3 3symmetric 3 3 3 3 3

(string-level) range characterization by[Gildea: On the string translations produced by multi bottom-up tree transducers. Computational Linguistics 38(3), 2012]

Page 138: Applications of Weighted Tree Automata and Tree ... · Applications of Weighted Tree Automata and Tree Transducers in Natural Language Processing Andreas Maletti Institute of Computer

Synchronous Multi Tree Substitution Grammars

Theorem (STSGR)3 ( reg.-range SMTSG

u0

δ

t2 δ

t3 δ

tn−1 δ

tn t1

...

(3)...

(2)

?

(1)

u1

v12

v(n−1)2 vn2

u2

vn3 v(n−1)3

v13

u3

δ

t1 δ

t2 δ

t3 δ

tn−1 tn

[M.: The power of weighted regularity-preserving multi bottom-up tree transducers. Int. J. Found. Comput. Sci. 26(7), 2015]

Page 139: Applications of Weighted Tree Automata and Tree ... · Applications of Weighted Tree Automata and Tree Transducers in Natural Language Processing Andreas Maletti Institute of Computer

Summary

Parsing:

tree automata = CFG with subcategorization(which are the state-of-the-art models for many languages)

wealth of open problems for non-constituent parsing(alternative theories seem to be on the rise; “Parsey McParseface”)

Machine translation:

all major (non-neural) translation models in use are grammar-based(and their expressive power is o�en ill-understood)

combination of parser and translation model challenging(although that is typically just a regular domain restriction)

evaluation of theoretically well-behaved models (in practice)

Thank you for the attention.

Page 140: Applications of Weighted Tree Automata and Tree ... · Applications of Weighted Tree Automata and Tree Transducers in Natural Language Processing Andreas Maletti Institute of Computer

Summary

Parsing:

tree automata = CFG with subcategorization(which are the state-of-the-art models for many languages)

wealth of open problems for non-constituent parsing(alternative theories seem to be on the rise; “Parsey McParseface”)

Machine translation:

all major (non-neural) translation models in use are grammar-based(and their expressive power is o�en ill-understood)

combination of parser and translation model challenging(although that is typically just a regular domain restriction)

evaluation of theoretically well-behaved models (in practice)

Thank you for the attention.

Page 141: Applications of Weighted Tree Automata and Tree ... · Applications of Weighted Tree Automata and Tree Transducers in Natural Language Processing Andreas Maletti Institute of Computer

Summary

Parsing:

tree automata = CFG with subcategorization(which are the state-of-the-art models for many languages)

wealth of open problems for non-constituent parsing(alternative theories seem to be on the rise; “Parsey McParseface”)

Machine translation:

all major (non-neural) translation models in use are grammar-based(and their expressive power is o�en ill-understood)

combination of parser and translation model challenging(although that is typically just a regular domain restriction)

evaluation of theoretically well-behaved models (in practice)

Thank you for the attention.