Self-training with Products of Latent Variable Grammars

Self-training with Products of Latent Variable Grammars

Zhongqiang Huang, Mary Harper, and Slav Petrov

OverviewMotivation and Prior Related Research

Experimental SetupResultsAnalysisConclusions

2

Parse Tree Sentence Parameters

...

Derivations

PCFG-LA Parser[Matsuzaki et. al ’05] [Petrov et. al ’06] [Petrov & Klein’07]

3

PCFG-LA Parser

NP

NP1 NP2

Hierarchical splitting (& merging)

NP1 NP2 NP3 NP4

NP1 NP2 NP3 NP4 NP5 NP6 NP7 NP8

Split to 2

Split to 4

Split to 8

Original Node

…

IncreasedModel

Complexity

n-th grammar: grammar trained after n-th split-merge rounds

…

Typical learning curve

Grammar Order Selection

Use development set

Max-Rule Decoding (Single Grammar)

S

NP

VP

[Goodman ’98, Matsuzaki et al. ’05, Petrov & Klein ’07]

6

Variability

7 [Petrov, ’10]

...

Max-Rule Decoding (Multiple Grammars)

[Petrov, ’10]

Treebank

8

Product Model Results

9 [Petrov, ’10]

Motivation for Self-Training

10

Self-training (ST)

HandLabele

d

UnlabeledData

Train

LabelAutomatically Labeled

Data

Train

Select with dev

11

Self-Training Curve

13

WSJ Self-Training Results

F score

14 [Huang & Harper, ’09]

Self-Trained Grammar Variability

Self-trained Round 7

Self-trained Round 6

16

Summary Two issues: Variability & Over-fitting

Product model Makes use of variability Over-fitting remains in individual grammars

Self-training Alleviates over-fitting Variability remains in individual grammars

Next step: combine self-training with product models

17

Experimental Setup Two genres:

WSJ: Sections 2-21 for training, 22 for dev, 23 for test, 176.9K sentences per self-trained grammar

Broadcast News: WSJ+80% of BN for training, 10% for dev, 10% for test (see paper),

Training Scenarios: train 10 models with different seeds and combine using Max-Rule Decoding Regular: treebank training with up to 7 split-merge

iterations Self-Training: three methods with up to 7 split-

merge iterations18

ST-Reg


Data

UnlabeledData

HandLabele

d

Train

Train ⁞

Multiple Grammars?

ProductTrain

Select with dev set

19

Single automatically labeled set by round 6 product

ST-Prod


Data

UnlabeledData

HandLabele

d

Train⁞

Product

Train ⁞

Use more data?

Product

20

Single automatically labeled set by round 6 product

ST-Prod-Mult

⁞

HandLabele

d

Train⁞

Label

Product

⁞

Label

Product

Product

21

10 different automaticallylabeled sets by round 6 product

24

A Closer Look at Regular Results

25


26


27

A Closer Look at Self-Training Results

28


29


30

Analysis of Rule Variance We measure the average empirical variance

of the log posterior probabilities of the rules among the learned grammars over a held-out set S to get at the diversity among the grammars:

31

Analysis of Rule Variance

32

English Test Set Results (WSJ 23)

Single Parser Reranker Product Parser Combination

[Ch

arn

iak

’00]

Petr

ov e

t al.

’0

6]

[Carr

era

s e

t al.

’08]

[Hu

an

g &

Harp

er

’08]

Th

is W

ork

[Petr

ov ’

10]

Th

is W

ork

[Ch

arn

iak &

Joh

nson

’05]

[Hu

an

g ’

08]

[McC

losky e

t al.

’06]

[Sag

ae &

Lavie

’06]

[Fossu

m &

Kn

igh

t ’0

9]

[Zh

an

g e

t al.

’09]

33

Broadcast News

34

Conclusions Very high parse accuracies can be

achieved by combining self-training and product models on newswire and broadcast news parsing tasks.

Two important factors:1. Accuracy of the model used to parse the

unlabeled data 2. Diversity of the individual grammars

35

Documents

Self-training with Products of Latent Variable Grammars