30
Self-training with Products of Latent Variable Grammars Zhongqiang Huang, Mary Harper, and Slav Petrov

Self-training with Products of Latent Variable Grammars

  • Upload
    inari

  • View
    35

  • Download
    0

Embed Size (px)

DESCRIPTION

Self-training with Products of Latent Variable Grammars. Zhongqiang Huang, Mary Harper, and Slav Petrov. Overview. Motivation and Prior Related Research Experimental Setup Results Analysis Conclusions. Parse Tree Sentence. Parameters. Derivations. - PowerPoint PPT Presentation

Citation preview

Page 1: Self-training with Products of Latent Variable Grammars

Self-training with Products of Latent Variable Grammars

Zhongqiang Huang, Mary Harper, and Slav Petrov

Page 2: Self-training with Products of Latent Variable Grammars

OverviewMotivation and Prior Related Research

Experimental SetupResultsAnalysisConclusions

2

Page 3: Self-training with Products of Latent Variable Grammars

Parse Tree Sentence Parameters

...

Derivations

PCFG-LA Parser[Matsuzaki et. al ’05] [Petrov et. al ’06] [Petrov & Klein’07]

3

Page 4: Self-training with Products of Latent Variable Grammars

PCFG-LA Parser

NP

NP1 NP2

Hierarchical splitting (& merging)

NP1 NP2 NP3 NP4

NP1 NP2 NP3 NP4 NP5 NP6 NP7 NP8

Split to 2

Split to 4

Split to 8

Original Node

IncreasedModel

Complexity

n-th grammar: grammar trained after n-th split-merge rounds

Typical learning curve

Grammar Order Selection

Use development set

Page 5: Self-training with Products of Latent Variable Grammars

Max-Rule Decoding (Single Grammar)

S

NP

VP

[Goodman ’98, Matsuzaki et al. ’05, Petrov & Klein ’07]

6

Page 6: Self-training with Products of Latent Variable Grammars

Variability

7 [Petrov, ’10]

Page 7: Self-training with Products of Latent Variable Grammars

...

Max-Rule Decoding (Multiple Grammars)

[Petrov, ’10]

Treebank

8

Page 8: Self-training with Products of Latent Variable Grammars

Product Model Results

9 [Petrov, ’10]

Page 9: Self-training with Products of Latent Variable Grammars

Motivation for Self-Training

10

Page 10: Self-training with Products of Latent Variable Grammars

Self-training (ST)

HandLabele

d

UnlabeledData

Train

LabelAutomatically Labeled

Data

Train

Select with dev

11

Page 11: Self-training with Products of Latent Variable Grammars

Self-Training Curve

13

Page 12: Self-training with Products of Latent Variable Grammars

WSJ Self-Training Results

F score

14 [Huang & Harper, ’09]

Page 13: Self-training with Products of Latent Variable Grammars

Self-Trained Grammar Variability

Self-trained Round 7

Self-trained Round 6

16

Page 14: Self-training with Products of Latent Variable Grammars

Summary Two issues: Variability & Over-fitting

Product model Makes use of variability Over-fitting remains in individual grammars

Self-training Alleviates over-fitting Variability remains in individual grammars

Next step: combine self-training with product models

17

Page 15: Self-training with Products of Latent Variable Grammars

Experimental Setup Two genres:

WSJ: Sections 2-21 for training, 22 for dev, 23 for test, 176.9K sentences per self-trained grammar

Broadcast News: WSJ+80% of BN for training, 10% for dev, 10% for test (see paper),

Training Scenarios: train 10 models with different seeds and combine using Max-Rule Decoding Regular: treebank training with up to 7 split-merge

iterations Self-Training: three methods with up to 7 split-

merge iterations18

Page 16: Self-training with Products of Latent Variable Grammars

ST-Reg

LabelAutomatically Labeled

Data

UnlabeledData

HandLabele

d

Train

Train ⁞

Multiple Grammars?

ProductTrain

Select with dev set

19

Single automatically labeled set by round 6 product

Page 17: Self-training with Products of Latent Variable Grammars

ST-Prod

LabelAutomatically Labeled

Data

UnlabeledData

HandLabele

d

Train⁞

Product

Train ⁞

Use more data?

Product

20

Single automatically labeled set by round 6 product

Page 18: Self-training with Products of Latent Variable Grammars

ST-Prod-Mult

HandLabele

d

Train⁞

Label

Product

Label

Product

Product

21

10 different automaticallylabeled sets by round 6 product

Page 19: Self-training with Products of Latent Variable Grammars

24

Page 20: Self-training with Products of Latent Variable Grammars

A Closer Look at Regular Results

25

Page 21: Self-training with Products of Latent Variable Grammars

A Closer Look at Regular Results

26

Page 22: Self-training with Products of Latent Variable Grammars

A Closer Look at Regular Results

27

Page 23: Self-training with Products of Latent Variable Grammars

A Closer Look at Self-Training Results

28

Page 24: Self-training with Products of Latent Variable Grammars

A Closer Look at Self-Training Results

29

Page 25: Self-training with Products of Latent Variable Grammars

A Closer Look at Self-Training Results

30

Page 26: Self-training with Products of Latent Variable Grammars

Analysis of Rule Variance We measure the average empirical variance

of the log posterior probabilities of the rules among the learned grammars over a held-out set S to get at the diversity among the grammars:

31

Page 27: Self-training with Products of Latent Variable Grammars

Analysis of Rule Variance

32

Page 28: Self-training with Products of Latent Variable Grammars

English Test Set Results (WSJ 23)

Single Parser Reranker Product Parser Combination

[Ch

arn

iak

’00]

Petr

ov e

t al.

’0

6]

[Carr

era

s e

t al.

’08]

[Hu

an

g &

Harp

er

’08]

Th

is W

ork

[Petr

ov ’

10]

Th

is W

ork

[Ch

arn

iak &

Joh

nson

’05]

[Hu

an

g ’

08]

[McC

losky e

t al.

’06]

[Sag

ae &

Lavie

’06]

[Fossu

m &

Kn

igh

t ’0

9]

[Zh

an

g e

t al.

’09]

33

Page 29: Self-training with Products of Latent Variable Grammars

Broadcast News

34

Page 30: Self-training with Products of Latent Variable Grammars

Conclusions Very high parse accuracies can be

achieved by combining self-training and product models on newswire and broadcast news parsing tasks.

Two important factors:1. Accuracy of the model used to parse the

unlabeled data 2. Diversity of the individual grammars

35