Upload
inari
View
35
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Self-training with Products of Latent Variable Grammars. Zhongqiang Huang, Mary Harper, and Slav Petrov. Overview. Motivation and Prior Related Research Experimental Setup Results Analysis Conclusions. Parse Tree Sentence. Parameters. Derivations. - PowerPoint PPT Presentation
Citation preview
Self-training with Products of Latent Variable Grammars
Zhongqiang Huang, Mary Harper, and Slav Petrov
OverviewMotivation and Prior Related Research
Experimental SetupResultsAnalysisConclusions
2
Parse Tree Sentence Parameters
...
Derivations
PCFG-LA Parser[Matsuzaki et. al ’05] [Petrov et. al ’06] [Petrov & Klein’07]
3
PCFG-LA Parser
NP
NP1 NP2
Hierarchical splitting (& merging)
NP1 NP2 NP3 NP4
NP1 NP2 NP3 NP4 NP5 NP6 NP7 NP8
Split to 2
Split to 4
Split to 8
Original Node
…
IncreasedModel
Complexity
n-th grammar: grammar trained after n-th split-merge rounds
…
Typical learning curve
Grammar Order Selection
Use development set
Max-Rule Decoding (Single Grammar)
S
NP
VP
[Goodman ’98, Matsuzaki et al. ’05, Petrov & Klein ’07]
6
Variability
7 [Petrov, ’10]
...
Max-Rule Decoding (Multiple Grammars)
[Petrov, ’10]
Treebank
8
Product Model Results
9 [Petrov, ’10]
Motivation for Self-Training
10
Self-training (ST)
HandLabele
d
UnlabeledData
Train
LabelAutomatically Labeled
Data
Train
Select with dev
11
Self-Training Curve
13
WSJ Self-Training Results
F score
14 [Huang & Harper, ’09]
Self-Trained Grammar Variability
Self-trained Round 7
Self-trained Round 6
16
Summary Two issues: Variability & Over-fitting
Product model Makes use of variability Over-fitting remains in individual grammars
Self-training Alleviates over-fitting Variability remains in individual grammars
Next step: combine self-training with product models
17
Experimental Setup Two genres:
WSJ: Sections 2-21 for training, 22 for dev, 23 for test, 176.9K sentences per self-trained grammar
Broadcast News: WSJ+80% of BN for training, 10% for dev, 10% for test (see paper),
Training Scenarios: train 10 models with different seeds and combine using Max-Rule Decoding Regular: treebank training with up to 7 split-merge
iterations Self-Training: three methods with up to 7 split-
merge iterations18
ST-Reg
LabelAutomatically Labeled
Data
UnlabeledData
HandLabele
d
Train
Train ⁞
Multiple Grammars?
ProductTrain
Select with dev set
19
Single automatically labeled set by round 6 product
ST-Prod
LabelAutomatically Labeled
Data
UnlabeledData
HandLabele
d
Train⁞
Product
Train ⁞
Use more data?
Product
20
Single automatically labeled set by round 6 product
ST-Prod-Mult
⁞
HandLabele
d
Train⁞
Label
Product
⁞
Label
Product
Product
21
10 different automaticallylabeled sets by round 6 product
24
A Closer Look at Regular Results
25
A Closer Look at Regular Results
26
A Closer Look at Regular Results
27
A Closer Look at Self-Training Results
28
A Closer Look at Self-Training Results
29
A Closer Look at Self-Training Results
30
Analysis of Rule Variance We measure the average empirical variance
of the log posterior probabilities of the rules among the learned grammars over a held-out set S to get at the diversity among the grammars:
31
Analysis of Rule Variance
32
English Test Set Results (WSJ 23)
Single Parser Reranker Product Parser Combination
[Ch
arn
iak
’00]
Petr
ov e
t al.
’0
6]
[Carr
era
s e
t al.
’08]
[Hu
an
g &
Harp
er
’08]
Th
is W
ork
[Petr
ov ’
10]
Th
is W
ork
[Ch
arn
iak &
Joh
nson
’05]
[Hu
an
g ’
08]
[McC
losky e
t al.
’06]
[Sag
ae &
Lavie
’06]
[Fossu
m &
Kn
igh
t ’0
9]
[Zh
an
g e
t al.
’09]
33
Broadcast News
34
Conclusions Very high parse accuracies can be
achieved by combining self-training and product models on newswire and broadcast news parsing tasks.
Two important factors:1. Accuracy of the model used to parse the
unlabeled data 2. Diversity of the individual grammars
35