64
Updating a Name Tagger Using Contemporary Unlabeled Data ACL-IJCNLP 2009 Singapore, August 3rd - 5th Cristina Mota 1,2 and Ralph Grishman 2 1 IST & L2F INESC-ID (Portugal) 2 New York University (USA) (Advisors: Ralph Grishman & Nuno Mamede) This research was funded by Funda¸ ao para a Ciˆ encia e a Tecnologia (doctoral scholarship SFRH/BD/3237/2000)

Updating a Name Tagger Using Contemporary Unlabeled Data

  • Upload
    cmota21

  • View
    126

  • Download
    2

Embed Size (px)

DESCRIPTION

Presentation at ACL-IJCNLP 2009 of Cristina Mota & Ralph Grishman (2009a). “Updating a name tagger using contemporary unlabeled data.” Proc. of the Joint conference of the 47th Annual Meeting of the ACL and the 4th IJCNLP of the AFNLP, August, 2009, Singapore.

Citation preview

Page 1: Updating a Name Tagger Using  Contemporary Unlabeled Data

Updating a Name Tagger Using

Contemporary Unlabeled Data

ACL-IJCNLP 2009Singapore, August 3rd - 5th

Cristina Mota1,2 and Ralph Grishman2

1IST & L2F INESC-ID (Portugal)2New York University (USA)

(Advisors: Ralph Grishman & Nuno Mamede)

This research was funded by Fundacao para a Ciencia e a Tecnologia (doctoral scholarship SFRH/BD/3237/2000)

Page 2: Updating a Name Tagger Using  Contemporary Unlabeled Data

Motivation

0 1 2 3 4 5 6 7

0.79

0.80

0.81

0.82

0.83

0.84

0.85

Time gap (year)

F−m

easu

re

y=−0.00391x+0.82479R2=0.3647

The performance of a co-trainednamed entity tagger decreases asthe time gap increases betweentraining and test sets (Mota &Grishman, 2008)

Do we need to update the seeds or the unlabeled data?

Does more older data help?

Page 3: Updating a Name Tagger Using  Contemporary Unlabeled Data

Motivation

0 1 2 3 4 5 6 7

0.79

0.80

0.81

0.82

0.83

0.84

0.85

Time gap (year)

F−m

easu

re

y=−0.00391x+0.82479R2=0.3647

The performance of a co-trainednamed entity tagger decreases asthe time gap increases betweentraining and test sets (Mota &Grishman, 2008)

Do we need to update the seeds or the unlabeled data?

Does more older data help?

Page 4: Updating a Name Tagger Using  Contemporary Unlabeled Data

Related Work

“More data are better data” (Church & Mercer, 1993)Enlarge labeled data as a way of improving performance

Contemporary (labeled) data reduces out-of-vocabulary rates

Time-adaptive language model (Auzanne et al., 2000)Generation of offline name lists (Palmer & Ostendorf, 2005)Daily adaptation of the language model of a broadcast newstranscription system (Martins et al., 2006)

Page 5: Updating a Name Tagger Using  Contemporary Unlabeled Data

Data Sets

Data sets were drawn from the Politics section of CETEMPublicocorpus (Santos & Rocha, 2001)

Language: Portuguese

Time span: 8 years (1991-1998)

Time gap: 1=6 months

For each six month period

Seeds (S): names collected from first 192 extracts∗

Test data (T): next 208 extractsUnlabeled data (U): next 7856 extracts

∗1 extract = app. 2 paragraphs

Page 6: Updating a Name Tagger Using  Contemporary Unlabeled Data

Named Entity Tagger

Identification

Pairs (spelling features,

contextual features)

Co-training

Spelling +

contextual rules

Seeds

Unlabeled text

Training

Based on a co-training classifier(Collins & Singer, 1999)

Includes propagation step

Needs few seeds andperformance is high (above80%)

Performance is parametrized bycombination of seeds,unlabeled set and test set:(S,U,T)

Tagger is evaluated afterpropagation with HAREMscoring programs

Page 7: Updating a Name Tagger Using  Contemporary Unlabeled Data

Named Entity Tagger

Test text

Labeled Pairs

Text with classified NE

Identification

Classification

Propagation

Pairs (spelling features,

contextual features)

Co-training

Spelling +

contextual rules

Seeds

Unlabeled text

TestingTraining

Based on a co-training classifier(Collins & Singer, 1999)

Includes propagation step

Needs few seeds andperformance is high (above80%)

Performance is parametrized bycombination of seeds,unlabeled set and test set:(S,U,T)

Tagger is evaluated afterpropagation with HAREMscoring programs

Page 8: Updating a Name Tagger Using  Contemporary Unlabeled Data

Update seeds or unlabeled data?

Timeline

Tn

SiSeeds

Unlabeled

examples Ui

Test

91a 98b

Experiment 1: Baseline (vary seeds and unlabeled datasynchronously as in Mota & Grishman (2008))

Page 9: Updating a Name Tagger Using  Contemporary Unlabeled Data

Update seeds or unlabeled data?

Timeline

Tn

SiSeeds

Unlabeled

examples Ui

Test

91a 98b

Experiment 1: Baseline (vary seeds and unlabeled datasynchronously as in Mota & Grishman (2008))

Page 10: Updating a Name Tagger Using  Contemporary Unlabeled Data

Update seeds or unlabeled data?

Timeline

Tn

SiSeeds

Unlabeled

examples Ui

Test

91a 98b

Experiment 1: Baseline (vary seeds and unlabeled datasynchronously as in Mota & Grishman (2008))

Page 11: Updating a Name Tagger Using  Contemporary Unlabeled Data

Update seeds or unlabeled data?

Timeline

Tn

SiSeeds

Unlabeled

examples Ui

Test

91a 98b

Experiment 1: Baseline (vary seeds and unlabeled datasynchronously as in Mota & Grishman (2008))

Page 12: Updating a Name Tagger Using  Contemporary Unlabeled Data

Update seeds or unlabeled data?

Timeline

Tn

SiSeeds

Unlabeled

examples Ui

Test

91a 98b

Experiment 1: Baseline (vary seeds and unlabeled datasynchronously as in Mota & Grishman (2008))

Page 13: Updating a Name Tagger Using  Contemporary Unlabeled Data

Update seeds or unlabeled data?

Timeline

Tn

SiSeeds

Unlabeled

examples Ui

Test

91a 98b

Experiment 1: Baseline (vary seeds and unlabeled datasynchronously as in Mota & Grishman (2008))

Page 14: Updating a Name Tagger Using  Contemporary Unlabeled Data

Update seeds or unlabeled data?

Timeline

Tn

SiSeeds

Unlabeled

examples Ui

Test

91a 98b

Experiment 1: Baseline (vary seeds and unlabeled datasynchronously as in Mota & Grishman (2008))

Page 15: Updating a Name Tagger Using  Contemporary Unlabeled Data

Update seeds or unlabeled data?

Timeline

Tn

SiSeeds

Unlabeled

examples Ui

Test

91a 98b

Experiment 1: Baseline (vary seeds and unlabeled datasynchronously as in Mota & Grishman (2008))

Page 16: Updating a Name Tagger Using  Contemporary Unlabeled Data

Update seeds or unlabeled data?0.

740.

760.

780.

800.

820.

84

Training epoch

F−m

easu

re

(i,i,98b)(98b,i,98b)(i,98b,98b)

91a

91b

92a

92b

93a

93b

94a

94b

95a

95b

96a

96b

97a

97b

98a

98b

Performance decays as thetime gap increases (Mota &Grishman, 2008)

v v v v v v v v v v v v v v v v v v v v v v v v

Page 17: Updating a Name Tagger Using  Contemporary Unlabeled Data

Update seeds or unlabeled data?

Timeline

Tn

SnSeeds

Unlabeled

examples Ui

Test

91a 98b

Experiment 2: Update seeds (vary unlabeled data but usecontemporary seeds)

Page 18: Updating a Name Tagger Using  Contemporary Unlabeled Data

Update seeds or unlabeled data?

Timeline

Tn

Seeds

Unlabeled

examples Ui

Test

91a 98b

Sn

Experiment 2: Update seeds (vary unlabeled data but usecontemporary seeds)

Page 19: Updating a Name Tagger Using  Contemporary Unlabeled Data

Update seeds or unlabeled data?

Timeline

Tn

Seeds

Unlabeled

examples Ui

Test

91a 98b

Sn

Experiment 2: Update seeds (vary unlabeled data but usecontemporary seeds)

Page 20: Updating a Name Tagger Using  Contemporary Unlabeled Data

Update seeds or unlabeled data?

Timeline

Tn

Seeds

Unlabeled

examples Ui

Test

91a 98b

Sn

Experiment 2: Update seeds (vary unlabeled data but usecontemporary seeds)

Page 21: Updating a Name Tagger Using  Contemporary Unlabeled Data

Update seeds or unlabeled data?

Timeline

Tn

Seeds

Unlabeled

examples Ui

Test

91a 98b

Sn

Experiment 2: Update seeds (vary unlabeled data but usecontemporary seeds)

Page 22: Updating a Name Tagger Using  Contemporary Unlabeled Data

Update seeds or unlabeled data?

Timeline

Tn

SnSeeds

Unlabeled

examples Ui

Test

91a 98b

Experiment 2: Update seeds (vary unlabeled data but usecontemporary seeds)

Page 23: Updating a Name Tagger Using  Contemporary Unlabeled Data

Update seeds or unlabeled data?

Timeline

Tn

SnSeeds

Unlabeled

examples Ui

Test

91a 98b

Experiment 2: Update seeds (vary unlabeled data but usecontemporary seeds)

Page 24: Updating a Name Tagger Using  Contemporary Unlabeled Data

Update seeds or unlabeled data?

Timeline

Tn

SnSeeds

Unlabeled

examples Ui

Test

91a 98b

Experiment 2: Update seeds (vary unlabeled data but usecontemporary seeds)

Page 25: Updating a Name Tagger Using  Contemporary Unlabeled Data

Update seeds or unlabeled data?0.

740.

760.

780.

800.

820.

84

Training epoch

F−m

easu

re

(i,i,98b)(98b,i,98b)(i,98b,98b)

91a

91b

92a

92b

93a

93b

94a

94b

95a

95b

96a

96b

97a

97b

98a

98b

Contemporary seeds slightlyattenuate the decrease

v v v v v v v v v v v v v v v v v v v v v v v v

Page 26: Updating a Name Tagger Using  Contemporary Unlabeled Data

Update seeds or unlabeled data?

Timeline

Tn

SiSeeds

Unlabeled

examples Un

Test

91a 98b

Experiment 3: Update unlabeled data (vary seeds but usecontemporary unlabeled data)

Page 27: Updating a Name Tagger Using  Contemporary Unlabeled Data

Update seeds or unlabeled data?

Timeline

Tn

SiSeeds

Unlabeled

examples

Test

91a 98b

Un

Experiment 3: Update unlabeled data (vary seeds but usecontemporary unlabeled data)

Page 28: Updating a Name Tagger Using  Contemporary Unlabeled Data

Update seeds or unlabeled data?

Timeline

Tn

SiSeeds

Unlabeled

examples

Test

91a 98b

Un

Experiment 3: Update unlabeled data (vary seeds but usecontemporary unlabeled data)

Page 29: Updating a Name Tagger Using  Contemporary Unlabeled Data

Update seeds or unlabeled data?

Timeline

Tn

SiSeeds

Unlabeled

examples

Test

91a 98b

Un

Experiment 3: Update unlabeled data (vary seeds but usecontemporary unlabeled data)

Page 30: Updating a Name Tagger Using  Contemporary Unlabeled Data

Update seeds or unlabeled data?

Timeline

Tn

SiSeeds

Unlabeled

examples

Test

91a 98b

Un

Experiment 3: Update unlabeled data (vary seeds but usecontemporary unlabeled data)

Page 31: Updating a Name Tagger Using  Contemporary Unlabeled Data

Update seeds or unlabeled data?

Timeline

Tn

SiSeeds

Unlabeled

examples

Test

91a 98b

Un

Experiment 3: Update unlabeled data (vary seeds but usecontemporary unlabeled data)

Page 32: Updating a Name Tagger Using  Contemporary Unlabeled Data

Update seeds or unlabeled data?

Timeline

Tn

SiSeeds

Unlabeled

examples Un

Test

91a 98b

Experiment 3: Update unlabeled data (vary seeds but usecontemporary unlabeled data)

Page 33: Updating a Name Tagger Using  Contemporary Unlabeled Data

Update seeds or unlabeled data?

Timeline

Tn

SiSeeds

Unlabeled

examples Un

Test

91a 98b

Experiment 3: Update unlabeled data (vary seeds but usecontemporary unlabeled data)

Page 34: Updating a Name Tagger Using  Contemporary Unlabeled Data

Updating the unlabeled data is better than

updating the seeds0.

740.

760.

780.

800.

820.

84

Training epoch

F−m

easu

re

(i,i,98b)(98b,i,98b)(i,98b,98b)

91a

91b

92a

92b

93a

93b

94a

94b

95a

95b

96a

96b

97a

97b

98a

98b

Contemporary unlabeled datamaintain the performance

v v v v v v v v v v v v v v v v v v v v v v v v

Page 35: Updating a Name Tagger Using  Contemporary Unlabeled Data

Augment unlabeled data?

Timeline

Tn

SnSeeds

Unlabeled

examples Ui

Test

91a 98b

Experiment 4: Enlarge unlabeled data with older data anduse contemporary seeds

Page 36: Updating a Name Tagger Using  Contemporary Unlabeled Data

Augment unlabeled data?

Timeline

Tn

SnSeeds

Unlabeled

examples Ui

Test

91a 98b

Ui

Experiment 4: Enlarge unlabeled data with older data anduse contemporary seeds

Page 37: Updating a Name Tagger Using  Contemporary Unlabeled Data

Augment unlabeled data?

Timeline

Tn

SnSeeds

Unlabeled

examples Ui

Test

91a 98b

UiUi

Experiment 4: Enlarge unlabeled data with older data anduse contemporary seeds

Page 38: Updating a Name Tagger Using  Contemporary Unlabeled Data

Augment unlabeled data?

Timeline

Tn

SnSeeds

Unlabeled

examples Ui

Test

91a 98b

UiUiUi

Experiment 4: Enlarge unlabeled data with older data anduse contemporary seeds

Page 39: Updating a Name Tagger Using  Contemporary Unlabeled Data

Augment unlabeled data?

Timeline

Tn

SnSeeds

Unlabeled

examples Ui

Test

91a 98b

UiUiUiUi

Experiment 4: Enlarge unlabeled data with older data anduse contemporary seeds

Page 40: Updating a Name Tagger Using  Contemporary Unlabeled Data

Augment unlabeled data?

Timeline

Tn

SnSeeds

Unlabeled

examples Ui

Test

91a 98b

UiUiUiUiUi

Experiment 4: Enlarge unlabeled data with older data anduse contemporary seeds

Page 41: Updating a Name Tagger Using  Contemporary Unlabeled Data

Augment unlabeled data?

Timeline

Tn

SnSeeds

Unlabeled

examples

Test

91a 98b

UiUiUiUiUiUiUi

Experiment 4: Enlarge unlabeled data with older data anduse contemporary seeds

Page 42: Updating a Name Tagger Using  Contemporary Unlabeled Data

Augment unlabeled data?0.

740.

760.

780.

800.

820.

84

Time frame (semester)

F−m

easu

re

(i,98b,98b)(i,u[i,...,98a],98b)(98b,u[i,...,98a],98b)

91a

91b

92a

92b

93a

93b

94a

94b

95a

95b

96a

96b

97a

97b

98a

98b

Green line: Same seeds for all taggers (98b);unlabeled data is enlarging backwards

Blue line: Different seeds for each tagger; sameunlabeled data for all taggers (98b)

Larger amounts of olderunlabeled data does not alwaysresult in better performance

Page 43: Updating a Name Tagger Using  Contemporary Unlabeled Data

Augment unlabeled data?0.

740.

760.

780.

800.

820.

84

Time frame (semester)

F−m

easu

re

(i,98b,98b)(i,u[i,...,98a],98b)(98b,u[i,...,98a],98b)

91a

91b

92a

92b

93a

93b

94a

94b

95a

95b

96a

96b

97a

97b

98a

98b

Green line: Same seeds for all taggers (98b);unlabeled data is enlarging backwards

Blue line: Different seeds for each tagger; sameunlabeled data for all taggers (98b)

Larger amounts of olderunlabeled data does not alwaysresult in better performance

Page 44: Updating a Name Tagger Using  Contemporary Unlabeled Data

Augment unlabeled data?

Timeline

Tn

SiSeeds

Unlabeled

examples Ui

Test

91a 98b

Experiment 5: Enlarge the size of unlabeled data and varyseeds

Page 45: Updating a Name Tagger Using  Contemporary Unlabeled Data

Augment unlabeled data?

Timeline

Tn

SiSeeds

Unlabeled

examples Ui

Test

91a 98b

Ui

Experiment 5: Enlarge the size of unlabeled data and varyseeds

Page 46: Updating a Name Tagger Using  Contemporary Unlabeled Data

Augment unlabeled data?

Timeline

Tn

SiSeeds

Unlabeled

examples Ui

Test

91a 98b

Ui Ui

Experiment 5: Enlarge the size of unlabeled data and varyseeds

Page 47: Updating a Name Tagger Using  Contemporary Unlabeled Data

Augment unlabeled data?

Timeline

Tn

SiSeeds

Unlabeled

examples Ui

Test

91a 98b

Ui Ui Ui

Experiment 5: Enlarge the size of unlabeled data and varyseeds

Page 48: Updating a Name Tagger Using  Contemporary Unlabeled Data

Augment unlabeled data?

Timeline

Tn

SiSeeds

Unlabeled

examples Ui

Test

91a 98b

Ui Ui Ui Ui

Experiment 5: Enlarge the size of unlabeled data and varyseeds

Page 49: Updating a Name Tagger Using  Contemporary Unlabeled Data

Augment unlabeled data?

Timeline

Tn

SiSeeds

Unlabeled

examples Ui

Test

91a 98b

Ui Ui Ui Ui Ui

Experiment 5: Enlarge the size of unlabeled data and varyseeds

Page 50: Updating a Name Tagger Using  Contemporary Unlabeled Data

Augment unlabeled data?

Timeline

Tn

SiSeeds

Unlabeled

examples Ui

Test

91a 98b

Ui Ui Ui Ui Ui Ui

Experiment 5: Enlarge the size of unlabeled data and varyseeds

Page 51: Updating a Name Tagger Using  Contemporary Unlabeled Data

Updating the unlabeled data is better than

accumulating older unlabeled data0.

740.

760.

780.

800.

820.

84

Time frame (semester)

F−m

easu

re

(i,98b,98b)(i,u[i,...,98a],98b)(98b,u[i,...,98a],98b)

91a

91b

92a

92b

93a

93b

94a

94b

95a

95b

96a

96b

97a

97b

98a

98b

Violet line: Seeds in the same time frame asunlabeled set being added; unlabeled data isenlarging backwards

Blue line: Seeds are the same as in the violetline; same unlabeled data for all taggers (98b)

Green line: Same seeds for all taggers (98b);unlabeled data is enlarging backwards

Larger amounts of unlabeleddata is worse than training withcontemporary unlabeled data

Larger amounts of unlabeleddata does not outperform thetagger trained withcontemporary seeds andunlabeled data

Page 52: Updating a Name Tagger Using  Contemporary Unlabeled Data

Updating the unlabeled data is better than

accumulating older unlabeled data0.

740.

760.

780.

800.

820.

84

Time frame (semester)

F−m

easu

re

(i,98b,98b)(i,u[i,...,98a],98b)(98b,u[i,...,98a],98b)

91a

91b

92a

92b

93a

93b

94a

94b

95a

95b

96a

96b

97a

97b

98a

98b

Violet line: Seeds in the same time frame asunlabeled set being added; unlabeled data isenlarging backwards

Blue line: Seeds are the same as in the violetline; same unlabeled data for all taggers (98b)

Green line: Same seeds for all taggers (98b);unlabeled data is enlarging backwards

Larger amounts of unlabeleddata is worse than training withcontemporary unlabeled data

Larger amounts of unlabeleddata does not outperform thetagger trained withcontemporary seeds andunlabeled data

Page 53: Updating a Name Tagger Using  Contemporary Unlabeled Data

Updating the unlabeled data is better than

accumulating older unlabeled data0.

740.

760.

780.

800.

820.

84

Time frame (semester)

F−m

easu

re

(i,98b,98b)(i,u[i,...,98a],98b)(98b,u[i,...,98a],98b)

91a

91b

92a

92b

93a

93b

94a

94b

95a

95b

96a

96b

97a

97b

98a

98b

Violet line: Seeds in the same time frame asunlabeled set being added; unlabeled data isenlarging backwards

Blue line: Seeds are the same as in the violetline; same unlabeled data for all taggers (98b)

Green line: Same seeds for all taggers (98b);unlabeled data is enlarging backwards

Larger amounts of unlabeleddata is worse than training withcontemporary unlabeled data

Larger amounts of unlabeleddata does not outperform thetagger trained withcontemporary seeds andunlabeled data

Page 54: Updating a Name Tagger Using  Contemporary Unlabeled Data

Final remarks

Contemporary unlabeled data are better data

But...

Why doesn’t the labeled data impact the performance more?Are other semi-supervised approaches also sensitive?

Page 55: Updating a Name Tagger Using  Contemporary Unlabeled Data

Acknowledgments

This research work was funded by Fundacao para a Ciencia e a

Tecnologia (doctoral scholarship SFRH/BD/3237/2000)

Page 56: Updating a Name Tagger Using  Contemporary Unlabeled Data

Updating a Name Tagger Using

Contemporary Unlabeled Data

ACL-IJCNLP 2009Singapore, August 3rd - 5th

Cristina Mota1,2 and Ralph Grishman2

1IST & L2F INESC-ID (Portugal)2New York University (USA)

(Advisors: Ralph Grishman & Nuno Mamede)

This research was funded by Fundacao para a Ciencia e a Tecnologia (doctoral scholarship SFRH/BD/3237/2000)

Page 57: Updating a Name Tagger Using  Contemporary Unlabeled Data

Example of (miss)classification

Test set 98b includes two instances of “Tizi Ouzou”:Tizi Ouzou tem (en: Tizi Ouzou has)manifestacoes em Tizi Ouzou (en: demonstrations in Tizi Ouzou)

Does not occur in u 91a so depends on contexts:(”n v” ”tem”) ORGANIZATION 0.52(”type” ”nprop v”) PERSON 0.43(”len” 2) PERSON 0.62

But occurs in u 98b:noite em Tizi (en: night in Tizi)ruas de Tizi Ouzou (en: street of Tizi Ouzou)ir a Tizi-Ouzou (en: go to Tizi Ouzou)

Page 58: Updating a Name Tagger Using  Contemporary Unlabeled Data

NE tagger: Identification

Raw text

Lexical analysis

Chunking

NE + context identification

Portuguese dictionary

Pairs (NE,context)

Priority dictionaries

Chunking grammars

Morphological grammars

NE + context grammars

Text with unclassified NE

Identification designed with NooJ(Silberztein, 2004)

1 Elisa Ferreira comecou porcriticar Cavaco Silva

2 [Elisa Ferreira]SEQM [comecouporcriticar]V+Complexo+Pred=criticar

[Cavaco Silva]SEQM

3 [Elisa Ferreira]nprop v+criticar

comecou por criticar [CavacoSilva]v nprop+criticar

4 [Elisa Ferreira]nprop v+criticar

[Cavaco Silva]v nprop+criticar

Page 59: Updating a Name Tagger Using  Contemporary Unlabeled Data

NE tagger: Classification

Seeds

Label with name rules

Infer context rules

Label with context rules

Infer name rules

Labeled examples

Context rules

Labeled examples

Name rules

Label with name + context rules

Labeled examples

Infername + context rules

List of examples

Name + context rules

Spelling features ← SEEDS: (ElisaFerreira,PESSOA,0.9999)

1 LABEL: Elisa Ferreira,criticar ← PESSOA

2 INFER: (criticar,PESSOA,0.98)

3 LABEL: Cavaco Silva,criticar ← PESSOA

4 INFER: (Silva,PESSOA,0.97)

5 REPEAT

Page 60: Updating a Name Tagger Using  Contemporary Unlabeled Data

NE tagger performance decreases over time (Mota & Grishman, 2008)

Detailed analysis using six-month periods (instead of periods of 1 year)

(Si , Ui , Tj)a b R2

P 0.827 -0.0024 0.24824R 0.773 -0.0022 0.19393F 0.799 -0.0023 0.23765

0 5 10 15

0.74

0.76

0.78

0.80

0.82

Time gap (1=6 months)

F−m

easu

re

y=−0.00232x+0.79906R2=0.2376

The performance decreases at an estimated rate of:

0.00232 in F-measure each 6 months (0.0348 after 8 years)

The low R-squared values show that not all variation is attributableto increasing the time gap

Page 61: Updating a Name Tagger Using  Contemporary Unlabeled Data

Updating the unlabeled data is better thanupdating the seeds (Complete training-test configurations)

0 5 10 15

0.74

0.76

0.78

0.80

0.82

Time gap (1=6 months)

F−m

easu

re

y=−0.00232x+0.79906R2=0.2376

Update? a b R2

No 0.799 -0.0023 0.238Seeds 0.800 -0.0019 0.192Unlabeled 0.807 -0.0005 0.019

Page 62: Updating a Name Tagger Using  Contemporary Unlabeled Data

Updating the unlabeled data is better thanupdating the seeds (Complete training-test configurations)

0 5 10 15

0.76

0.78

0.80

0.82

Time gap (1=6 months)

F−m

easu

re

y=−0.00189x+0.80025R2=0.1917

Update? a b R2

No 0.799 -0.0023 0.238Seeds 0.800 -0.0019 0.192Unlabeled 0.807 -0.0005 0.019

Page 63: Updating a Name Tagger Using  Contemporary Unlabeled Data

Updating the unlabeled data is better thanupdating the seeds (Complete training-test configurations)

0 5 10 15

0.77

0.78

0.79

0.80

0.81

0.82

0.83

Time gap (1=6 months)

F−m

easu

re

y=−0.00051x+0.80769R2=0.0189

Update? a b R2

No 0.799 -0.0023 0.238Seeds 0.800 -0.0019 0.192Unlabeled 0.807 -0.0005 0.019

Page 64: Updating a Name Tagger Using  Contemporary Unlabeled Data

Confusion matrices

91a 335 12 22 330 16 20 393 12 22

52 453 79 52 456 69 12 463 38

23 21 330 28 14 342 5 11 371

92b 368 19 42 368 16 40 391 11 22

19 435 55 23 445 39 14 463 29

23 32 334 19 25 352 5 12 380

95b 375 14 34 387 14 30 394 12 26

22 465 78 13 461 73 12 463 43

13 7 319 10 11 328 4 11 362

98a 390 16 31 386 16 28 395 11 28

11 458 58 13 460 48 11 464 39

9 12 342 11 10 355 4 11 364

98b 394 9 20 394 9 20 394 9 20

8 467 29 8 467 29 8 467 29

8 10 382 8 10 382 8 10 382