22
[Freedman+ EMNLP11] Extreme Extraction Machine Reading in a Week 23 Dec 2011 Nakatani Shuyo @ Cybozu labs, Inc twitter : @shuyo

Extreme Extraction - Machine Reading in a Week

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Extreme Extraction - Machine Reading in a Week

[Freedman+ EMNLP11] Extreme

Extraction – Machine Reading in a

Week

23 Dec 2011

Nakatani Shuyo @ Cybozu labs, Inc

twitter : @shuyo

Page 2: Extreme Extraction - Machine Reading in a Week

Abstract

• Target:

– Rapid construction of concept and relation

extraction system

• Method:

– Extend an existing ACE system for new relation

– in short time with minimum training data

• in a Week (<50 person hours) with <20 example pairs

– Evaluate by question answering task

Page 3: Extreme Extraction - Machine Reading in a Week

Phases

1. Ontology and resources

2. Extending system for new ontology

3. Extracting relations

4. Evaluation

Page 4: Extreme Extraction - Machine Reading in a Week

1. Ontology and resources

• possibleTreatment( Substance, Condition )

– SSRIs(S) are effective treatments for depression(C)

• expectedDateOnMarket( Substance , Date )

– More drugs for type 2(S) expected on market soon(D)

• responsibleForTreatment( Substance, Agent )

– Officials(A) Responsible for Treatment of War Dead(S)

• studiesDisease( Agent , Condition )

– cancer(C) researcher Dr. Henri Joyeux(A)

• hasSideEffect( Substance, Condition )

not

sure

Page 5: Extreme Extraction - Machine Reading in a Week

2. Extending system for new

ontology

• Add new relation/class detectors into “our”

extraction system for ACE task

– Details of the system are not clear...

• Class detectors with unsupervised word clustering

• Bootstrap relation learner with a template and seeds

• Pattern learning for relation extraction

• Annotate words for 4 classes

• Coreference

Page 6: Extreme Extraction - Machine Reading in a Week

Bootstrap relation learner

• DAP(Double-Anchored Pattern) (Kozareva+ 08)

– Web search with a query based on “<CLASS>

such as <SEED> and *”

– Add words at the position “*” in snippet into the

class member as new seeds

– Repeat “the bootstraping loop” while seeds are

available

Page 7: Extreme Extraction - Machine Reading in a Week

Relation detection with DAP

• CLASS = disease / SEED = cold

• Web search = “disease such as cold and”

Page 8: Extreme Extraction - Machine Reading in a Week

Relation detection with DAP

• CLASS = disease / SEED = cold

• Web search = “disease such as cold and”

– disease such as cold and flu (9). ...

– disease such as cold and heat, external ...

– disease such as cold and pneumonia. ...

– disease (such as cold and hot diseases), ...

– disease such as cold and flu viruses. ...

– disease such as cold and food poisoning. ...

Page 9: Extreme Extraction - Machine Reading in a Week

Four classes to annotate

• Substance-Name

– medicine name

• Substance-Description

– e.g. “new drags”

• Condition-Name

– name of disease

• Condition-Description

– e.g. “the illness”

Page 10: Extreme Extraction - Machine Reading in a Week

Annotation

• Name tagging with active learning(Miller+ 04)

– Unsupervised word clustering on binary tree

(Brown+ 90)

– Tagging with clustering information

• Averaged Perceptron (Collins 02)

– Request annotation for selected sentence based on

“confidence score”

• score = (highest perceptron score) - (second one)

!?

Page 11: Extreme Extraction - Machine Reading in a Week

Results of Class Detection

• substances & conditions

– -Name / -Description respectively

• without/with lists of known substances and conditions

from [Freedman+ 11]

What’s

GS(GoldStandard)?

Page 12: Extreme Extraction - Machine Reading in a Week

Coreference

• It took the most time(20 of 43 hours)

• But its detail is not clear...

– domain independent heuristics

– appositive linking

Page 13: Extreme Extraction - Machine Reading in a Week

3. Extracting relations

• Learned Patterns vs. Handwritten Patterns

from [Freedman+ 11]

Page 14: Extreme Extraction - Machine Reading in a Week

from [Freedman+ 11]

Page 15: Extreme Extraction - Machine Reading in a Week

4. Evaluation

• Question Answering with extracted

information

• Query examples

– Find possible treatments for diabetes

– What is expected date to market for Abilify?

Page 16: Extreme Extraction - Machine Reading in a Week

Answer Example

• ACME produces a wide range of drugs

including treatments for malaria and

athletes foot

– responsibleForTreatment(drugs, ACME)

– possibleTreatment(drugs, malaria)

– possibleTreatment(drugs, athletes foot)

Page 17: Extreme Extraction - Machine Reading in a Week

• useful = answering complex query from [Freedman+ 11]

Page 18: Extreme Extraction - Machine Reading in a Week

When non-useful answers are removed

• annotator’s recall (A)

• using combining both (C)

• using only handwritten rules (H, HW)

• using only learned patterns (L)

from [Freedman+ 11]

Page 19: Extreme Extraction - Machine Reading in a Week

from [Freedman+ 11]

Page 20: Extreme Extraction - Machine Reading in a Week

Discussion

from [Freedman+ 11]

Page 21: Extreme Extraction - Machine Reading in a Week

Conclusions

• The combination system can achieve

F1 of 0.51 in a new domain in a week.

• It requires so little training data.

• The effectiveness of learning algorithms is

still not competitive with handwritten

patterns.

Page 22: Extreme Extraction - Machine Reading in a Week

References

• [Freedman+ 11] Extreme Extraction – Machine

Reading in a Week

• [Kozareva+ 08] Semantic Class Learning from the

Web with Hyponym Pattern Linkage

• [Miller+ 04] Name Tagging with Word Cluster and

Discriminative Training

– [Brown+ 90] Class-based n-gram models of natural

language

– [Collins 02] Discriminative Training Methods for Hidden

Markov Models: Theory and Experiments with Perceptron

Algorithm