28
Open IE to KBP Relations in 3 Hours Stephen Soderland John Gilmer, Rob Bart, Oren Etzioni, Daniel S. Weld Turing Center University of Washington 11/18/2013 TAC-KBP Workshop 1

Open IE to KBP Relations in 3 Hours

  • Upload
    aimee

  • View
    37

  • Download
    0

Embed Size (px)

DESCRIPTION

Open IE to KBP Relations in 3 Hours . Stephen Soderland John Gilmer, Rob Bart, Oren Etzioni, Daniel S. Weld Turing Center University of Washington. Open IE. “Steve Jobs, the co-founder of Apple, died of cancer in his Palo Alto home.” Arg1 Rel Arg2 ( Steve Jobs ,died of ,cancer). - PowerPoint PPT Presentation

Citation preview

Page 1: Open IE to KBP Relations  in 3 Hours

Open IE to KBP Relations in 3 Hours

Stephen SoderlandJohn Gilmer, Rob Bart, Oren Etzioni, Daniel S. Weld

Turing CenterUniversity of Washington

11/18/2013 TAC-KBP Workshop 1

Page 2: Open IE to KBP Relations  in 3 Hours

TAC-KBP Workshop 211/18/2013

Open IE

“Steve Jobs, the co-founder of Apple, died of cancer in his Palo Alto home.”

Arg1 Rel Arg2(Steve Jobs , died of , cancer)

Page 3: Open IE to KBP Relations  in 3 Hours

TAC-KBP Workshop 311/18/2013

Open IE

“Steve Jobs, the co-founder of Apple, died of cancer in his Palo Alto home.”

Arg1 Rel Arg2(Steve Jobs , died of , cancer)(Steve Jobs , died in , his Palo Alto home)

Page 4: Open IE to KBP Relations  in 3 Hours

TAC-KBP Workshop 411/18/2013

Open IE

“Steve Jobs, the co-founder of Apple, died of cancer in his Palo Alto home.”

Arg1 Rel Arg2(Steve Jobs , died of , cancer)(Steve Jobs , died in , his Palo Alto home)(Steve Jobs , is co-founder of , Apple)

Page 5: Open IE to KBP Relations  in 3 Hours

TAC-KBP Workshop 511/18/2013

Open IE

“Steve Jobs, the co-founder of Apple, died of cancer in his Palo Alto home.”

Arg1 Rel Arg2(Steve Jobs , died of , cancer)(Steve Jobs , died in , his Palo Alto home)(Steve Jobs , is co-founder of , Apple)

“Hamas denied responsibility for the attacks , which threaten to derail ongoing peace talks.”

Arg1 Rel Arg2(Hamas , denied responsibility for, the attacks)

Page 6: Open IE to KBP Relations  in 3 Hours

TAC-KBP Workshop 611/18/2013

Open IE

“Steve Jobs, the co-founder of Apple, died of cancer in his Palo Alto home.”

Arg1 Rel Arg2(Steve Jobs , died of , cancer)(Steve Jobs , died in , his Palo Alto home)(Steve Jobs , is co-founder of , Apple)

“Hamas denied responsibility for the attacks , which threaten to derail ongoing peace talks.”

Arg1 Rel Arg2(Hamas , denied responsibility for, the attacks)(the attacks , threatened to derail, ongoing peace talks)

Page 7: Open IE to KBP Relations  in 3 Hours

TAC-KBP Workshop 711/18/2013

Open IE

“Steve Jobs, the co-founder of Apple, died of cancer in his Palo Alto home.”

Arg1 Rel Arg2(Steve Jobs , died of , cancer)(Steve Jobs , died in , his Palo Alto home)(Steve Jobs , is co-founder of , Apple)

“Hamas denied responsibility for the attacks , which threaten to derail ongoing peace talks.”

Arg1 Rel Arg2(Hamas , denied responsibility for, the attacks)(the attacks , threatened to derail, ongoing peace talks)“Ribosomes , which are complexes made of ribosomal RNA and protein,

are the cellular components that carry out protein synthesis.”

Arg1 Rel Arg2(Ribosomes , are complexes made of , ribosomal RNA and protein)

Page 8: Open IE to KBP Relations  in 3 Hours

TAC-KBP Workshop 811/18/2013

Open IE

“Steve Jobs, the co-founder of Apple, died of cancer in his Palo Alto home.”

Arg1 Rel Arg2(Steve Jobs , died of , cancer)(Steve Jobs , died in , his Palo Alto home)(Steve Jobs , is co-founder of , Apple)

“Hamas denied responsibility for the attacks , which threaten to derail ongoing peace talks.”

Arg1 Rel Arg2(Hamas , denied responsibility for, the attacks)(the attacks , threatened to derail, ongoing peace talks)“Ribosomes , which are complexes made of ribosomal RNA and protein,

are the cellular components that carry out protein synthesis.”

Arg1 Rel Arg2(Ribosomes , are complexes made of , ribosomal RNA and protein)(Ribosomes , are , the cellular components)

Page 9: Open IE to KBP Relations  in 3 Hours

TAC-KBP Workshop 911/18/2013

Open IE

“Steve Jobs, the co-founder of Apple, died of cancer in his Palo Alto home.”

Arg1 Rel Arg2(Steve Jobs , died of , cancer)(Steve Jobs , died in , his Palo Alto home)(Steve Jobs , is co-founder of , Apple)

“Hamas denied responsibility for the attacks , which threaten to derail ongoing peace talks.”

Arg1 Rel Arg2(Hamas , denied responsibility for, the attacks)(the attacks , threatened to derail, ongoing peace talks)“Ribosomes , which are complexes made of ribosomal RNA and protein,

are the cellular components that carry out protein synthesis.”

Arg1 Rel Arg2(Ribosomes , are complexes made of , ribosomal RNA and protein)(Ribosomes , are , the cellular components)(Ribosomes , carry out , protein synthesis)

Page 10: Open IE to KBP Relations  in 3 Hours

TAC-KBP Workshop 10

• Advantages of Open IE– Robust– Massively scalable– Works out of the box– Finds whatever relations are expressed in the text– Not tied to an ontology of relations

• Disadvantages– Finds whatever relations are expressed in the text– Not tied to an ontology of relations

• Challenge– Map Open IE to an ontology of relations– Minimum of user effort

11/18/2013

github/knowitall/openie

Page 11: Open IE to KBP Relations  in 3 Hours

TAC-KBP Workshop 1111/18/2013

per:cause_of_death:

(Steve Jobs , died of cancer)(Steve Jobs , died from , cancer)(Steve Jobs , passed away from , cancer)(Steve Jobs , succumbed to , cancer)(cancer , killed , Steve Jobs)

…(cancer , claimed the life of Steve Jobs)(Steve Jobs , lost his battle to , cancer)(Steve Jobs , was a victim of cancer)(Steve Jobs , could not beat , cancer)(Steve Jobs , could not have prevented , his death from cancer)(Steve Jobs , joins the ranks of cancer fatalities)

Head:high frequency

Long tail:low frequency

Page 12: Open IE to KBP Relations  in 3 Hours

TAC-KBP Workshop 12

Outline

• Rules to map to target relations– Rule language– Semantic taggers

• KBP system– Architecture– 3 hour rule set vs. 12 hour rule set

• Results and discussion• Future work

11/18/2013

Page 13: Open IE to KBP Relations  in 3 Hours

TAC-KBP Workshop 13

Desiderata for Target Relation Mapping

• Works even if no annotated training• User may have limited skill in NLP and ML• Rules are understandable to user• High precision and good generalization

Approach:– Manually created rules based on Open IE tuples– Simple rule language– Rules combine lexical and semantic type constraints– Extensible semantic types based on keyword tagger

11/18/2013

Page 14: Open IE to KBP Relations  in 3 Hours

TAC-KBP Workshop 14

Rule language

11/18/2013

(Smith, was appointed, Acting Director of Acme Corporation) entity slotfill

Terms in Rule Example

Target relation: per:employee_or_member_ofQuery entity in: Arg1Slotfill in: Arg2Slotfill type: OrganizationArg1 terms: -Relation terms: appointedArg2 terms: <JobTitle> ofFunctional? no

Page 15: Open IE to KBP Relations  in 3 Hours

TAC-KBP Workshop 15

Rule language

11/18/2013

(Smith, was appointed, Acting Director of Acme Corporation)per:employee_or_member_of (Smith, Acme Corporation)

Terms in Rule Example

Target relation: per:employee_or_member_ofQuery entity in: Arg1Slotfill in: Arg2Slotfill type: OrganizationArg1 terms: -Relation terms: appointedArg2 terms: <JobTitle> ofFunctional? no

Page 16: Open IE to KBP Relations  in 3 Hours

TAC-KBP Workshop 16

Semantic Tagging

• General types– Person, Organization, Location, Date– NER tagger– WordNet

• User-specified types– Keyword tagger– User creates file of terms for the semantic type– Taggers takes file as input– Used lists from CMU’s NELL for KBP

11/18/2013

github/knowitall/taggers

Page 17: Open IE to KBP Relations  in 3 Hours

TAC-KBP Workshop 17

Semantic Types from CMU’s NELL

• 4K Job titles – academic coordinator … zonal underwriting manager

• 182 Head job titles – acting chief director … vice-director

• 47 Religions– Adventist … Zoroastrianism

• 114 Nationalities – Akkadian … Zambian

• 5K Cities: Aachen … Zwolle• 536 State-provinces: Ad Dali … Zlitan• 241 Countries: Afghanistan … Zimbabwe

11/18/2013

Page 18: Open IE to KBP Relations  in 3 Hours

TAC-KBP Workshop 18

Outline

• Rules to map to target relations– Rule language– Semantic taggers

• KBP system– architecture– 3 hour rule set vs. 12 hour rule set– Co-reference

• Results and discussion• Future work

11/18/2013

Page 19: Open IE to KBP Relations  in 3 Hours

TAC-KBP Workshop 19

KBP Architecture

11/18/2013

200M tuples

Page 20: Open IE to KBP Relations  in 3 Hours

TAC-KBP Workshop 20

What We Did Not Handle

• Entity disambiguation needed for KBP precision– Good extraction for “Paul Gray”, but wrong Paul Gray

• Mostly ignored this in our system– Find any tuple that matched entity string– Detect ambiguous entities if linked to multiple KB entries– Discard all results for ambigous entities

11/18/2013

Page 21: Open IE to KBP Relations  in 3 Hours

TAC-KBP Workshop 21

Creating Rule Sets

• 3 Hour Rules set– Avg 3 rules per relation– Light editing of NELL keyword lists

per:cause_of_death = “died of”, “died from”, “died as a result of”, “died due to”

• 12 Hour Rules set (over two week period)– Avg 16 rules per relation– Refined rules, testing on 2012 KBP answer key– Further editing of NELL keyword lists

per:cause_of_death = “die of”, “dies of”, “dying of”, … “succumbed to”, “succumbs to”, …

11/18/2013

Page 22: Open IE to KBP Relations  in 3 Hours

TAC-KBP Workshop 22

Outline

• Rules to map to target relations– Rule language– Semantic taggers

• KBP system– architecture– 3 hour rule set vs. 12 hour rule set– Co-reference

• Results and discussion• Future work

11/18/2013

Page 23: Open IE to KBP Relations  in 3 Hours

TAC-KBP Workshop 23

KBP Results

11/18/2013

Extractor Precision:per:title(Paul Gray, bassist)per:title(Paul Gray, president)

KBP Precision:per:title(Paul Gray, bassist)

per:title(Paul Gray, president)

35% recall boost from 12 hours

Page 24: Open IE to KBP Relations  in 3 Hours

TAC-KBP Workshop 24

Error Analysis

• 31% “Looked right to me”“Tantawi was the grand sheik” => per:title(Tantawi, sheik)“ETA's political wing Batasuna” => org:subsidiary(ETA, Batasuna)

• 23% Overgeneralized rules“Ginzburg was an outspoken critic” => per:title(Ginzburg, critic)“Meredith led the NFL in scoring” =>

per:employee_or_member_of(Meredith, NFL)

• 19% Rules matched on non-head terms“Kahn’s younger sister married Shankar” => per:spouse(Kahn, Shankar)

• 15% Open IE errors• 12% Coref errors

11/18/2013

Page 25: Open IE to KBP Relations  in 3 Hours

TAC-KBP Workshop 25

Ceiling for Recall from Open IE

• 42% Extracts all information for KBP relation• 16% Extractor truncates an argument

Omits appositive or parenthetical “Sheikh Tantawi, the top Egyptian cleric who died on Wednesday…” (the top Egyptian cleric , died on, Wednesday)

• 10% Extractor misses “relational noun”“Tantawi, the Grand Imam of Al-Azhar”

• 10% No extraction of relevant part of sentenceSyntactic complexity

• 4% Extraction error• 18% Other

11/18/2013

68%

Page 26: Open IE to KBP Relations  in 3 Hours

TAC-KBP Workshop 26

Future Work

• Increase recall of Open IE• Increase precision of rule applier• General method not tied to KBP task

– Plug in any ontology of relations– Results not tied to query entity

• Release as open-source software

11/18/2013

Page 27: Open IE to KBP Relations  in 3 Hours

TAC-KBP Workshop 27

Conclusion

• Novel approach for KBP Slot Filling– Run Open IE extractor on corpus– Semantic taggers based on user-written keyword lists– User-written rules to map target relations to Open IE

• Results – High extraction precision 0.80– Moderate recall 0.10 (comparable to all but top sites)

• Low human effort– Requires no NLP or ML experience– Only 3 hours effort gives high precision

11/18/2013

Page 28: Open IE to KBP Relations  in 3 Hours

TAC-KBP Workshop 28

Thank you

github/knowitall/openiegithub/knowitall/taggers

11/18/2013