18
Enhancing R ecallin Inform ation Extraction through O ntolog ical Sem antics Sergei N irenburg, M arjorie M cShane and StephenBeale Institute f or La nguage an d Inf orm ation Technologies University of Maryland, B altimore County Baltimore, MD, U SA

Fact Extraction Ontology Ontological- Semantic Analysis Text Meaning Representation (TMR) Fact Repository (FR) Text Sources Lexicons Grammars Static Knowledge

Embed Size (px)

Citation preview

Page 1: Fact Extraction Ontology Ontological- Semantic Analysis Text Meaning Representation (TMR) Fact Repository (FR) Text Sources Lexicons Grammars Static Knowledge

Enhancing Recall in Information Extraction throughOntological Semantics

Sergei Nirenburg, Marjorie McShane and Stephen BealeInstitute for Language and Information Technologies

University of Maryland, Baltimore CountyBaltimore, MD, USA

Page 2: Fact Extraction Ontology Ontological- Semantic Analysis Text Meaning Representation (TMR) Fact Repository (FR) Text Sources Lexicons Grammars Static Knowledge

Fact Extraction

Ontology

Ontological-Semantic Analysis

TextMeaningRepresentation(TMR)

Fact Repository(FR)

TextSources

LexiconsGrammars

Static Knowledge Sources

Data and Control Flow

Knowledge Support

QuestionAnswering

Page 3: Fact Extraction Ontology Ontological- Semantic Analysis Text Meaning Representation (TMR) Fact Repository (FR) Text Sources Lexicons Grammars Static Knowledge

Fact Production

Ontology

Ontological-Semantic Analysis

TextMeaningRepresentation(TMR)

Fact Repository(FR)

TextSources

LexiconsGrammars

Static Knowledge Sources

Data and Control Flow

Knowledge Support

QuestionAnswering

ReportGeneration

Summary Production

TrendIdentificationUser

Page 4: Fact Extraction Ontology Ontological- Semantic Analysis Text Meaning Representation (TMR) Fact Repository (FR) Text Sources Lexicons Grammars Static Knowledge

Query Formulation

Ontology

Ontological-Semantic Analysis

TextMeaningRepresentation(TMR)

Fact Repository(FR)

LexiconsGrammars

Static Knowledge Sources

Data and Control Flow Knowledge Support

Answer

Query

Page 5: Fact Extraction Ontology Ontological- Semantic Analysis Text Meaning Representation (TMR) Fact Repository (FR) Text Sources Lexicons Grammars Static Knowledge

(he (he-pro1 (cat n) (MORPH )

(ANNO(DEF "the pronoun 'he'")

(EX "He kicked the can.")(COMMENTS "Expect the same weights for constraints for 'he' and 'she'"))

(SYN-STRUC((root $var0) (cat n) (type pro)))

(SEM-STRUC (ANIMAL))

(MEANING-PROCEDURE (TRIGGER-REFERENCE

(third singular male) (same-clause .1) (preceding-clause .7)

(pre-preceding-clause .5) (preceding-sent .5) (sentence-minus-2 .2)

(sentence-minus-3 .1) (para-break .5)

(repeat-collocation .7) (synonym-collocation .6)

(agent-theme .8) (pp-embedded .2) (function-match .7)

(coord .7) ;; (refl-null-sem .5) NA ;; (subj-of-same-clause .5) NA

Page 6: Fact Extraction Ontology Ontological- Semantic Analysis Text Meaning Representation (TMR) Fact Repository (FR) Text Sources Lexicons Grammars Static Knowledge

A description of the heuristics

same-clause The candidate is in the same clause

preceding-clause The candidate is in the preceding clause

pre-preceding-clause The candidate is in any clause in the given sentence farther back than the

preceding clause.

preceding-sent The candidate is in the preceding sentence.

sentence-minus-2 The candidate is two sentences back.

sentence-minus-3 The candidate is three sentences back.

para-break The candidate is in the preceding paragraph. Valid only for candidates that,

themselves, are not in the first sentence of a paragraph.

repeat-collocation The nominal candidate has been the same argument of the given verb

previously.

syn-collocation The nominal candidate has been the same argument of a synonymous (or

similar) verb previously.

agent-theme The nominal candidate is one of the main arguments in its clause, which we

can define for now as the agent or theme (not path, beneficiary,

instrument, etc.)

pp-embedded The nominal candidate is embedded in a PP (this is mostly to weed out non-

prominent adjuncts, but since some arguments are PPs, it can't be

too strong a heuristic; it is better to look at case roles).

Page 7: Fact Extraction Ontology Ontological- Semantic Analysis Text Meaning Representation (TMR) Fact Repository (FR) Text Sources Lexicons Grammars Static Knowledge

function-match The syntactic function of the candidate matches the function of the referring expression

coord The candidate is an argument in the preceding conjunct of a coordinate structure BUT the coordinate structure must be larger than the category itself: i.e., we want to catch the fact that the coordination in 'I picked up the book and read it' (VP coordination) is a strong indicator of coreference, but we don't want to assume that there is coreference in 'I told John and him'.

refl-null-sem The reflexive directly follows a NP that has matching features; in this case, it is rendered as null semantics (I myself know… He himself thought… The plans themselves are …)

subj-of-same-clause The candidate is the head of the subject of the same clause (used to distinguish anaphors). By 'subject of same clause' we mean whichever of the following is the nearest:

1) the overt or elided subject of the minimal tensed clause - Mary-i likes herself-i. - Mary-i is happy-go-lucky and pro-i likes herself-i 2) the PRO (non-overt) subject of the given infinitival clause - John-i forces his children-j PRO-j to fight for

themselves-j 3) the overt PP "subject" of the given infinitival clause - For me-i to hurt myself-i would be stupid.

It's important to say 'the head' of the subject because, for example, 'Mary's-i dog likes her-i' has coreference between the mod of the head and the d.o.

refl-cl-subj The reflexive has to match features with the subject of the clause: He went to the movies himself

Page 8: Fact Extraction Ontology Ontological- Semantic Analysis Text Meaning Representation (TMR) Fact Repository (FR) Text Sources Lexicons Grammars Static Knowledge

;United Ai r line s fi l es fo r US Bankr uptc y protect i on.

PREPROCESSOR OUTPUT: not ice th e ambigui t ies:

(( UNITED-Air l ines n (( ty pe pn) (t ype company) )) ( FILE N (( NUMBER PL) )) (F I LE V (( PERSON THI RD) (T ENSE PRESENT) ) *REPEAT*) ( FOR CONJ NIL) (F OR PREP NIL *REPEAT*) ( US N (( TYPE PRO))) (US N (( TYPE COUNTRY) (T YPE PN) ) *REPEAT*) ( BANKRUPTCY N NIL) ( PROTECTION N NIL) ( *PERIOD* PUNCT NIL)))

Page 9: Fact Extraction Ontology Ontological- Semantic Analysis Text Meaning Representation (TMR) Fact Repository (FR) Text Sources Lexicons Grammars Static Knowledge

SYNTAX OUTPUT

0 - 6 : CL (( 0 1 NP UNITED-AI RLINES) (1 2 V FI LE) (2 6 PP FOR)) FI LE 10 - 1 : NP (( 0 1 N UNI TED-AIRLI NES)) UNI TED-AIRLI NES 00 - 1 : N NIL UNI TED-AIRLI NES 01 - 2 : V NIL FI LE 12 - 6 : PP (( 2 3 PREP FOR) (3 6 NP PROTECTION)) FOR 22 - 3 : PREP NIL FOR 23 - 6 : NP (( 3 3 ART ** *) (3 6 N PROTECTI ON)) PROTECTI ON 5

;; noti ce some ambigui t y le f t as to (US (BANKRUPTCY PROTECTI ON)) vs.;; ((US BANKRUPTCY) PROTECTI ON) ;; The semanti c analyze r pi cked th e one it li ked best, but we will continu e;; worki ng on thes e n- n compounds

3 - 6 : N (( 3 5 N BANKRUPTCY) (5 6 N PROTECTI ON)) PROTECTI ON 53 - 6 : N (( 3 4 N US) (4 6 N PROTECTI ON)) PROTECTI ON 53 - 5 : N (( 3 4 N US) (4 5 N BANKRUPTCY)) BANKRUPTCY 43 - 4 : N NIL US 33 - 3 : ART NIL ** * NIL4 - 6 : N (( 4 5 N BANKRUPTCY) (5 6 N PROTECTI ON)) PROTECTI ON 54 - 5 : N NIL BANKRUPTCY 45 - 6 : N NIL PROTECTI ON 5

Page 10: Fact Extraction Ontology Ontological- Semantic Analysis Text Meaning Representation (TMR) Fact Repository (FR) Text Sources Lexicons Grammars Static Knowledge

SEMANTIC OUTPUT:

(APPLY- FOR-28 (TIM E ( VALUE (COMMON (F I ND-ANCHOR- TIME) ))) (THEME ( VALUE (COMMON PROTECT- 28)) ) (AGENT ( VALUE (COMMON CORPORATION- 28))) (INS TANCE- OF ( VALUE (COMMON APPLY-F OR))))

(P ROTECT-28 (RELATION ( VALUE (COMMON BANKRUPT-28) ) ) (THEME-OF ( VALUE (COMMON APPLY-F OR-28) )) (INS TANCE- OF ( VALUE (COMMON PROTECT) )))

Page 11: Fact Extraction Ontology Ontological- Semantic Analysis Text Meaning Representation (TMR) Fact Repository (FR) Text Sources Lexicons Grammars Static Knowledge

(CORPORATION- 28 (HAS- NAME ( VALUE (COMMON "UAL CORP")) ) ;; "UAL CORP" is th e "of fici al" FR name (AGENT-OF ( VALUE (COMMON APPLY-F OR-28) )) (INS TANCE- OF ( VALUE (COMMON CORPORATION) ) ))

( UNITED-STA TES-OF-AMERICA- 28 (INS TANCE- OF ( VALUE (COMMON UNI TED-STATES-OF- AMERICA))))

(BANKRUPT-28 (RELATION ( VALUE (COMMON UNI TED-STATES-OF- AMERICA-28) )) (INS TANCE- OF ( VALUE (COMMON BANKRUPT))))

Page 12: Fact Extraction Ontology Ontological- Semantic Analysis Text Meaning Representation (TMR) Fact Repository (FR) Text Sources Lexicons Grammars Static Knowledge

;* b57-3*; 10 December 2002

(DATE-29 (VALUE ( VALUE (COMMON (( YEAR \2 002) (DATE \1 0)) (( YEAR \2 002) (MONTH \1 2)))) ) (INS TANCE- OF ( VALUE (COMMON DATE))))

Page 13: Fact Extraction Ontology Ontological- Semantic Analysis Text Meaning Representation (TMR) Fact Repository (FR) Text Sources Lexicons Grammars Static Knowledge

;* b57-4*;UAL Cor porat ion fi l ed fo r Chapter 11 prot ectio n.

(APPLY- FOR-153 (TIM E ( VALUE (COMMON (< (FIN D-ANCHOR-TI ME))) ) ) (THEME ( VALUE (COMMON PROTECT- 154)) ) (AGENT ( VALUE (COMMON CORPORATION- 153)) ) (INS TANCE- OF ( VALUE (COMMON APPLY-F OR))))

(CORPORATION- 153 (HAS- NAME ( VALUE (COMMON "UAL CORP")) ) (AGENT-OF ( VALUE (COMMON APPLY-F OR-153))) (INS TANCE- OF ( VALUE (COMMON CORPORATION) ) ))

( PROTECT-15 4 (RELATION ( VALUE (COMMON CHAPTER- 11-BANKRUPTCY- PROTECTION-155) )) (THEME-OF ( VALUE (COMMON APPLY-F OR-153))) (INS TANCE- OF ( VALUE (COMMON PROTECT) )))

Page 14: Fact Extraction Ontology Ontological- Semantic Analysis Text Meaning Representation (TMR) Fact Repository (FR) Text Sources Lexicons Grammars Static Knowledge

;* b57-5*;T he company has sai d it wil l lo ok at all aspects of it s operati ons.

(S PEECH- ACT-232 (TIM E ( VALUE (COMMON (< (FIN D-ANCHOR-TI ME))) )) (THEME ( VALUE (COMMON CONSIDER-232) )) (AGENT ( VALUE (COMMON CORPORATION- 232)) ) (INS TANCE- OF ( VALUE (COMMON SPEECH-ACT)) ))

(CORPORATION- 232 (HAS- NAME ( VALUE (COMMON "UAL CORP")) ) ; ; re f erence re solut i on: "c ompany" = "UAL CORP" (AGENT-OF ( VALUE (COMMON SPEECH-ACT-2 32))) (INS TANCE- OF ( VALUE (COMMON CORPORATION) ) ))

Page 15: Fact Extraction Ontology Ontological- Semantic Analysis Text Meaning Representation (TMR) Fact Repository (FR) Text Sources Lexicons Grammars Static Knowledge

(CONSIDER-232 (THEME-OF ( VALUE (COMMON SPEECH-ACT-2 32))) (TIM E ( VALUE (COMMON (> (FIN D-ANCHOR-TI ME))) )) (THEME ( VALUE (COMMON SET-232) )) ;; all aspects (AGENT ( VALUE (COMMON PHYSICAL-OBJ ECT-2 32))) (INS TANCE- OF ( VALUE (COMMON CONSIDER))))

(MI LITA RY-AC TIVIT Y-232 ;; "operat i ons" - obvious l y not cor rect her e (POSSESSED-BY ;; it s operati on = operati on of UAL ( VALUE (COMMON PHYSIC AL-OBJ ECT- 233)) ) (PARTS ;; "aspect " of th e operati ons ( VALUE (COMMON OBJ ECT- 232)) ) (CARDINALI TY ( VALUE (COMMON (> 1))) ) (INS TANCE- OF ( VALUE (COMMON MILITAR Y-ACTI VITY ) ))

Page 16: Fact Extraction Ontology Ontological- Semantic Analysis Text Meaning Representation (TMR) Fact Repository (FR) Text Sources Lexicons Grammars Static Knowledge

(S ET-23 2 ; ; al l aspects (SET- MEMBER-TY PE ( VALUE (COMMON OBJ ECT- 232)) ) (QUANT ( VALUE (COMMON \1 ) )) (INS TANCE- OF ( VALUE (COMMON SET))))

(OBJECT- 232 ;; aspect of its operati on (PART-OF ( VALUE (COMMON MILITAR Y-ACTI VITY- 232)) ) (CARDINALI TY ( VALUE (COMMON (> 1))) ) (THEME-OF ( VALUE (COMMON CONSIDER-232) )) (INS TANCE- OF ( VALUE (COMMON OBJ ECT)) )

Page 17: Fact Extraction Ontology Ontological- Semantic Analysis Text Meaning Representation (TMR) Fact Repository (FR) Text Sources Lexicons Grammars Static Knowledge

(P HYSICAL-OBJ ECT- 232 ;; re f erence re solut i on: it = UAL (HAS- NAME ( VALUE (COMMON "UAL CORP")) ) (COREFERENCE ( VALUE (COMMON +)) ) (AGENT-OF ( VALUE (COMMON CONSIDER-232) )) (INS TANCE- OF ( VALUE (COMMON PHYSICAL-OBJ ECT)) ))

(P HYSICAL-OBJ ECT- 233 ;; re f erence re solut i on: it s = UAL's (HAS- NAME ( VALUE (COMMON "UAL CORP")) ) (COREFERENCE ( VALUE (COMMON +)) ) (INS TANCE- OF ( VALUE (COMMON PHYSICAL-OBJ ECT)) ))

Page 18: Fact Extraction Ontology Ontological- Semantic Analysis Text Meaning Representation (TMR) Fact Repository (FR) Text Sources Lexicons Grammars Static Knowledge

Text1

Text2

Text3

Text19

Text57

Text59

Totals

Events 15 8 6 10 6 15 60Full NP Subjects 10 5 3 6 5 10 39Pro Subjects 3 1 1 2 1 3 11No Subjects 2 2 2 2 0 2 10Proper Name Subj 10 6 4 6 5 12 43Subjects which are common nouns 2 0 0 1 0 1 4Nominalized subjects 1 0 0 1 1 0 3No subjects 2 2 2 2 0 2 10Events identified correctly 15 8 4 6 6 12 51Events identified incorrectly 1 0 0 2 0 2 5Events not identified 0 0 2 4 0 3 9Agents identified without ReferenceResolution

5/15 2/8 2/6 2/10 1/6 6/15 18/60

Agents identified with ReferenceResolution

13/15 6/8 4/6 5/10 5/6 8/15 41/60

Agent Referents needing to beresolved

10 5 4 4 3 4 30

Agent Referents resolved correctly 7 3 2 4 3 2 21Agent Referents resolvedincorrectly

1 0 0 0 0 0 1