Upload
denis-carpenter
View
215
Download
0
Embed Size (px)
Citation preview
Enhancing Recall in Information Extraction throughOntological Semantics
Sergei Nirenburg, Marjorie McShane and Stephen BealeInstitute for Language and Information Technologies
University of Maryland, Baltimore CountyBaltimore, MD, USA
Fact Extraction
Ontology
Ontological-Semantic Analysis
TextMeaningRepresentation(TMR)
Fact Repository(FR)
TextSources
LexiconsGrammars
Static Knowledge Sources
Data and Control Flow
Knowledge Support
QuestionAnswering
Fact Production
Ontology
Ontological-Semantic Analysis
TextMeaningRepresentation(TMR)
Fact Repository(FR)
TextSources
LexiconsGrammars
Static Knowledge Sources
Data and Control Flow
Knowledge Support
QuestionAnswering
ReportGeneration
Summary Production
TrendIdentificationUser
Query Formulation
Ontology
Ontological-Semantic Analysis
TextMeaningRepresentation(TMR)
Fact Repository(FR)
LexiconsGrammars
Static Knowledge Sources
Data and Control Flow Knowledge Support
Answer
Query
(he (he-pro1 (cat n) (MORPH )
(ANNO(DEF "the pronoun 'he'")
(EX "He kicked the can.")(COMMENTS "Expect the same weights for constraints for 'he' and 'she'"))
(SYN-STRUC((root $var0) (cat n) (type pro)))
(SEM-STRUC (ANIMAL))
(MEANING-PROCEDURE (TRIGGER-REFERENCE
(third singular male) (same-clause .1) (preceding-clause .7)
(pre-preceding-clause .5) (preceding-sent .5) (sentence-minus-2 .2)
(sentence-minus-3 .1) (para-break .5)
(repeat-collocation .7) (synonym-collocation .6)
(agent-theme .8) (pp-embedded .2) (function-match .7)
(coord .7) ;; (refl-null-sem .5) NA ;; (subj-of-same-clause .5) NA
A description of the heuristics
same-clause The candidate is in the same clause
preceding-clause The candidate is in the preceding clause
pre-preceding-clause The candidate is in any clause in the given sentence farther back than the
preceding clause.
preceding-sent The candidate is in the preceding sentence.
sentence-minus-2 The candidate is two sentences back.
sentence-minus-3 The candidate is three sentences back.
para-break The candidate is in the preceding paragraph. Valid only for candidates that,
themselves, are not in the first sentence of a paragraph.
repeat-collocation The nominal candidate has been the same argument of the given verb
previously.
syn-collocation The nominal candidate has been the same argument of a synonymous (or
similar) verb previously.
agent-theme The nominal candidate is one of the main arguments in its clause, which we
can define for now as the agent or theme (not path, beneficiary,
instrument, etc.)
pp-embedded The nominal candidate is embedded in a PP (this is mostly to weed out non-
prominent adjuncts, but since some arguments are PPs, it can't be
too strong a heuristic; it is better to look at case roles).
function-match The syntactic function of the candidate matches the function of the referring expression
coord The candidate is an argument in the preceding conjunct of a coordinate structure BUT the coordinate structure must be larger than the category itself: i.e., we want to catch the fact that the coordination in 'I picked up the book and read it' (VP coordination) is a strong indicator of coreference, but we don't want to assume that there is coreference in 'I told John and him'.
refl-null-sem The reflexive directly follows a NP that has matching features; in this case, it is rendered as null semantics (I myself know… He himself thought… The plans themselves are …)
subj-of-same-clause The candidate is the head of the subject of the same clause (used to distinguish anaphors). By 'subject of same clause' we mean whichever of the following is the nearest:
1) the overt or elided subject of the minimal tensed clause - Mary-i likes herself-i. - Mary-i is happy-go-lucky and pro-i likes herself-i 2) the PRO (non-overt) subject of the given infinitival clause - John-i forces his children-j PRO-j to fight for
themselves-j 3) the overt PP "subject" of the given infinitival clause - For me-i to hurt myself-i would be stupid.
It's important to say 'the head' of the subject because, for example, 'Mary's-i dog likes her-i' has coreference between the mod of the head and the d.o.
refl-cl-subj The reflexive has to match features with the subject of the clause: He went to the movies himself
;United Ai r line s fi l es fo r US Bankr uptc y protect i on.
PREPROCESSOR OUTPUT: not ice th e ambigui t ies:
(( UNITED-Air l ines n (( ty pe pn) (t ype company) )) ( FILE N (( NUMBER PL) )) (F I LE V (( PERSON THI RD) (T ENSE PRESENT) ) *REPEAT*) ( FOR CONJ NIL) (F OR PREP NIL *REPEAT*) ( US N (( TYPE PRO))) (US N (( TYPE COUNTRY) (T YPE PN) ) *REPEAT*) ( BANKRUPTCY N NIL) ( PROTECTION N NIL) ( *PERIOD* PUNCT NIL)))
SYNTAX OUTPUT
0 - 6 : CL (( 0 1 NP UNITED-AI RLINES) (1 2 V FI LE) (2 6 PP FOR)) FI LE 10 - 1 : NP (( 0 1 N UNI TED-AIRLI NES)) UNI TED-AIRLI NES 00 - 1 : N NIL UNI TED-AIRLI NES 01 - 2 : V NIL FI LE 12 - 6 : PP (( 2 3 PREP FOR) (3 6 NP PROTECTION)) FOR 22 - 3 : PREP NIL FOR 23 - 6 : NP (( 3 3 ART ** *) (3 6 N PROTECTI ON)) PROTECTI ON 5
;; noti ce some ambigui t y le f t as to (US (BANKRUPTCY PROTECTI ON)) vs.;; ((US BANKRUPTCY) PROTECTI ON) ;; The semanti c analyze r pi cked th e one it li ked best, but we will continu e;; worki ng on thes e n- n compounds
3 - 6 : N (( 3 5 N BANKRUPTCY) (5 6 N PROTECTI ON)) PROTECTI ON 53 - 6 : N (( 3 4 N US) (4 6 N PROTECTI ON)) PROTECTI ON 53 - 5 : N (( 3 4 N US) (4 5 N BANKRUPTCY)) BANKRUPTCY 43 - 4 : N NIL US 33 - 3 : ART NIL ** * NIL4 - 6 : N (( 4 5 N BANKRUPTCY) (5 6 N PROTECTI ON)) PROTECTI ON 54 - 5 : N NIL BANKRUPTCY 45 - 6 : N NIL PROTECTI ON 5
SEMANTIC OUTPUT:
(APPLY- FOR-28 (TIM E ( VALUE (COMMON (F I ND-ANCHOR- TIME) ))) (THEME ( VALUE (COMMON PROTECT- 28)) ) (AGENT ( VALUE (COMMON CORPORATION- 28))) (INS TANCE- OF ( VALUE (COMMON APPLY-F OR))))
(P ROTECT-28 (RELATION ( VALUE (COMMON BANKRUPT-28) ) ) (THEME-OF ( VALUE (COMMON APPLY-F OR-28) )) (INS TANCE- OF ( VALUE (COMMON PROTECT) )))
(CORPORATION- 28 (HAS- NAME ( VALUE (COMMON "UAL CORP")) ) ;; "UAL CORP" is th e "of fici al" FR name (AGENT-OF ( VALUE (COMMON APPLY-F OR-28) )) (INS TANCE- OF ( VALUE (COMMON CORPORATION) ) ))
( UNITED-STA TES-OF-AMERICA- 28 (INS TANCE- OF ( VALUE (COMMON UNI TED-STATES-OF- AMERICA))))
(BANKRUPT-28 (RELATION ( VALUE (COMMON UNI TED-STATES-OF- AMERICA-28) )) (INS TANCE- OF ( VALUE (COMMON BANKRUPT))))
;* b57-3*; 10 December 2002
(DATE-29 (VALUE ( VALUE (COMMON (( YEAR \2 002) (DATE \1 0)) (( YEAR \2 002) (MONTH \1 2)))) ) (INS TANCE- OF ( VALUE (COMMON DATE))))
;* b57-4*;UAL Cor porat ion fi l ed fo r Chapter 11 prot ectio n.
(APPLY- FOR-153 (TIM E ( VALUE (COMMON (< (FIN D-ANCHOR-TI ME))) ) ) (THEME ( VALUE (COMMON PROTECT- 154)) ) (AGENT ( VALUE (COMMON CORPORATION- 153)) ) (INS TANCE- OF ( VALUE (COMMON APPLY-F OR))))
(CORPORATION- 153 (HAS- NAME ( VALUE (COMMON "UAL CORP")) ) (AGENT-OF ( VALUE (COMMON APPLY-F OR-153))) (INS TANCE- OF ( VALUE (COMMON CORPORATION) ) ))
( PROTECT-15 4 (RELATION ( VALUE (COMMON CHAPTER- 11-BANKRUPTCY- PROTECTION-155) )) (THEME-OF ( VALUE (COMMON APPLY-F OR-153))) (INS TANCE- OF ( VALUE (COMMON PROTECT) )))
;* b57-5*;T he company has sai d it wil l lo ok at all aspects of it s operati ons.
(S PEECH- ACT-232 (TIM E ( VALUE (COMMON (< (FIN D-ANCHOR-TI ME))) )) (THEME ( VALUE (COMMON CONSIDER-232) )) (AGENT ( VALUE (COMMON CORPORATION- 232)) ) (INS TANCE- OF ( VALUE (COMMON SPEECH-ACT)) ))
(CORPORATION- 232 (HAS- NAME ( VALUE (COMMON "UAL CORP")) ) ; ; re f erence re solut i on: "c ompany" = "UAL CORP" (AGENT-OF ( VALUE (COMMON SPEECH-ACT-2 32))) (INS TANCE- OF ( VALUE (COMMON CORPORATION) ) ))
(CONSIDER-232 (THEME-OF ( VALUE (COMMON SPEECH-ACT-2 32))) (TIM E ( VALUE (COMMON (> (FIN D-ANCHOR-TI ME))) )) (THEME ( VALUE (COMMON SET-232) )) ;; all aspects (AGENT ( VALUE (COMMON PHYSICAL-OBJ ECT-2 32))) (INS TANCE- OF ( VALUE (COMMON CONSIDER))))
(MI LITA RY-AC TIVIT Y-232 ;; "operat i ons" - obvious l y not cor rect her e (POSSESSED-BY ;; it s operati on = operati on of UAL ( VALUE (COMMON PHYSIC AL-OBJ ECT- 233)) ) (PARTS ;; "aspect " of th e operati ons ( VALUE (COMMON OBJ ECT- 232)) ) (CARDINALI TY ( VALUE (COMMON (> 1))) ) (INS TANCE- OF ( VALUE (COMMON MILITAR Y-ACTI VITY ) ))
(S ET-23 2 ; ; al l aspects (SET- MEMBER-TY PE ( VALUE (COMMON OBJ ECT- 232)) ) (QUANT ( VALUE (COMMON \1 ) )) (INS TANCE- OF ( VALUE (COMMON SET))))
(OBJECT- 232 ;; aspect of its operati on (PART-OF ( VALUE (COMMON MILITAR Y-ACTI VITY- 232)) ) (CARDINALI TY ( VALUE (COMMON (> 1))) ) (THEME-OF ( VALUE (COMMON CONSIDER-232) )) (INS TANCE- OF ( VALUE (COMMON OBJ ECT)) )
(P HYSICAL-OBJ ECT- 232 ;; re f erence re solut i on: it = UAL (HAS- NAME ( VALUE (COMMON "UAL CORP")) ) (COREFERENCE ( VALUE (COMMON +)) ) (AGENT-OF ( VALUE (COMMON CONSIDER-232) )) (INS TANCE- OF ( VALUE (COMMON PHYSICAL-OBJ ECT)) ))
(P HYSICAL-OBJ ECT- 233 ;; re f erence re solut i on: it s = UAL's (HAS- NAME ( VALUE (COMMON "UAL CORP")) ) (COREFERENCE ( VALUE (COMMON +)) ) (INS TANCE- OF ( VALUE (COMMON PHYSICAL-OBJ ECT)) ))
Text1
Text2
Text3
Text19
Text57
Text59
Totals
Events 15 8 6 10 6 15 60Full NP Subjects 10 5 3 6 5 10 39Pro Subjects 3 1 1 2 1 3 11No Subjects 2 2 2 2 0 2 10Proper Name Subj 10 6 4 6 5 12 43Subjects which are common nouns 2 0 0 1 0 1 4Nominalized subjects 1 0 0 1 1 0 3No subjects 2 2 2 2 0 2 10Events identified correctly 15 8 4 6 6 12 51Events identified incorrectly 1 0 0 2 0 2 5Events not identified 0 0 2 4 0 3 9Agents identified without ReferenceResolution
5/15 2/8 2/6 2/10 1/6 6/15 18/60
Agents identified with ReferenceResolution
13/15 6/8 4/6 5/10 5/6 8/15 41/60
Agent Referents needing to beresolved
10 5 4 4 3 4 30
Agent Referents resolved correctly 7 3 2 4 3 2 21Agent Referents resolvedincorrectly
1 0 0 0 0 0 1