CROSS-DOCUMENT RELATION DISCOVERY, TRUTH FINDING Heng Ji
[email protected] Nov 12, 2014
Slide 2
2 Task Definition Supervised Models Basic Features World
Knowledge Learning Models Joint Inference Semi-supervised Learning
Domain-independent Relation Extraction Outline
Slide 3
relation: a semantic relationship between two entities ACE
relation typeexample Agent-Artifact Rubin Military Design, the
makers of the Kursk Discourse each of whom Employment/
MembershipMr. Smith, a senior programmer at Microsoft
Place-Affiliation Salzburg Red Cross officials Person-Social
relatives of the dead Physical a town some 50 miles south of
Salzburg Other-Affiliation Republican senators Relation Extraction:
Task
Slide 4
Test Sample Train Sample K=3 A Simple Baseline with
K-Nearest-Neighbor (KNN)
Slide 5
Test Sample Train Sample: Employment Train Sample: Physical
Train Sample: Employment Train Sample: Physical 1. If the heads of
the mentions dont match: +8 2. If the entity types of the heads of
the mentions dont match: +20 3. If the intervening words dont
match: +10 the president of the United States the previous
president of the United States the secretary of NIST US forces in
Bahrain Connecticut s governor his ranch in Texas 4626 46 36 0
Relation Extraction with KNN
Slide 6
Lexical Heads of the mentions and their context words, POS tags
Entity Entity and mention type of the heads of the mentions Entity
Positional Structure Entity Context Syntactic Chunking Premodifier,
Possessive, Preposition, Formulaic The sequence of the heads of the
constituents, chunks between the two mentions The syntactic
relation path between the two mentions Dependent words of the
mentions Semantic Gazetteers Synonyms in WordNet Name Gazetteers
Personal Relative Trigger Word List Wikipedia If the head extent of
a mention is found (via simple string matching) in the predicted
Wikipedia article of another mention References: Kambhatla, 2004;
Zhou et al., 2005; Jiang and Zhai, 2007; Chan and Roth, 2010,2011
Typical Relation Extraction Features
Slide 7
7 Using Background Knowledge (Chan and Roth, 2010) Features
employed are usually restricted to being defined on the various
representations of the target sentences Humans rely on background
knowledge to recognize relations Overall aim of this work Propose
methods of using knowledge or resources that exists beyond the
sentence Wikipedia, word clusters, hierarchy of relations, entity
type constraints, coreference As additional features, or under the
Constraint Conditional Model (CCM) framework with Integer Linear
Programming (ILP) 7
Slide 8
8 8 David Cone, a Kansas City native, was originally signed by
the Royals and broke into the majors with the team Using Background
Knowledge
Slide 9
9 9 David Cone, a Kansas City native, was originally signed by
the Royals and broke into the majors with the team Using Background
Knowledge
Slide 10
10 David Cone, a Kansas City native, was originally signed by
the Royals and broke into the majors with the team Using Background
Knowledge
Slide 11
11 David Cone, a Kansas City native, was originally signed by
the Royals and broke into the majors with the team Using Background
Knowledge
Slide 12
12 David Cone, a Kansas City native, was originally signed by
the Royals and broke into the majors with the team Using Background
Knowledge David Brian Cone (born January 2, 1963) is a former Major
League Baseball pitcher. He compiled an 83 postseason record over
21 postseason starts and was a part of five World Series
championship teams (1992 with the Toronto Blue Jays and 1996, 1998,
1999 & 2000 with the New York Yankees). He had a career
postseason ERA of 3.80. He is the subject of the book A Pitcher's
Story: Innings With David Cone by Roger Angell. Fans of David are
known as "Cone-Heads." Major League BaseballpitcherWorld
Series1992Toronto Blue Jays1996199819992000New York YankeesRoger
AngellCone-Heads Cone lives in Stamford, Connecticut, and is
formerly a color commentator for the Yankees on the YES Network.
[1]Stamford, Connecticut color commentatorYES Network [1] Contents
[hide]hide 1 Early years 2 Kansas City Royals 3 New York Mets
Partly because of the resulting lack of leadership, after the 1994
season the Royals decided to reduce payroll by trading pitcher
David Cone and outfielder Brian McRae, then continued their salary
dump in the 1995 season. In fact, the team payroll, which was
always among the league's highest, was sliced in half from $40.5
million in 1994 (fourth-highest in the major leagues) to $18.5
million in 1996 (second-lowest in the major leagues)David ConeBrian
McRae1995 season1996
Slide 13
13 David Cone, a Kansas City native, was originally signed by
the Royals and broke into the majors with the team Using Background
Knowledge fine-grained Employment:Staff0.20
Employment:Executive0.15 Personal:Family0.10 Personal:Business0.10
Affiliation:Citizen0.20 Affiliation:Based-in0.25
Slide 14
14 David Cone, a Kansas City native, was originally signed by
the Royals and broke into the majors with the team Using Background
Knowledge fine-grainedcoarse-grained Employment:Staff0.20
0.35Employment Employment:Executive0.15 Personal:Family0.10
0.40Personal Personal:Business0.10 Affiliation:Citizen0.20
0.25Affiliation Affiliation:Based-in0.25
Slide 15
15 David Cone, a Kansas City native, was originally signed by
the Royals and broke into the majors with the team Using Background
Knowledge fine-grainedcoarse-grained Employment:Staff0.20
0.35Employment Employment:Executive0.15 Personal:Family0.10
0.40Personal Personal:Business0.10 Affiliation:Citizen0.20
0.25Affiliation Affiliation:Based-in0.25
Slide 16
16 David Cone, a Kansas City native, was originally signed by
the Royals and broke into the majors with the team Using Background
Knowledge fine-grainedcoarse-grained Employment:Staff0.20
0.35Employment Employment:Executive0.15 Personal:Family0.10
0.40Personal Personal:Business0.10 Affiliation:Citizen0.20
0.25Affiliation Affiliation:Based-in0.25 0.55
Slide 17
17 Knowledge 1 : Wikipedia 1 (as additional feature) We use a
Wikifier system (Ratinov et al., 2010) which performs
context-sensitive mapping of mentions to Wikipedia pages Introduce
a new feature based on: introduce a new feature by combining the
above with the coarse- grained entity types of m i, m j 17 mimi
mjmj r ?
Slide 18
18 Knowledge 1 : Wikipedia 2 (as additional feature) Given m i,
m j, we use a Parent-Child system (Do and Roth, 2010) to predict
whether they have a parent-child relation Introduce a new feature
based on: combine the above with the coarse-grained entity types of
m i, m j 18 mimi mjmj parent-child?
Slide 19
19 Knowledge 2 : Word Class Information (as additional feature)
Supervised systems face an issue of data sparseness (of lexical
features) Use class information of words to support generalization
better: instantiated as word clusters in our work Automatically
generated from unlabeled texts using algorithm of (Brown et al.,
1992) apple pear Apple IBM 0 10 1 0 1 boughtrun of in 0 1 0 1 0 1 0
1 19
Slide 20
20 Knowledge 2 : Word Class Information Supervised systems face
an issue of data sparseness (of lexical features) Use class
information of words to support generalization better: instantiated
as word clusters in our work Automatically generated from unlabeled
texts using algorithm of (Brown et al., 1992) apple pear Apple 0 10
1 0 1 boughtrun of in 0 1 0 1 0 1 0 1 20 IBM
Slide 21
21 Knowledge 2 : Word Class Information Supervised systems face
an issue of data sparseness (of lexical features) Use class
information of words to support generalization better: instantiated
as word clusters in our work Automatically generated from unlabeled
texts using algorithm of (Brown et al., 1992) apple pear Apple 0 10
1 0 1 boughtrun of in 0 1 0 1 0 1 0 1 21 IBM011
Slide 22
22 Knowledge 2 : Word Class Information All lexical features
consisting of single words will be duplicated with its
corresponding bit-string representation apple pear Apple IBM 0 10 1
0 1 boughtrun of in 0 1 0 1 0 1 0 1 22 00 0110 11
Slide 23
23 weight vector for local models collection of classifiers
Constraint Conditional Models (CCMs) (Roth and Yih, 2007; Chang et
al., 2008)
Slide 24
24 Constraint Conditional Models (CCMs) (Roth and Yih, 2007;
Chang et al., 2008) 24 weight vector for local models collection of
classifiers penalty for violating the constraint how far y is from
a legal assignment
Slide 25
25 Constraint Conditional Models (CCMs) (Roth and Yih, 2007;
Chang et al., 2008) 25 Wikipedia word clusters hierarchy of
relations entity type constraints coreference
Slide 26
26 David Cone, a Kansas City native, was originally signed by
the Royals and broke into the majors with the team Constraint
Conditional Models (CCMs) fine-grainedcoarse-grained
Employment:Staff0.20 0.35Employment Employment:Executive0.15
Personal:Family0.10 0.40Personal Personal:Business0.10
Affiliation:Citizen0.20 0.25Affiliation
Affiliation:Based-in0.25
Slide 27
27 Key steps Write down a linear objective function Write down
constraints as linear inequalities Solve using integer linear
programming (ILP) packages 27 Constraint Conditional Models (CCMs)
(Roth and Yih, 2007; Chang et al., 2008)
Slide 28
28 Knowledge 3 : Relations between our target relations...
personal... employment family bizexecutive staff 28
Slide 29
29 Knowledge 3 : Hierarchy of Relations... personal...
employment family bizexecutive staff 29 coarse-grained classifier
fine-grained classifier
Slide 30
30 Knowledge 3 : Hierarchy of Relations... personal...
employment family bizexecutive staff 30 mimi mjmj coarse-grained?
fine-grained?
Slide 31
31 Knowledge 3 : Hierarchy of Relations... personal...
employment family bizexecutive staff 31
Slide 32
32 Knowledge 3 : Hierarchy of Relations... personal...
employment family bizexecutive staff 32
Slide 33
33 Knowledge 3 : Hierarchy of Relations... personal...
employment family bizexecutive staff 33
Slide 34
34 Knowledge 3 : Hierarchy of Relations... personal...
employment family bizexecutive staff 34
Slide 35
35 Knowledge 3 : Hierarchy of Relations... personal...
employment family bizexecutive staff 35
Slide 36
36 Knowledge 3 : Hierarchy of Relations Write down a linear
objective function 36 coarse-grained prediction probabilities
fine-grained prediction probabilities
Slide 37
37 Knowledge 3 : Hierarchy of Relations Write down a linear
objective function 37 coarse-grained prediction probabilities
fine-grained prediction probabilities coarse-grained indicator
variable fine-grained indicator variable indicator variable ==
relation assignment
Slide 38
38 Knowledge 3 : Hierarchy of Relations Write down constraints
If a relation R is assigned a coarse-grained label rc, then we must
also assign to R a fine-grained relation rf which is a child of rc.
(Capturing the inverse relationship) If we assign rf to R, then we
must also assign to R the parent of rf, which is a corresponding
coarse-grained label 38
Slide 39
39 Knowledge 4 : Entity Type Constraints ( Roth and Yih, 2004,
2007) Entity types are useful for constraining the possible labels
that a relation R can assume 39 mimi mjmj Employment:Staff
Employment:Executive Personal:Family Personal:Business
Affiliation:Citizen Affiliation:Based-in
Slide 40
40 Entity types are useful for constraining the possible labels
that a relation R can assume 40 Employment:Staff
Employment:Executive Personal:Family Personal:Business
Affiliation:Citizen Affiliation:Based-in per org per org per org
gpe per mimi mjmj Knowledge 4 : Entity Type Constraints ( Roth and
Yih, 2004, 2007)
Slide 41
41 We gather information on entity type constraints from
ACE-2004 documentation and impose them on the coarse-grained
relations By improving the coarse-grained predictions and combining
with the hierarchical constraints defined earlier, the improvements
would propagate to the fine-grained predications 41
Employment:Staff Employment:Executive Personal:Family
Personal:Business Affiliation:Citizen Affiliation:Based-in per org
per org per org gpe per mimi mjmj Knowledge 4 : Entity Type
Constraints ( Roth and Yih, 2004, 2007)
43 Knowledge 5 : Coreference In this work, we assume that we
are given the coreference information, which is available from the
ACE annotation. 43 mimi mjmj Employment:Staff Employment:Executive
Personal:Family Personal:Business Affiliation:Citizen
Affiliation:Based-in null
Slide 44
44 Experiment Results 44 F1% improvement from using each
knowledge source All nwire10% of nwire BasicRE50.5%31.0%
Slide 45
Consider different levels of syntactic information Deep
processing of text produces structural but less reliable results
Simple surface information is less structural, but more reliable
Generalization of feature-based solutions A kernel (kernel
function) defines a similarity metric (x, y) on objects No need for
enumeration of features Efficient extension of normal features into
high-order spaces Possible to solve linearly non-separable problem
in a higher order space Nice combination properties Closed under
linear combination Closed under polynomial extension Closed under
direct sum/product on different domains References: Zelenko et al.,
2002, 2003; Aron Culotta and Sorensen, 2004; Bunescu and Mooney,
2005; Zhao and Grishman, 2005; Che et al., 2005, Zhang et al.,
2006; Qian et al., 2007; Zhou et al., 2007; Khayyamian et al.,
2009; Reichartz et al., 2009 Most Successful Learning Methods:
Kernel-based
Slide 46
, where 1) Argument 2) Local dependency, where Kernel Examples
for Relation Extraction K T is a token kernel defined as: (Zhao and
Grishman, 2005) 3) Path, where Composite Kernels:
Slide 47
Occurrences of seed tuples: Computer servers at Microsofts
headquarters in Redmond In mid-afternoon trading, share of
Redmond-based Microsoft fell The Armonk-based IBM introduced a new
line The combined company will operate from Boeings headquarters in
Seattle. Intel, Santa Clara, cut prices of its Pentium processor.
Initial Seed TuplesOccurrences of Seed Tuples Generate Extraction
Patterns Generate New Seed Tuples Augment Table Bootstrapping for
Relation Extraction
Slide 48
s headquarters in -based, Initial Seed TuplesOccurrences of
Seed Tuples Generate Extraction Patterns Generate New Seed Tuples
Augment Table Learned Patterns: Bootstrapping for Relation
Extraction (Cont)
Slide 49
Initial Seed TuplesOccurrences of Seed Tuples Generate
Extraction Patterns Generate New Seed Tuples Augment Table Generate
new seed tuples; start new iteration Bootstrapping for Relation
Extraction (Cont)
Slide 50
50 State-of-the-art: About 71% F-score on perfect mentions, and
50% F-score on system mentions Single human annotator: 84% F-score
on perfect mentions Remaining Challenges Context generalization to
reduce data sparsity Test: ABC's Sam Donaldson has recently been to
Mexico to see him Training: PHY relation ( arrived in, was
traveling to, ) Long context Davies is leaving to become chairman
of the London School of Economics, one of the best-known parts of
the University of London Disambiguate fine-grained types U.S.
citizens and U.S. businessman indicate GPE-AFF relation while U.S.
president indicates EMP-ORG relation Parsing errors
State-of-the-art and Remaining Challenges
Slide 51
School Attended: University of Houston Jim Parsons
eng-WL-11-174592-12943233 PER E0300113 per:date_of_birth per:age
per:country_of_birth per:city_of_birth Knowledge Base Population
(Slot Filling)
Slot Filling & Slot filler Validation Slot Filling SF
Definition: The slot filling task is to search a document
collection to fill in values for predefined slots (attributes) for
a given entity to populate a reference KB. Queries: 50 person
queries and 50 organization queries such as Marc Bolland and Public
Library of Science Response: Claim + Evidence 41 slot types single
or multiple attribute values Slot Filling Validation (SFV) 52 runs
from 18 SF teams
Slide 54
Extracting true claims from multiple sources Problems:
different information sources may generate claims with varied
trustability various SF systems may generate erroneous,
conflicting, redundant, complementary, ambiguously worded, or
inter- dependent claims from the same set of documents
Slide 55
Solution Truth Finding: Determine the veracity of multiple
conflicting claims from various sources and providers (i.e. systems
or humans)
Slide 56
Truth Finding Problem We require not only high-confidence
claims but also trustworthy evidence to verify them. deep
understanding is needed. Previous truth finding work assumed most
claims are likely to be true. Most of them relied on the wisdom of
the crowd. In SF, 72.02% responses are false. Certain truths might
only be discovered by a minority of systems or from a few sources
(62% from 1 or 2 systems)
Slide 57
Multi-dimensional truth-finding model (MTM)
Slide 58
Heuristics Explored in MTM Heuristic 1: A response is more
likely to be true if derived from many trustworthy sources. A
source is more likely to be trustworthy if many responses derived
from it are true. Heuristic 2: A response is more likely to be true
if it is extracted by many trustworthy systems. A system is more
likely to be trustworthy if many responses generated by it are
true.
Slide 59
Credibility Initialization
Slide 60
Credibility Propagation
Slide 61
Manually crafted/edited patterns: low coverage; expensive
Bootstrapping: hard to generalize; long-tail distribution Typical
Dependency patterns for per:place_of_birth nsubjpass-1 born prep_in
partmod born prep_in nsubjpass-1 born prep_on rcmod born prep_in
Missing some simple cases Charles Gwathmey [1] was born on June 19,
1938, in Charlotte [2], N.C.. Dependency path between [1] and [2]:
[ 'nsubjpass', 'born', 'prep_on', 'June', 'prep_in', 'N.C', 'nn') ]
Bottleneck: Low Coverage of Patterns
Slide 62
Typical Dependency Patterns for per:place_of_death nsubj-1 dies
prep_in nsubj-1 died prep_in nsubj-1 died prep_on nsubj-1 died
prep_in hospital nn Missing some simple cases ``60 Minutes'' was
the brainchild of Don Hewitt [1], the show 's longtime executive
producer who died Wednesday of pancreatic cancer at his home in
Bridgehampton, N.Y. [2], at age 86. Dependency path between [1] and
[2]: [ 'appos', "producer", 'nsubj', 'died', "who", 'rcmod',
'died', 'prep_at', 'home', 'prep_in] Bottleneck: Low Coverage of
Patterns
Slide 63
Deep Knowledge Acquisition: Nominal Coreference Almost
overnight, he became fabulously rich, with a $3-million book deal,
a $100,000 speech making fee, and a lucrative multifaceted
consulting business, Giuliani Partners. As a celebrity rainmaker
and lawyer, his income last year exceeded $17 million. His
consulting partners included seven of those who were with him on
9/11, and in 2002 Alan Placa, his boyhood pal, went to work at the
firm. After successful karting career in Europe, Perera became part
of the Toyota F1 Young Drivers Development Program and was a
Formula One test driver for the Japanese company in 2006. Alexandra
Burke is out with the video for her second single taken from the
British artists debut album a woman charged with running a
prostitution ring her business, Pamela Martin and Associates Our
Solution: Online knowledge graph construction; enrich paths with
semantic annotations and Information Extraction
(coreference/relation/event) Knowledge Gap 1
Slide 64
Deep Knowledge Acquisition: Implicit paraphrases &
long-tail distribution employee/member : Sutil, a trained pianist,
tested for Midland in 2006 and raced for Spyker in 2007 where he
scored one point in the Japanese Grand Prix. Daimler Chrysler
reports 2004 profits of $3.3 billion; Chrysler earns $1.9 billion.
In her second term, she received a seat on the powerful Ways and
Means Committee Jennifer Dunn was the face of the Washington state
Republican Party for more than two decades State of Residence:
Davis became Virginia's first Republican woman elected to Congress
in 2000, and she was a member of the House Armed Services Committee
and the Foreign Affairs Committee Buchwald lied about his age and
escaped into the Marine Corps. By 1942, Peterson was performing
with one of Canada's leading big bands, the Johnny Holmes
Orchestra. Even more: would join, would be appointed, will start
at, went to work, was transferred to, was recruited by, took over
as, succeeded PERSON, began to teach piano, spouse: Buchwald 's
1952 wedding -- Lena Horne arranged for it to be held in London 's
Westminster Cathedral -- was attended by Gene Kelly, John Huston,
Jose Ferrer, Perle Mesta and Rosemary Clooney, to name a few
Knowledge Gap 2
Slide 65
Linguistic Indicators: Knowledge Graph Construction Mays had
died sleep his home Tampa 50 June,28 amod nsubj aux prep_in poss
prep_at prep_of nn poss located_in {PER.Individual, NAM, Billy
Mays} Query {NUM } Per:age {Death-Trigger} {PER.Individual.PRO,
Mays} {FAC.Building-Grounds.NOM}
Slide 66
Linguistic Indicators Linguistic Indicators: (binary
classification result) Linguistic indicators make use of linguistic
features on varying levels - surface form, sentential syntax,
semantics, and pragmatics. Node Indicators Path Indicators
Interdependent Claims
Slide 67
Node Indicators Surface: stop words, lowercased Entity type,
subtype and mention type Fillers for org:top_employee Fillers for
org:website Entity attributes mined by the NELL system (Carlson et
al., 2010)
Slide 68
Path Indicators Trigger phrases Examples: top-employees: chief
executive officer, chief financial officer, chief operating
officer, chief strategy and development officer, chiev information
officer, e- commerce and security officer, headquarters: based,
headquarter, headquarters, 's Disease list from medical ontology
Relations and events: e.g. Start-Position indicates slot type:
per:employee_or_member_of Path length: e.g. the path length for
per:title is usually 1.
Slide 69
Independent Claims Indicators Conflicting slot fillers
Inter-dependent slot types: After initial credibility scores for
each response, we check whether evidence exists for any implied
claims. e.g.: Given A is Bs son and C is As sibling brother-> A
is Cs parent.
Slide 70
Inter-dependent Slots Query: Beverly Sills
Slide 71
Example: local structure for death related slots We already
know Beverly Sills, 78, died on Monday in Brookly, NY. Given the
knowledge graph of Paul Gillmor and a similar local structure, we
can predict the slot types of nodes.
Slide 72
Truth Finding Overall Performance
MethodsPrecisionRecallF-measureAccuracyMAP* 1.
Random28.64%50.48%36.54%50.54%34% 2.
Voting42.16%70.18%52.68%62.54%62% 3. Linguistic Indicators
50.24%70.69%58.73%72.29%60% 4. SVM (3+system+source)
56.59%48.72%52.36%75.86%56% 5. MTM (3+system+source)
53.94%72.11%61.72%81.57%70% MAP: Mean Average Precision
Slide 73
Truth Finding Efficiency
Slide 74
Enhance Individual SF Systems
Slide 75
75 Remaining Challenges Name Tagging Errors Coreference
Resolution Errors He worked his way up the organization under
founder Ted Arison and his son Micky, who now leads Carnival Corp.
and called Dickinson, `` one of the most influential people in the
development of the modern-day cruise industry. Indiana Muslim
running for Congress wants to combat ignorance about his [Andre
Carson] faith INDIANAPOLIS -- A convert to Islam stands an election
victory away from becoming the second Muslim elected to Congress
and a role model for a faith community seeking to make its mark in
national politics. Vague Justification It was in December 1970 that
Anderson criticized Hoover 's pretrial attack on two Roman Catholic
priests, Daniel J. and Philip F. Berrigan, who were later convicted
of destroying draft board records. religion filler? Fuzzy
Definition She and Russell Simmons, 50, have two daughters:
8-year-old Ming Lee and 5-year-old Aoki Lee.
Slide 76
76 Remaining Challenges Distinguish Slot Directions
Organization parent/subsidiary; members/member_of Implicit
Relations He [Pascal Yoadimnadji] has been evacuated to France on
Wednesday after falling ill and slipping into a coma in Chad,
Ambassador Moukhtar Wawa Dahab told The Associated Press. His wife,
who accompanied Yoadimnadji to Paris, will repatriate his body to
Chad, the amba. is he dead? in Paris? Until last week, Palin was
relatively unknown outside Alaska, and as facts have dribbled out
about her, the McCain campaign has insisted that its examination of
her background was thorough and that nothing that has come out
about her was a surprise. does she live in Alaska? The list says
that the state is owed $2,665,305 in personal income taxes by
singer Dionne Warwick of South Orange, N.J., with the tax lien
dating back to 1997. does she live in NJ? Vernon Bellecourt --
whose Ojibwe name, WaBun-Inini, means "Man of Dawn" or "Daybreak"
-- was born on the White Earth Indian Reservation in Minnesota. He
left home at 15 after finding work in a carnival. did he live in
Minnesota?